GEE中之集成堆叠模型(stack ensemble)

学术   2024-09-15 18:00   云南  

//代码https://code.earthengine.google.com/bd19f473c37ebe258bfebe5d55ca2fd6?noload=true

集成堆叠模型是一种机器学习集成策略,它集成许多模型来提高模型的整体性能。堆叠的主要思想是将众多基础模型的预测输入到称为元模型或混合器的高级模型中,然后将它们组合起来以获得最终预测。也就是把多个基础模型预测的各个类别的概率输出出来,做为变量,再次输入元模型中进行预测集成堆叠模型模型的结果。

堆叠是一种通过合并多个基础模型结果来提高预测性能的策略,它能减少预测中的偏差和方差,同时通过使用多样化的模型减少过拟合风险,增强对不同数据的鲁棒性。但同时如果每个模型的结果都不好,用stack ensemble那就是大便上面叠大便了(比如我的例子)。

近几年经常有论文结合GEE与stack ensemble在一起,我一直不知道咋弄的,后来通过我的好兄弟GEEer成长日记的提点,知道了MULTIPROBABILITY可输出概率,这样就可以很方便的实现集成堆叠了。

引用与本文无关

爱探索的GEEer,公众号:GEEer成长日记日记128:随机森林分类模型保存与使用讲解

普通的分类方法

var withRandom = SamplePoints.randomColumn('random');// 将样本划分为训练集和验证集var split = 0.7; var trainingPartition = withRandom.filter(ee.Filter.lt('random', split));//筛选70%的样本作为训练样本var testingPartition = withRandom.filter(ee.Filter.gte('random', split));//筛选30%的样本作为测试样本
// 使用训练集进行波段值采样var training = image.sampleRegions({ collection: trainingPartition, properties: ['landcover'], scale: 30});print('training',training)// 分类方法选择smilerandomForest()var classifierRF = ee.Classifier.smileRandomForest(37).train({ features: training, classProperty: 'landcover', inputProperties: image.bandNames()})// // 对Landsat-8进行分类var classifiedRF = image.classify(classifierRF).clip(geometry);// 使用验证集,对分类的结果进行采样var verification = classifiedRF.sampleRegions({ collection: testingPartition, properties: ['landcover'], // tileScale: 16, scale: 30,});Map.addLayer(classifiedRF, {min: 0, max: 6, palette: ['black','green','lightgreen','pink','red','black','blue']},'分类的结果');// 计算混淆矩阵var confusionMatrix = verification.errorMatrix('landcover', 'classification');print('方法一:confusionMatrix',confusionMatrix); //面板上显示混淆矩阵print('方法一:Overall accuracy:', confusionMatrix.accuracy()); //面板上显示总体精度print('方法一:kappa accuracy:', confusionMatrix.kappa()); //面板上显示kappa值 print('方法一:User acc:',confusionMatrix.consumersAccuracy())//面板上显示用户精度print('方法一:Prod acc:',confusionMatrix.producersAccuracy())//面板上显示生产精度

stack ensemble分类方法:

在训练数据上训练几个基础模型,我选择了随机森林、svm、梯度提升机、cart作为基础模型,最小距离为元模型

var classifier_RF = ee.Classifier.smileRandomForest(37).setOutputMode('MULTIPROBABILITY');var classifier_SVM = ee.Classifier.libsvm().setOutputMode('MULTIPROBABILITY');var classifier_Cart = ee.Classifier.smileCart().setOutputMode('MULTIPROBABILITY');var classifier_GTB = ee.Classifier.smileGradientTreeBoost(37).setOutputMode('MULTIPROBABILITY');var classifier_MD = ee.Classifier.minimumDistance();

var trained_RF = classifier_RF.train(training, 'landcover', bandnames);var trained_SVM = classifier_SVM.train(training, 'landcover', bandnames);var trained_Cart = classifier_Cart.train(training, 'landcover', bandnames);var trained_GTB = classifier_GTB.train(training, 'landcover', bandnames);

使用基础模型对保留验证数据进行预测

var classified_RF = image.classify(trained_RF);var classified_SVM = image.classify(trained_SVM);var classified_Cart = image.classify(trained_Cart);var classified_GTB = image.classify(trained_GTB);

使用基础模型的预测作为输入特征,在保留验证数据上训练元模型

var classified_RF_C0 = classified_RF.arrayGet(0);var classified_RF_C1 = classified_RF.arrayGet(1);var classified_RF_C2 = classified_RF.arrayGet(2);var classified_RF_C3 = classified_RF.arrayGet(3);var classified_RF_C4 = classified_RF.arrayGet(4);var classified_RF_C5 = classified_RF.arrayGet(5);var classified_RF_C6 = classified_RF.arrayGet(6);var classified_SVM_C0 = classified_SVM.arrayGet(0);var classified_SVM_C1 = classified_SVM.arrayGet(1);var classified_SVM_C2 = classified_SVM.arrayGet(2);var classified_SVM_C3 = classified_SVM.arrayGet(3);var classified_SVM_C4 = classified_SVM.arrayGet(4);var classified_SVM_C5 = classified_SVM.arrayGet(5);var classified_SVM_C6 = classified_SVM.arrayGet(6);var classified_Cart_C0 = classified_Cart.arrayGet(0);var classified_Cart_C1 = classified_Cart.arrayGet(1);var classified_Cart_C2 = classified_Cart.arrayGet(2);var classified_Cart_C3 = classified_Cart.arrayGet(3);var classified_Cart_C4 = classified_Cart.arrayGet(4);var classified_Cart_C5 = classified_Cart.arrayGet(5);var classified_Cart_C6 = classified_Cart.arrayGet(6);var classified_GTB_C0 = classified_GTB.arrayGet(0);var classified_GTB_C1 = classified_GTB.arrayGet(1);var classified_GTB_C2 = classified_GTB.arrayGet(2);var classified_GTB_C3 = classified_GTB.arrayGet(3);var classified_GTB_C4 = classified_GTB.arrayGet(4);var classified_GTB_C5 = classified_GTB.arrayGet(5);var classified_GTB_C6 = classified_GTB.arrayGet(6);var classified_Stack = classified_RF_C0.addBands(classified_RF_C1).addBands(classified_RF_C2).addBands(classified_RF_C3).addBands(classified_RF_C4).addBands(classified_RF_C5).addBands(classified_RF_C6)        .addBands(classified_SVM_C0).addBands(classified_SVM_C1).addBands(classified_SVM_C2).addBands(classified_SVM_C3).addBands(classified_SVM_C4).addBands(classified_SVM_C5).addBands(classified_SVM_C6)        .addBands(classified_Cart_C0).addBands(classified_Cart_C1).addBands(classified_Cart_C2).addBands(classified_Cart_C3).addBands(classified_Cart_C4).addBands(classified_Cart_C5).addBands(classified_Cart_C6)        .addBands(classified_GTB_C0).addBands(classified_GTB_C1).addBands(classified_GTB_C2).addBands(classified_GTB_C3).addBands(classified_GTB_C4).addBands(classified_GTB_C5).addBands(classified_GTB_C6)        var bandnames_MD = classified_Stack.bandNames();
var training_MD = classified_Stack.sampleRegions({collection:trainingPartition,properties: ['landcover'],scale: 30,tileScale:16,geometries:true});
var trained_MD = classifier_MD.train(training_MD, 'landcover', bandnames_MD);

对新数据进行预测并评估堆叠模型的性能

var classified_MD = classified_Stack.classify(trained_MD);Map.addLayer(classified_MD, {min: 0, max: 6, palette: ['black','green','lightgreen','pink','red','black','blue']}, 'classified_MD');//////////////////////////////////////////////////////////////////////var testing_MD = classified_MD.sampleRegions({collection:testingPartition,properties: ['landcover'],scale: 30,tileScale:16,geometries:true});
////////////////////////////////////////////////////////////////////////////////////////////var confusionMatrix2 = testing_MD.errorMatrix('landcover', 'classification');print('方法二:confusionMatrix',confusionMatrix2); //面板上显示混淆矩阵print('方法二:Overall accuracy:', confusionMatrix2.accuracy()); //面板上显示总体精度print('方法二:kappa accuracy:', confusionMatrix2.kappa()); //面板上显示kappa值 print('方法二:User acc:',confusionMatrix2.consumersAccuracy())//面板上显示用户精度print('方法二:Prod acc:',confusionMatrix2.producersAccuracy())//面板上显示生产精度

哈哈哈哈哈,stack完反而结果变差了。

Reference

https://medium.com/@brijesh_soni/stacking-to-improve-model-performance-a-comprehensive-guide-on-ensemble-learning-in-python-9ed53c93ce28

走天涯徐小洋地理数据科学
一个爱生活的地理土博,分享GIS、遥感、空间分析、R语言、景观生态等地理数据科学实操教程、经典文献、数据资源
 最新文章