Spatial dependence between training and test sets: another pitfall of classification accuracy assessment in remote sensing,Machine Learning

当前位置： X-MOL 学术 › Mach. Learn. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Spatial dependence between training and test sets: another pitfall of classification accuracy assessment in remote sensing
Machine Learning ( IF 4.3 ) Pub Date : 2021-04-26 , DOI: 10.1007/s10994-021-05972-1
N. Karasiak , J.-F. Dejoux , C. Monteil , D. Sheeren

Spatial autocorrelation is inherent to remotely sensed data. Nearby pixels are more similar than distant ones. This property can help to improve the classification performance, by adding spatial or contextual features into the model. However, it can also lead to overestimation of generalisation capabilities, if the spatial dependence between training and test sets is ignored. In this paper, we review existing approaches that deal with spatial autocorrelation for image classification in remote sensing and demonstrate the importance of bias in accuracy metrics when spatial independence between the training and test sets is not respected. We compare three spatial and non-spatial cross-validation strategies at pixel and object levels and study how performances vary at different sample sizes. Experiments based on Sentinel-2 data for mapping two simple forest classes show that spatial leave-one-out cross-validation is the better strategy to provide unbiased estimates of predictive error. Its performance metrics are consistent with the real quality of the resulting map contrary to traditional non-spatial cross-validation that overestimates accuracy. This highlight the need to change practices in classification accuracy assessment. To encourage it we developped Museo ToolBox, an open-source python library that makes spatial cross-validation possible.

中文翻译：

训练和测试集之间的空间依赖性：遥感分类精度评估的另一个陷阱

空间自相关是遥感数据固有的。附近的像素比远处的像素更相似。通过向模型中添加空间或上下文特征，此属性可以帮助提高分类性能。但是，如果忽略训练和测试集之间的空间依赖性，也可能导致高估泛化能力。在本文中，我们回顾了现有的处理空间自相关的遥感图像分类方法，并证明了当不考虑训练集和测试集之间的空间独立性时，精度指标存在偏差的重要性。我们比较了像素级和对象级的三种空间和非空间交叉验证策略，并研究了在不同样本量下性能如何变化。基于Sentinel-2数据绘制两个简单森林类别的实验表明，空间留一法交叉验证是提供无偏估计误差的更好策略。与传统的非空间交叉验证（高估准确性）相反，其性能指标与生成的地图的真实质量一致。这突显了在分类准确性评估中需要改变实践的需求。为了鼓励它，我们发展了Museo ToolBox，一个开放源代码的python库，可以进行空间交叉验证。

更新日期：2021-04-27

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11