当前位置: X-MOL 学术Int. J. Appl. Earth Obs. Geoinf. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Determining representative sample size for validation of continuous, large continental remote sensing data
International Journal of Applied Earth Observation and Geoinformation ( IF 7.6 ) Pub Date : 2020-09-21 , DOI: 10.1016/j.jag.2020.102235
Megan L. Blatchford , Chris M. Mannaerts , Yijian Zeng

The validation of global remote sensing data comprises multiple methods including comparison to field measurements, cross-comparisons and verification of physical consistency. Physical consistency and cross-comparisons are typically assessed for all pixels of the entire product extent, which requires intensive computing. This paper proposes a statistically representative sampling approach to reduce time and efforts associated with validations of remote sensing data having big data volume. A progressive sampling approach, as typically applied in machine learning to train algorithms, combined with two performance measures, was applied to estimate the required sample size. The confidence interval (CI) and maximum entropy probability distribution were used as indicators to represent accuracy. The approach was tested on 8 continental remote sensing-based data products over the Middle East and Africa. Without the consideration of climate classes, a sample size of 10,000–100,000, dependent on the product, met the nominally set CI and entropy indicators. This corresponds to <0.01 % of the total image for the high-resolution images. All continuous datasets showed the same trend of CI and entropy with increasing sample size. The actual evapotranspiration and interception (ETIa) product was further analysed based on climate classes, which increased the sample size required to meet performance requirements, but was still determined to be significantly less than the entire dataset size. The proposed approach can significantly reduce the processing time while still providing a statistically valid representation of a large remote sensing dataset. This can be useful as more high-resolution remote sensing data becomes available.



中文翻译:

确定代表性样本量以验证连续的大块大陆遥感数据

全球遥感数据的验证包括多种方法,包括与实地测量的比较,交叉比较和物理一致性的验证。通常针对整个产品范围内的所有像素评估物理一致性和交叉比较,这需要大量的计算。本文提出了一种具有统计意义的抽样方法,以减少与验证具有大数据量的遥感数据有关的时间和精力。逐步采样方法(通常在机器学习中用于训练算法)与两种性能指标相结合,用于估计所需的样本量。置信区间(CI)和最大熵概率分布用作表示准确性的指标。该方法已在中东和非洲的8种基于大陆遥感的数据产品上进行了测试。在不考虑气候类别的情况下,取决于产品的样本量为10,000–100,000,符合名义上设定的CI和熵指标。对于高分辨率图像,这相当于总图像的<0.01%。随着样本量的增加,所有连续数据集都显示出相同的CI和熵趋势。根据气候类别进一步分析了实际的蒸散和截留(ETIa)产品,这增加了满足性能要求所需的样本量,但仍被确定为明显小于整个数据集的大小。所提出的方法可以显着减少处理时间,同时仍然可以提供大型遥感数据集的统计有效表示。随着更多高分辨率遥感数据的出现,这将很有用。

更新日期:2020-09-21
down
wechat
bug