当前位置: X-MOL 学术Geoderma › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Improving the predictions of soil properties from VNIR–SWIR spectra in an unlabeled region using semi-supervised and active learning
Geoderma ( IF 5.6 ) Pub Date : 2021-01-16 , DOI: 10.1016/j.geoderma.2020.114830
Nikolaos L. Tsakiridis , John B. Theocharis , Andreas L. Symeonidis , George C. Zalidis

Monitoring the status of the soil ecosystem to identify the spatio-temporal extent of the pressures exerted and mitigate the effects of climate change and land degradation necessitates the need for reliable and cost-effective solutions. To address this need, soil spectroscopy in the visible, near- and shortwave-infrared (VNIR–SWIR) has emerged as a viable alternative to traditional analytical approaches. To this end, large-scale soil spectral libraries coupled with advanced machine learning tools have been developed to infer the soil properties from the hyperspectral signatures. However, models developed from one region may exhibit diminished performance when applied to a new, unseen by the model, region due to the large and inherent soil variability (e.g. pedogenetical differences, diverse soil types etc.). Given an existing spectral library with labeled data and a new unlabeled region (i.e. where no soil samples are analytically measured) the question then becomes how to best develop a model which can more accurately predict the soil properties of the unlabeled region.

In this paper, a machine learning technique leveraging on the capabilities of semi-supervised learning which exploits the predictors’ distribution of the unlabeled dataset and of active learning which expertly selects a small set of data from the unlabeled dataset as a spiking subset in order to develop a more robust model is proposed. The semi-supervised learning approach is the Laplacian Support Vector Regression following the manifold regularization framework. As far as the active learning component is concerned, the pool-based approach is utilized as it best matches with the aforementioned use-case scenario, which iteratively selects a subset of data from the unlabeled region to spike the calibration set. As a query strategy, a novel machine learning–based strategy is proposed herein to best identify the spiking subset at each iteration. The experimental analysis was conducted using data from the Land Use and Coverage Area Frame Survey of 2009 which covered most of the then member-states of the European Union, and in particular by focusing on the mineral cropland soil samples from 5 different countries. The statistical analysis conducted ascertained the efficacy of our approach when compared to the current state-of-the-art in soil spectroscopy.



中文翻译:

使用半监督和主动学习改进无标记区域中VNIR–SWIR光谱对土壤性质的预测

监测土壤生态系统的状况以识别所施加压力的时空范围并减轻气候变化和土地退化的影响,因此需要可靠且具有成本效益的解决方案。为了满足这一需求,可见光谱,近红外光谱和短波红外光谱(VNIR–SWIR)已经成为传统分析方法的可行替代方法。为此,已经开发了结合高级机器学习工具的大规模土壤光谱库,以从高光谱特征推断土壤性质。但是,由于一个较大的和固有的土壤变化性(例如,土壤学差异,土壤类型多样等),当从一个区域开发的模型应用于新的,模型区域看不到的区域时,其性能可能会下降。

在本文中,一种利用半监督学习功能的机器学习技术利用了未标记数据集的预测变量分布和主动学习的能力,该主动学习功能从未标记数据集中选择了一小部分数据作为加标子集,以便提出了一个更健壮的模型。半监督学习方法是遵循流形正则化框架的拉普拉斯支持向量回归。就主动学习组件而言,使用基于池的方法,因为它与上述用例场景最匹配,该用例场景从未标记区域中反复选择数据子集以加标校准集。作为一种查询策略,本文提出了一种新颖的基于机器学习的策略,以最佳地识别每次迭代中的尖峰子集。使用来自2009年土地使用和覆盖区域框架调查的数据进行了实验分析,该数据涵盖了当时欧盟的大多数成员国,特别是着眼于5个不同国家的矿质农田土壤样本。与当前的土壤光谱技术相比,进行的统计分析确定了我们方法的有效性。

更新日期:2021-01-18
down
wechat
bug