当前位置: X-MOL 学术Geoderma › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A self-training semi-supervised machine learning method for predictive mapping of soil classes with limited sample data
Geoderma ( IF 6.1 ) Pub Date : 2021-02-01 , DOI: 10.1016/j.geoderma.2020.114809
Lei Zhang , Lin Yang , Tianwu Ma , Feixue Shen , Yanyan Cai , Chenghu Zhou

Abstract Numerous machine learning models have been developed for constructing the relationship between soil classes or properties and its environmental covariates in digital soil mapping (DSM). Most machine learning models are trained with a supervised learning (SL) method based on training samples. However, the collected sample data is often limited in practice due to that field sampling is expensive and time-consuming. The insufficient samples may limit the learning ability of the model to a large extent. Semi-supervised machine learning, a new machine learning paradigm that makes use of both unsampled data and a small amount of sampled data in the learning process, can be a potential effective method for DSM. In this study, we present a self-training semi-supervised learning (SSL) method for DSM. Different with the SL method for machine learning models, the SSL method not only utilizes the sampled locations but also the abundant environmental covariate information at the unvisited locations. Its basic idea is to iteratively enlarge the training data set by adding the unsampled points with high prediction confidence from the unvisited locations until a stopping criterion reached. The proposed SSL method was applied in machine learning models for predicting soil classes in Heshan Farm of Nenjiang County in Heilongjiang Province, China. Three machine learning models, including multinomial logistic regression (MLR), k-nearest neighbor (KNN) and random forest (RF), were selected to evaluate the efficiency of the SSL method. The entropy threshold was an important parameter in the SSL method, and a sensitivity analysis on this parameter was conducted with using a series of entropy thresholds. The SSL method was compared with the SL method for the three machine learning models for soil prediction. A cross-validation was employed to evaluate the accuracy of the predicted soil class maps generated based on each method. The results showed that the prediction accuracies (the proportion of the correctly predicted samples over the total number of validation samples) of the SSL method were higher than those of the SL method for MLR, KNN, and RF by 5.9%, 12.2%, and 6.0%, respectively. RF-SSL was the most accurate model in the study area, followed by KNN-SSL. Meanwhile, the self-training SSL method for the KNN model had the largest improvement comparing with the other two models. Furthermore, the predicted soil maps using the SSL method showed a more reasonable spatial variation pattern of soil classes. In the study area, a suitable value of the entropy threshold was 0.8 ~ 1.0. We concluded that the SSL method improved the soil prediction accuracy compared with the SL method when applying machine learning models for DSM, and thus is a potential efficient method for DSM with limit sample data.
更新日期:2021-02-01
down
wechat
bug