当前位置: X-MOL 学术Geoderma Reg. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Comparison of PLS and SVM models for soil organic matter and particle size using vis-NIR spectral libraries
Geoderma Regional ( IF 4.1 ) Pub Date : 2021-08-27 , DOI: 10.1016/j.geodrs.2021.e00436
Felipe B. de Santana 1 , Sandro K. Otani 1 , André M. de Souza 2 , Ronei J. Poppi 1
Affiliation  

In this study a systematic comparison was carried out to assess differences on the accuracy between partial least squares (PLS) and support vector machine (SVM) regression algorithms in soil organic matter and particle size determinations using vis-NIR spectroscopy. The comparison consisted in investigating the influence on the size of calibration set on the external validation set accuracy. For this purpose, three vis-NIR soil libraries containing 14,212, 15,330 and 42,471 soil samples were used to determine sand, clay, and SOM content, respectively. To increase the variability of the results obtained, each calibration subset was randomly generated 49 times and for each iteration a PLS, SVM-Linear and SVM-RBF (radial basis function) regression models were built. These calibration subsets were composed by 250, 1000, 2000, 5000 and 8000 or 10,000 samples.

In all situations the SVM-Linear obtained the worst accuracy results. For sand and clay determinations, SVM-RBF models shows a significant improvement on the accuracy, compared to PLS, when the calibration model was built using at least 1000 samples, resulting in a reduction of ~14–29% on the RMSEP. For SOM determinations the difference in RMSEP values of SVM-RBF and PLS starts to be significant when 2000 or more samples were used in calibration set, presenting a reduction of ~8–22% on the RMSEP values. In addition, for all soil attributes investigated between 20 and 27% of the external validation set (1173–2241 samples) were considered outliers and excluded by the PLS regression models.

This loss of PLS performance for large calibration sets, indicates the correlation between the vis-NIR spectra and clay, sand and SOM contents tends to be more complex by increasing the variability/number of samples. Requiring the use of machine learnings models with high generalization capacity, such as the SVM-RBF, which increased the performance as the number of samples that compose the calibration set increased.



中文翻译:

使用可见近红外光谱库比较土壤有机质和颗粒大小的 PLS 和 SVM 模型

在这项研究中,进行了系统比较,以评估偏最小二乘法 (PLS) 和支持向量机 (SVM) 回归算法在使用可见近红外光谱测定土壤有机质和粒度方面的准确性差异。比较包括调查校准集大小对外部验证集准确性的影响。为此,三个包含 14,212、15,330 和 42,471 个土壤样品的 vis-NIR 土壤库分别用于测定沙子、粘土和 SOM 含量。为了增加所得结果的可变性,每个校准子集随机生成 49 次,并且每次迭代都建立了 PLS、SVM-线性和 SVM-RBF(径向基函数)回归模型。这些校准子集由 250、1000、2000、5000 和 8000 或 10 个组成,

在所有情况下,SVM-Linear 都获得了最差的精度结果。对于沙子和粘土的测定,与 PLS 相比,当校准模型使用至少 1000 个样本构建时,SVM-RBF 模型的精度显着提高,导致 RMSEP 降低约 14-29%。对于 SOM 测定,当校准集中使用 2000 个或更多样本时,SVM-RBF 和 PLS 的 RMSEP 值的差异开始变得显着,RMSEP 值降低了约 8–22%。此外,对于所有调查的土壤属性,20% 到 27% 的外部验证集(1173-2241 个样本)被认为是异常值并被 PLS 回归模型排除。

大型校准集的 PLS 性能损失表明,可见近红外光谱与粘土、沙子和 SOM 含量之间的相关性由于增加了样品的可变性/数量而变得更加复杂。需要使用具有高泛化能力的机器学习模型,例如 SVM-RBF,随着组成校准集的样本数量的增加,性能会提高。

更新日期:2021-09-02
down
wechat
bug