当前位置: X-MOL 学术Geoderma › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A framework for optimizing environmental covariates to support model interpretability in digital soil mapping
Geoderma ( IF 6.1 ) Pub Date : 2024-04-04 , DOI: 10.1016/j.geoderma.2024.116873
Babak Kasraei , Margaret G. Schmidt , Jin Zhang , Chuck E. Bulmer , Deepa S. Filatow , Adrienne Arbor , Travis Pennell , Brandon Heung

A common practice in digital soil mapping (DSM) is to incorporate many environmental covariates into a machine-learning algorithm to predict the spatial patterns of soil attributes. Variance inflation factor (VIF), principal component analysis (PCA), and recursive feature elimination (RFE) are three statistical methods that can be used to reduce the number of covariates. This study aims 1) to compare VIF and PCA approaches; 2) to identify an approach to determine the minimum number of covariates in DSM to ensure model parsimony using RFE after using VIF; and 3) to examine methods to interpret the impact of covariates on the variability of the predicted soil properties. The study area was the province of British Columbia (BC), Canada. This study used legacy data for four soil properties to make digital soil maps: soil organic carbon (SOC%), pH, clay%, and coarse fragment (CF%). Seven models were made for each soil property to determine the influence on validation results by using a different number of covariates produced by various methods on validation results. The results showed that the number of covariates could be reduced from 70 to 4 to 12 with only a little or no difference in concordance correlation coefficient (CCC) validation results. The CCC results of pH models using 70 and 7 covariates were both 0.74, and for other soil properties, this difference was negligible. The validation results obtained from PCA models showed that the performance of PCA in reducing the number of covariates was not as effective as when using VIF. Moreover, this study showed that covariates related to precipitation were the most important for modeling SOC%, soil pH, and clay%. Topographic covariates were the most influential covariates for modeling soil CF%. This study emphasizes the potential benefits of combining various data reduction methods to achieve optimal outcomes and generate the most parsimonious and interpretable models.

中文翻译:

优化环境协变量以支持数字土壤测绘中模型可解释性的框架

数字土壤测绘 (DSM) 的常见做法是将许多环境协变量纳入机器学习算法中,以预测土壤属性的空间模式。方差膨胀因子(VIF)、主成分分析(PCA)和递归特征消除(RFE)是三种可用于减少协变量数量的统计方法。本研究的目的 1) 比较 VIF 和 PCA 方法; 2) 确定一种方法来确定 DSM 中协变量的最小数量,以确保使用 VIF 后使用 RFE 的模型简约性; 3) 研究解释协变量对预测土壤特性变异性影响的方法。研究区域为加拿大不列颠哥伦比亚省 (BC)。本研究使用四种土壤特性的遗留数据来制作数字土壤图:土壤有机碳 (SOC%)、pH、粘土% 和粗碎片 (CF%)。针对每种土壤属性建立了七个模型,通过使用不同数量的由验证结果产生的协变量来确定对验证结果的影响。结果表明,协变量的数量可以从 70 个减少到 4 到 12 个,而一致性相关系数 (CCC) 验证结果只有很小的差异或没有差异。使用 70 和 7 个协变量的 pH 模型的 CCC 结果均为 0.74,对于其他土壤特性,这种差异可以忽略不计。从PCA模型获得的验证结果表明,PCA在减少协变量数量方面的性能不如使用VIF时有效。此外,这项研究表明,与降水相关的协变量对于模拟 SOC%、土壤 pH 值和粘土% 来说是最重要的。地形协变量是对土壤 CF% 建模影响最大的协变量。这项研究强调了结合各种数据缩减方法以实现最佳结果并生成最简约和可解释模型的潜在好处。
更新日期:2024-04-04
down
wechat
bug