当前位置: X-MOL 学术For. Ecosyst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Performance of statistical and machine learning-based methods for predicting biogeographical patterns of fungal productivity in forest ecosystems
Forest Ecosystems ( IF 4.1 ) Pub Date : 2021-03-15 , DOI: 10.1186/s40663-021-00297-w
Albert Morera , Juan Martínez de Aragón , José Antonio Bonet , Jingjing Liang , Sergio de-Miguel

The prediction of biogeographical patterns from a large number of driving factors with complex interactions, correlations and non-linear dependences require advanced analytical methods and modeling tools. This study compares different statistical and machine learning-based models for predicting fungal productivity biogeographical patterns as a case study for the thorough assessment of the performance of alternative modeling approaches to provide accurate and ecologically-consistent predictions. We evaluated and compared the performance of two statistical modeling techniques, namely, generalized linear mixed models and geographically weighted regression, and four techniques based on different machine learning algorithms, namely, random forest, extreme gradient boosting, support vector machine and artificial neural network to predict fungal productivity. Model evaluation was conducted using a systematic methodology combining random, spatial and environmental blocking together with the assessment of the ecological consistency of spatially-explicit model predictions according to scientific knowledge. Fungal productivity predictions were sensitive to the modeling approach and the number of predictors used. Moreover, the importance assigned to different predictors varied between machine learning modeling approaches. Decision tree-based models increased prediction accuracy by more than 10% compared to other machine learning approaches, and by more than 20% compared to statistical models, and resulted in higher ecological consistence of the predicted biogeographical patterns of fungal productivity. Decision tree-based models were the best approach for prediction both in sampling-like environments as well as in extrapolation beyond the spatial and climatic range of the modeling data. In this study, we show that proper variable selection is crucial to create robust models for extrapolation in biophysically differentiated areas. This allows for reducing the dimensions of the ecosystem space described by the predictors of the models, resulting in higher similarity between the modeling data and the environmental conditions over the whole study area. When dealing with spatial-temporal data in the analysis of biogeographical patterns, environmental blocking is postulated as a highly informative technique to be used in cross-validation to assess the prediction error over larger scales.

中文翻译:

基于统计和基于机器学习的方法预测森林生态系统真菌生产力的生物地理模式的性能

从大量具有复杂相互作用,相关性和非线性依赖性的驱动因素来预测生物地理格局,需要先进的分析方法和建模工​​具。这项研究比较了用于预测真菌生产力生物地理模式的不同统计模型和基于机器学习的模型,以此作为对替代模型方法的性能进行全面评估以提供准确且生态一致的预测的案例研究。我们评估并比较了两种统计建模技术(即广义线性混合模型和地理加权回归)以及基于不同机器学习算法的四种技术(即随机森林,极限梯度提升,支持向量机和人工神经网络来预测真菌的生产力。使用系统方法进行模型评估,该方法结合了随机,空间和环境限制以及根据科学知识对空间明确模型预测的生态一致性进行评估。真菌生产力的预测对建模方法和所使用的预测变量的数量很敏感。此外,在机器学习建模方法之间,分配给不同预测变量的重要性也有所不同。与其他机器学习方法相比,基于决策树的模型将预测准确性提高了10%以上,而与统计模型相比,则将预测准确性提高了20%以上,并导致了真菌生产力的预测生物地理模式的更高生态一致性。基于决策树的模型是在类似采样的环境以及超出建模数据的空间和气候范围的推断中进行预测的最佳方法。在这项研究中,我们表明适当的变量选择对​​于在生化差异区域中创建可靠的外推模型至关重要。这可以减少模型的预测变量所描述的生态系统空间的尺寸,从而在整个研究区域内的建模数据与环境条件之间具有更高的相似度。在分析生物地理格局时处理时空数据时,假定环境封锁是一种非常有用的技术,可用于交叉验证以评估较大规模的预测误差。
更新日期:2021-03-15
down
wechat
bug