当前位置: X-MOL 学术Ecography › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Testing the ability of species distribution models to infer variable importance
Ecography ( IF 5.9 ) Pub Date : 2020-09-02 , DOI: 10.1111/ecog.05317
Adam B. Smith 1 , Maria J. Santos 2
Affiliation  

Models of species’ distributions and niches are frequently used to infer the importance of range‐ and niche‐defining variables. However, the degree to which these models can reliably identify important variables and quantify their influence remains unknown. Here we use a series of simulations to explore how well models can 1) discriminate between variables with different influence and 2) calibrate the magnitude of influence relative to an ‘omniscient’ model. To quantify variable importance, we trained generalized additive models (GAMs), Maxent and boosted regression trees (BRTs) on simulated data and tested their sensitivity to permutations in each predictor. Importance was inferred by calculating the correlation between permuted and unpermuted predictions, and by comparing predictive accuracy of permuted and unpermuted predictions using AUC and the continuous Boyce index. In scenarios with one influential and one uninfluential variable, models failed to discriminate reliably between variables when training occurrences were < 8–64, prevalence was > 0.5, spatial extent was small, environmental data had coarse resolution and spatial autocorrelation was low, or when pairwise correlation between environmental variables was |r| > 0.7. When two variables influenced the distribution equally, importance was underestimated when species had narrow or intermediate niche breadth. Interactions between variables in how they shaped the niche did not affect inferences about their importance. When variables acted unequally, the effect of the stronger variable was overestimated. GAMs and Maxent discriminated between variables more reliably than BRTs, but no algorithm was consistently well‐calibrated vis‐à‐vis the omniscient model. Algorithm‐specific measures of importance like Maxent's change‐in‐gain metric were less robust than the permutation test. Overall, high predictive accuracy did not connote robust inferential capacity. As a result, requirements for reliably measuring variable importance are likely more stringent than for creating models with high predictive accuracy.

中文翻译:

测试物种分布模型推断变量重要性的能力

物种分布和生态位的模型经常被用来推断范围和生态位定义变量的重要性。但是,这些模型能否可靠地识别重要变量并量化其影响的程度仍然未知。在这里,我们使用一系列模拟来探索模型如何能够做到以下几点:1)区分具有不同影响的变量,以及2)相对于“无所不知”的模型校准影响的大小。为了量化变量的重要性,我们在模拟数据上训练了广义加性模型(GAM),Maxent和Boosted回归树(BRT),并测试了它们对每个预测变量排列的敏感性。通过计算置换预测和非置换预测之间的相关性可以推断出重要性,并通过使用AUC和连续博伊斯指数比较排列和未排列预测的预测准确性。在具有一个有影响力和一个无影响力变量的场景中,当训练发生次数小于8-64,患病率大于0.5,空间范围较小,环境数据的分辨率较差且空间自相关性较低或成对时,模型无法可靠地区分变量环境变量之间的相关性为| r |。> 0.7。当两个变量均等地影响分布时,当物种的生态位宽度狭窄或中等时,重要性就被低估了。变量如何塑造利基市场之间的相互作用并不会影响对其重要性的推论。当变量作用不均时,强变量的作用被高估了。GAM和Maxent比BRT更可靠地区分变量,但是没有一种算法能够对无所不知的模型进行一致的校准。像Maxent的收益增加量度度量标准这样的算法特定的重要性度量没有排列检验那么健壮。总体而言,高预测准确性并不意味着推理能力强。结果,可靠地测量变量重要性的要求可能比创建具有高预测精度的模型更为严格。
更新日期:2020-09-02
down
wechat
bug