当前位置: X-MOL 学术Ecography › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Is more data always better? A simulation study of benefits and limitations of integrated distribution models
Ecography ( IF 5.4 ) Pub Date : 2020-07-14 , DOI: 10.1111/ecog.05146
Emily G. Simmonds 1 , Susan G. Jarvis 2 , Peter A. Henrys 2 , Nick J. B. Isaac 3 , Robert B. O'Hara 1
Affiliation  

Species distribution models are popular and widely applied ecological tools. Recent increases in data availability have led to opportunities and challenges for species distribution modelling. Each data source has different qualities, determined by how it was collected. As several data sources can inform on a single species, ecologists have often analysed just one of the data sources, but this loses information, as some data sources are discarded. Integrated distribution models (IDMs) were developed to enable inclusion of multiple datasets in a single model, whilst accounting for different data collection protocols. This is advantageous because it allows efficient use of all data available, can improve estimation and account for biases in data collection. What is not yet known is when integrating different data sources does not bring advantages. Here, for the first time, we explore the potential limits of IDMs using a simulation study integrating a spatially biased, opportunistic, presence‐only dataset with a structured, presence–absence dataset. We explore four scenarios based on real ecological problems; small sample sizes, low levels of detection probability, correlations between covariates and a lack of knowledge of the drivers of bias in data collection. For each scenario we ask; do we see improvements in parameter estimation or the accuracy of spatial pattern prediction in the IDM versus modelling either data source alone? We found integration alone was unable to correct for spatial bias in presence‐only data. Including a covariate to explain bias or adding a flexible spatial term improved IDM performance beyond single dataset models, with the models including a flexible spatial term producing the most accurate and robust estimates. Increasing the sample size of presence–absence data and having no correlated covariates also improved estimation. These results demonstrate under which conditions integrated models provide benefits over modelling single data sources.

中文翻译:

越来越多的数据总是更好吗?集成分布模型的利弊模拟研究

物种分布模型是流行且广泛应用的生态工具。数据可用性的最新增长为物种分布建模带来了机遇和挑战。每个数据源都有不同的质量,取决于收集方式。由于多个数据源可以说明一个物种,因此生态学家通常只分析其中一个数据源,但这会丢失信息,因为某些数据源被丢弃了。开发了集成分布模型(IDM),以使多个数据集包含在一个模型中,同时考虑了不同的数据收集协议。这是有利的,因为它允许有效使用所有可用数据,可以改善估计并解决数据收集中的偏差。尚不知道何时集成不同的数据源不会带来优势。在这里,我们第一次使用模拟研究来探索IDM的潜在限制,该研究将空间偏见,机会主义,仅存在数据集与结构化,不存在数据集进行了集成。我们根据实际的生态问题探讨了四种方案:样本量小,检测概率低,协变量之间的相关性以及对数据收集中的偏差驱动因素缺乏了解。对于每种情况,我们都要求;与仅对任一数据源进行建模相比,IDM中的参数估计或空间模式预测的准确性是否有所改善?我们发现仅凭积分无法纠正仅在场数据中的空间偏差。包括协变量来解释偏差,或者添加灵活的空间术语,可以改善IDM的性能,使其超越单个数据集模型,这些模型包括一个灵活的空间项,可以产生最准确,最可靠的估计。增加存在/不存在数据的样本量并且没有相关的协变量也可以改善估计。这些结果表明,在什么条件下集成模型比建模单个数据源更具优势。
更新日期:2020-07-14
down
wechat
bug