当前位置: X-MOL 学术Methods Ecol. Evol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A new joint species distribution model for faster and more accurate inference of species associations from big community data
Methods in Ecology and Evolution ( IF 6.3 ) Pub Date : 2021-07-28 , DOI: 10.1111/2041-210x.13687
Maximilian Pichler 1 , Florian Hartig 1
Affiliation  

  1. Joint species distribution models (JSDMs) explain spatial variation in community composition by contributions of the environment, biotic associations and possibly spatially structured residual covariance. They show great promise as a general analytical framework for community ecology and macroecology, but current JSDMs, even when approximated by latent variables, scale poorly on large datasets, limiting their usefulness for currently emerging big (e.g. metabarcoding and metagenomics) community datasets.
  2. Here, we present a novel, more scalable JSDM (sjSDM) that circumvents the need to use latent variables by using a Monte Carlo integration of the joint JSDM likelihood together with flexible elastic net regularization on all model components. We implemented sjSDM in PyTorch, a modern machine learning framework, which allows making use of both CPU and GPU calculations. Using simulated communities with known species–species associations and different number of species and sites, we compare sjSDM with state-of-the-art JSDM implementations to determine computational runtimes and accuracy of the inferred species–species and species–environment associations.
  3. We find that sjSDM is orders of magnitude faster than existing JSDM algorithms (even when run on the CPU) and can be scaled to very large datasets. Despite the dramatically improved speed, sjSDM produces more accurate estimates of species association structures than alternative JSDM implementations. We demonstrate the applicability of sjSDM to big community data using eDNA case study with thousands of fungi operational taxonomic units (OTU).
  4. Our sjSDM approach makes the analysis of JSDMs to large community datasets with hundreds or thousands of species possible, substantially extending the applicability of JSDMs in ecology. We provide our method in an R package to facilitate its applicability for practical data analysis.


中文翻译:

一种新的联合物种分布模型,可从大群落数据中更快、更准确地推断物种关联

  1. 联合物种分布模型 (JSDM) 通过环境、生物关联和可能的空间结构残差协方差的贡献来解释群落组成的空间变化。它们作为社区生态学和宏观生态学的一般分析框架显示出巨大的前景,但当前的 JSDM,即使通过潜在变量进行近似,在大型数据集上的扩展性也很差,限制了它们对当前新兴的大型(例如元条形码和宏基因组学)社区数据集的有用性。
  2. 在这里,我们提出了一种新颖的、更具可扩展性的 JSDM (sjSDM),它通过在所有模型组件上使用联合 JSDM 似然的蒙特卡罗积分以及灵活的弹性网络正则化来规避使用潜在变量的需要。我们在 PyTorch 中实现了 sjSDM,这是一种现代机器学习框架,它允许同时使用 CPU 和 GPU 计算。使用具有已知物种 - 物种关联以及不同数量的物种和站点的模拟群落,我们将 sjSDM 与最先进的 JSDM 实现进行比较,以确定推断的物种 - 物种和物种 - 环境关联的计算运行时间和准确性。
  3. 我们发现 sjSDM 比现有的 JSDM 算法(即使在 CPU 上运行时)快几个数量级,并且可以扩展到非常大的数据集。尽管速度显着提高,但 sjSDM 比替代 JSDM 实现产生更准确的物种关联结构估计。我们使用具有数千个真菌操作分类单元 (OTU) 的 eDNA 案例研究证明了 sjSDM 对大社区数据的适用性。
  4. 我们的 sjSDM 方法使对具有数百或数千种物种的大型社区数据集的 JSDM 分析成为可能,大大扩展了 JSDM 在生态学中的适用性。我们在 R 包中提供我们的方法,以促进其在实际数据分析中的适用性。
更新日期:2021-07-28
down
wechat
bug