当前位置: X-MOL 学术Methods Ecol. Evol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multi‐output Gaussian processes for species distribution modelling
Methods in Ecology and Evolution ( IF 6.3 ) Pub Date : 2020-09-23 , DOI: 10.1111/2041-210x.13496
Martin Ingram 1 , Damjan Vukcevic 2, 3 , Nick Golding 1
Affiliation  

  1. Species distribution modelling is an active area of research in ecology. In recent years, interest has grown in modelling multiple species simultaneously, partly due to the ability to ‘borrow strength’ from similar species to improve predictions. Mixed and hierarchical models allow this but typically assume a (generalised) linear relationship between covariates and species presence and absence. On the other hand, popular machine learning techniques such as random forests and boosted regression trees are able to model complex nonlinear relationships but consider only one species at a time.
  2. We apply multi‐output Gaussian processes (MOGPs) to the problem of species distribution modelling. MOGPs model each species' response to the environment as a weighted sum of a small number of nonlinear functions, each modelled by a Gaussian process. While Gaussian process models are notoriously computationally intensive, recent techniques from the machine learning literature as well as using graphics processing units (GPUs) allow us to scale the model to datasets with hundreds of species at thousands of sites.
  3. We evaluate the MOGP against four baseline models on six different datasets. Overall, the MOGP is competitive with the best single‐species and joint‐species models, while being much faster to fit. On single‐species metrics (AUC and log likelihood), the MOGP and single‐output GPs outperformed tree‐based models (random forest and boosted regression trees) and a joint species distribution model (JSDM). Compared to single‐output GPs, the MOGP generally has a higher AUC for rare species with fewer than 50 observation in the dataset. When evaluated using joint‐species log likelihood, the MOGP outperforms all models apart from the JSDM, which has a better joint likelihood on three datasets and similar performance on the three others. A key advantage of the MOGP is speed: on the largest dataset, it is around 18 times faster than fitting single output GPs, and over 80 times faster to fit than the JSDM.
  4. Our results suggest that both MOGPs and SOGPs are accurate predictive models of species distributions and that the MOGP is particularly compelling when predictions for rare species are of interest.


中文翻译:

多输出高斯过程,用于物种分布建模

  1. 物种分布建模是生态学研究的活跃领域。近年来,人们对同时建立多个物种的模型越来越感兴趣,部分原因是能够从相似物种“借力”以改善预测。混合模型和分层模型允许这样做,但通常假设协变量与物种存在与否之间存在(广义)线性关系。另一方面,流行的机器学习技术(例如随机森林和增强回归树)能够对复杂的非线性关系建模,但一次只考虑一个物种。
  2. 我们将多输出高斯过程(MOGPs)应用于物种分布建模问题。MOGP将每个物种对环境的响应建模为少量非线性函数的加权总和,每个非线性函数均通过高斯过程进行建模。尽管众所周知,高斯过程模型的计算量很大,但是机器学习文献以及使用图形处理单元(GPU)的最新技术使我们能够将模型扩展到数千个站点上具有数百个物种的数据集。
  3. 我们针对六个不同数据集上的四个基线模型评估了MOGP。总体而言,MOGP与最佳的单一物种和联合物种模型相比具有竞争优势,同时拟合速度更快。在单物种指标(AUC和对数似然)上,MOGP和单输出GP优于基于树的模型(随机森林和增强回归树)和联合物种分布模型(JSDM)。与单输出GP相比,MOGP对于稀有物种通常具有更高的AUC,而在数据集中观察不到50个。当使用联合物种对数似然法进行评估时,MOGP优于JSDM的所有模型,后者在三个数据集上具有更好的联合似然性,而在其他三个数据集上具有相似的性能。MOGP的主要优势在于速度:在最大的数据集上,它比安装单个输出GP快约18倍,
  4. 我们的结果表明,MOGP和SOGP都是准确的物种分布预测模型,当对稀有物种的预测感兴趣时,MOGP特别引人注目。
更新日期:2020-09-23
down
wechat
bug