当前位置: X-MOL 学术Environmetrics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Spatiotemporal clustering using Gaussian processes embedded in a mixture model
Environmetrics ( IF 1.7 ) Pub Date : 2021-05-07 , DOI: 10.1002/env.2681
Jarno Vanhatalo 1 , Scott D. Foster 2 , Geoffrey R. Hosack 2

The categorization of multidimensional data into clusters is a common task in statistics. Many applications of clustering, including the majority of tasks in ecology, use data that is inherently spatial and is often also temporal. However, spatiotemporal dependence is typically ignored when clustering multivariate data. We present a finite mixture model for spatial and spatiotemporal clustering that incorporates spatial and spatiotemporal autocorrelation by including appropriate Gaussian processes (GP) into a model for the mixing proportions. We also allow for flexible and semiparametric dependence on environmental covariates, once again using GPs. We propose to use Bayesian inference through three tiers of approximate methods: a Laplace approximation that allows efficient analysis of large datasets, and both partial and full Markov chain Monte Carlo (MCMC) approaches that improve accuracy at the cost of increased computational time. Comparison of the methods shows that the Laplace approximation is a useful alternative to the MCMC methods. A decadal analysis of 253 species of teleost fish from 854 samples collected along the biodiverse northwestern continental shelf of Australia between 1986 and 1997 shows the added clarity provided by accounting for spatial autocorrelation. For these data, the temporal dependence is comparatively small, which is an important finding given the changing human pressures over this time.



将多维数据分类为集群是统计学中的一项常见任务。聚类的许多应用程序,包括生态学中的大多数任务,使用的数据本质上是空间的,而且通常也是时间的。然而,在对多元数据进行聚类时,通常会忽略时空依赖性。我们提出了空间和时空聚类的有限混合模型,该模型通过将适当的高斯过程 (GP) 包含到混合比例模型中来合并空间和时空自相关。我们还允许对环境协变量进行灵活的半参数依赖,再次使用 GP。我们建议通过三层近似方法使用贝叶斯推理:拉普拉斯近似,允许对大型数据集进行有效分析,部分和全马尔可夫链蒙特卡罗 (MCMC) 方法以增加计算时间为代价提高准确性。方法的比较表明,拉普拉斯近似是 MCMC 方法的有用替代方法。对 1986 年至 1997 年间沿生物多样性丰富的澳大利亚西北部大陆架收集的 854 个样本中的 253 种硬骨鱼进行的十年分析表明,空间自相关的考虑提供了额外的清晰度。对于这些数据,时间依赖性相对较小,考虑到这段时间内人类压力的变化,这是一个重要的发现。对 1986 年至 1997 年间沿生物多样性丰富的澳大利亚西北部大陆架收集的 854 个样本中的 253 种硬骨鱼进行的十年分析表明,空间自相关的考虑提供了额外的清晰度。对于这些数据,时间依赖性相对较小,考虑到这段时间内人类压力的变化,这是一个重要的发现。对 1986 年至 1997 年间沿生物多样性丰富的澳大利亚西北部大陆架收集的 854 个样本中的 253 种硬骨鱼进行的十年分析表明,空间自相关的考虑提供了额外的清晰度。对于这些数据,时间依赖性相对较小,考虑到这段时间内人类压力的变化,这是一个重要的发现。