Parallel cross-validation: A scalable fitting method for Gaussian process models,Computational Statistics & Data Analysis

当前位置： X-MOL 学术 › Comput. Stat. Data Anal. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Parallel cross-validation: A scalable fitting method for Gaussian process models
Computational Statistics & Data Analysis ( IF 1.8 ) Pub Date : 2021-03-01 , DOI: 10.1016/j.csda.2020.107113
Florian Gerber , Douglas W. Nychka

Gaussian process (GP) models are widely used to analyze spatially referenced data and to predict values at locations without observations. In contrast to many algorithmic procedures, GP models are based on a statistical framework, which enables uncertainty quantification of the model structure and predictions. Both the evaluation of the likelihood and the prediction involve solving linear systems. Hence, the computational costs are large and limit the amount of data that can be handled. While there are many approximation strategies that lower the computational cost of GP models, they often provide only sub-optimal support for the parallel computing capabilities of current (high-performance) computing environments. We aim at bridging this gap with a parameter estimation and prediction method that is designed to be parallelizable. More precisely, we divide the spatial domain into overlapping subsets and use cross-validation (CV) to estimate the covariance parameters in parallel. We present simulation studies, which assess the accuracy of the parameter estimates and predictions. Moreover, we show that our implementation has good weak and strong parallel scaling properties. For illustration, we fit an exponential covariance model to a scientifically relevant canopy height dataset with 5 million observations. Using 512 processor cores in parallel brings the evaluation time of one covariance parameter configuration to less than 1.5 minutes. The parallel CV method can be easily extended to include approximate likelihood methods, multivariate and spatio-temporal data, as well as non-stationary covariance models.

中文翻译：

并行交叉验证：一种用于高斯过程模型的可扩展拟合方法

高斯过程 (GP) 模型被广泛用于分析空间参考数据和预测没有观察的位置的值。与许多算法程序相比，GP 模型基于统计框架，这使得模型结构和预测的不确定性量化成为可能。似然评估和预测都涉及求解线性系统。因此，计算成本很大并且限制了可以处理的数据量。虽然有许多近似策略可以降低 GP 模型的计算成本，但它们通常只能为当前（高性能）计算环境的并行计算能力提供次优支持。我们旨在通过设计为可并行化的参数估计和预测方法来弥合这一差距。更确切地说，我们将空间域划分为重叠的子集，并使用交叉验证（CV）并行估计协方差参数。我们提供模拟研究，评估参数估计和预测的准确性。此外，我们表明我们的实现具有良好的弱和强并行缩放特性。为了说明，我们将指数协方差模型拟合到具有 500 万个观测值的科学相关冠层高度数据集。并行使用 512 个处理器内核使一个协方差参数配置的评估时间少于 1.5 分钟。并行 CV 方法可以很容易地扩展到包括近似似然方法、多变量和时空数据以及非平稳协方差模型。

更新日期：2021-03-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>