当前位置: X-MOL 学术Can. J. Stat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A sequential split‐and‐conquer approach for the analysis of big dependent data in computer experiments
The Canadian Journal of Statistics ( IF 0.8 ) Pub Date : 2020-07-28 , DOI: 10.1002/cjs.11559
Chengrui Li 1 , Ying Hung 2 , Minge Xie 2
Affiliation  

Massive correlated data with many inputs are often generated from computer experiments to study complex systems. The Gaussian process (GP) model is a widely used tool for the analysis of computer experiments. Although GPs provide a simple and effective approximation to computer experiments, two critical issues remain unresolved. One is the computational issue in GP estimation and prediction where intensive manipulations of a large correlation matrix are required. For a large sample size and with a large number of variables, this task is often unstable or infeasible. The other issue is how to improve the naive plug‐in predictive distribution which is known to underestimate the uncertainty. In this article, we introduce a unified framework that can tackle both issues simultaneously. It consists of a sequential split‐and‐conquer procedure, an information combining technique using confidence distributions (CD), and a frequentist predictive distribution based on the combined CD. It is shown that the proposed method maintains the same asymptotic efficiency as the conventional likelihood inference under mild conditions, but dramatically reduces the computation in both estimation and prediction. The predictive distribution contains comprehensive information for inference and provides a better quantification of predictive uncertainty as compared with the plug‐in approach. Simulations are conducted to compare the estimation and prediction accuracy with some existing methods, and the computational advantage of the proposed method is also illustrated. The proposed method is demonstrated by a real data example based on tens of thousands of computer experiments generated from a computational fluid dynamic simulator.

中文翻译:

在计算机实验中分析大相依数据的顺序分而治之方法

具有大量输入的海量关联数据通常是通过计算机实验生成的,以研究复杂的系统。高斯过程(GP)模型是用于计算机实验分析的广泛使用的工具。尽管GP为计算机实验提供了一种简单有效的近似方法,但仍有两个关键问题尚未解决。其中之一是GP估计和预测中的计算问题,其中需要对大的相关矩阵进行大量操作。对于较大的样本量和大量变量,此任务通常不稳定或不可行。另一个问题是如何改善朴素的插件预测分布,已知该分布会低估不确定性。在本文中,我们介绍了一个可以同时解决这两个问题的统一框架。它由顺序的分而治之程序组成,一种使用置信度分布(CD)的信息组合技术,以及基于组合CD的频繁预测分布。结果表明,所提出的方法在温和条件下保持了与传统似然推理相同的渐近效率,但是大大减少了估计和预测的计算量。与插件方法相比,预测分布包含用于推断的综合信息,并提供了更好的量化预测不确定性的方法。进行了仿真,以比较估计和预测精度与一些现有方法,并说明了该方法的计算优势。
更新日期:2020-07-28
down
wechat
bug