当前位置: X-MOL 学术J. Comput. Graph. Stat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Predictive Subdata Selection for Computer Models
Journal of Computational and Graphical Statistics ( IF 1.4 ) Pub Date : 2022-07-25 , DOI: 10.1080/10618600.2022.2097247
Ming-Chung Chang 1
Affiliation  

Abstract

An explosion in the availability of rich data from the technological advances is hindering efforts at statistical analysis due to constraints on time and memory storage, regardless of whether researchers employ simple methods (e.g., linear regression) or complex models (e.g., Gaussian processes). A recent approach to overcoming these limits involves information-based optimal subdata selection and Latin hypercube subagging. In the current study, we develop a novel subdata selection method for large-scale computer models based on expected improvement optimization. Numerical and empirical analysis using real-world data are used to select subdata by which to derive accurate predictions. During the optimization procedure, the proposed scheme employs the geometry of the input feature region as well as information related to output values. The data points associated with the largest improvement in prediction accuracy are combined in the construction of a subdataset that can be used to formulate predictions with affordable computing time. Supplementary materials for this article, including proofs of theorems and additional numerical results, are available online.



中文翻译:

计算机模型的预测子数据选择

摘要

无论研究人员采用简单方法(例如线性回归)还是复杂模型(例如高斯过程),由于时间和内存存储的限制,技术进步带来的丰富数据的可用性的爆炸式增长阻碍了统计分析的努力。最近克服这些限制的方法涉及基于信息的最佳子数据选择和拉丁超立方子分类​​。在当前的研究中,我们开发了一种基于预期改进优化的大规模计算机模型的新型子数据选择方法。使用真实世界数据的数值和实证分析用于选择子数据,从而得出准确的预测。在优化过程中,所提出的方案采用输入特征区域的几何形状以及与输出值相关的信息。与预测精度最大改进相关的数据点被组合在子数据集的构造中,该子数据集可用于以可承受的计算时间制定预测。本文的补充材料,包括定理证明和其他数值结果,可在线获取。

更新日期:2022-07-25
down
wechat
bug