当前位置: X-MOL 学术J. Comput. Graph. Stat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
LowCon: A Design-based Subsampling Approach in a Misspecified Linear Model
Journal of Computational and Graphical Statistics ( IF 2.4 ) Pub Date : 2020-12-11 , DOI: 10.1080/10618600.2020.1844215
Cheng Meng 1 , Rui Xie 2 , Abhyuday Mandal 3 , Xinlian Zhang 4 , Wenxuan Zhong 3 , Ping Ma 3
Affiliation  

Abstract

We consider a measurement constrained supervised learning problem, that is, (i) full sample of the predictors are given; (ii) the response observations are unavailable and expensive to measure. Thus, it is ideal to select a subsample of predictor observations, measure the corresponding responses, and then fit the supervised learning model on the subsample of the predictors and responses. However, model fitting is a trial and error process, and a postulated model for the data could be misspecified. Our empirical studies demonstrate that most of the existing subsampling methods have unsatisfactory performances when the models are misspecified. In this paper, we develop a novel subsampling method, called “LowCon,” which outperforms the competing methods when the working linear model is misspecified. Our method uses orthogonal Latin hypercube designs to achieve a robust estimation. We show that the proposed design-based estimator approximately minimizes the so-called worst-case bias with respect to many possible misspecification terms. Both the simulated and real-data analyses demonstrate the proposed estimator is more robust than several subsample least-squares estimators obtained by state-of-the-art subsampling methods. Supplementary materials for this article are available online.



中文翻译:

LowCon:错误指定的线性模型中基于设计的子采样方法

摘要

我们考虑一个测量约束的监督学习问题,即(i)给出预测变量的完整样本;(ii) 响应观察不可用且测量成本高。因此,理想的是选择预测变量观察的子样本,测量相应的响应,然后将监督学习模型拟合到预测变量和响应的子样本上。但是,模型拟合是一个反复试验的过程,并且可能会错误地指定数据的假设模型。我们的实证研究表明,当模型指定错误时,大多数现有的子采样方法的性能都不令人满意。在本文中,我们开发了一种新的子采样方法,称为“LowCon”,当工作线性模型被错误指定时,该方法优于竞争方法。我们的方法使用正交拉丁超立方体设计来实现稳健的估计。我们表明,所提出的基于设计的估计器在许多可能的错误指定项方面近似最小化了所谓的最坏情况偏差。模拟和实际数据分析都表明,所提出的估计量比通过最先进的子采样方法获得的几个子样本最小二乘估计量更稳健。本文的补充材料可在线获取。模拟和实际数据分析都表明,所提出的估计量比通过最先进的子采样方法获得的几个子样本最小二乘估计量更稳健。本文的补充材料可在线获取。模拟和实际数据分析都表明,所提出的估计量比通过最先进的子采样方法获得的几个子样本最小二乘估计量更稳健。本文的补充材料可在线获取。

更新日期:2020-12-11
down
wechat
bug