当前位置: X-MOL 学术J. Comput. Graph. Stat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Optimal Sampling for Generalized Linear Models under Measurement Constraints
Journal of Computational and Graphical Statistics ( IF 2.4 ) Pub Date : 2020-07-20 , DOI: 10.1080/10618600.2020.1778483
Tao Zhang 1 , Yang Ning 1 , David Ruppert 1, 2
Affiliation  

Under "measurement constraints," responses are expensive to measure and initially unavailable on most of records in the dataset, but the covariates are available for the entire dataset. Our goal is to sample a relatively small portion of the dataset where the expensive responses will be measured and the resultant sampling estimator is statistically efficient. Measurement constraints require the sampling probabilities can only depend on a very small set of the responses. A sampling procedure that uses responses at most only on a small pilot sample will be called "response-free." We propose a response-free sampling procedure \mbox{(OSUMC)} for generalized linear models (GLMs). Using the A-optimality criterion, i.e., the trace of the asymptotic variance, the resultant estimator is statistically efficient within a class of sampling estimators. We establish the unconditional asymptotic distribution of a general class of response-free sampling estimators. This result is novel compared with the existing conditional results obtained by conditioning on both covariates and responses. Under our unconditional framework, the subsamples are no longer independent and new martingale techniques are developed for our asymptotic theory. We further derive the A-optimal response-free sampling distribution. Since this distribution depends on population level quantities, we propose the Optimal Sampling Under Measurement Constraints (OSUMC) algorithm to approximate the theoretical optimal sampling. Finally, we conduct an intensive empirical study to demonstrate the advantages of OSUMC algorithm over existing methods in both statistical and computational perspectives.

中文翻译:

测量约束下广义线性模型的最优采样

在“测量约束”下,响应的测量成本很高,并且最初无法在数据集中的大多数记录上使用,但协变量可用于整个数据集。我们的目标是对数据集的一小部分进行采样,其中将测量昂贵的响应,并且由此产生的采样估计器在统计上是有效的。测量约束要求采样概率只能依赖于非常小的一组响应。最多仅对小规模试点样本使用响应的抽样程序将被称为“无响应”。我们为广义线性模型 (GLM) 提出了一种无响应采样程序 \mbox{(OSUMC)}。使用 A 最优性准则,即渐近方差的迹,结果估计量在一类抽样估计量中是统计上有效的。我们建立了一类无响应抽样估计量的无条件渐近分布。与通过调节协变量和响应获得的现有条件结果相比,该结果是新颖的。在我们的无条件框架下,子样本不再是独立的,并且为我们的渐近理论开发了新的鞅技术。我们进一步推导出 A 最优无响应采样分布。由于这种分布取决于总体水平的数量,我们提出了在测量约束下的最优采样 (OSUMC) 算法来近似理论最优采样。最后,
更新日期:2020-07-20
down
wechat
bug