Two-phase sampling designs for data validation in settings with covariate measurement error and continuous outcome,The Journal of the Royal Statistical Society, Series A (Statistics in Society)

当前位置： X-MOL 学术 › J. R. Stat. Soc. A › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Two-phase sampling designs for data validation in settings with covariate measurement error and continuous outcome
The Journal of the Royal Statistical Society, Series A (Statistics in Society) ( IF 1.5 ) Pub Date : 2021-04-15 , DOI: 10.1111/rssa.12689
Gustavo Amorim ₁ , Ran Tao _{1,

2} , Sarah Lotspeich ₁ , Pamela A Shaw ₃ , Thomas Lumley ₄ , Bryan E Shepherd ₁

Affiliation

Measurement errors are present in many data collection procedures and can harm analyses by biasing estimates. To correct for measurement error, researchers often validate a subsample of records and then incorporate the information learned from this validation sample into estimation. In practice, the validation sample is often selected using simple random sampling (SRS). However, SRS leads to inefficient estimates because it ignores information on the error-prone variables, which can be highly correlated to the unknown truth. Applying and extending ideas from the two-phase sampling literature, we propose optimal and nearly optimal designs for selecting the validation sample in the classical measurement-error framework. We target designs to improve the efficiency of model-based and design-based estimators, and show how the resulting designs compare to each other. Our results suggest that sampling schemes that extract more information from the error-prone data are substantially more efficient than SRS, for both design- and model-based estimators. The optimal procedure, however, depends on the analysis method, and can differ substantially. This is supported by theory and simulations. We illustrate the various designs using data from an HIV cohort study.

中文翻译：

在具有协变量测量误差和连续结果的情况下进行数据验证的两阶段抽样设计

许多数据收集程序中都存在测量误差，并且可能会因估计偏差而损害分析。为了纠正测量误差，研究人员经常验证记录的子样本，然后将从该验证样本中学到的信息纳入估计中。在实践中，验证样本通常是使用简单随机抽样（SRS）来选择的。然而，SRS 会导致估计效率低下，因为它忽略了容易出错的变量的信息，而这些变量可能与未知的事实高度相关。应用和扩展两阶段抽样文献的思想，我们提出了在经典测量误差框架中选择验证样本的最佳和近乎最佳设计。我们的设计目标是提高基于模型和基于设计的估算器的效率，并展示最终设计如何相互比较。我们的结果表明，对于基于设计和基于模型的估算器来说，从容易出错的数据中提取更多信息的采样方案比 SRS 更有效。然而，最佳程序取决于分析方法，并且可能存在很大差异。这得到了理论和模拟的支持。我们使用 HIV 队列研究的数据来说明各种设计。

更新日期：2021-04-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文