当前位置: X-MOL 学术Stat. Sin. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Semiparametric Estimation with Data Missing Not at Random Using an Instrumental Variable
Statistica Sinica ( IF 1.4 ) Pub Date : 2018-01-01 , DOI: 10.5705/ss.202016.0324
Eric Tchetgen Tchetgen , Baoluo Sun , Lan Liu , Wang Miao , Kathleen Wirth , Robin James

Missing data occur frequently in empirical studies in health and social sciences, often compromising our ability to make accurate inferences. An outcome is said to be missing not at random (MNAR) if, conditional on the observed variables, the missing data mechanism still depends on the unobserved outcome. In such settings, identification is generally not possible without imposing additional assumptions. Identification is sometimes possible, however, if an instrumental variable (IV) is observed for all subjects which satisfies the exclusion restriction that the IV affects the missingness process without directly influencing the outcome. In this paper, we provide necessary and sufficient conditions for nonparametric identification of the full data distribution under MNAR with the aid of an IV. In addition, we give sufficient identification conditions that are more straightforward to verify in practice. For inference, we focus on estimation of a population outcome mean, for which we develop a suite of semiparametric estimators that extend methods previously developed for data missing at random. Specifically, we propose inverse probability weighted estimation, outcome regression-based estimation and doubly robust estimation of the mean of an outcome subject to MNAR. For illustration, the methods are used to account for selection bias induced by HIV testing refusal in the evaluation of HIV seroprevalence in Mochudi, Botswana, using interviewer characteristics such as gender, age and years of experience as IVs.

中文翻译:

使用工具变量非随机丢失数据的半参数估计

在健康和社会科学的实证研究中经常出现数据缺失,这往往会影响我们做出准确推断的能力。如果以观察到的变量为条件,缺失数据机制仍然依赖于未观察到的结果,那么结果被称为非随机缺失 (MNAR)。在这种情况下,如果不施加额外的假设,识别通常是不可能的。然而,如果对所有受试者观察到工具变量 (IV) 满足排除限制,即 IV 影响缺失过程而不直接影响结果,则识别有时是可能的。在本文中,我们借助 IV 为 MNAR 下全数据分布的非参数识别提供了充分必要条件。此外,我们给出了足够的识别条件,在实践中更容易验证。对于推理,我们专注于总体结果均值的估计,为此我们开发了一套半参数估计器,这些估计器扩展了先前为随机丢失数据开发的方法。具体来说,我们提出了逆概率加权估计、基于结果回归的估计和受 MNAR 影响的结果均值的双重稳健估计。例如,在博茨瓦纳 Mochudi 的 HIV 血清阳性率评估中,这些方法用于解释由 HIV 检测拒绝引起的选择偏差,使用采访者特征(例如性别、年龄和经验年限)作为 IV。为此,我们开发了一套半参数估计器,这些估计器扩展了先前为随机丢失数据而开发的方法。具体来说,我们提出了逆概率加权估计、基于结果回归的估计和受 MNAR 影响的结果均值的双重稳健估计。例如,在博茨瓦纳 Mochudi 的 HIV 血清阳性率评估中,这些方法用于解释由 HIV 检测拒绝引起的选择偏差,使用采访者特征(例如性别、年龄和经验年限)作为 IV。为此,我们开发了一套半参数估计器,这些估计器扩展了先前为随机丢失数据而开发的方法。具体来说,我们提出了逆概率加权估计、基于结果回归的估计和受 MNAR 影响的结果均值的双重稳健估计。例如,在博茨瓦纳 Mochudi 的 HIV 血清阳性率评估中,这些方法用于解释由 HIV 检测拒绝引起的选择偏差,使用采访者特征(例如性别、年龄和经验年限)作为 IV。
更新日期:2018-01-01
down
wechat
bug