A Nuisance-Free Inference Procedure Accounting for the Unknown Missingness with Application to Electronic Health Records,Entropy

当前位置： X-MOL 学术 › Entropy › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Nuisance-Free Inference Procedure Accounting for the Unknown Missingness with Application to Electronic Health Records
Entropy ( IF 2.1 ) Pub Date : 2020-10-14 , DOI: 10.3390/e22101154
Jiwei Zhao ₁ , Chi Chen ₂

Affiliation

We study how to conduct statistical inference in a regression model where the outcome variable is prone to missing values and the missingness mechanism is unknown. The model we consider might be a traditional setting or a modern high-dimensional setting where the sparsity assumption is usually imposed and the regularization technique is popularly used. Motivated by the fact that the missingness mechanism, albeit usually treated as a nuisance, is difficult to specify correctly, we adopt the conditional likelihood approach so that the nuisance can be completely ignored throughout our procedure. We establish the asymptotic theory of the proposed estimator and develop an easy-to-implement algorithm via some data manipulation strategy. In particular, under the high-dimensional setting where regularization is needed, we propose a data perturbation method for the post-selection inference. The proposed methodology is especially appealing when the true missingness mechanism tends to be missing not at random, e.g., patient reported outcomes or real world data such as electronic health records. The performance of the proposed method is evaluated by comprehensive simulation experiments as well as a study of the albumin level in the MIMIC-III database.

中文翻译：

应用于电子健康记录的未知缺失的无公害推理程序

我们研究如何在结果变量容易出现缺失值且缺失机制未知的回归模型中进行统计推断。我们考虑的模型可能是传统设置或现代高维设置，其中通常施加稀疏性假设并普遍使用正则化技术。由于缺失机制虽然通常被视为一种干扰，但很难正确指定，因此我们采用条件似然方法，以便在整个过程中可以完全忽略干扰。我们建立了所提出的估计器的渐近理论，并通过一些数据操作策略开发了一种易于实现的算法。特别是，在需要正则化的高维设置下，我们提出了一种用于后选择推理的数据扰动方法。当真正的缺失机制往往不是随机缺失时，例如患者报告的结果或电子健康记录等真实世界数据，所提出的方法特别有吸引力。通过综合模拟实验以及 MIMIC-III 数据库中白蛋白水平的研究来评估所提出方法的性能。

更新日期：2020-10-14

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11