当前位置: X-MOL 学术Biometrika › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Diagnosing missing always at random in multivariate data
Biometrika ( IF 2.4 ) Pub Date : 2019-11-23 , DOI: 10.1093/biomet/asz061
Iavor I Bojinov 1 , Natesh S Pillai 2 , Donald B Rubin 3
Affiliation  

Models for analyzing multivariate data sets with missing values require strong, often unassessable, assumptions. The most common of these is that the mechanism that created the missing data is ignorable - a twofold assumption dependent on the mode of inference. The first part, which is the focus here, under the Bayesian and direct-likelihood paradigms, requires that the missing data are missing at random; in contrast, the frequentist-likelihood paradigm demands that the missing data mechanism always produces missing at random data, a condition known as missing always at random. Under certain regularity conditions, assuming missing always at random leads to an assumption that can be tested using the observed data alone namely, the missing data indicators only depend on fully observed variables. Here, we propose three different diagnostic tests that not only indicate when this assumption is incorrect but also suggest which variables are the most likely culprits. Although missing always at random is not a necessary condition to ensure validity under the Bayesian and direct-likelihood paradigms, it is sufficient, and evidence for its violation should encourage the careful statistician to conduct targeted sensitivity analyses.

中文翻译:

在多变量数据中总是随机诊断缺失

用于分析具有缺失值的多元数据集的模型需要强有力的、通常无法评估的假设。其中最常见的是创建缺失数据的机制是可忽略的——一个依赖于推理模式的双重假设。第一部分是这里的重点,在贝叶斯和直接似然范式下,要求丢失的数据是随机丢失的;相比之下,频率论似然范式要求丢失数据机制总是产生随机丢失数据,这种情况称为总是随机丢失。在一定的规律性条件下,假设总是随机缺失会导致可以单独使用观察数据进行检验的假设,即缺失数据指标仅依赖于完全观察到的变量。这里,我们提出了三种不同的诊断测试,它们不仅可以表明这种假设何时不正确,还可以表明哪些变量是最有可能的罪魁祸首。尽管在贝叶斯范式和直接似然范式下,总是随机缺失并不是确保有效性的必要条件,但它已经足够了,并且其违反的证据应该鼓励细心的统计学家进行有针对性的敏感性分析。
更新日期:2019-11-23
down
wechat
bug