Selecting the model for multiple imputation of missing data: Just use an IC!,Statistics in Medicine

当前位置： X-MOL 学术 › Stat. Med. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Selecting the model for multiple imputation of missing data: Just use an IC!
Statistics in Medicine ( IF 1.8 ) Pub Date : 2021-02-24 , DOI: 10.1002/sim.8915
Firouzeh Noghrehchi ₁ , Jakub Stoklosa _{2,

3} , Spiridon Penev ₂ , David I Warton _{2,

3}

Affiliation

Multiple imputation and maximum likelihood estimation (via the expectation‐maximization algorithm) are two well‐known methods readily used for analyzing data with missing values. While these two methods are often considered as being distinct from one another, multiple imputation (when using improper imputation) is actually equivalent to a stochastic expectation‐maximization approximation to the likelihood. In this article, we exploit this key result to show that familiar likelihood‐based approaches to model selection, such as Akaike's information criterion (AIC) and the Bayesian information criterion (BIC), can be used to choose the imputation model that best fits the observed data. Poor choice of imputation model is known to bias inference, and while sensitivity analysis has often been used to explore the implications of different imputation models, we show that the data can be used to choose an appropriate imputation model via conventional model selection tools. We show that BIC can be consistent for selecting the correct imputation model in the presence of missing data. We verify these results empirically through simulation studies, and demonstrate their practicality on two classical missing data examples. An interesting result we saw in simulations was that not only can parameter estimates be biased by misspecifying the imputation model, but also by overfitting the imputation model. This emphasizes the importance of using model selection not just to choose the appropriate type of imputation model, but also to decide on the appropriate level of imputation model complexity.

中文翻译：

为丢失数据的多次插补选择模型：只需使用IC！

多重插补和最大似然估计（通过期望最大化算法）是两种众所周知的方法，可用于分析缺失值的数据。虽然这两种方法通常被认为是彼此不同的，但是多重插补（当使用不正确的插补时）实际上等效于对可能性的随机期望-最大化近似。在本文中，我们利用这一关键结果来表明，可以使用熟悉的基于似然性的模型选择方法（例如Akaike的信息准则（AIC）和贝叶斯信息准则（BIC））来选择最适合模型的归因模型。观察数据。插补模型选择不当会导致推理偏差，虽然灵敏度分析通常用于探索不同插补模型的含义，但我们表明，可以通过常规模型选择工具将数据用于选择合适的插补模型。我们表明，在缺少数据的情况下，BIC可以一致地选择正确的归因模型。我们通过仿真研究经验地验证了这些结果，并在两个经典的缺失数据示例上证明了它们的实用性。我们在模拟中看到的一个有趣的结果是，不仅参数估计会因错误指定插补模型而产生偏差，而且还会因我们通过仿真研究经验地验证了这些结果，并在两个经典的缺失数据示例上证明了它们的实用性。我们在模拟中看到的一个有趣的结果是，不仅参数估计会因错误指定插补模型而产生偏差，而且还会因我们通过仿真研究经验地验证了这些结果，并在两个经典的缺失数据示例上证明了它们的实用性。我们在模拟中看到的一个有趣的结果是，不仅参数估计会因错误指定插补模型而产生偏差，而且还会因过度拟合归因模型。这强调了使用模型选择的重要性，不仅要选择合适的插补模型类型，还要决定合适的插补模型复杂度。

更新日期：2021-04-08

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11