Bayesian Latent Class Models for the Multiple Imputation of Categorical Data,Methodology

当前位置： X-MOL 学术 › Methodology › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Bayesian Latent Class Models for the Multiple Imputation of Categorical Data
Methodology ( IF 1.975 ) Pub Date : 2018-04-01 , DOI: 10.1027/1614-2241/a000146
Davide Vidotto ₁ , Jeroen K. Vermunt ₁ , Katrijn Van Deun ₁

Affiliation

Latent class analysis has been recently proposed for the multiple imputation (MI) of missing categorical data, using either a standard frequentist approach or a nonparametric Bayesian model called Dirichlet process mixture of multinomial distributions (DPMM). The main advantage of using a latent class model for multiple imputation is that it is very flexible in the sense that it can capture complex relationships in the data given that the number of latent classes is large enough. However, the two existing approaches also have certain disadvantages. The frequentist approach is computationally demanding because it requires estimating many LC models: first models with different number of classes should be estimated to determine the required number of classes and subsequently the selected model is reestimated for multiple bootstrap samples to take into account parameter uncertainty during the imputation stage. Whereas the Bayesian Dirichlet process models perform the model selection and the handling of the parameter uncertainty automatically, the disadvantage of this method is that it tends to use a too small number of clusters during the Gibbs sampling, leading to an underfitting model yielding invalid imputations. In this paper, we propose an alternative approach which combined the strengths of the two existing approaches; that is, we use the Bayesian standard latent class model as an imputation model. We show how model selection can be performed prior to the imputation step using a single run of the Gibbs sampler and, moreover, show how underfitting is prevented by using large values for the hyperparameters of the mixture weights. The results of two simulation studies and one real-data study indicate that with a proper setting of the prior distributions, the Bayesian latent class model yields valid imputations and outperforms competing methods.

中文翻译：

分类数据多重插补的贝叶斯潜在类模型

最近，有人提出了一种潜在类别分析方法，用于使用标准频繁方法或称为Dirichlet多项式分布混合过程的非参数贝叶斯模型对缺失的分类数据进行多重插补（MI）。使用潜在类模型进行多重插补的主要优点是，在潜在类的数量足够大的情况下，它可以捕获数据中的复杂关系，因此它非常灵活。然而，两种现有方法也具有某些缺点。频繁使用的方法在计算上要求很高，因为它需要估计许多LC模型：首先应估计具有不同类别数量的模型，以确定所需的类别数量，然后为多个自举样本重新估计所选模型，以考虑到插补阶段的参数不确定性。尽管贝叶斯Dirichlet过程模型会自动执行模型选择和参数不确定性的处理，但该方法的缺点是，在Gibbs采样过程中倾向于使用过少的聚类，从而导致拟合不足的模型产生无效的插补。在本文中，我们提出了一种替代方法，该方法结合了两种现有方法的优势。也就是说，我们使用贝叶斯标准潜在类模型作为插补模型。我们展示了如何使用一次Gibbs采样器在插补步骤之前执行模型选择，此外，展示了如何通过使用较大的混合权重超值来防止拟合不足。两项模拟研究和一项实际数据研究的结果表明，通过适当设置先验分布，贝叶斯潜在类模型可以得出有效的推定结果，并且优于竞争方法。

更新日期：2018-04-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>