当前位置: X-MOL 学术J. Comput. Graph. Stat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Bayesian Singular Value Decomposition Procedure for Missing Data Imputation
Journal of Computational and Graphical Statistics ( IF 2.4 ) Pub Date : 2022-10-07 , DOI: 10.1080/10618600.2022.2107534
Ruoshui Zhai 1 , Roee Gutman 1
Affiliation  

Abstract

Missing data are common in empirical studies. Multiple imputation is a method to handle missing values by replacing them with plausible values. A common imputation method is multiple imputation with chain equations (MICE). MICE defines a series of conditional distributions to impute missing values. Although MICE is relatively easy to implement, it may not converge to a proper joint distribution. An alternative strategy is to model the variables jointly using the general location model, but this model can become complex when the number of variables increases. Both approaches require integration of prior information when there are more variables than cases. We propose a Bayesian model that is based on the singular value decomposition components of a continuous data matrix to impute missing values. The model assumes that the matrix is of low rank by applying double exponential prior distributions on the singular values. We describe an efficient sampling algorithm to estimate the model’s parameters and impute the missing data. The performance of the model is compared to current imputation methods in simulated and real datasets. Of all the methods considered and in most of the simulated and real datasets, the proposed procedure appears to be the most accurate and precise. Supplementary materials for this article are available online.



中文翻译:

缺失数据插补的贝叶斯奇异值分解过程

摘要

缺失数据在实证研究中很常见。多重插补是一种通过用合理值替换缺失值来处理缺失值的方法。常见的插补方法是链方程多重插补 (MICE)。MICE 定义了一系列条件分布来估算缺失值。尽管 MICE 相对容易实现,但它可能无法收敛到适当的联合分布。另一种策略是使用一般位置模型对变量进行联合建模,但当变量数量增加时,该模型可能会变得复杂。当变量多于案例时,这两种方法都需要整合先验信息。我们提出了一种基于连续数据矩阵的奇异值分解分量来估算缺失值的贝叶斯模型。该模型通过对奇异值应用双指数先验分布来假设矩阵是低秩的。我们描述了一种有效的采样算法来估计模型的参数并估算缺失的数据。该模型的性能与模拟和真实数据集中的当前插补方法进行了比较。在考虑的所有方法中以及大多数模拟和真实数据集中,所提出的程序似乎是最准确和精确的。本文的补充材料可在线获取。在考虑的所有方法中以及大多数模拟和真实数据集中,所提出的程序似乎是最准确和精确的。本文的补充材料可在线获取。在考虑的所有方法中以及大多数模拟和真实数据集中,所提出的程序似乎是最准确和精确的。本文的补充材料可在线获取。

更新日期:2022-10-07
down
wechat
bug