当前位置: X-MOL 学术Stat. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Imputation and low-rank estimation with Missing Not At Random data
Statistics and Computing ( IF 1.6 ) Pub Date : 2020-07-16 , DOI: 10.1007/s11222-020-09963-5
Aude Sportisse , Claire Boyer , Julie Josse

Missing values challenge data analysis because many supervised and unsupervised learning methods cannot be applied directly to incomplete data. Matrix completion based on low-rank assumptions are very powerful solution for dealing with missing values. However, existing methods do not consider the case of informative missing values which are widely encountered in practice. This paper proposes matrix completion methods to recover Missing Not At Random (MNAR) data. Our first contribution is to suggest a model-based estimation strategy by modelling the missing mechanism distribution. An EM algorithm is then implemented, involving a Fast Iterative Soft-Thresholding Algorithm (FISTA). Our second contribution is to suggest a computationally efficient surrogate estimation by implicitly taking into account the joint distribution of the data and the missing mechanism: the data matrix is concatenated with the mask coding for the missing values; a low-rank structure for exponential family is assumed on this new matrix, in order to encode links between variables and missing mechanisms. The methodology that has the great advantage of handling different missing value mechanisms is robust to model specification errors. The performances of our methods are assessed on the real data collected from a trauma registry (TraumaBase\(^{\textregistered }\)) containing clinical information about over twenty thousand severely traumatized patients in France. The aim is then to predict if the doctors should administrate tranexomic acid to patients with traumatic brain injury, that would limit excessive bleeding.



中文翻译:

缺少非随机数据的插补和低秩估计

缺少价值会挑战数据分析,因为许多有监督和无监督的学习方法无法直接应用于不完整的数据。基于低秩假设的矩阵完成是处理缺失值的非常有力的解决方案。但是,现有方法没有考虑实际中广泛遇到的信息缺失值的情况。本文提出了一种矩阵完成方法来恢复随机丢失的数据(MNAR)。我们的第一个贡献是通过对缺失的机制分布进行建模来提出基于模型的估计策略。然后实施EM算法,其中涉及快速迭代软阈值算法(FISTA)。我们的第二个贡献是通过隐式考虑数据的联合分布和缺失机制来建议一种计算有效的替代估计:数据矩阵与掩码编码连接在一起,用于缺失值;在此新矩阵上假定了指数族的低秩结构,以便对变量和缺失机制之间的链接进行编码。具有处理不同的缺失值机制的巨大优势的方法对于建模规范错误具有鲁棒性。我们的方法的性能是根据从创伤登记处(TraumaBase 为了编码变量和缺失机制之间的联系。具有处理不同的缺失值机制的巨大优势的方法对于建模规范错误具有鲁棒性。我们的方法的性能是根据从创伤登记处(TraumaBase 为了编码变量和缺失机制之间的联系。具有处理不同的缺失值机制的巨大优势的方法对于建模规范错误具有鲁棒性。我们的方法的性能是根据从创伤登记处(TraumaBase\(^ {\ textregistered} \))包含有关法国超过2万名严重创伤患者的临床信息。然后的目的是预测医生是否应该对患有颅脑外伤的患者使用氨甲环酸,以限制过多的出血。

更新日期:2020-07-17
down
wechat
bug