当前位置: X-MOL 学术Crop Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Missing-value imputation using the robust singular-value decomposition: Proposals and numerical evaluation
Crop Science ( IF 2.0 ) Pub Date : 2021-03-26 , DOI: 10.1002/csc2.20508
Marisol García‐Peña 1 , Sergio Arciniegas‐Alarcón 2 , Wojtek J. Krzanowski 3 , Diego Duarte 2
Affiliation  

A common problem in the analysis of data from multi-environment trials is imbalance caused by missing observations. To get around this problem, Yan proposed a method for imputing the missing values based on the singular-value decomposition (SVD) of a matrix. However, this SVD can be affected by outliers and produce low quality imputations. In this article, we propose four extensions of the Yan method that are resistant to outliers, replacing the standard SVD method with four robust SVD extensions. We evaluate these methods, using exclusively numerical criteria in a simulation study and in a cross-validation study based on real data. We conclude that in the presence of outliers, the standard SVD method should not be used; instead, the best alternatives are the robust SVD methods based on sub-sampling when the percentage of contamination is less than 2% following a completely random missing data mechanism. In any other case, methods that either minimize the L2 norm or that involve L1 regressions are preferable.

中文翻译:

使用稳健奇异值分解的缺失值插补:建议和数值评估

多环境试验数据分析中的一个常见问题是缺失观测值导致的不平衡。为了解决这个问题,Yan 提出了一种基于矩阵奇异值分解 (SVD) 的缺失值插补方法。但是,这种 SVD 可能会受到异常值的影响并产生低质量的插补。在本文中,我们提出了四种抗异常值的 Yan 方法扩展,用四个稳健的 SVD 扩展替换了标准 SVD 方法。我们在模拟研究和基于真实数据的交叉验证研究中仅使用数值标准来评估这些方法。我们得出结论,在存在异常值的情况下,不应使用标准 SVD 方法;反而,当污染百分比小于 2% 时,最好的替代方法是基于子采样的稳健 SVD 方法,遵循完全随机的缺失数据机制。在任何其他情况下,最小化 L 的方法2范数或涉及 L 1回归的更可取。
更新日期:2021-03-26
down
wechat
bug