当前位置: X-MOL 学术Brief. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Comparison of imputation and imputation-free methods for statistical analysis of mass spectrometry data with missing data
Briefings in Bioinformatics ( IF 6.8 ) Pub Date : 2021-08-13 , DOI: 10.1093/bib/bbab353
Sandra Taylor 1 , Matthew Ponzini 1 , Machelle Wilson 1 , Kyoungmi Kim 1
Affiliation  

Missing values are common in high-throughput mass spectrometry data. Two strategies are available to address missing values: (i) eliminate or impute the missing values and apply statistical methods that require complete data and (ii) use statistical methods that specifically account for missing values without imputation (imputation-free methods). This study reviews the effect of sample size and percentage of missing values on statistical inference for multiple methods under these two strategies. With increasing missingness, the ability of imputation and imputation-free methods to identify differentially and non-differentially regulated compounds in a two-group comparison study declined. Random forest and k-nearest neighbor imputation combined with a Wilcoxon test performed well in statistical testing for up to 50% missingness with little bias in estimating the effect size. Quantile regression imputation accompanied with a Wilcoxon test also had good statistical testing outcomes but substantially distorted the difference in means between groups. None of the imputation-free methods performed consistently better for statistical testing than imputation methods.

中文翻译:

质谱数据与缺失数据统计分析的插补和免插补方法的比较

缺失值在高通量质谱数据中很常见。有两种策略可用于处理缺失值:(i) 消除或估算缺失值并应用需要完整数据的统计方法,以及 (ii) 使用专门解释缺失值而不进行估算的统计方法(无估算方法)。本研究回顾了样本量和缺失值百分比对这两种策略下多种方法的统计推断的影响。随着缺失的增加,在两组比较研究中,用插补和无插补方法识别差异和非差异调节化合物的能力下降。随机森林和 k 近邻插补与 Wilcoxon 检验相结合,在高达 50% 的缺失率的统计检验中表现良好,在估计效应大小时几乎没有偏差。伴随 Wilcoxon 检验的分位数回归插补也具有良好的统计检验结果,但严重扭曲了组间均值的差异。没有一种无插补方法在统计测试中的表现始终优于插补方法。
更新日期:2021-08-13
down
wechat
bug