当前位置: X-MOL 学术bioRxiv. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
DIMA: Data-driven selection of a suitable imputation algorithm
bioRxiv - Bioinformatics Pub Date : 2020-10-14 , DOI: 10.1101/2020.10.13.323618
Janine Egert , Bettina Warscheid , Clemens Kreutz

Motivation: Imputation is a prominent strategy when dealing with missing values (MVs) in proteomics data analysis pipelines. However, the performance of different imputation methods is difficult to assess and varies strongly depending on data characteristics. To overcome this issue, we present the concept of a data-driven selection of a suitable imputation algorithm (DIMA). Results: The performance and broad applicability of DIMA is demonstrated on 121 quantitative proteomics data sets from the PRIDE database and on simulated data consisting of 5-50% MVs with different proportions of missing not at random and missing completely at random values. DIMA reliably suggests a high-performing imputation algorithm which is always among the three best algorithms and results in a root mean square error difference (ΔRMSE) <10% in 84% of the cases. Availability and Implementation: Source code is freely available for download at https://github.com/clemenskreutz/OmicsData .

中文翻译:

DIMA:数据驱动的合适插补算法选择

动机:在蛋白质组学数据分析管道中处理缺失值(MV)时,插补是一种重要的策略。但是,不同的插补方法的性能很难评估,并且会根据数据特征而有很大差异。为克服此问题,我们提出了一种数据驱动选择合适的插补算法(DIMA)的概念。结果:在来自PRIDE数据库的121个定量蛋白质组学数据集和由5-50%MV组成的模拟数据中证明了DIMA的性能和广泛适用性,其中MV的比例不同,随机缺失和完全随机缺失。DIMA可靠地提出了一种高性能的插补算法,该算法始终是三个最佳算法之一,并且在84%的情况下导致均方根误差均方根差(ΔRMSE)<10%。
更新日期:2020-10-16
down
wechat
bug