当前位置: X-MOL 学术Brief. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Proper imputation of missing values in proteomics datasets for differential expression analysis.
Briefings in Bioinformatics ( IF 6.8 ) Pub Date : 2020-06-10 , DOI: 10.1093/bib/bbaa112
Mingyi Liu , Ashok Dongre

Label-free shotgun proteomics is an important tool in biomedical research, where tandem mass spectrometry with data-dependent acquisition (DDA) is frequently used for protein identification and quantification. However, the DDA datasets contain a significant number of missing values (MVs) that severely hinders proper analysis. Existing literature suggests that different imputation methods should be used for the two types of MVs: missing completely at random or missing not at random. However, the simulated or biased datasets utilized by most of such studies offer few clues about the composition and thus proper imputation of MVs in real-life proteomic datasets. Moreover, the impact of imputation methods on downstream differential expression analysis—a critical goal for many biomedical projects—is largely undetermined. In this study, we investigated public DDA datasets of various tissue/sample types to determine the composition of MVs in them. We then developed simulated datasets that imitate the MV profile of real-life datasets. Using such datasets, we compared the impact of various popular imputation methods on the analysis of differentially expressed proteins. Finally, we make recommendations on which imputation method(s) to use for proteomic data beyond just DDA datasets.

中文翻译:

蛋白质组学数据集中缺失值的正确估算,用于差异表达分析。

无标记鸟枪蛋白质组学是生物医学研究中的重要工具,其中具有数据依赖采集 (DDA) 的串联质谱法经常用于蛋白质鉴定和定量。然而,DDA 数据集包含大量缺失值 (MV),严重阻碍了正确分析。现有文献表明,应该对两种类型的 MV 使用不同的插补方法:完全随机缺失或非随机缺失。然而,大多数此类研究使用的模拟或有偏见的数据集提供的关于组成的线索很少,因此在现实生活中的蛋白质组学数据集中无法正确估算 MV。此外,插补方法对下游差异表达分析(许多生物医学项目的关键目标)的影响在很大程度上尚未确定。在这项研究中,我们调查了各种组织/样本类型的公共 DDA 数据集,以确定其中 MV 的组成。然后,我们开发了模拟真实数据集的 MV 配置文件的模拟数据集。使用这些数据集,我们比较了各种流行的插补方法对差异表达蛋白质分析的影响。最后,我们就除 DDA 数据集之外的蛋白质组数据使用哪种插补方法提出建议。
更新日期:2020-06-10
down
wechat
bug