当前位置: X-MOL 学术J. Proteome Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
ProteomeGenerator: A Framework for Comprehensive Proteomics Based on de Novo Transcriptome Assembly and High-Accuracy Peptide Mass Spectral Matching
Journal of Proteome Research ( IF 4.4 ) Pub Date : 2018-10-19 , DOI: 10.1021/acs.jproteome.8b00295
Paolo Cifani 1 , Avantika Dhabaria 1 , Zining Chen 1 , Akihide Yoshimi , Emily Kawaler , Omar Abdel-Wahab 2 , John T Poirier 1, 2 , Alex Kentsis 1, 3
Affiliation  

Modern mass spectrometry now permits genome-scale and quantitative measurements of biological proteomes. However, analysis of specific specimens is currently hindered by the incomplete representation of biological variability of protein sequences in canonical reference proteomes and the technical demands for their construction. Here, we report ProteomeGenerator, a framework for de novo and reference-assisted proteogenomic database construction and analysis based on sample-specific transcriptome sequencing and high-accuracy mass spectrometry proteomics. This enables the assembly of proteomes encoded by actively transcribed genes, including sample-specific protein isoforms resulting from non-canonical mRNA transcription, splicing, or editing. To improve the accuracy of protein isoform identification in non-canonical proteomes, ProteomeGenerator relies on statistical target–decoy database matching calibrated using sample-specific controls. Its current implementation includes automatic integration with MaxQuant mass spectrometry proteomics algorithms. We applied this method for the proteogenomic analysis of splicing factor SRSF2 mutant leukemia cells, demonstrating high-confidence identification of non-canonical protein isoforms arising from alternative transcriptional start sites, intron retention, and cryptic exon splicing as well as improved accuracy of genome-scale proteome discovery. Additionally, we report proteogenomic performance metrics for current state-of-the-art implementations of SEQUEST HT, MaxQuant, Byonic, and PEAKS mass spectral analysis algorithms. Finally, ProteomeGenerator is implemented as a Snakemake workflow within a Singularity container for one-step installation in diverse computing environments, thereby enabling open, scalable, and facile discovery of sample-specific, non-canonical, and neomorphic biological proteomes.

中文翻译:

ProteomeGenerator:基于从头转录组组装和高精度肽质谱匹配的综合蛋白质组学框架

现代质谱法现在允许对生物蛋白质组进行基因组规模和定量测量。然而,特定样本的分析目前受到规范参考蛋白质组中蛋白质序列生物变异性的不完整表示及其构建的技术要求的阻碍。在这里,我们报告了 ProteomeGenerator,这是一个基于样本特异性转录组测序和高精度质谱蛋白质组学的从头和参考辅助蛋白质组数据库构建和分析的框架。这使得能够组装由主动转录基因编码的蛋白质组,包括由非规范 mRNA 转录、剪接或编辑产生的样本特异性蛋白质异构体。为了提高非经典蛋白质组中蛋白质同种型鉴定的准确性,ProteomeGenerator 依赖于使用特定样本对照校准的统计目标-诱饵数据库匹配。它当前的实现包括与 MaxQuant 质谱蛋白质组学算法的自动集成。我们将此方法应用于剪接因子的蛋白质组学分析SRSF2突变白血病细胞,展示了对由替代转录起始位点、内含子保留和隐秘外显子剪接产生的非规范蛋白质同种型的高可信度鉴定,以及提高了基因组规模蛋白质组发现的准确性。此外,我们报告了当前最先进的 SEQUEST HT、MaxQuant、Byonic 和 PEAKS 质谱分析算法实现的蛋白质组学性能指标。最后,ProteomeGenerator 作为 Singularity 容器中的 Snakemake 工作流实现,可在不同的计算环境中一步安装,从而实现开放、可扩展和轻松地发现样本特异性、非规范和新形态的生物蛋白质组。
更新日期:2018-10-19
down
wechat
bug