当前位置: X-MOL 学术BMC Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Joint learning improves protein abundance prediction in cancers.
BMC Biology ( IF 5.4 ) Pub Date : 2019-12-23 , DOI: 10.1186/s12915-019-0730-9
Hongyang Li 1 , Omer Siddiqui 1 , Hongjiu Zhang 1 , Yuanfang Guan 1, 2
Affiliation  

BACKGROUND The classic central dogma in biology is the information flow from DNA to mRNA to protein, yet complicated regulatory mechanisms underlying protein translation often lead to weak correlations between mRNA and protein abundances. This is particularly the case in cancer samples and when evaluating the same gene across multiple samples. RESULTS Here, we report a method for predicting proteome from transcriptome, using a training dataset provided by NCI-CPTAC and TCGA, consisting of transcriptome and proteome data from 77 breast and 105 ovarian cancer samples. First, we establish a generic model capturing the correlation between mRNA and protein abundance of a single gene. Second, we build a gene-specific model capturing the interdependencies among multiple genes in a regulatory network. Third, we create a cross-tissue model by joint learning the information of shared regulatory networks and pathways across cancer tissues. Our method ranked first in the NCI-CPTAC DREAM Proteogenomics Challenge, and the predictive performance is close to the accuracy of experimental replicates. Key functional pathways and network modules controlling the proteomic abundance in cancers were revealed, in particular metabolism-related genes. CONCLUSIONS We present a method to predict proteome from transcriptome, leveraging data from different cancer tissues to build a trans-tissue model, and suggest how to integrate information from multiple cancers to provide a foundation for further research.

中文翻译:

联合学习改善了癌症中蛋白质丰度的预测。

背景技术生物学中的经典中心教条是从DNA到mRNA到蛋白质的信息流,然而,蛋白质翻译所基于的复杂调节机制通常会导致mRNA与蛋白质丰度之间的弱关联。在癌症样品中以及在多个样品中评估相同基因时尤其如此。结果在这里,我们报告了一种使用NCI-CPTAC和TCGA提供的训练数据集从转录组预测蛋白质组的方法,该数据集由来自77个乳腺癌和105个卵巢癌样本的转录组和蛋白质组数据组成。首先,我们建立一个通用模型来捕获单个基因的mRNA和蛋白质丰度之间的相关性。其次,我们建立了一个基因特异性模型,该模型捕获了监管网络中多个基因之间的相互依赖性。第三,我们通过共同学习跨癌组织的共享调控网络和途径的信息来创建跨组织模型。我们的方法在NCI-CPTAC DREAM蛋白质组学挑战赛中名列第一,其预测性能接近实验重复的准确性。揭示了控制蛋白质组学丰度的关键功能途径和网络模块,特别是与代谢相关的基因。结论我们提供了一种从转录组预测蛋白质组的方法,利用来自不同癌症组织的数据来构建跨组织模型,并提出如何整合来自多种癌症的信息以为进一步研究提供基础。并且预测性能接近实验重复的准确性。揭示了控制蛋白质组学丰度的关键功能途径和网络模块,特别是与代谢相关的基因。结论我们提供了一种从转录组预测蛋白质组的方法,利用来自不同癌症组织的数据来构建跨组织模型,并提出如何整合来自多种癌症的信息以为进一步研究提供基础。并且预测性能接近实验重复的准确性。揭示了控制蛋白质组学丰度的关键功能途径和网络模块,特别是与代谢相关的基因。结论我们提供了一种从转录组预测蛋白质组的方法,利用来自不同癌症组织的数据来构建跨组织模型,并提出如何整合来自多种癌症的信息以为进一步研究提供基础。
更新日期:2020-04-22
down
wechat
bug