当前位置: X-MOL 学术bioRxiv. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
PrismExp: Predicting Human Gene Function by Partitioning Massive RNA-seq Co-expression Data
bioRxiv - Bioinformatics Pub Date : 2021-01-21 , DOI: 10.1101/2021.01.20.427528
Alexander Lachmann , Kaeli Rizzo , Alon Bartal , Minji Jeon , Daniel J. B. Clarke , Avi Ma’ayan

Gene co-expression correlations from mRNA-sequencing (RNAseq) can be used to predict gene function based on the covariance structure that exists within such data. In the past, we showed that RNA-seq co-expression data is highly predictive of gene function and protein-protein interactions. We demonstrated that the performance of such predictions is dependent on the source of the gene expression data. Furthermore, since genes function in different cellular contexts, predictions derived from tissue-specific gene co-expression data outperform predictions derived from cross-tissue gene co-expression data. However, the identification of the optimal tissue type to maximize gene function predictions for all mammalian genes is not trivial. Here we introduce and validate an approach we term Partitioning RNA-seq data Into Segments for Massive co-EXpression-based gene function Predictions (PrismExp), for improved gene function prediction based on RNA-seq co-expression data. With coexpression data from ARCHS4, we apply PrismExp to predict a wide variety of gene functions, including pathway membership, phenotypic associations, and protein-protein interactions. PrismExp outperforms the cross-tissue co-expression correlation matrix approach on all tested domains. Hence, PrismExp can enhance machine learning methods that utilize RNA-seq co-expression correlations to impute knowledge about understudied genes and proteins.

中文翻译:

PrismExp:通过分割大量RNA-seq共表达数据预测人类基因功能

来自mRNA测序(RNAseq)的基因共表达相关性可用于基于此类数据中存在的协方差结构预测基因功能。过去,我们显示RNA-seq共表达数据可高度预测基因功能和蛋白质-蛋白质相互作用。我们证明了这种预测的性能取决于基因表达数据的来源。此外,由于基因在不同的细胞环境中起作用,因此从组织特异性基因共表达数据得出的预测要优于从跨组织基因共表达数据得出的预测。然而,确定最佳组织类型以最大化所有哺乳动物基因的基因功能预测并非易事。在这里,我们介绍并验证了一种方法,该方法称为“将RNA-seq数据划分为大量基于共同表达的基因功能预测(PrismExp)的片段”,用于基于RNA-seq共表达数据的改进的基因功能预测。利用来自ARCHS4的共表达数据,我们将PrismExp应用于预测多种基因功能,包括途径成员,表型关联和蛋白质-蛋白质相互作用。PrismExp在所有测试域上均优于跨组织共表达相关矩阵方法。因此,PrismExp可以增强利用RNA-seq共表达相关性的机器学习方法,以推算有关未充分研究的基因和蛋白质的知识。我们应用PrismExp预测多种基因功能,包括途径成员,表型关联和蛋白质-蛋白质相互作用。PrismExp在所有测试域上均优于跨组织共表达相关矩阵方法。因此,PrismExp可以增强利用RNA-seq共表达相关性的机器学习方法,以推算有关未充分研究的基因和蛋白质的知识。我们应用PrismExp预测多种基因功能,包括途径成员,表型关联和蛋白质-蛋白质相互作用。PrismExp在所有测试域上均优于跨组织共表达相关矩阵方法。因此,PrismExp可以增强利用RNA-seq共表达相关性的机器学习方法,以推算有关未充分研究的基因和蛋白质的知识。
更新日期:2021-01-22
down
wechat
bug