当前位置: X-MOL 学术BMC Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A pipeline to create predictive functional networks: application to the tumor progression of hepatocellular carcinoma.
BMC Bioinformatics ( IF 3 ) Pub Date : 2020-01-14 , DOI: 10.1186/s12859-019-3316-1
Maxime Folschette 1, 2, 3, 4, 5 , Vincent Legagneux 2 , Arnaud Poret 4 , Lokmane Chebouba 4, 6, 7 , Carito Guziolowski 4, 6 , Nathalie Théret 1, 2
Affiliation  

BACKGROUND Integrating genome-wide gene expression patient profiles with regulatory knowledge is a challenging task because of the inherent heterogeneity, noise and incompleteness of biological data. From the computational side, several solvers for logic programs are able to perform extremely well in decision problems for combinatorial search domains. The challenge then is how to process the biological knowledge in order to feed these solvers to gain insights in a biological study. It requires formalizing the biological knowledge to give a precise interpretation of this information; currently, very few pathway databases offer this possibility. RESULTS The presented work proposes an automatic pipeline to extract automatically regulatory knowledge from pathway databases and generate novel computational predictions related to the state of expression or activity of biological molecules. We applied it in the context of hepatocellular carcinoma (HCC) progression, and evaluate the precision and the stability of these computational predictions. Our working base is a graph of 3383 nodes and 13,771 edges extracted from the KEGG database, in which we integrate 209 differentially expressed genes between low and high aggressive HCC across 294 patients. Our computational model predicts the shifts of expression of 146 initially non-observed biological components. Our predictions were validated at 88% using a larger experimental dataset and cross-validation techniques. In particular, we focus on the protein complexes predictions and show for the first time that NFKB1/BCL-3 complexes are activated in aggressive HCC. In spite of the large dimension of the reconstructed models, our analyses over the computational predictions discover a well constrained region where KEGG regulatory knowledge constrains gene expression of several biomolecules. These regions can offer interesting windows to perturb experimentally such complex systems. CONCLUSION This new pipeline allows biologists to develop their own predictive models based on a list of genes. It facilitates the identification of new regulatory biomolecules using knowledge graphs and predictive computational methods. Our workflow is implemented in an automatic python pipeline which is publicly available at https://github.com/LokmaneChebouba/key-pipeand contains as testing data all the data used in this paper.

中文翻译:

创建预测性功能网络的管道:在肝细胞癌肿瘤进展中的应用。

背景技术由于生物学数据固有的异质性,噪声和不完整性,将全基因组范围的基因表达患者概况与调节知识相集成是一项挑战性的任务。从计算的角度来看,逻辑程序的多个求解器能够在组合搜索域的决策问题中表现出色。接下来的挑战是如何处理生物学知识,以使这些求解器获得生物学研究的见识。它需要对生物学知识进行形式化以对这些信息进行精确的解释。目前,很少有通路数据库提供这种可能性。结果提出的工作提出了一种自动途径,可以从途径数据库中自动提取监管知识,并生成与生物分子表达或活性状态有关的新颖计算预测。我们将其应用在肝细胞癌(HCC)进展的背景下,并评估了这些计算预测的准确性和稳定性。我们的工作基础是从KEGG数据库中提取的3383个节点和13771个边缘的图形,其中我们整合了294位患者在低和高侵袭性肝癌之间的209个差异表达基因。我们的计算模型预测了146种最初未观察到的生物成分的表达变化。使用更大的实验数据集和交叉验证技术,我们的预测得到了88%的验证。特别是,我们将重点放在蛋白质复合物的预测上,并首次证明NFKB1 / BCL-3复合物在侵袭性肝癌中被激活。尽管重建模型的尺寸很大,但我们对计算预测的分析发现,KEGG调控知识限制了几种生物分子的基因表达的区域受到很好的限制。这些区域可以提供有趣的窗口,以在实验上扰动这种复杂的系统。结论这一新的途径使生物学家可以根据一系列基因开发自己的预测模型。它有助于使用知识图和预测性计算方法来识别新的调节性生物分子。我们的工作流程是在自动python管道中实现的,该管道可在https:// github上公开获得。
更新日期:2020-01-14
down
wechat
bug