当前位置: X-MOL 学术BMC Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
CSN: unsupervised approach for inferring biological networks based on the genome alone.
BMC Bioinformatics ( IF 2.9 ) Pub Date : 2020-05-15 , DOI: 10.1186/s12859-020-3479-9
Maya Galili 1, 2 , Tamir Tuller 1, 3
Affiliation  

BACKGROUND Most organisms cannot be cultivated, as they live in unique ecological conditions that cannot be mimicked in the lab. Understanding the functionality of those organisms' genes and their interactions by performing large-scale measurements of transcription levels, protein-protein interactions or metabolism, is extremely difficult and, in some cases, impossible. Thus, efficient algorithms for deciphering genome functionality based only on the genomic sequences with no other experimental measurements are needed. RESULTS In this study, we describe a novel algorithm that infers gene networks that we name Common Substring Network (CSN). The algorithm enables inferring novel regulatory relations among genes based only on the genomic sequence of a given organism and partial homolog/ortholog-based functional annotation. It can specifically infer the functional annotation of genes with unknown homology. This approach is based on the assumption that related genes, not necessarily homologs, tend to share sub-sequences, which may be related to common regulatory mechanisms, similar functionality of encoded proteins, common evolutionary history, and more. We demonstrate that CSNs, which are based on S. cerevisiae and E. coli genomes, have properties similar to 'traditional' biological networks inferred from experiments. Highly expressed genes tend to have higher degree nodes in the CSN, genes with similar protein functionality tend to be closer, and the CSN graph exhibits a power-law degree distribution. Also, we show how the CSN can be used for predicting gene interactions and functions. CONCLUSIONS The reported results suggest that 'silent' code inside the transcript can help to predict central features of biological networks and gene function. This approach can help researchers to understand the genome of novel microorganisms, analyze metagenomic data, and can help to decipher new gene functions. AVAILABILITY Our MATLAB implementation of CSN is available at https://www.cs.tau.ac.il/~tamirtul/CSN-Autogen.

中文翻译:

CSN:仅基于基因组推断生物网络的无监督方法。

背景技术大多数生物体无法培养,因为它们生活在实验室无法模拟的独特生态条件下。通过大规模测量转录水平、蛋白质-蛋白质相互作用或代谢来了解这些生物体基因的功能及其相互作用是极其困难的,在某些情况下甚至是不可能的。因此,需要仅基于基因组序列而无需其他实验测量来破译基因组功能的有效算法。结果在这项研究中,我们描述了一种推断基因网络的新颖算法,我们将其命名为公共子串网络(CSN)。该算法能够仅根据给定生物体的基因组序列和基于部分同源/直向同源的功能注释来推断基因之间的新调控关系。它可以特异性地推断同源性未知的基因的功能注释。这种方法基于这样的假设:相关基因(不一定是同源基因)倾向于共享子序列,这可能与共同的调控机制、编码蛋白质的相似功能、共同的进化历史等有关。我们证明,基于酿酒酵母和大肠杆菌基因组的 CSN 具有与实验推断的“传统”生物网络类似的特性。高表达的基因往往在 CSN 中具有较高的度节点,具有相似蛋白质功能的基因往往更接近,并且 CSN 图呈现幂律度分布。此外,我们还展示了如何使用 CSN 来预测基因相互作用和功能。结论 报告的结果表明,转录本内的“沉默”代码有助于预测生物网络和基因功能的中心特征。这种方法可以帮助研究人员了解新型微生物的基因组,分析宏基因组数据,并有助于破译新的基因功能。可用性 我们的 CSN MATLAB 实现可从 https://www.cs.tau.ac.il/~tamirtul/CSN-Autogen 获取。
更新日期:2020-05-15
down
wechat
bug