当前位置: X-MOL 学术IEEE/ACM Trans. Comput. Biol. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multi-Factored Gene-Gene Proximity Measures Exploiting Biological Knowledge Extracted from Gene Ontology: Application in Gene Clustering.
IEEE/ACM Transactions on Computational Biology and Bioinformatics ( IF 3.6 ) Pub Date : 2018-06-21 , DOI: 10.1109/tcbb.2018.2849362
Sudipta Acharya , Sriparna Saha , Prasanna Pradhan

To describe the cellular functions of proteins and genes, a potential dynamic vocabulary is Gene Ontology (GO), which comprises of three sub-ontologies namely, Biological-process, Cellular-component, and Molecular-function. It has several applications in the field of bioinformatics like annotating/measuring gene-gene or protein-protein semantic similarity, identifying genes/proteins by their GO annotations for disease gene and target discovery, etc. To determine semantic similarity between genes, several semantic measures have been proposed in literature, which involve information content of GO-terms, GO tree structure, or the combination of both. But, most of the existing semantic similarity measures do not consider different topological and information theoretic aspects of GO-terms collectively. Inspired by this fact, in this article, we have first proposed three novel semantic similarity/distance measures for genes covering different aspects of GO-tree. These are further implanted in the frameworks of well-known multi-objective and single-objective based clustering algorithms to determine functionally similar genes. For comparative analysis, 10 popular existing GO based semantic similarity/distance measures and tools are also considered. Experimental results on Mouse genome, Yeast, and Human genome datasets evidently demonstrate the supremacy of multi-objective clustering algorithms in association with proposed multi-factored similarity/distance measures. Clustering outcomes are further validated by conducting some biological/statistical significance tests. Supplementary information is available at https://www.iitp.ac.in/sriparna/journals.html.

中文翻译:

利用从基因本体论中提取的生物学知识的多因素基因-基因邻近度测量方法:在基因聚类中的应用。

为了描述蛋白质和基因的细胞功能,潜在的动态词汇是基因本体论(GO),它由三个亚本体论组成,即生物过程,细胞成分和分子功能。它在生物信息学领域有多种应用,例如注释/测量基因-基因或蛋白质-蛋白质的语义相似性,通过它们对疾病基因的GO注释和目标发现来识别基因/蛋白质,等等。要确定基因之间的语义相似性,可以采取几种语义措施在文献中已经提出了涉及GO术语,GO树结构或两者的组合的信息内容。但是,大多数现有的语义相似性度量并没有共同考虑GO术语的不同拓扑和信息理论方面。受此事实启发,在本文中,我们首先针对覆盖GO树不同方面的基因提出了三种新颖的语义相似度/距离度量。这些被进一步植入到众所周知的基于多目标和单目标的聚类算法的框架中,以确定功能相似的基因。为了进行比较分析,还考虑了10种流行的现有基于GO的语义相似度/距离度量和工具。在小鼠基因组,酵母和人类基因组数据集上的实验结果显然证明了多目标聚类算法与拟议的多因素相似性/距离测量方法相辅相成的优势。通过进行一些生物学/统计显着性检验,进一步验证了聚类结果。有关补充信息,请访问https://www.iitp.ac.in/sriparna/journals.html。
更新日期:2020-03-07
down
wechat
bug