当前位置: X-MOL 学术IEEE/ACM Trans. Comput. Biol. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Floating Search Methodology for Combining Classification Models for Site Recognition in DNA Sequences
IEEE/ACM Transactions on Computational Biology and Bioinformatics ( IF 4.5 ) Pub Date : 2020-02-17 , DOI: 10.1109/tcbb.2020.2974221
Javier Perez-Rodriguez , Aida de Haro-Garcia , Nicolas Garcia-Pedrajas

Recognition of the functional sites of genes, such as translation initiation sites, donor and acceptor splice sites and stop codons, is a relevant part of many current problems in bioinformatics. The best approaches use sophisticated classifiers, such as support vector machines. However, with the rapid accumulation of sequence data, methods for combining many sources of evidence are necessary as it is unlikely that a single classifier can solve this problem with the best possible performance. A major issue is that the number of possible models to combine is large and the use of all of these models is impractical. In this paper we present a methodology for combining many sources of information to recognize any functional site using “floating search”, a powerful heuristics applicable when the cost of evaluating each solution is high. We present experiments on four functional sites in the human genome, which is used as the target genome, and use another 20 species as sources of evidence. The proposed methodology shows significant improvement over state-of-the-art methods. The results show an advantage of the proposed method and also challenge the standard assumption of using only genomes not very close and not very far from the human to improve the recognition of functional sites.

中文翻译:

用于结合分类模型进行 DNA 序列中位点识别的浮动搜索方法

识别基因的功能位点,例如翻译起始位点、供体和受体剪接位点以及终止密码子,是当前生物信息学中许多问题的相关部分。最好的方法使用复杂的分类器,例如支持向量机。然而,随着序列数据的快速积累,需要结合多种证据来源的方法,因为单个分类器不太可能以最佳性能解决这个问题。一个主要问题是要组合的可能模型的数量很大,并且使用所有这些模型是不切实际的。在本文中,我们提出了一种结合多种信息来源的方法,以使用“浮动搜索”识别任何功能站点,这是一种强大的启发式方法,适用于评估每个解决方案的成本很高时。我们在人类基因组中的四个功能位点上进行了实验,这些位点用作目标基因组,并使用另外 20 个物种作为证据来源。所提出的方法显示出对最先进方法的显着改进。结果显示了所提出方法的优势,也挑战了仅使用离人类不太近和不太远的基因组来提高功能位点识别的标准假设。
更新日期:2020-02-17
down
wechat
bug