当前位置: X-MOL 学术J. Bioinform. Comput. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A novel pattern matching algorithm for genomic patterns related to protein motifs
Journal of Bioinformatics and Computational Biology ( IF 0.9 ) Pub Date : 2020-01-31 , DOI: 10.1142/s0219720020500110
Mohammad-Hadi Foroughmand-Araabi 1 , Sama Goliaei 2 , Bahram Goliaei 3
Affiliation  

Background: Patterns on proteins and genomic sequences are vastly analyzed, extracted and collected in databases. Although protein patterns originate from genomic coding regions, very few works have directly or indirectly dealt with coding region patterns induced from protein patterns. Results: In this paper, we have defined a new genomic pattern structure suitable for representing induced patterns from proteins. The provided pattern structure, which is called “Consecutive Positions Scoring Matrix (CPSSM)”, is a replacement for protein patterns and profiles in the genomic context. CPSSMs can be identified, discovered, and searched in genomes. Then, we have presented a novel pattern matching algorithm between the defined genomic pattern and genomic sequences based on dynamic programming. In addition, we have modified the provided algorithm to support intronic gaps and huge sequences. We have implemented and tested the provided algorithm on real data. The results on Saccharomyces cerevisiae’s genome show 132% more true positives and no false negatives and the results on human genome show no false negatives and 10 times as many true positives as those in previous works. Conclusion: CPSSM and provided methods could be used for open reading frame detection and gene finding. The application is available with source codes to run and download at http://app.foroughmand.ir/cpssm/ .

中文翻译:

一种与蛋白质基序相关的基因组模式的新型模式匹配算法

背景:蛋白质和基因组序列的模式在数据库中被大量分析、提取和收集。尽管蛋白质模式起源于基因组编码区,但很少有工作直接或间接地处理由蛋白质模式诱导的编码区模式。结果:在本文中,我们定义了一种新的基因组模式结构,适用于表示蛋白质诱导模式。提供的模式结构,称为“连续位置评分矩阵 (CPSSM)”,是基因组环境中蛋白质模式和谱的替代。可以在基因组中识别、发现和搜索 CPSSM。然后,我们提出了一种基于动态规划的定义的基因组模式和基因组序列之间的新模式匹配算法。此外,我们修改了提供的算法以支持内含子间隙和巨大的序列。我们已经在真实数据上实现并测试了所提供的算法。Saccharomyces cerevisiae 基因组的结果显示真阳性和无假阴性增加 132%,人类基因组结果显示没有假阴性和真阳性是以前工作的 10 倍。结论:CPSSM及其提供的方法可用于开放阅读框检测和基因发现。该应用程序可在 http://app.foroughmand.ir/cpssm/ 上运行和下载源代码。Saccharomyces cerevisiae 基因组的结果显示真阳性和无假阴性增加 132%,人类基因组结果显示没有假阴性和真阳性是以前工作的 10 倍。结论:CPSSM及其提供的方法可用于开放阅读框检测和基因发现。该应用程序可在 http://app.foroughmand.ir/cpssm/ 上运行和下载源代码。Saccharomyces cerevisiae 基因组的结果显示真阳性和无假阴性增加 132%,人类基因组结果显示没有假阴性和真阳性是以前工作的 10 倍。结论:CPSSM及其提供的方法可用于开放阅读框检测和基因发现。该应用程序可在 http://app.foroughmand.ir/cpssm/ 上运行和下载源代码。
更新日期:2020-01-31
down
wechat
bug