当前位置: X-MOL 学术DNA Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Search for potential reading frameshifts in cds from Arabidopsis thaliana and other genomes.
DNA Research ( IF 3.9 ) Pub Date : 2019-02-07 , DOI: 10.1093/dnares/dsy046
Y M Suvorova 1 , M A Korotkova 2 , K G Skryabin 1 , E V Korotkov 1, 2
Affiliation  

A new mathematical method for potential reading frameshift detection in protein-coding sequences (cds) was developed. The algorithm is adjusted to the triplet periodicity of each analysed sequence using dynamic programming and a genetic algorithm. This does not require any preliminary training. Using the developed method, cds from the Arabidopsis thaliana genome were analysed. In total, the algorithm found 9,930 sequences containing one or more potential reading frameshift(s). This is ∼21% of all analysed sequences of the genome. The Type I and Type II error rates were estimated as 11% and 30%, respectively. Similar results were obtained for the genomes of Caenorhabditis elegans, Drosophila melanogaster, Homo sapiens, Rattus norvegicus and Xenopus tropicalis. Also, the developed algorithm was tested on 17 bacterial genomes. We compared our results with the previously obtained data on the search for potential reading frameshifts in these genomes. This study discussed the possibility that the reading frameshift seems like a relatively frequently encountered mutation; and this mutation could participate in the creation of new genes and proteins.

中文翻译:

从拟南芥和其他基因组的cds中搜索潜在的阅读移码。

开发了一种新的数学方法,可用于检测蛋白质编码序列(cds)中的潜在阅读移码。使用动态编程和遗传算法将算法调整为每个分析序列的三联体周期性。这不需要任何初步培训。使用开发的方法,分析了拟南芥基因组的cds。总共,该算法找到了9,930个序列,其中包含一个或多个潜在的读取移码。这是基因组所有分析序列的〜21%。I型和II型错误率分别估计为11%和30%。对于秀丽隐杆线虫,黑腹果蝇,智人,褐家鼠和热带非洲爪蟾的基因组也获得了相似的结果。同样,在17个细菌基因组上测试了开发的算法。我们将我们的结果与先前获得的数据进行了比较,以寻找这些基因组中潜在的阅读移码。这项研究讨论了阅读移码似乎是相对频繁遇到的突变的可能性。这种突变可以参与新基因和蛋白质的产生。
更新日期:2019-11-01
down
wechat
bug