当前位置: X-MOL 学术Bioinformatics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
FAME: fast and memory efficient multiple sequences alignment tool through compatible chain of roots.
Bioinformatics ( IF 5.8 ) Pub Date : 2020-03-14 , DOI: 10.1093/bioinformatics/btaa175
Etminan Naznooshsadat 1 , Parvinnia Elham 1 , Sharifi-Zarchi Ali 2
Affiliation  

Motivation
Multiple sequence alignment (MSA) is important and challenging problem of computational biology. Most of the existing methods can only provide a short length multiple alignments in an acceptable time. Nevertheless, when the researchers confront the genome size in the multiple alignments, the process has required a huge processing space/time. Accordingly, using the method that can align genome size rapidly and precisely has a great effect, especially on the analysis of the very long alignments. Herein, we have proposed an efficient method, called FAME, which vertically divides sequences from the places that they have common areas; then they are arranged in consecutive order. Then these common areas are shifted and placed under each other, and the subsequences between them are aligned using any existing MSA tool.
Results
The results demonstrate that the combination of FAME and the MSA methods and deploying minimizer are capable to be executed on personal computer and finely align long length sequences with much higher sum-of-pair (SP) score compared to the standalone MSA tools. As we select genomic datasets with longer length, the SP score of the combinatorial methods is gradually improved. The calculated computational complexity of methods supports the results in a way that combining FAME and the MSA tools leads to at least four times faster execution on the datasets.
Availability
The source code and all datasets and run-parameters are accessible free on http://github.com/naznoosh/msa.


中文翻译:

FAME:通过兼容的根链实现快速且高效存储的多序列比对工具。

动机
多序列比对(MSA)是计算生物学中重要且具有挑战性的问题。大多数现有方法只能在可接受的时间内提供短长度的多重比对。然而,当研究人员在多重比对中面对基因组大小时,该过程需要巨大的处理空间/时间。因此,使用能够快速和精确地比对基因组大小的方法具有很大的效果,特别是在非常长的比对的分析上。在此,我们提出了一种称为FAME的有效方法,该方法将序列从它们具有公共区域的位置垂直划分出来。然后按连续顺序排列。然后,将这些公共区域移动并放置在彼此之下,并使用任何现有的MSA工具对齐它们之间的子序列。
结果
结果表明,与独立的MSA工具相比,FAME和MSA方法以及部署的最小化程序的组合能够在个人计算机上执行,并且可以以高得多的对和(SP)得分对长序列进行精细比对。随着我们选择长度更长的基因组数据集,组合方法的SP得分逐渐提高。计算出的方法的计算复杂度以某种方式支持了结果,方法是将FAME和MSA工具相结合至少可以使数据集执行速度快四倍。
可用性
可从http://github.com/naznoosh/msa免费访问源代码以及所有数据集和运行参数。
更新日期:2020-03-16
down
wechat
bug