当前位置: X-MOL 学术Algorithms Mol. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing.
Algorithms for Molecular Biology ( IF 1 ) Pub Date : 2017-10-14 , DOI: 10.1186/s13015-017-0116-x
Shixiang Wan 1 , Quan Zou 1, 2
Affiliation  

BACKGROUND Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types. METHODS Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e.g. files more than 1 GB) sequence analyses. Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic tree construction. RESULTS The experiments in the DNA and protein large scale data sets, which are more than 1GB files, showed that HAlign II could save time and space. It outperformed the current software tools. HAlign-II can efficiently carry out MSA and construct phylogenetic trees with ultra-large numbers of biological sequences. HAlign-II shows extremely high memory efficiency and scales well with increases in computing resource. CONCLUSIONS THAlign-II provides a user-friendly web server based on our distributed computing infrastructure. HAlign-II with open-source codes and datasets was established at http://lab.malab.cn/soft/halign.

中文翻译:

HAlign-II:高效的超大型多序列比对和系统发育树重建,具有分布式和并行计算功能。

背景技术多序列比对(MSA)在生物序列分析中,特别是在系统树的构建中起着关键作用。下一代测序的极大增加导致缺乏有效的超大型生物序列比对方法来应对不同的序列类型。方法分布式并行计算是加速超大型(例如,文件大于1 GB)序列分析的一项关键技术。基于HAlign和Spark分布式计算系统,我们实现了一种具有成本效益和时间效率的HAlign-II工具,以解决超大型多重生物序列比对和系统树的构建。结果对DNA和蛋白质的大规模数据集(超过1GB的文件)进行的实验表明,HAlign II可以节省时间和空间。它的性能优于当前的软件工具。HAlign-II可以有效地执行MSA并构建具有超大量生物序列的系统发育树。HAlign-II显示出极高的内存效率,并且可以随着计算资源的增加而很好地扩展。结论THAlign-II提供了基于我们的分布式计算基础架构的用户友好型Web服务器。具有开源代码和数据集的HAlign-II在http://lab.malab.cn/soft/halign上建立。
更新日期:2019-11-01
down
wechat
bug