当前位置: X-MOL 学术BMC Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
LMAP_S: Lightweight Multigene Alignment and Phylogeny eStimation.
BMC Bioinformatics ( IF 2.9 ) Pub Date : 2019-12-30 , DOI: 10.1186/s12859-019-3292-5
Emanuel Maldonado 1 , Agostinho Antunes 1, 2
Affiliation  

BACKGROUND Recent advances in genome sequencing technologies and the cost drop in high-throughput sequencing continue to give rise to a deluge of data available for downstream analyses. Among others, evolutionary biologists often make use of genomic data to uncover phenotypic diversity and adaptive evolution in protein-coding genes. Therefore, multiple sequence alignments (MSA) and phylogenetic trees (PT) need to be estimated with optimal results. However, the preparation of an initial dataset of multiple sequence file(s) (MSF) and the steps involved can be challenging when considering extensive amount of data. Thus, it becomes necessary the development of a tool that removes the potential source of error and automates the time-consuming steps of a typical workflow with high-throughput and optimal MSA and PT estimations. RESULTS We introduce LMAP_S (Lightweight Multigene Alignment and Phylogeny eStimation), a user-friendly command-line and interactive package, designed to handle an improved alignment and phylogeny estimation workflow: MSF preparation, MSA estimation, outlier detection, refinement, consensus, phylogeny estimation, comparison and editing, among which file and directory organization, execution, manipulation of information are automated, with minimal manual user intervention. LMAP_S was developed for the workstation multi-core environment and provides a unique advantage for processing multiple datasets. Our software, proved to be efficient throughout the workflow, including, the (unlimited) handling of more than 20 datasets. CONCLUSIONS We have developed a simple and versatile LMAP_S package enabling researchers to effectively estimate multiple datasets MSAs and PTs in a high-throughput fashion. LMAP_S integrates more than 25 software providing overall more than 65 algorithm choices distributed in five stages. At minimum, one FASTA file is required within a single input directory. To our knowledge, no other software combines MSA and phylogeny estimation with as many alternatives and provides means to find optimal MSAs and phylogenies. Moreover, we used a case study comparing methodologies that highlighted the usefulness of our software. LMAP_S has been developed as an open-source package, allowing its integration into more complex open-source bioinformatics pipelines. LMAP_S package is released under GPLv3 license and is freely available at https://lmap-s.sourceforge.io/.

中文翻译:

LMAP_S:轻量级多基因比对和系统进化估计。

背景技术基因组测序技术的最新进展和高通量测序的成本下降继续引起大量可用于下游分析的数据。其中,进化生物学家经常利用基因组数据来揭示蛋白质编码基因的表型多样性和适应性进化。因此,需要估计具有最佳结果的多重序列比对(MSA)和系统发育树(PT)。但是,在考虑大量数据时,准备多个序列文件(MSF)的初始数据集以及所涉及的步骤可能是具有挑战性的。因此,有必要开发一种工具来消除潜在的错误源,并通过高通量,最佳的MSA和PT估算来自动化典型工作流程的耗时步骤。结果我们引入了LMAP_S(轻量级多基因比对和系统发育eStimation),这是一种用户友好的命令行和交互式程序包,旨在处理改进的比对和系统发育估计工作流程:MSF准备,MSA估计,离群值检测,细化,共识,系统发育估计,比较和编辑,其中文件和目录的组织,执行,信息处理是自动化的,而用户的人工干预却很少。LMAP_S是为工作站多核环境开发的,为处理多个数据集提供了独特的优势。我们的软件被证明在整个工作流程中都是高效的,包括(无限)处理20多个数据集。结论我们开发了一个简单而通用的LMAP_S软件包,使研究人员能够以高通量方式有效地估计多个数据集MSA和PT。LMAP_S集成了25多种软件,提供了五个阶段中总共65种以上的算法选择。单个输入目录中至少需要一个FASTA文件。据我们所知,没有其他软件将MSA和系统发育评估与许多替代方案结合起来,并提供了寻找最佳MSA和系统发育的手段。此外,我们使用了一个案例研究,比较了强调我们软件实用性的方法。LMAP_S已开发为开放源代码软件包,可将其集成到更复杂的开放源代码生物信息学管道中。LMAP_S软件包是根据GPLv3许可证发布的,可从https:// lmap-s免费获得。
更新日期:2019-12-31
down
wechat
bug