当前位置: X-MOL 学术Mol. Biol. Evol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
GeneRax: A Tool for Species-Tree-Aware Maximum Likelihood-Based Gene Family Tree Inference under Gene Duplication, Transfer, and Loss.
Molecular Biology and Evolution ( IF 11.0 ) Pub Date : 2020-06-05 , DOI: 10.1093/molbev/msaa141
Benoit Morel 1 , Alexey M Kozlov 1 , Alexandros Stamatakis 1, 2 , Gergely J Szöllősi 3, 4, 5
Affiliation  

Inferring phylogenetic trees for individual homologous gene families is difficult because alignments are often too short, and thus contain insufficient signal, while substitution models inevitably fail to capture the complexity of the evolutionary processes. To overcome these challenges species tree-aware methods also leverage information from a putative species tree. However, only few methods are available that implement a full likelihood framework or account for horizontal gene transfers. Furthermore, these methods often require expensive data pre-processing (e.g., computing bootstrap trees), and rely on approximations and heuristics that limit the degree of tree space exploration. Here we present GeneRax, the first maximum likelihood species tree-aware phylogenetic inference software. It simultaneously accounts for substitutions at the sequence level as well as gene level events, such as duplication, transfer, and loss relying on established maximum likelihood optimization algorithms. GeneRax can infer rooted phylogenetic trees for multiple gene families, directly from the per-gene sequence alignments and a rooted, yet undated, species tree. We show that compared to competing tools, on simulated data GeneRax infers trees that are the closest to the true tree in 90% of the simulations in terms of relative Robinson-Foulds distance. On empirical datasets, GeneRax is the fastest among all tested methods when starting from aligned sequences, and it infers trees with the highest likelihood score, based on our model. GeneRax completed tree inferences and reconciliations for 1099 Cyanobacteria families in eight minutes on 512 CPU cores. Thus, its parallelization scheme enables large-scale analyses. GeneRax is available under GNU GPL at https://github.com/BenoitMorel/GeneRax

中文翻译:

GeneRax:一种用于在基因复制、转移和丢失下进行基于物种树的最大似然基因家族树推断的工具。

推断单个同源基因家族的系统发育树很困难,因为比对通常太短,因此包含的信号不足,而替代模型不可避免地无法捕捉进化过程的复杂性。为了克服这些挑战,物种树感知方法还利用来自假定物种树的信息。然而,只有少数方法可以实现完全似然框架或解释水平基因转移。此外,这些方法通常需要昂贵的数据预处理(例如,计算引导树),并且依赖于限制树空间探索程度的近似和启发式方法。在这里,我们展示了 GeneRax,这是第一个最大似然物种树感知系统发育推理软件。它同时考虑了序列级别的替换以及基因级别的事件,例如依赖于已建立的最大似然优化算法的复制、转移和丢失。GeneRax 可以直接从每个基因序列比对和有根但未注明日期的物种树推断多个基因家族的有根系统发育树。我们表明,与竞争工具相比,在模拟数据上,GeneRax 推断出在 90% 的模拟中就相对 Robinson-Foulds 距离而言最接近真实树的树。在经验数据集上,从对齐序列开始,GeneRax 是所有测试方法中最快的,它根据我们的模型推断出具有最高似然分数的树。GeneRax 在 8 分钟内在 512 个 CPU 内核上完成了 1099 个蓝藻家族的树推断和协调。因此,它的并行化方案支持大规模分析。GeneRax 可在 GNU GPL 下获得,网址为 https://github.com/BenoitMorel/GeneRax
更新日期:2020-06-05
down
wechat
bug