Statistically consistent divide-and-conquer pipelines for phylogeny estimation using NJMerge.,Algorithms for Molecular Biology

当前位置： X-MOL 学术 › Algorithms Mol. Biol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Statistically consistent divide-and-conquer pipelines for phylogeny estimation using NJMerge.
Algorithms for Molecular Biology ( IF 1.5 ) Pub Date : 2019-07-19 , DOI: 10.1186/s13015-019-0151-x
Erin K Molloy ₁ , Tandy Warnow ₁

Affiliation

BACKGROUND Divide-and-conquer methods, which divide the species set into overlapping subsets, construct a tree on each subset, and then combine the subset trees using a supertree method, provide a key algorithmic framework for boosting the scalability of phylogeny estimation methods to large datasets. Yet the use of supertree methods, which typically attempt to solve NP-hard optimization problems, limits the scalability of such approaches. RESULTS In this paper, we introduce a divide-and-conquer approach that does not require supertree estimation: we divide the species set into pairwise disjoint subsets, construct a tree on each subset using a base method, and then combine the subset trees using a distance matrix. For this merger step, we present a new method, called NJMerge, which is a polynomial-time extension of Neighbor Joining (NJ); thus, NJMerge can be viewed either as a method for improving traditional NJ or as a method for scaling the base method to larger datasets. We prove that NJMerge can be used to create divide-and-conquer pipelines that are statistically consistent under some models of evolution. We also report the results of an extensive simulation study evaluating NJMerge on multi-locus datasets with up to 1000 species. We found that NJMerge sometimes improved the accuracy of traditional NJ and substantially reduced the running time of three popular species tree methods (ASTRAL-III, SVDquartets, and "concatenation" using RAxML) without sacrificing accuracy. Finally, although NJMerge can fail to return a tree, in our experiments, NJMerge failed on only 11 out of 2560 test cases. CONCLUSIONS Theoretical and empirical results suggest that NJMerge is a valuable technique for large-scale phylogeny estimation, especially when computational resources are limited. NJMerge is freely available on Github (http://github.com/ekmolloy/njmerge).

中文翻译：

使用 NJMerge 进行系统发育估计的统计一致的分治流程。

背景分而治之方法，将物种集划分为重叠的子集，在每个子集上构建一棵树，然后使用超树方法组合子集树，为提高系统发育估计方法的可扩展性提供了关键的算法框架。数据集。然而，通常尝试解决 NP 难优化问题的超树方法的使用限制了此类方法的可扩展性。结果在本文中，我们引入了一种不需要超树估计的分而治之的方法：我们将物种集划分为成对不相交的子集，使用基本方法在每个子集上构造一棵树，然后使用距离矩阵。对于这个合并步骤，我们提出了一种新方法，称为 NJMerge，它是 Neighbor Joining (NJ) 的多项式时间扩展；因此，NJMerge 既可以被视为改进传统 NJ 的方法，也可以被视为将基本方法扩展到更大数据集的方法。我们证明 NJMerge 可用于创建在某些进化模型下统计上一致的分而治之的管道。我们还报告了在多达 1000 个物种的多位点数据集上评估 NJMerge 的广泛模拟研究的结果。我们发现 NJMerge 有时提高了传统 NJ 的准确性，并在不牺牲准确性的情况下大幅减少了三种流行物种树方法（ASTRAL-III、SVDquartets 和使用 RAxML 的“串联”）的运行时间。最后，尽管 NJMerge 可能无法返回树，但在我们的实验中，NJMerge 在 2560 个测试用例中仅失败了 11 个。结论理论和实证结果表明，NJMerge 是大规模系统发育估计的一种有价值的技术，特别是在计算资源有限的情况下。NJMerge 可在 Github (http://github.com/ekmolloy/njmerge) 上免费获取。

更新日期：2019-11-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11