当前位置: X-MOL 学术Algorithms Mol. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Constrained incremental tree building: new absolute fast converging phylogeny estimation methods with improved scalability and accuracy.
Algorithms for Molecular Biology ( IF 1 ) Pub Date : 2019-02-06 , DOI: 10.1186/s13015-019-0136-9
Qiuyi Zhang 1 , Satish Rao 2 , Tandy Warnow 3
Affiliation  

BACKGROUND Absolute fast converging (AFC) phylogeny estimation methods are ones that have been proven to recover the true tree with high probability given sequences whose lengths are polynomial in the number of number of leaves in the tree (once the shortest and longest branch weights are fixed). While there has been a large literature on AFC methods, the best in terms of empirical performance was D C M NJ , published in SODA 2001. The main empirical advantage of DCM NJ over other AFC methods is its use of neighbor joining (NJ) to construct trees on smaller taxon subsets, which are then combined into a tree on the full set of species using a supertree method; in contrast, the other AFC methods in essence depend on quartet trees that are computed independently of each other, which reduces accuracy compared to neighbor joining. However, DCM NJ is unlikely to scale to large datasets due to its reliance on supertree methods, as no current supertree methods are able to scale to large datasets with high accuracy. RESULTS In this study we present a new approach to large-scale phylogeny estimation that shares some of the features of DCM NJ but bypasses the use of supertree methods. We prove that this new approach is AFC and uses polynomial time and space. Furthermore, we describe variations on this basic approach that can be used with leaf-disjoint constraint trees (computed using methods such as maximum likelihood) to produce other methods that are likely to provide even better accuracy. Thus, we present a new generalizable technique for large-scale tree estimation that is designed to improve scalability for phylogeny estimation methods to ultra-large datasets, and that can be used in a variety of settings (including tree estimation from unaligned sequences, and species tree estimation from gene trees).

中文翻译:

约束增量树构建:新的绝对快速收敛系统发育估计方法,具有改进的可扩展性和准确性。

背景 绝对快速收敛 (AFC) 系统发育估计方法是已被证明可以在给定序列的情况下以高概率恢复真实树的方法,这些序列的长度是树中叶子数量的多项式(一旦最短和最长分支权重是固定的) )。虽然有大量关于 AFC 方法的文献,但在经验性能方面最好的是 DCM NJ,发表于 SODA 2001。与其他 AFC 方法相比,DCM NJ 的主要经验优势是它使用邻居连接 (NJ) 来构建树在较小的分类单元子集上,然后使用超级树方法将其组合成完整物种集的树;相比之下,其他 AFC 方法本质上依赖于彼此独立计算的四重奏树,与相邻连接相比,这会降低准确性。然而,DCM NJ 不太可能扩展到大型数据集,因为它依赖于超级树方法,因为当前没有超级树方法能够以高精度扩展到大型数据集。结果 在这项研究中,我们提出了一种大规模系统发育估计的新方法,它具有 DCM NJ 的一些特征,但绕过了超级树方法的使用。我们证明这种新方法是 AFC 并且使用多项式时间和空间。此外,我们描述了这种基本方法的变体,可以与叶不相交约束树(使用最大似然等方法计算)一起使用,以产生可能提供更好准确性的其他方法。因此,我们提出了一种用于大规模树估计的新通用技术,旨在提高系统发育估计方法对超大型数据集的可扩展性,
更新日期:2019-11-01
down
wechat
bug