当前位置: X-MOL 学术Syst. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
DISCO: Species Tree Inference using Multicopy Gene Family Tree Decomposition
Systematic Biology ( IF 6.1 ) Pub Date : 2021-08-27 , DOI: 10.1093/sysbio/syab070
James Willson 1 , Mrinmoy Saha Roddur 1 , Baqiao Liu 1 , Paul Zaharias 1 , Tandy Warnow 1
Affiliation  

Species tree inference from gene family trees is a significant problem in computational biology. However, gene tree heterogeneity, which can be caused by several factors including gene duplication and loss, makes the estimation of species trees very challenging. While there have been several species tree estimation methods introduced in recent years to specifically address gene tree heterogeneity due to gene duplication and loss (such as DupTree, FastMulRFS, ASTRAL-Pro, and SpeciesRax), many incur high cost in terms of both running time and memory. We introduce a new approach, DISCO, that decomposes the multi-copy gene family trees into many single copy trees, which allows for methods previously designed for species tree inference in a single copy gene tree context to be used. We prove that using DISCO with ASTRAL (i.e., ASTRAL-DISCO) is statistically consistent under the GDL model, provided that ASTRAL-Pro correctly roots and tags each gene family tree. We evaluate DISCO paired with different methods for estimating species trees from single copy genes (e.g., ASTRAL, ASTRID, and IQ-TREE) under a wide range of model conditions, and establish that high accuracy can be obtained even when ASTRAL-Pro is not able to correctly roots and tags the gene family trees. We also compare results using MI, an alternative decomposition strategy from Yang Y. and Smith S.A. (2014), and find that DISCO provides better accuracy, most likely as a result of covering more of the gene family tree leafset in the output decomposition. [Concatenation analysis; gene duplication and loss; species tree inference; summary method.]

中文翻译:


DISCO:使用多拷贝基因谱系树分解进行物种树推断



从基因谱系树推断物种树是计算生物学中的一个重要问题。然而,基因树异质性可能由基因重复和丢失等多种因素引起,这使得物种树的估计非常具有挑战性。虽然近年来引入了几种物种树估计方法来专门解决由于基因重复和丢失而导致的基因树异质性(例如 DupTree、FastMulRFS、ASTRAL-Pro 和 SpeciesRax),但许多方法在运行时间方面都会产生高昂的成本和记忆。我们引入了一种新方法 DISCO,它将多拷贝基因家谱分解为许多单拷贝树,这允许使用先前为单拷贝基因树上下文中的物种树推断而设计的方法。我们证明,只要 ASTRAL-Pro 正确地对每个基因谱系树进行根和标记,在 GDL 模型下使用 DISCO 和 ASTRAL(即 ASTRAL-DISCO)在统计上是一致的。我们评估了 DISCO 与不同方法的配对,在各种模型条件下根据单拷贝基因(例如 ASTRAL、ASTRID 和 IQ-TREE)估计物种树,并确定即使 ASTRAL-Pro 不可用,也可以获得高精度。能够正确地根和标记基因谱系树。我们还比较了使用 MI(Yang Y. 和 Smith SA(2014)的另一种分解策略)的结果,发现 DISCO 提供了更好的准确性,很可能是因为在输出分解中覆盖了更多的基因家族树叶子集。 [串联分析;基因复制和丢失;物种树推断;总结法。]
更新日期:2021-08-27
down
wechat
bug