当前位置: X-MOL 学术Algorithms Mol. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Non-parametric correction of estimated gene trees using TRACTION
Algorithms for Molecular Biology ( IF 1 ) Pub Date : 2020-01-04 , DOI: 10.1186/s13015-019-0161-8
Sarah Christensen 1 , Erin K Molloy 1 , Pranjal Vachaspati 1 , Ananya Yammanuru 1 , Tandy Warnow 1
Affiliation  

Motivation
Estimated gene trees are often inaccurate, due to insufficient phylogenetic signal in the single gene alignment, among other causes. Gene tree correction aims to improve the accuracy of an estimated gene tree by using computational techniques along with auxiliary information, such as a reference species tree or sequencing data. However, gene trees and species trees can differ as a result of gene duplication and loss (GDL), incomplete lineage sorting (ILS), and other biological processes. Thus gene tree correction methods need to take estimation error as well as gene tree heterogeneity into account. Many prior gene tree correction methods have been developed for the case where GDL is present.

Results
Here, we study the problem of gene tree correction where gene tree heterogeneity is instead due to ILS and/or HGT. We introduce TRACTION, a simple polynomial time method that provably finds an optimal solution to the RF-optimal tree refinement and completion (RF-OTRC) Problem, which seeks a refinement and completion of a singly-labeled gene tree with respect to a given singly-labeled species tree so as to minimize the Robinson−Foulds (RF) distance. Our extensive simulation study on 68,000 estimated gene trees shows that TRACTION matches or improves on the accuracy of well-established methods from the GDL literature when HGT and ILS are both present, and ties for best under the ILS-only conditions. Furthermore, TRACTION ties for fastest on these datasets. We also show that a naive generalization of the RF-OTRC problem to multi-labeled trees is possible, but can produce misleading results where gene tree heterogeneity is due to GDL.



中文翻译:

使用 TRACTION 对估计的基因树进行非参数校正

由于单基因比对中的系统发育
信号不足等原因,估计的基因树通常不准确。基因树校正旨在通过使用计算技术以及参考物种树或测序数据等辅助信息来提高估计基因树的准确性。然而,基因树和物种树可能因基因复制和丢失 (GDL)、不完全谱系分类 (ILS) 和其他生物学过程而有所不同。因此基因树校正方法需要考虑估计误差以及基因树异质性。对于存在 GDL 的情况,已经开发了许多先前的基因树校正方法。

结果
在这里,我们研究了基因树校正的问题,其中基因树的异质性是由 ILS 和/或 HGT 引起的。我们介绍了 TRACTION,这是一种简单的多项式时间方法,可证明找到 RF 最优树细化和完成 (RF-OTRC) 问题的最佳解决方案,该问题寻求针对给定的单个标记的基因树的细化和完成-标记的物种树,以最小化 Robinson-Foulds (RF) 距离。我们对 68,000 个估计基因树的广泛模拟研究表明,当 HGT 和 ILS 都存在时,TRACTION 匹配或提高了 GDL 文献中成熟方法的准确性,并且在仅 ILS 的条件下达到最佳状态。此外,TRACTION 在这些数据集上的速度最快。

更新日期:2020-01-04
down
wechat
bug