当前位置: X-MOL 学术Algorithms Mol. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
OCTAL: Optimal Completion of gene trees in polynomial time.
Algorithms for Molecular Biology ( IF 1 ) Pub Date : 2018-03-15 , DOI: 10.1186/s13015-018-0124-5
Sarah Christensen 1 , Erin K Molloy 1 , Pranjal Vachaspati 1 , Tandy Warnow 1
Affiliation  

BACKGROUND For a combination of reasons (including data generation protocols, approaches to taxon and gene sampling, and gene birth and loss), estimated gene trees are often incomplete, meaning that they do not contain all of the species of interest. As incomplete gene trees can impact downstream analyses, accurate completion of gene trees is desirable. RESULTS We introduce the Optimal Tree Completion problem, a general optimization problem that involves completing an unrooted binary tree (i.e., adding missing leaves) so as to minimize its distance from a reference tree on a superset of the leaves. We present OCTAL, an algorithm that finds an optimal solution to this problem when the distance between trees is defined using the Robinson-Foulds (RF) distance, and we prove that OCTAL runs in [Formula: see text] time, where n is the total number of species. We report on a simulation study in which gene trees can differ from the species tree due to incomplete lineage sorting, and estimated gene trees are completed using OCTAL with a reference tree based on a species tree estimated from the multi-locus dataset. OCTAL produces completed gene trees that are closer to the true gene trees than an existing heuristic approach in ASTRAL-II, but the accuracy of a completed gene tree computed by OCTAL depends on how topologically similar the reference tree (typically an estimated species tree) is to the true gene tree. CONCLUSIONS OCTAL is a useful technique for adding missing taxa to incomplete gene trees and provides good accuracy under a wide range of model conditions. However, results show that OCTAL's accuracy can be reduced when incomplete lineage sorting is high, as the reference tree can be far from the true gene tree. Hence, this study suggests that OCTAL would benefit from using other types of reference trees instead of species trees when there are large topological distances between true gene trees and species trees.

中文翻译:

八进制:多项式时间内基因树的最佳完成。

背景由于多种原因(包括数据生成协议、分类单元和基因采样方法以及基因诞生和丢失),估计的基因树通常是不完整的,这意味着它们不包含所有感兴趣的物种。由于不完整的基因树会影响下游分析,因此需要准确完成基因树。结果我们介绍了最优树完成问题,这是一个一般的优化问题,涉及完成一个无根二叉树(即添加缺失的叶子),以使其与叶子超集上的参考树的距离最小。我们提出了 OCTAL,这是一种在使用 Robinson-Foulds (RF) 距离定义树之间的距离时找到该问题的最佳解决方案的算法,并且我们证明了 OCTAL 在 [公式:见文本] 时间内运行,其中n是物种的总数。我们报告了一项模拟研究,其中基因树可能由于不完整的谱系分类而与物种树不同,并且估计的基因树是使用 OCTAL 和基于从多位点数据集估计的物种树的参考树完成的。OCTAL 生成的完整基因树比 ASTRAL-II 中现有的启发式方法更接近真实基因树,但 OCTAL 计算的完整基因树的准确性取决于参考树(通常是估计的物种树)在拓扑上的相似程度到真正的基因树。结论 八进制是一种有用的技术,可以将缺失的分类群添加到不完整的基因树中,并在各种模型条件下提供良好的准确性。然而,结果表明,OCTAL' 当不完整的谱系排序很高时,s 准确性会降低,因为参考树可能远离真正的基因树。因此,这项研究表明,当真正的基因树和物种树之间存在较大的拓扑距离时,OCTAL 将受益于使用其他类型的参考树而不是物种树。
更新日期:2019-11-01
down
wechat
bug