当前位置: X-MOL 学术Syst. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Bayesian Supertree Model for Genome-Wide Species Tree Reconstruction
Systematic Biology ( IF 6.1 ) Pub Date : 2014-10-03 , DOI: 10.1093/sysbio/syu082
Leonardo De Oliveira Martins 1 , Diego Mallo 2 , David Posada 2
Affiliation  

Current phylogenomic data sets highlight the need for species tree methods able to deal with several sources of gene tree/species tree incongruence. At the same time, we need to make most use of all available data. Most species tree methods deal with single processes of phylogenetic discordance, namely, gene duplication and loss, incomplete lineage sorting (ILS) or horizontal gene transfer. In this manuscript, we address the problem of species tree inference from multilocus, genome-wide data sets regardless of the presence of gene duplication and loss and ILS therefore without the need to identify orthologs or to use a single individual per species. We do this by extending the idea of Maximum Likelihood (ML) supertrees to a hierarchical Bayesian model where several sources of gene tree/species tree disagreement can be accounted for in a modular manner. We implemented this model in a computer program called guenomu whose inputs are posterior distributions of unrooted gene tree topologies for multiple gene families, and whose output is the posterior distribution of rooted species tree topologies. We conducted extensive simulations to evaluate the performance of our approach in comparison with other species tree approaches able to deal with more than one leaf from the same species. Our method ranked best under simulated data sets, in spite of ignoring branch lengths, and performed well on empirical data, as well as being fast enough to analyze relatively large data sets. Our Bayesian supertree method was also very successful in obtaining better estimates of gene trees, by reducing the uncertainty in their distributions. In addition, our results show that under complex simulation scenarios, gene tree parsimony is also a competitive approach once we consider its speed, in contrast to more sophisticated models.

中文翻译:


用于全基因组物种树重建的贝叶斯超级树模型



当前的系统发育数据集强调需要能够处理基因树/物种树不一致的多个来源的物种树方法。与此同时,我们需要充分利用所有可用的数据。大多数物种树方法处理系统发育不一致的单一过程,即基因重复和丢失、不完全谱系排序(ILS)或水平基因转移。在这份手稿中,我们解决了从多位点、全基因组数据集推断物种树的问题,无论是否存在基因重复和丢失以及ILS,因此不需要识别直系同源物或每个物种使用单个个体。我们通过将最大似然(ML)超树的思想扩展到分层贝叶斯模型来做到这一点,其中可以以模块化方式解释基因树/物种树不一致的多个来源。我们在一个名为 guenomu 的计算机程序中实现了这个模型,其输入是多个基因家族的无根基因树拓扑的后验分布,其输出是有根物种树拓扑的后验分布。我们进行了广泛的模拟,以评估我们的方法与能够处理同一物种的多个叶子的其他物种树方法相比的性能。尽管忽略了分支长度,我们的方法在模拟数据集下排名最好,并且在经验数据上表现良好,并且足够快来分析相对较大的数据集。我们的贝叶斯超级树方法通过减少基因树分布的不确定性,在获得更好的基因树估计方面也非常成功。 此外,我们的结果表明,在复杂的模拟场景下,与更复杂的模型相比,一旦我们考虑其速度,基因树简约也是一种有竞争力的方法。
更新日期:2014-10-03
down
wechat
bug