Divergence and support among slightly suboptimal likelihood gene trees,Cladistics

当前位置： X-MOL 学术 › Cladistics › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Divergence and support among slightly suboptimal likelihood gene trees
Cladistics ( IF 3.9 ) Pub Date : 2019-11-13 , DOI: 10.1111/cla.12404
Mark P Simmons ₁ , John Kessenich ₂

Affiliation

Contemporary phylogenomic studies frequently incorporate two‐step coalescent analyses wherein the first step is to infer individual‐gene trees, generally using maximum‐likelihood implemented in the popular programs PhyML or RAxML. Four concerns with this approach are that these programs only present a single fully resolved gene tree to the user despite potential for ambiguous support, insufficient phylogenetic signal to fully resolve each gene tree, inexact computer arithmetic affecting the reported likelihood of gene trees, and an exclusive focus on the most likely tree while ignoring trees that are only slightly suboptimal or within the error tolerance. Taken together, these four concerns are sufficient for RAxML and PhyML users to be suspicious of the resulting (perhaps over‐resolved) gene‐tree topologies and (perhaps unjustifiably high) bootstrap support for individual clades. In this study, we sought to determine how frequently these concerns apply in practice to contemporary phylogenomic studies that use RAxML for gene‐tree inference. We did so by re‐analyzing 100 genes from each of ten studies that, taken together, are representative of many empirical phylogenomic studies. Our seven findings are as follows. First, the few search replicates that are frequently applied in phylogenomic studies are generally insufficient to find the optimal gene‐tree topology. Second, there is often more topological variation among slightly suboptimal gene trees relative to the best‐reported tree than can be safely ignored. Third, the Shimodaira–Hasegawa‐like approximate likelihood ratio test is highly effective at identifying dubiously supported clades and outperforms the alternative approaches of relying on bootstrap support or collapsing minimum‐length branches. Fourth, the bootstrap can, but rarely does, indicate high support for clades that are not supported amongst slightly suboptimal trees. Fifth, increasing the accuracy by which RAxML optimizes model‐parameter values generally has a nominal effect on selection of optimal trees. Sixth, tree searches using the GTRCAT model were generally less effective at finding optimal known trees than those using the GTRGAMMA model. Seventh, choice of gene‐tree sampling strategy can affect inferred coalescent branch lengths, species‐tree topology and branch support.

中文翻译：

轻微次优似然基因树之间的分歧和支持

当代系统基因组学研究经常包含两步合并分析，其中第一步是推断单个基因树，通常使用流行程序 PhyML 或 RAxML 中实现的最大似然。这种方法的四个问题是，这些程序只向用户提供一个完全解析的基因树，尽管可能存在模棱两可的支持，系统发育信号不足以完全解析每个基因树，不精确的计算机算法会影响报告的基因树的可能性，以及唯一的关注最有可能的树，而忽略仅略微次优或在容错范围内的树。综合起来，这四个问题足以让 RAxML 和 PhyML 用户怀疑由此产生的（可能是过度解析的）基因树拓扑结构和（可能是不合理的高）对单个进化枝的引导支持。在这项研究中，我们试图确定这些问题在实践中应用到使用 RAxML 进行基因树推断的当代系统基因组研究的频率。为此，我们重新分析了十项研究中每一项的 100 个基因，这些研究合起来代表了许多经验性系统发育研究。我们的七个发现如下。首先，在系统发育研究中经常应用的少数搜索重复通常不足以找到最佳的基因树拓扑。其次，相对于最佳报告的树，在稍微次优的基因树之间通常存在更多的拓扑变异，而不是可以安全地忽略。第三，下平-长谷川类近似似然比检验在识别可疑支持的进化枝方面非常有效，并且优于依赖引导支持或折叠最小长度分支的替代方法。第四，引导程序可以（但很少）表明对在稍微次优的树中不受支持的进化枝的高度支持。第五，提高 RAxML 优化模型参数值的准确性通常对选择最优树有名义上的影响。第六，使用 GTRCAT 模型的树搜索在寻找最佳已知树方面通常不如使用 GTRGAMMA 模型的树搜索有效。第七，基因树采样策略的选择会影响推断的聚结分支长度、物种树拓扑和分支支持。Shimodira-Hasegawa 类似的近似似然比检验在识别可疑支持的进化枝方面非常有效，并且优于依赖引导支持或折叠最小长度分支的替代方法。第四，引导程序可以（但很少）表明对在稍微次优的树中不受支持的进化枝的高度支持。第五，提高 RAxML 优化模型参数值的准确性通常对选择最优树有名义上的影响。第六，使用 GTRCAT 模型的树搜索在寻找最佳已知树方面通常不如使用 GTRGAMMA 模型的树搜索有效。第七，基因树采样策略的选择会影响推断的聚结分支长度、物种树拓扑和分支支持。Shimodira-Hasegawa 类似的近似似然比检验在识别可疑支持的进化枝方面非常有效，并且优于依赖引导支持或折叠最小长度分支的替代方法。第四，引导程序可以（但很少）表明对在稍微次优的树中不受支持的进化枝的高度支持。第五，提高 RAxML 优化模型参数值的准确性通常对选择最优树有名义上的影响。第六，使用 GTRCAT 模型的树搜索在寻找最佳已知树方面通常不如使用 GTRGAMMA 模型的树搜索有效。第七，基因树采样策略的选择会影响推断的聚结分支长度、物种树拓扑和分支支持。

更新日期：2019-11-13

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11