Parsimony analysis of phylogenomic datasets (II): evaluation of PAUP*, MEGA and MPBoot,Cladistics

当前位置： X-MOL 学术 › Cladistics › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Parsimony analysis of phylogenomic datasets (II): evaluation of PAUP*, MEGA and MPBoot
Cladistics ( IF 3.6 ) Pub Date : 2021-07-14 , DOI: 10.1111/cla.12476
Pablo A Goloboff _{1,

2} , Santiago A Catalano _{1,

3} , Ambrosio Torres ₁

Affiliation

This paper examines the implementation of parsimony methods in the programs PAUP*, MEGA and MPBoot, and compares them with TNT. PAUP* implements standard, well-tested algorithms, and flexible search strategies and options for handling trees; its main drawback is the lack of advanced search algorithms, which makes it difficult to find most parsimonious trees for large and complex datasets. In addition, branch-swapping can be much slower than in TNT for datasets with large numbers of taxa, although this is only occasionally a problem for phylogenomic datasets given that they typically have small numbers of taxa. The parsimony implementation of MEGA has major drawbacks. MEGA often fails to find parsimonious trees because it does not perform all possible branch swapping subtree pruning regrafting (SPR)/tree bisection-reconnection (TBR) rearrangements. It furthermore fails to properly handle ambiguity or multiple equally parsimonious trees, and it uses the same addition sequence for all bootstrap replicates. The latter yields values of group support that depend on the order in which taxa are listed in the dataset. In addition, tree searches are very slow and do not facilitate the exploration of different starting points (as random seed is fixed). MPBoot searches for optimal trees using the ratchet, but it is based on SPR instead of TBR (and only evaluates by default a subset of the SPR rearrangements). MPBoot approximates bootstrap frequencies by first finding a sample of trees and then selecting from those trees for every replicate, without performing a tree-search. The approximation is too rough in many cases, producing serious under- or overestimations of the correct support values and, for most kinds of datasets, slower estimations than can be obtained with TNT. In addition, bootstrapping with PAUP*, MEGA or MPBoot can attribute strong supports to groups that have no support at all under any meaningful concept of support, such as likelihood ratios or Bremer supports. In TNT, this problem is decreased by using the strict consensus tree to represent each replicate, or eliminated entirely by using different approximations of the Bremer support.

中文翻译：

系统基因组数据集的简约分析（II）：PAUP*、MEGA 和 MPBoot 的评估

本文研究了 PAUP*、MEGA 和 MPBoot 程序中简约方法的实现，并将它们与 TNT 进行了比较。PAUP* 实现了标准的、经过充分测试的算法，以及灵活的搜索策略和处理树的选项；它的主要缺点是缺乏高级搜索算法，这使得很难为大型和复杂的数据集找到最简约的树。此外，对于具有大量分类群的数据集，分支交换可能比在 TNT 中慢得多，尽管这对于系统基因组数据集来说只是偶尔出现的问题，因为它们通常具有少量分类群。MEGA 的简约实现有很大的缺点。MEGA 经常无法找到简约树，因为它没有执行所有可能的分支交换子树修剪重接 (SPR)/树二等分重连接 (TBR) 重排。此外，它无法正确处理歧义或多个同样简约的树，并且它对所有引导复制使用相同的添加序列。后者产生的组支持值取决于分类单元在数据集中列出的顺序。此外，树搜索非常慢，不利于不同起点的探索（因为随机种子是固定的）。MPBoot 使用棘轮搜索最佳树，但它基于 SPR 而不是 TBR（并且默认情况下仅评估 SPR 重排的子集）。MPBoot 通过首先找到一个树样本然后从这些树中为每个复制选择来近似引导频率，而不执行树搜索。在许多情况下，近似值过于粗糙，导致对正确支持值的严重低估或高估，并且，对于大多数类型的数据集，估计速度比使用 TNT 获得的速度要慢。此外，使用 PAUP*、MEGA 或 MPBoot 进行引导可以将强支持归因于在任何有意义的支持概念下根本没有支持的组，例如似然比或 Bremer 支持。在 TNT 中，这个问题通过使用严格的共识树来表示每个复制来减少，或者通过使用 Bremer 支持的不同近似值完全消除。

更新日期：2021-07-14

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>