Relative Model Fit Does Not Predict Topological Accuracy in Single-Gene Protein Phylogenetics.,Molecular Biology and Evolution

当前位置： X-MOL 学术 › Mol. Biol. Evol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Relative Model Fit Does Not Predict Topological Accuracy in Single-Gene Protein Phylogenetics.
Molecular Biology and Evolution ( IF 11.0 ) Pub Date : 2020-04-02 , DOI: 10.1093/molbev/msaa075
Stephanie J Spielman ₁

Affiliation

It is regarded as best practice in phylogenetic reconstruction to perform relative model selection to determine an appropriate evolutionary model for the data. This procedure ranks a set of candidate models according to their goodness of fit to the data, commonly using an information theoretic criterion. Users then specify the best-ranking model for inference. Although it is often assumed that better-fitting models translate to increase accuracy, recent studies have shown that the specific model employed may not substantially affect inferences. We examine whether there is a systematic relationship between relative model fit and topological inference accuracy in protein phylogenetics, using simulations and real sequences. Simulations employed site-heterogeneous mechanistic codon models that are distinct from protein-level phylogenetic inference models, allowing us to investigate how protein models performs when they are misspecified to the data, as will be the case for any real sequence analysis. We broadly find that phylogenies inferred across models with vastly different fits to the data produce highly consistent topologies. We additionally find that all models infer similar proportions of false-positive splits, raising the possibility that all available models of protein evolution are similarly misspecified. Moreover, we find that the parameter-rich GTR (general time reversible) model, whose amino acid exchangeabilities are free parameters, performs similarly to models with fixed exchangeabilities, although the inference precision associated with GTR models was not examined. We conclude that, although relative model selection may not hinder phylogenetic analysis on protein data, it may not offer specific predictable improvements and is not a reliable proxy for accuracy.

中文翻译：

相对模型拟合不能预测单基因蛋白质系统发育学中的拓扑准确性。

在系统发育重建中执行相对模型选择以确定数据的合适进化模型被视为最佳实践。此过程通常使用信息理论标准，根据一组候选模型对数据的拟合优度对其进行排名。然后，用户为推理指定最佳排名模型。尽管通常认为更适合的模型可以提高准确性，但最近的研究表明，所采用的特定模型可能不会显着影响推理。我们使用模拟和真实序列检查蛋白质系统发育中相对模型拟合和拓扑推理准确性之间是否存在系统关系。模拟采用不同于蛋白质水平系统发育推断模型的位点异构机制密码子模型，这使我们能够研究蛋白质模型在错误指定给数据时的表现，就像任何真实序列分析的情况一样。我们广泛地发现，根据与数据完全不同的模型推断出的系统发育会产生高度一致的拓扑。我们还发现，所有模型都推断出假阳性拆分的比例相似，从而提高了所有可能的蛋白质进化模型都被错误指定的可能性。此外，我们发现，富含氨基酸的GTR（通用时间可逆）模型的氨基酸交换能力是自由参数，其行为类似于具有固定交换能力的模型，尽管未检查与GTR模型相关的推理精度。我们得出的结论是，尽管相对模型的选择可能不会妨碍对蛋白质数据的系统发育分析，但它可能无法提供特定的可预测的改进，也不是准确性的可靠替代。

更新日期：2020-04-02

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11