当前位置: X-MOL 学术Syst. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The Multispecies Coalescent Model Outperforms Concatenation across Diverse Phylogenomic Data Sets
Systematic Biology ( IF 6.5 ) Pub Date : 2020-02-03 , DOI: 10.1093/sysbio/syaa008
Xiaodong Jiang 1 , Scott V Edwards 2 , Liang Liu 1, 3
Affiliation  

Abstract A statistical framework of model comparison and model validation is essential to resolving the debates over concatenation and coalescent models in phylogenomic data analysis. A set of statistical tests are here applied and developed to evaluate and compare the adequacy of substitution, concatenation, and multispecies coalescent (MSC) models across 47 phylogenomic data sets collected across tree of life. Tests for substitution models and the concatenation assumption of topologically congruent gene trees suggest that a poor fit of substitution models, rejected by 44% of loci, and concatenation models, rejected by 38% of loci, is widespread. Logistic regression shows that the proportions of GC content and informative sites are both negatively correlated with the fit of substitution models across loci. Moreover, a substantial violation of the concatenation assumption of congruent gene trees is consistently observed across six major groups (birds, mammals, fish, insects, reptiles, and others, including other invertebrates). In contrast, among those loci adequately described by a given substitution model, the proportion of loci rejecting the MSC model is 11%, significantly lower than those rejecting the substitution and concatenation models. Although conducted on reduced data sets due to computational constraints, Bayesian model validation and comparison both strongly favor the MSC over concatenation across all data sets; the concatenation assumption of congruent gene trees rarely holds for phylogenomic data sets with more than 10 loci. Thus, for large phylogenomic data sets, model comparisons are expected to consistently and more strongly favor the coalescent model over the concatenation model. We also found that loci rejecting the MSC have little effect on species tree estimation. Our study reveals the value of model validation and comparison in phylogenomic data analysis, as well as the need for further improvements of multilocus models and computational tools for phylogenetic inference. [Bayes factor; Bayesian model validation; coalescent prior; congruent gene trees; independent prior; Metazoa; posterior predictive simulation.]

中文翻译:

多物种聚结模型优于跨不同系统基因组数据集的串联

摘要 模型比较和模型验证的统计框架对于解决系统发育数据分析中关于串联和合并模型的争论至关重要。这里应用和开发了一组统计测试,以评估和比较跨生命树收集的 47 个系统发育数据集的替代、串联和多物种聚结 (MSC) 模型的充分性。替代模型的检验和拓扑一致基因树的串联假设表明,44% 的基因座拒绝的替代模型和串联模型的拟合不佳(被 38% 的基因座拒绝)是普遍存在的。Logistic 回归显示 GC 含量和信息位点的比例均与跨位点的替代模型拟合呈负相关。而且,在六个主要群体(鸟类、哺乳动物、鱼类、昆虫、爬行动物和其他动物,包括其他无脊椎动物)中一致观察到对一致基因树的串联假设的严重违反。相比之下,在给定替代模型充分描述的基因座中,拒绝 MSC 模型的基因座比例为 11%,明显低于拒绝替代和串联模型的基因座。尽管由于计算限制在减少的数据集上进行,贝叶斯模型验证和比较都强烈支持 MSC,而不是跨所有数据集的串联;全等基因树的串联假设很少适用于超过 10 个基因座的系统基因组数据集。因此,对于大型系统基因组数据集,与串联模型相比,模型比较预计将始终如一且更强烈地支持合并模型。我们还发现拒绝 MSC 的基因座对物种树估计几乎没有影响。我们的研究揭示了模型验证和比较在系统发育数据分析中的价值,以及进一步改进多基因座模型和系统发育推断计算工具的必要性。[贝叶斯因子;贝叶斯模型验证;聚结先验;一致的基因树;独立先验;后生动物;后验预测模拟。] 以及需要进一步改进用于系统发育推断的多位点模型和计算工具。[贝叶斯因子;贝叶斯模型验证;聚结先验;一致的基因树;独立先验;后生动物;后验预测模拟。] 以及需要进一步改进用于系统发育推断的多位点模型和计算工具。[贝叶斯因子;贝叶斯模型验证;聚结先验;一致的基因树;独立先验;后生动物;后验预测模拟。]
更新日期:2020-02-03
down
wechat
bug