当前位置: X-MOL 学术Mol. Biol. Evol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Scalable empirical mixture models that account for across-site compositional heterogeneity.
Molecular Biology and Evolution ( IF 11.0 ) Pub Date : 2020-09-08 , DOI: 10.1093/molbev/msaa145
Dominik Schrempf 1 , Nicolas Lartillot 2 , Gergely Szöllősi 1, 3, 4
Affiliation  

Biochemical demands constrain the range of amino acids acceptable at specific sites resulting in across-site compositional heterogeneity of the amino acid replacement process. Phylogenetic models that disregard this heterogeneity are prone to systematic errors, which can lead to severe long-branch attraction artifacts. State-of-the-art models accounting for across-site compositional heterogeneity include the CAT model, which is computationally expensive, and empirical distribution mixture models estimated via maximum likelihood (C10–C60 models). Here, we present a new, scalable method EDCluster for finding empirical distribution mixture models involving a simple cluster analysis. The cluster analysis utilizes specific coordinate transformations which allow the detection of specialized amino acid distributions either from curated databases or from the alignment at hand. We apply EDCluster to the HOGENOM and HSSP databases in order to provide universal distribution mixture (UDM) models comprising up to 4,096 components. Detailed analyses of the UDM models demonstrate the removal of various long-branch attraction artifacts and improved performance compared with the C10–C60 models. Ready-to-use implementations of the UDM models are provided for three established software packages (IQ-TREE, Phylobayes, and RevBayes).

中文翻译:

可扩展的经验混合模型,说明了跨站点组成的异质性。

生化需求限制了特定位点可接受的氨基酸范围,导致氨基酸替换过程的跨位点组成异质性。忽略这种异质性的系统发生模型容易出现系统错误,这可能导致严重的长分支吸引伪像。跨站点组成异质性的最新模型包括CAT模型(该模型计算量大)和通过最大似然估计的经验分布混合模型(C10–C60模型)。在这里,我们提出了一种新的可扩展方法EDCluster,用于查找涉及简单聚类分析的经验分布混合模型。聚类分析利用特定的坐标转换,该坐标转换允许从策划的数据库或手边的比对中检测特定的氨基酸分布。我们将EDCluster应用于HOGENOM和HSSP数据库,以便提供包含多达4,096个组件的通用分布混合(UDM)模型。对UDM模型的详细分析表明,与C10–C60模型相比,消除了各种长分支吸引伪像,并提高了性能。为三个已建立的软件包(IQ-TREE,Phylobayes和RevBayes)提供了UDM模型的即用型实现。对UDM模型的详细分析表明,与C10–C60模型相比,消除了各种长分支吸引伪像,并提高了性能。为三个已建立的软件包(IQ-TREE,Phylobayes和RevBayes)提供了UDM模型的即用型实现。对UDM模型的详细分析表明,与C10–C60模型相比,消除了各种长分支吸引伪像,并提高了性能。为三个已建立的软件包(IQ-TREE,Phylobayes和RevBayes)提供了UDM模型的即用型实现。
更新日期:2020-09-08
down
wechat
bug