当前位置: X-MOL 学术Syst. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Concordance-Based Approaches for the Inference of Relationships and Molecular Rates with Phylogenomic Data Sets
Systematic Biology ( IF 6.1 ) Pub Date : 2021-07-01 , DOI: 10.1093/sysbio/syab052
Joseph F Walker 1, 2 , Stephen A Smith 3 , Richard G J Hodel 4 , Edwige Moyroud 1, 5
Affiliation  

Gene tree conflict is common and finding methods to analyze and alleviate the negative effects that conflict has on species tree analysis is a crucial part of phylogenomics. This study aims to expand the discussion of inferring species trees and molecular branch lengths when conflict is present. Conflict is typically examined in two ways: inferring its prevalence and inferring the influence of the individual genes (how strongly one gene supports any given topology compared to an alternative topology). Here, we examine a procedure for incorporating both conflict and the influence of genes in order to infer evolutionary relationships. All supported relationships in the gene trees are analyzed and the likelihood of the genes constrained to these relationships is summed to provide a likelihood for the relationship. Consensus tree assembly is conducted based on the sum of likelihoods for a given relationship and choosing relationships based on the most likely relationship assuming it does not conflict with a relationship that has a higher likelihood score. If it is not possible for all most likely relationships to be combined into a single bifurcating tree then multiple trees are produced and a consensus tree with a polytomy is created. This procedure allows for more influential genes to have a greater influence on an inferred relationship, does not assume conflict has arisen from any one source and does not force the data set to produce a single bifurcating tree. Using this approach, on three empirical data sets, we examine and discuss the relationship between influence and prevalence of gene tree conflict. We find that in one of the data sets, assembling a bifurcating consensus tree solely composed of the most likely relationships is impossible. To account for conflict in molecular rate analysis we also introduce a concordance-based approach to the summary and estimation of branch lengths suitable for downstream comparative analyses. We demonstrate through simulation that even under high levels of stochastic conflict, the mean and median of the concordant rates recapitulate the true molecular rate better than using a supermatrix approach. Using a large phylogenomic data set, we examine rate heterogeneity across concordant genes with a focus on the branch subtending crown angiosperms. Notably, we find highly variable rates of evolution along the branch subtending crown angiosperms. The approaches outlined here have several limitations, but they also represent some alternative methods for harnessing the complexity of phylogenomic data sets and enrich our inferences of both species relationships and evolutionary processes.[Branch length estimation; consensus tree; gene tree conflict; gene tree filtering; phylogenetics; phylogenomics.]

中文翻译:

使用系统基因组数据集推断关系和分子速率的基于一致性的方法

基因树冲突很常见,寻找分析和减轻冲突对物种树分析的负面影响的方法是系统基因组学的重要组成部分。本研究旨在扩大在存在冲突时推断物种树和分子分支长度的讨论。通常以两种方式检查冲突:推断其普遍性和推断单个基因的影响(与替代拓扑相比,一个基因支持任何给定拓扑的强度)。在这里,我们研究了一个结合冲突和基因影响的程序,以推断进化关系。分析基因树中所有受支持的关系,并对受这些关系约束的基因的可能性求和,以提供关系的可能性。共识树组装是基于给定关系的可能性总和进行的,并根据最可能的关系选择关系,假设它与具有更高可能性分数的关系不冲突。如果不可能将所有最可能的关系组合成单个分叉树,则生成多棵树并创建具有多分法的共识树。此过程允许更有影响力的基因对推断的关系产生更大的影响,不假设冲突来自任何一个来源,也不强制数据集生成单个分叉树。使用这种方法,在三个经验数据集上,我们检查和讨论了基因树冲突的影响和流行之间的关系。我们发现在其中一个数据集中,组装仅由最可能的关系组成的分叉共识树是不可能的。为了解决分子速率分析中的冲突,我们还引入了一种基于一致性的方法来总结和估计适合下游比较分析的分支长度。我们通过模拟证明,即使在高水平的随机冲突下,一致速率的平均值和中位数也比使用超矩阵方法更好地概括了真实的分子速率。使用大型系统基因组数据集,我们检查了一致基因之间的速率异质性,重点是对着冠被子植物的分支。值得注意的是,我们发现沿对冠被子植物分支的进化率变化很大。这里概述的方法有几个限制,但它们也代表了一些利用系统发育数据集复杂性的替代方法,并丰富了我们对物种关系和进化过程的推断。[分支长度估计;共识树;基因树冲突;基因树过滤;系统发育学;系统基因组学。]
更新日期:2021-07-01
down
wechat
bug