当前位置: X-MOL 学术Ann. Appl. Stat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Incorporating compositional heterogeneity into Lie Markov models for phylogenetic inference
Annals of Applied Statistics ( IF 1.8 ) Pub Date : 2020-12-19 , DOI: 10.1214/20-aoas1369
Naomi E. Hannaford , Sarah E. Heaps , Tom M. W. Nye , Tom A. Williams , T. Martin Embley

Phylogenetics uses alignments of molecular sequence data to learn about evolutionary trees. Substitutions in sequences are modelled through a continuous-time Markov process, characterised by an instantaneous rate matrix, which standard models assume is time-reversible and stationary. These assumptions are biologically questionable and induce a likelihood function which is invariant to a tree’s root position. This hampers inference because a tree’s biological interpretation depends critically on where it is rooted. Relaxing both assumptions, we introduce a model whose likelihood can distinguish between rooted trees. The model is nonstationary with step changes in the instantaneous rate matrix at each speciation event. Exploiting recent theoretical work, each rate matrix belongs to a nonreversible family of Lie Markov models. These models are closed under matrix multiplication, so our extension offers the conceptually appealing property that a tree and all its subtrees could have arisen from the same family of nonstationary models. We adopt a Bayesian approach, describe an MCMC algorithm for posterior inference and provide software. The biological insight that our model can provide is illustrated through an analysis in which nonreversible but stationary and nonstationary but reversible models cannot identify a plausible root.

中文翻译:

将成分异质性纳入Lie Markov模型以进行系统发育推断

系统发育学使用分子序列数据的比对来了解进化树。通过连续时间马尔可夫过程对序列中的取代进行建模,该过程以瞬时速率矩阵为特征,标准模型假定该模型是时间可逆的并且是固定的。这些假设在生物学上是有问题的,并且会诱导出对于树的根位置不变的似然函数。这阻碍了推理,因为树的生物学解释主要取决于树的生根位置。放宽这两个假设,我们引入了一个模型,该模型可以区分有根树。该模型是非平稳的,在每个物种形成事件中瞬时速率矩阵都有阶跃变化。利用最新的理论工作,每个速率矩阵都属于一个不可逆的Lie Markov模型家族。这些模型在矩阵乘法下是封闭的,因此我们的扩展提供了概念上吸引人的属性,即一棵树及其所有子树可能来自同一个非平稳模型族。我们采用贝叶斯方法,描述用于后验推理的MCMC算法并提供软件。我们的模型可以提供的生物学洞察力通过分析得以说明,在该分析中,不可逆但固定的和非平稳但可逆的模型无法确定合理的根源。
更新日期:2020-12-20
down
wechat
bug