当前位置: X-MOL 学术Virus Evol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Efficient inference, potential, and limitations of site-specific substitution models
Virus Evolution ( IF 5.3 ) Pub Date : 2020-07-01 , DOI: 10.1093/ve/veaa066
Vadim Puller 1, 2 , Pavel Sagulenko 3 , Richard A Neher 1, 2
Affiliation  

Abstract Natural selection imposes a complex filter on which variants persist in a population resulting in evolutionary patterns that vary greatly along the genome. Some sites evolve close to neutrally, while others are highly conserved, allow only specific states, or only change in concert with other sites. On one hand, such constraints on sequence evolution can be to infer biological function, one the other hand they need to be accounted for in phylogenetic reconstruction. Phylogenetic models often account for this complexity by partitioning sites into a small number of discrete classes with different rates and/or state preferences. Appropriate model complexity is typically determined by model selection procedures. Here, we present an efficient algorithm to estimate more complex models that allow for different preferences at every site and explore the accuracy at which such models can be estimated from simulated data. Our iterative approximate maximum likelihood scheme uses information in the data efficiently and accurately estimates site-specific preferences from large data sets with moderately diverged sequences and known topology. However, the joint estimation of site-specific rates, and site-specific preferences, and phylogenetic branch length can suffer from identifiability problems, while ignoring variation in preferences across sites results in branch length underestimates. Site-specific preferences estimated from large HIV pol alignments show qualitative concordance with intra-host estimates of fitness costs. Analysis of these substitution models suggests near saturation of divergence after a few hundred years. Such saturation can explain the inability to infer deep divergence times of HIV and SIVs using molecular clock approaches and time-dependent rate estimates.

中文翻译:

特定站点替代模型的有效推理、潜力和局限性

摘要 自然选择施加了一个复杂的过滤器,变异会在种群中持续存在,从而导致沿基因组变化很大的进化模式。一些站点进化接近中性,而其他站点则高度保守,仅允许特定状态,或仅与其他站点协同变化。一方面,对序列进化的这种限制可以用来推断生物功能,另一方面,它们需要在系统发育重建中加以考虑。系统发育模型通常通过将站点划分为少数具有不同速率和/或状态偏好的离散类来解释这种复杂性。适当的模型复杂性通常由模型选择程序确定。这里,我们提出了一种有效的算法来估计更复杂的模型,这些模型允许每个站点的不同偏好,并探索从模拟数据中估计这些模型的准确性。我们的迭代近似最大似然方案有效地使用数据中的信息并准确地估计来自具有适度发散序列和已知拓扑的大型数据集中的站点特定偏好。然而,位点特定率、位点特定偏好和系统发育分支长度的联合估计可能会遇到可识别性问题,而忽略跨站点偏好的变化会导致分支长度低估。从大型 HIV pol 比对估计的特定站点偏好显示与宿主内部对健康成本的估计的定性一致性。对这些替代模型的分析表明,几百年后分歧接近饱和。这种饱和可以解释无法使用分子钟方法和时间相关速率估计来推断 HIV 和 SIV 的深度分歧时间。
更新日期:2020-07-01
down
wechat
bug