当前位置: X-MOL 学术Mol. Ecol. Resour. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Limits and convergence properties of the sequentially Markovian coalescent
Molecular Ecology Resources ( IF 5.5 ) Pub Date : 2021-05-12 , DOI: 10.1111/1755-0998.13416
Thibaut Paul Patrick Sellinger 1 , Diala Abu-Awad 1 , Aurélien Tellier 1
Affiliation  

Several methods based on the sequentially Markovian coalescent (SMC) make use of full genome sequence data from samples to infer population demographic history including past changes in population size, admixture, migration events and population structure. More recently, the original theoretical framework has been extended to allow the simultaneous estimation of population size changes along with other life history traits such as selfing or seed banking. The latter developments enhance the applicability of SMC methods to nonmodel species. Although convergence proofs have been given using simulated data in a few specific cases, an in-depth investigation of the limitations of SMC methods is lacking. In order to explore such limits, we first develop a tool inferring the best case convergence of SMC methods assuming the true underlying coalescent genealogies are known. This tool can be used to quantify the amount and type of information that can be confidently retrieved from given data sets prior to the analysis of the real data. Second, we assess the inference accuracy when the assumptions of SMC approaches are violated due to departures from the model, namely the presence of transposable elements, variable recombination and mutation rates along the sequence, and SNP calling errors. Third, we deliver a new interpretation of SMC methods by highlighting the importance of the transition matrix, which we argue can be used as a set of summary statistics in other statistical inference methods, uncoupling the SMC from hidden Markov models (HMMs). We finally offer recommendations to better apply SMC methods and build adequate data sets under budget constraints.

中文翻译:

顺序马尔可夫聚结的极限和收敛性质

几种基于顺序马尔可夫合并 (SMC) 的方法利用来自样本的全基因组序列数据来推断种群人口统计历史,包括种群规模、混合、迁移事件和种群结构的过去变化。最近,原始理论框架已扩展到允许同时估计种群规模变化以及其他生活史特征(如自交或种子库)。后者的发展增强了 SMC 方法对非模型物种的适用性。尽管在一些特定情况下使用模拟数据给出了收敛证明,但缺乏对 SMC 方法局限性的深入研究。为了探索这些极限,我们首先开发了一个工具来推断 SMC 方法的最佳情况收敛,假设真正的潜在合并谱系是已知的。该工具可用于量化信息的数量和类型,这些信息可以在分析真实数据之前从给定的数据集中可靠地检索到。其次,当由于偏离模型而违反 SMC 方法的假设时,我们评估推理准确性,即存在转座元件、沿序列的可变重组和突变率,以及 SNP 调用错误。第三,我们通过强调转移矩阵的重要性来提供对 SMC 方法的新解释,我们认为它可以用作其他统计推理方法中的一组汇总统计,将 SMC 与隐藏马尔可夫模型 (HMM) 解耦。
更新日期:2021-05-12
down
wechat
bug