当前位置: X-MOL 学术Ann. Appl. Stat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Sequential importance sampling for multiresolution Kingman–Tajima coalescent counting
Annals of Applied Statistics ( IF 1.8 ) Pub Date : 2020-06-29 , DOI: 10.1214/19-aoas1313
Lorenzo Cappello , Julia A. Palacios

Statistical inference of evolutionary parameters from molecular sequence data relies on coalescent models to account for the shared genealogical ancestry of the samples. However, inferential algorithms do not scale to available data sets. A strategy to improve computational efficiency is to rely on simpler coalescent and mutation models, resulting in smaller hidden state spaces. An estimate of the cardinality of the state space of genealogical trees at different resolutions is essential to decide the best modeling strategy for a given dataset. To our knowledge, there is neither an exact nor approximate method to determine these cardinalities. We propose a sequential importance sampling algorithm to estimate the cardinality of the sample space of genealogical trees under different coalescent resolutions. Our sampling scheme proceeds sequentially across the set of combinatorial constraints imposed by the data which, in this work, are completely linked sequences of DNA at a nonrecombining segment. We analyze the cardinality of different genealogical tree spaces on simulations to study the settings that favor coarser resolutions. We apply our method to estimate the cardinality of genealogical tree spaces from mtDNA data from the 1000 genomes and a sample from a Melanesian population at the $\beta $-globin locus.

中文翻译:

顺序重要性抽样用于多分辨率Kingman–Tajima合并计数

从分子序列数据对进化参数的统计推断依赖于聚结模型来解释样品的共同谱系。但是,推论算法无法扩展到可用数据集。一种提高计算效率的策略是依靠更简单的合并模型和变异模型,从而实现较小的隐藏状态空间。对家谱树的状态空间在不同分辨率下的基数进行估计对于决定给定数据集的最佳建模策略至关重要。据我们所知,既没有精确的方法也没有近似的方法来确定这些基数。我们提出了一种顺序重要性抽样算法,以估计不同合并分辨率下家谱树样本空间的基数。我们的采样方案在数据施加的一组组合约束条件下按顺序进行,在这项工作中,这些约束条件是非重组段上DNA的完全链接序列。我们在仿真中分析了不同族谱树空间的基数,以研究有利于较粗分辨率的设置。我们应用我们的方法,根据1000个基因组的mtDNA数据和来自美拉尼西亚种群的样本,从1000个基因组的mtDNA数据估计族谱树空间的基数。
更新日期:2020-06-29
down
wechat
bug