当前位置: X-MOL 学术bioRxiv. Genet. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Maximum likelihood estimation of natural selection and allele age from time series data of allele frequencies
bioRxiv - Genetics Pub Date : 2020-07-28 , DOI: 10.1101/837310
Zhangyi He , Xiaoyang Dai , Mark Beaumont , Feng Yu

Temporally spaced genetic data allow for more accurate inference of population genetic parameters and hypothesis testing on the recent action of natural selection. In this work, we develop a novel likelihood-based method for jointly estimating selection coefficient and allele age from time series data of allele frequencies. Our approach is based on a hidden Markov model where the underlying process is a Wright-Fisher diffusion conditioned to survive until the time of the most recent sample. This formulation circumvents the assumption required in existing methods that the allele is created by mutation at a certain low frequency. We calculate the likelihood by numerically solving the resulting Kolmogorov backward equation backwards in time while re-weighting the solution with the emission probabilities of the observation at each sampling time point. This procedure reduces the two-dimensional numerical search for the maximum of the likelihood surface for both the selection coefficient and the allele age to a one-dimensional search over the selection coefficient only. We illustrate through extensive simulations that our method can produce accurate estimates of the selection coefficient and the allele age under both constant and non-constant demographic histories. We apply our approach to re-analyse ancient DNA data associated with horse base coat colours. We find that ignoring demographic histories or grouping raw samples can significantly bias the inference results.

中文翻译:

从等位基因频率的时间序列数据中自然选择和等位基因年龄的最大似然估计

暂时间隔的遗传数据可以更准确地推断出种群遗传参数,并就自然选择的最新作用进行假设检验。在这项工作中,我们开发了一种基于可能性的新方法,用于从等位基因频率的时间序列数据中共同估计选择系数和等位基因年龄。我们的方法基于一个隐式马尔可夫模型,其中潜在过程是赖特-费舍尔扩散,条件是生存直到最近的样本时间。该制剂绕开了现有方法中所要求的假设,即等位基因是通过某种低频突变产生的。我们通过在时间上向后数值求解所得的Kolmogorov后向方程,同时用每个采样时间点的观测发射概率对解决方案进行加权,来计算似然率。该过程将针对选择系数和等位基因年龄的似然面最大值的二维数值搜索减少到仅针对选择系数的一维搜索。通过广泛的模拟,我们证明了我们的方法可以在不变的和不变的人口统计学历史下产生选择系数和等位基因年龄的准确估计。我们运用我们的方法来重新分析与马匹底色有关的古代DNA数据。我们发现,忽略人口统计历史记录或对原始样本进行分组会极大地影响推断结果。通过广泛的模拟,我们证明了我们的方法可以在不变的和不变的人口统计学历史下产生选择系数和等位基因年龄的准确估计。我们运用我们的方法来重新分析与马匹底色有关的古代DNA数据。我们发现,忽略人口统计历史记录或对原始样本进行分组会极大地影响推断结果。通过广泛的模拟,我们证明了我们的方法可以在不变的和不变的人口统计学历史下产生选择系数和等位基因年龄的准确估计。我们运用我们的方法来重新分析与马匹底色有关的古代DNA数据。我们发现,忽略人口统计历史记录或对原始样本进行分组会极大地影响推断结果。
更新日期:2020-07-30
down
wechat
bug