当前位置: X-MOL 学术bioRxiv. Genet. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Fast and accurate approximation of the joint site frequency spectrum of multiple populations
bioRxiv - Genetics Pub Date : 2020-05-28 , DOI: 10.1101/2020.05.01.073213
Ethan M. Jewett

The site frequency spectrum (SFS) is a statistic that summarizes the distribution of derived allele frequencies in a sample of DNA sequences. The SFS provides useful information about genetic variation within and among populations and it can used to make population genetic inferences. Methods for computing the SFS based on the diffusion approximation are computationally efficient when computing all terms of the SFS simultaneously and they can handle complicated demographic scenarios. However, in practice it is sometimes only necessary to compute a subset of terms of the SFS, in which case coalescent-based methods can achieve greater computational efficiency. Here, we present simple and accurate approximate formulas for the expected joint SFS for multiple populations connected by migration. Compared with existing exact approaches, our approximate formulas greatly reduce the complexity of computing each entry of the SFS and have simple forms. The computational complexity of our method depends on the index of the entry to be computed, rather than on the sample size, and the accuracy of our approximation improves as the sample size increases.

中文翻译:

快速准确地估计多个人群的联合站点频谱

现场频谱(SFS)是一种统计数据,它概述了DNA序列样本中派生的等位基因频率的分布。SFS提供了有关种群内和种群间遗传变异的有用信息,它可用于进行种群遗传推断。在同时计算SFS的所有项时,基于扩散近似的SFS计算方法在计算效率上很高,并且可以处理复杂的人口统计方案。但是,实际上,有时仅需要计算SFS项的子集,在这种情况下,基于合并的方法可以实现更高的计算效率。在这里,我们为通过迁移连接的多个人口的预期联合SFS提供了简单而准确的近似公式。与现有的精确方法相比,我们的近似公式大大降低了计算SFS各个条目的复杂性,并且具有简单的形式。我们方法的计算复杂度取决于要计算的条目的索引,而不是样本量,并且随着样本量的增加,近似的准确性也会提高。
更新日期:2020-05-28
down
wechat
bug