Efficient Computation of the Joint Sample Frequency Spectra for Multiple Populations,Journal of Computational and Graphical Statistics

当前位置： X-MOL 学术 › J. Comput. Graph. Stat. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Efficient Computation of the Joint Sample Frequency Spectra for Multiple Populations
Journal of Computational and Graphical Statistics ( IF 2.4 ) Pub Date : 2017-01-02 , DOI: 10.1080/10618600.2016.1159212
John A Kamm ₁ , Jonathan Terhorst ₁ , Yun S Song ₂

Affiliation

ABSTRACT A wide range of studies in population genetics have employed the sample frequency spectrum (SFS), a summary statistic which describes the distribution of mutant alleles at a polymorphic site in a sample of DNA sequences and provides a highly efficient dimensional reduction of large-scale population genomic variation data. Recently, there has been much interest in analyzing the joint SFS data from multiple populations to infer parameters of complex demographic histories, including variable population sizes, population split times, migration rates, admixture proportions, and so on. SFS-based inference methods require accurate computation of the expected SFS under a given demographic model. Although much methodological progress has been made, existing methods suffer from numerical instability and high computational complexity when multiple populations are involved and the sample size is large. In this article, we present new analytic formulas and algorithms that enable accurate, efficient computation of the expected joint SFS for thousands of individuals sampled from hundreds of populations related by a complex demographic model with arbitrary population size histories (including piecewise-exponential growth). Our results are implemented in a new software package called momi (MOran Models for Inference). Through an empirical study, we demonstrate our improvements to numerical stability and computational complexity.

中文翻译：

多群体联合采样频谱的高效计算

摘要群体遗传学的广泛研究采用了样本频谱 (SFS)，这是一种汇总统计量，它描述了 DNA 序列样本中多态位点突变等位基因的分布，并提供了大规模的高效降维。群体基因组变异数据。最近，人们对分析来自多个种群的联合 SFS 数据以推断复杂人口历史的参数产生了很大的兴趣，包括可变的种群规模、种群分裂时间、迁移率、混合比例等。基于 SFS 的推理方法需要在给定的人口统计模型下准确计算预期的 SFS。虽然在方法论上取得了很大进展，当涉及多个种群且样本量很大时，现有方法存在数值不稳定性和高计算复杂度的问题。在本文中，我们提出了新的分析公式和算法，可以准确、有效地计算从数百个人口中采样的数千个个体的预期联合 SFS，这些个体通过具有任意人口规模历史（包括分段指数增长）的复杂人口模型相关。我们的结果在一个名为 momi（用于推理的 MOran 模型）的新软件包中实现。通过实证研究，我们展示了我们对数值稳定性和计算复杂性的改进。有效计算从具有任意人口规模历史（包括分段指数增长）的复杂人口模型相关的数百个人口中采样的数千个个体的预期联合 SFS。我们的结果在一个名为 momi（用于推理的 MOran 模型）的新软件包中实现。通过实证研究，我们展示了我们对数值稳定性和计算复杂性的改进。有效计算从具有任意人口规模历史（包括分段指数增长）的复杂人口模型相关的数百个人口中采样的数千个个体的预期联合 SFS。我们的结果在一个名为 momi（用于推理的 MOran 模型）的新软件包中实现。通过实证研究，我们展示了我们对数值稳定性和计算复杂性的改进。

更新日期：2017-01-02

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>