当前位置: X-MOL 学术Stat. Appl. Genet. Molecul. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Non-parametric estimation of population size changes from the site frequency spectrum
Statistical Applications in Genetics and Molecular Biology ( IF 0.8 ) Pub Date : 2018-06-10 , DOI: 10.1515/sagmb-2017-0061
Berit Lindum Waltoft 1, 2, 3 , Asger Hobolth 4
Affiliation  

Changes in population size is a useful quantity for understanding the evolutionary history of a species. Genetic variation within a species can be summarized by the site frequency spectrum (SFS). For a sample of size n, the SFS is a vector of length n − 1 where entry i is the number of sites where the mutant base appears i times and the ancestral base appears ni times. We present a new method, CubSFS, for estimating the changes in population size of a panmictic population from an observed SFS. First, we provide a straightforward proof for the expression of the expected site frequency spectrum depending only on the population size. Our derivation is based on an eigenvalue decomposition of the instantaneous coalescent rate matrix. Second, we solve the inverse problem of determining the changes in population size from an observed SFS. Our solution is based on a cubic spline for the population size. The cubic spline is determined by minimizing the weighted average of two terms, namely (i) the goodness of fit to the observed SFS, and (ii) a penalty term based on the smoothness of the changes. The weight is determined by cross-validation. The new method is validated on simulated demographic histories and applied on unfolded and folded SFS from 26 different human populations from the 1000 Genomes Project.

中文翻译:

从站点频谱对人口规模变化的非参数估计

种群规模的变化是了解物种进化历史的有用量。一个物种内的遗传变异可以通过位点频谱 (SFS) 来概括。对于大小样本n, SFS 是一个长度向量n− 1 其中条目一世是突变碱基出现的位点数一世时代和祖基出现n-一世次。我们提出了一种新方法 CubSFS,用于从观察到的 SFS 估计泛泛人群的种群规模变化。首先,我们为仅取决于人口规模的预期站点频谱的表达提供了一个简单的证明。我们的推导基于瞬时聚结率矩阵的特征值分解。其次,我们解决了从观察到的 SFS 确定种群大小变化的逆问题。我们的解决方案基于人口规模的三次样条。三次样条是通过最小化两项的加权平均值来确定的,即 (i) 与观察到的 SFS 的拟合优度,以及 (ii) 基于变化平滑度的惩罚项。权重由交叉验证确定。
更新日期:2018-06-10
down
wechat
bug