当前位置: X-MOL 学术bioRxiv. Genet. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Why are rare variants hard to impute? Coalescent models reveal theoretical limits in existing algorithms.
bioRxiv - Genetics Pub Date : 2021-01-14 , DOI: 10.1101/2020.08.10.245043
Yichen Si , Sebastian Zöllner

Genotype imputation is an indispensable step in human genetic studies. Large reference panels with deeply sequenced genomes now allow interrogating variants with minor allele frequency <1% without sequencing. While it is critical to consider limits of this approach, imputation methods for rare variants have only done so empirically; the theoretical basis of their imputation accuracy has not been explored. To provide theoretical consideration of imputation accuracy under the current imputation framework, we develop a coalescent model of imputing rare variants, leveraging the joint genealogy of the sample to be imputed and reference individuals. We show that broadly used imputation algorithms includes model miss-specifications about this joint genealogy that limit the ability to correctly impute rare variants. We develop closed-form solutions for the probability distribution of this joint genealogy and quantify the inevitable error rate resulting from the model miss-specification across a range of allele frequencies and reference sample sizes. We show that the probability of a falsely imputed minor allele decreases with reference sample size, but the proportion of falsely imputed minor alleles mostly depends on the allele count in the reference sample. We summarize the impact of this error on genotype imputation on association tests by calculating the r2 between imputed and true genotype and show that even when modeling other sources of error, the impact of the model miss-specification have a significant impact on the r2 of rare variants. These results provide a framework for developing new imputation algorithms and for interpreting rare variant association analyses.

中文翻译:

为什么难于估算难得的变体?合并模型揭示了现有算法中的理论限制。

基因型估算是人类遗传学研究中必不可少的步骤。现在,具有深度测序基因组的大型参考面板可在不进行测序的情况下,查询等位基因频率<1%的变体。尽管考虑这种方法的局限性至关重要,但稀有变异的插补方法仅凭经验进行;尚未探究其插补精度的理论基础。为了在当前插补框架下提供理论上的插补精度考虑,我们利用被插补样品和参考个体的联合谱系,开发了一个插补稀有变异的合并模型。我们表明,广泛使用的插补算法包括有关此联合家谱的模型缺失规范,这限制了正确插补稀有变体的能力。我们为这种联合家谱的概率分布开发了封闭形式的解决方案,并量化了在一系列等位基因频率和参考样本量范围内因模型缺失指定而导致的不可避免的错误率。我们显示,错误估算的次要等位基因的概率随参考样本的大小而降低,但是错误估算的次要等位基因的比例主要取决于参考样本中的等位基因计数。我们通过计算r来总结此错误对基因型插补对关联检验的影响 但是错误估算的次要等位基因的比例主要取决于参考样品中的等位基因计数。我们通过计算r来总结此错误对基因型插补对关联检验的影响 但是错误估算的次要等位基因的比例主要取决于参考样品中的等位基因计数。我们通过计算r来总结此错误对基因型插补对关联检验的影响假设基因型和真实基因型之间的差异2表示,即使在对其他误差源建模时,模型缺失指定的影响也对稀有变体的r 2产生重大影响。这些结果提供了开发新的插补算法和解释稀有变异关联分析的框架。
更新日期:2021-01-14
down
wechat
bug