当前位置: X-MOL 学术Hum. Mol. Genet. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
False positive findings during genome-wide association studies with imputation: influence of allele frequency and imputation accuracy
Human Molecular Genetics ( IF 3.1 ) Pub Date : 2021-08-05 , DOI: 10.1093/hmg/ddab203
Zhihui Zhang 1, 2 , Xiangjun Xiao 1 , Wen Zhou 1 , Dakai Zhu 1 , Christopher I Amos 1, 2
Affiliation  

Genotype imputation is widely used in genetic studies to boost the power of GWAS, to combine multiple studies for meta-analysis and to perform fine mapping. With advances of imputation tools and large reference panels, genotype imputation has become mature and accurate. However, the uncertain nature of imputed genotypes can cause bias in the downstream analysis. Many studies have compared the performance of popular imputation approaches, but few investigated bias characteristics of downstream association analyses. Herein, we showed that the imputation accuracy is diminished if the real genotypes contain minor alleles. Although these genotypes are less common, which is particularly true for loci with low minor allele frequency, a large discordance between imputed and observed genotypes significantly inflated the association results, especially in data with a large portion of uncertain SNPs. The significant discordance of P-values happened as the P-value approached 0 or the imputation quality was poor. Although elimination of poorly imputed SNPs can remove false positive (FP) SNPs, it sacrificed, sometimes, more than 80% true positive (TP) SNPs. For top ranked SNPs, removing variants with moderate imputation quality cannot reduce the proportion of FP SNPs, and increasing sample size in reference panels did not greatly benefit the results as well. Additionally, samples with a balanced ratio between cases and controls can dramatically improve the number of TP SNPs observed in the imputation based GWAS. These results raise concerns about results from analysis of association studies when rare variants are studied, particularly when case–control studies are unbalanced.

中文翻译:

全基因组关联研究中的假阳性结果与插补:等位基因频率和插补准确性的影响

基因型插补被广泛用于遗传研究,以提高 GWAS 的能力,将多项研究结合起来进行荟萃分析并进行精细定位。随着插补工具和大型参考panel的进步,基因型插补已经变得成熟和准确。然而,估算基因型的不确定性会导致下游分析出现偏差。许多研究比较了流行的插补方法的性能,但很少研究下游关联分析的偏差特征。在这里,我们表明,如果真正的基因型包含次要等位基因,则插补准确性会降低。尽管这些基因型不太常见,对于具有低次要等位基因频率的基因座尤其如此,但估算和观察到的基因型之间的巨大差异显着夸大了关联结果,特别是在具有大部分不确定 SNP 的数据中。当 P 值接近 0 或插补质量较差时,会出现 P 值的显着不一致。尽管消除不良估算的 SNP 可以消除假阳性 (FP) SNP,但有时会牺牲 80% 以上的真阳性 (TP) SNP。对于排名靠前的 SNP,去除具有中等插补质量的变体并不能降低 FP SNP 的比例,并且增加参考面板中的样本量也不会对结果有很大好处。此外,病例和对照之间比例平衡的样本可以显着提高在基于插补的 GWAS 中观察到的 TP SNP 的数量。当研究罕见变异时,特别是当病例对照研究不平衡时,这些结果引起了对关联研究分析结果的担忧。
更新日期:2021-08-05
down
wechat
bug