当前位置: X-MOL 学术Hum. Hered. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Comprehensive Assessment of Genotype Imputation Performance.
Human Heredity ( IF 1.8 ) Pub Date : 2019-01-23 , DOI: 10.1159/000489758
Shuo Shi 1, 2, 3 , Na Yuan 2 , Ming Yang 4 , Zhenglin Du 2 , Jinyue Wang 1, 2, 3 , Xin Sheng 1, 2, 3 , Jiayan Wu 1 , Jingfa Xiao 5, 6, 7
Affiliation  

Genotype imputation is a process of estimating missing ge-notypes from the haplotype or genotype reference panel. It can effectively boost the power of detecting single nucleotide polymorphisms (SNPs) in genome-wide association studies, integrate multi-studies for meta-analysis, and be applied in fine-mapping studies. The performance of genotype imputation is affected by many factors, including software, reference selection, sample size, and SNP density/sequencing coverage. A systematical evaluation of the imputation performance of current popular software will benefit future studies. Here, we evaluate imputation performances of Beagle4.1, IMPUTE2, MACH+Minimac3, and SHAPEIT2+ IM-PUTE2 using test samples of East Asian ancestry and references of the 1000 Genomes Project. The result indicated the accuracy of IMPUTE2 (99.18%) is slightly higher than that of the others (Beagle4.1: 98.94%, MACH+Minimac3: 98.51%, and SHAPEIT2+IMPUTE2: 99.08%). To achieve good and stable imputation quality, the minimum requirement of SNP density needs to be > 200/Mb. The imputation accuracies of IMPUTE2 and Beagle4.1 were under the minor influence of the study sample size. The contribution extent of reference to genotype imputation performance relied on software selection. We assessed the imputation performance on SNPs generated by next-generation whole genome sequencing and found that SNP sets detected by sequencing with 15× depth could be mostly got by imputing from the haplotype reference panel of the 1000 Genomes Project based on SNP data detected by sequencing with 4× depth. All of the imputation software had a weaker performance in low minor allele frequency SNP regions because of the bias of reference or software. In the future, more comprehensive reference panels or new algorithm developments may rise up to this challenge.

中文翻译:

基因型插补性能的综合评估。

基因型推算是从单倍型或基因型参考面板中估计缺失的基因型的过程。它可以有效地提高在全基因组关联研究中检测单核苷酸多态性(SNP)的能力,整合用于荟萃分析的多种研究,并可以应用于精细映射研究。基因型插补的性能受许多因素影响,包括软件,参考选择,样本大小和SNP密度/序列覆盖率。对当前流行软件的插补性能进行系统的评估将有益于未来的研究。在这里,我们使用东亚血统的测试样本和1000个基因组计划的参考来评估Beagle4.1,IMPUTE2,MACH + Minimac3和SHAPEIT2 + IM-PUTE2的插补性能。结果表明IMPUTE2(99。18%)略高于其他(Beagle4.1:98.94%,MACH + Minimac3:98.51%和SHAPEIT2 + IMPUTE2:99.08%)。为了获得良好且稳定的插补质量,SNP密度的最低要求必须大于200 / Mb。IMPUTE2和Beagle4.1的插补精度在研究样本量的较小影响下。参考基因型插补性能的贡献程度取决于软件选择。我们评估了由下一代全基因组测序产生的SNP的推算性能,发现通过基于测序检测到的SNP数据,从1000个基因组计划的单倍型参考面板推算出,通过15倍深度测序检测到的SNP集大部分可以得到。深度为4倍。由于参考或软件的偏倚,所有插补软件在较低的次要等位基因频率SNP区域均具有较弱的性能。将来,更全面的参考面板或新的算法开发可能会迎接这一挑战。
更新日期:2019-11-01
down
wechat
bug