当前位置: X-MOL 学术Front. Genet. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Assessment of Imputation Quality: Comparison of Phasing and Imputation Algorithms in Real Data
Frontiers in Genetics ( IF 3.7 ) Pub Date : 2021-09-22 , DOI: 10.3389/fgene.2021.724037
Katharina Stahl 1, 2 , Damian Gola 2 , Inke R König 2, 3
Affiliation  

Despite the widespread use of genotype imputation tools and the availability of different approaches, late developments of currently used programs have not been compared comprehensively. We therefore assessed the performance of 35 combinations of phasing and imputation programs, including versions of SHAPEIT, Eagle, Beagle, minimac, PBWT, and IMPUTE, for genetic imputation of completely missing SNPs with a HRC reference panel regarding quality and speed. We used a data set comprising 1,149 fully sequenced individuals from the German population, subsetting the SNPs to approximate the Illumina Infinium-Omni5 array. Five hundred fifty-three thousand two hundred and thirty-four SNPs across two selected chromosomes were utilized for comparison between imputed and sequenced genotypes. We found that all tested programs with the exception of PBWT impute genotypes with very high accuracy (mean error rate < 0.005). PBTW hardly ever imputes the less frequent allele correctly (mean concordance for genotypes including the minor allele <0.0002). For all programs, imputation accuracy drops for rare alleles with a frequency <0.05. Even though overall concordance is high, concordance drops with genotype probability, indicating that low genotype probabilities are rare. The mean concordance of SNPs with a genotype probability <95% drops below 0.9, at which point disregarding imputed genotypes might prove favorable. For fast and accurate imputation, a combination of Eagle2.4.1 using a reference panel for phasing and Beagle5.1 for imputation performs best. Replacing Beagle5.1 with minimac3, minimac4, Beagle4.1, or IMPUTE4 results in a small gain in accuracy at a high cost of speed.



中文翻译:

插补质量评估:真实数据中分阶段算法和插补算法的比较

尽管基因型插补工具的广泛使用和不同方法的可用性,但尚未对当前使用的程序的后期发展进行全面比较。因此,我们评估了 35 种定相和插补程序组合的性能,包括 SHAPEIT、Eagle、Beagle、minimac、PBWT 和 IMPUTE 版本,以使用 HRC 参考面板关于质量和速度对完全缺失的 SNP 进行遗传插补。我们使用了一个包含来自德国人群的 1,149 个完全测序个体的数据集,将 SNP 子集以近似于 Illumina Infinium-Omni5 阵列。两个选定染色体上的 553234 个 SNP 用于比较估算和测序的基因型。我们发现除 PBWT 之外的所有测试程序都以非常高的准确度(平均错误率 < 0.005)估算基因型。PBTW 几乎从不正确地估算频率较低的等位基因(包括次要等位基因 <0.0002 在内的基因型的平均一致性)。对于所有程序,频率 <0.05 的稀有等位基因的插补准确性下降。尽管整体一致性很高,但一致性随着基因型概率下降,表明低基因型概率很少见。基因型概率 <95% 的 SNP 的平均一致性下降到 0.9 以下,此时忽略推算的基因型可能证明是有利的。为了快速准确地进行插补,使用参考面板进行定相的 Eagle2.4.1 和用于插补的 Beagle5.1 的组合效果最佳。用 minimac3、minimac4、Beagle4.1 替换 Beagle5.1,

更新日期:2021-09-22
down
wechat
bug