当前位置: X-MOL 学术PLOS Genet. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Genotype imputation using the Positional Burrows Wheeler Transform
PLOS Genetics ( IF 4.5 ) Pub Date : 2020-11-16 , DOI: 10.1371/journal.pgen.1009049
Simone Rubinacci 1, 2 , Olivier Delaneau 1, 2 , Jonathan Marchini 3
Affiliation  

Genotype imputation is the process of predicting unobserved genotypes in a sample of individuals using a reference panel of haplotypes. In the last 10 years reference panels have increased in size by more than 100 fold. Increasing reference panel size improves accuracy of markers with low minor allele frequencies but poses ever increasing computational challenges for imputation methods. Here we present IMPUTE5, a genotype imputation method that can scale to reference panels with millions of samples. This method continues to refine the observation made in the IMPUTE2 method, that accuracy is optimized via use of a custom subset of haplotypes when imputing each individual. It achieves fast, accurate, and memory-efficient imputation by selecting haplotypes using the Positional Burrows Wheeler Transform (PBWT). By using the PBWT data structure at genotyped markers, IMPUTE5 identifies locally best matching haplotypes and long identical by state segments. The method then uses the selected haplotypes as conditioning states within the IMPUTE model. Using the HRC reference panel, which has ∼65,000 haplotypes, we show that IMPUTE5 is up to 30x faster than MINIMAC4 and up to 3x faster than BEAGLE5.1, and uses less memory than both these methods. Using simulated reference panels we show that IMPUTE5 scales sub-linearly with reference panel size. For example, keeping the number of imputed markers constant, increasing the reference panel size from 10,000 to 1 million haplotypes requires less than twice the computation time. As the reference panel increases in size IMPUTE5 is able to utilize a smaller number of reference haplotypes, thus reducing computational cost.



中文翻译:

使用 Positional Burrows Wheeler 变换进行基因型插补

基因型插补是使用单倍型参考面板预测个体样本中未观察到的基因型的过程。在过去的 10 年中,参考面板的尺寸增加了 100 多倍。增加参考面板大小提高了具有低次要等位基因频率的标记的准确性,但对插补方法提出了越来越多的计算挑战。在这里,我们展示了 IMPUTE5,这是一种基因型插补方法,可以扩展到具有数百万个样本的参考面板。该方法继续改进在 IMPUTE2 方法中进行的观察,通过在估算每个个体时使用自定义的单倍型子集来优化准确性。它通过使用 Positional Burrows Wheeler Transform (PBWT) 选择单倍型,实现了快速、准确且节省内存的插补。通过在基因分型标记处使用 PBWT 数据结构,IMPUTE5 可识别本地最佳匹配的单倍型和状态段的长相同。然后该方法使用选定的单倍型作为 IMPUTE 模型内的调节状态。使用具有~65,000 个单倍型的 HRC 参考面板,我们表明 IMPUTE5 比 MINIMAC4 快 30 倍,比 BEAGLE5.1 快 3 倍,并且使用的内存比这两种方法都少。使用模拟参考面板,我们表明 IMPUTE5 与参考面板尺寸呈亚线性比例。例如,保持插补标记的数量不变,将参考面板大小从 10,000 个增加到 100 万个单倍型所需的计算时间不到两倍。随着参考面板大小的增加,IMPUTE5 能够利用较少数量的参考单倍型,

更新日期:2020-11-17
down
wechat
bug