当前位置: X-MOL 学术bioRxiv. Genet. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Fast Data-Driven Method for Genotype Imputation, Phasing, and Local Ancestry Inference: MendelImpute.jl
bioRxiv - Genetics Pub Date : 2021-02-27 , DOI: 10.1101/2020.10.24.353755
Benjamin B. Chu , Eric M. Sobel , Rory Wasiolek , Janet S. Sinsheimer , Hua Zhou , Kenneth Lange

Current methods for genotype imputation and phasing exploit the sheer volume of data in haplotype reference panels and rely on hidden Markov models. Existing programs all have essentially the same imputation accuracy, are computationally intensive, and generally require pre-phasing the typed markers. We propose a novel data-mining method for genotype imputation and phasing that substitutes highly efficient linear algebra routines for hidden Markov model calculations. This strategy, embodied in our Julia program MendelImpute.jl, avoids explicit assumptions about recombination and population structure while delivering similar prediction accuracy, better memory usage, and an order of magnitude or better run-times compared to the fastest competing method. MendelImpute operates on both dosage data and unphased genotype data and simultaneously imputes missing genotypes and phase at both the typed and untyped SNPs. Finally, MendelImpute naturally extends to global and local ancestry estimation and lends itself to new strategies for data compression and hence faster data transport and sharing.

中文翻译:

基因型插补,定相和局部祖先推断的快速数据驱动方法:MendelImpute.jl

基因型插补和定相的当前方法利用了单倍型参考面板中的庞大数据量,并依赖于隐马尔可夫模型。现有的程序都具有基本相同的插补精度,计算量大,并且通常需要预先定型键入的标记。我们提出了一种用于基因型插补和定相的新数据挖掘方法,该方法将高效的线性代数例程替换为隐马尔可夫模型计算。该策略体现在我们的Julia程序MendelImpute.jl中,它避免了有关重组和总体结构的明确假设,同时提供了与最快竞争方法相比相似的预测精度,更好的内存使用率以及一个数量级或更好的运行时间。MendelImpute可以处理剂量数据和非分期基因型数据,并且可以同时在分型和非分型SNP处估算缺失的基因型和相位。最后,MendelImpute自然地扩展到了全局和本地祖先估计,并使其适用于数据压缩的新策略,从而加快了数据传输和共享的速度。
更新日期:2021-02-28
down
wechat
bug