当前位置: X-MOL 学术Mol. Biol. Evol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Fast and robust identity-by-descent inference with the templated positional Burrows-Wheeler transform
Molecular Biology and Evolution ( IF 10.7 ) Pub Date : 2020-12-23 , DOI: 10.1093/molbev/msaa328
William A Freyman 1 , Kimberly F McManus 1 , Suyash S Shringarpure 1 , Ethan M Jewett 1 , Katarzyna Bryc 1 , , Adam Auton 1
Affiliation  

Estimating the genomic location and length of identical-by-descent (IBD) segments among individuals is a crucial step in many genetic analyses. However, the exponential growth in the size of biobank and direct-to-consumer (DTC) genetic data sets makes accurate IBD inference a significant computational challenge. Here we present the templated positional Burrows-Wheeler transform (TPBWT) to make fast IBD estimates robust to genotype and phasing errors. Using haplotype data simulated over pedigrees with realistic genotyping and phasing errors we show that the TPBWT outperforms other state-of-the-art IBD inference algorithms in terms of speed and accuracy. For each phase-aware method, we explore the false positive and false negative rates of inferring IBD by segment length and characterize the types of error commonly found. Our results highlight the fragility of most phased IBD inference methods; the accuracy of IBD estimates can be highly sensitive to the quality of haplotype phasing. Additionally we compare the performance of the TPBWT against a widely used phase-free IBD inference approach that is robust to phasing errors. We introduce both in-sample and out-of-sample TPBWT-based IBD inference algorithms and demonstrate their computational efficiency on massive-scale datasets with millions of samples. Furthermore we describe the binary file format for TPBWT-compressed haplotypes that results in fast and efficient out-of-sample IBD computes against very large cohort panels. Finally, we demonstrate the utility of the TPBWT in a brief empirical analysis exploring geographic patterns of haplotype sharing within Mexico. Hierarchical clustering of IBD shared across regions within Mexico reveals geographically structured haplotype sharing and a strong signal of isolation by distance. Our software implementation of the TPBWT is freely available for non-commercial use in the code repository https://github.com/23andMe/phasedibd.

中文翻译:

利用模板化的位置Burrows-Wheeler变换快速,可靠地按血统推论

在许多遗传分析中,估计个体中相同血统(IBD)片段的基因组位置和长度是至关重要的一步。然而,生物库和直接对消费者(DTC)遗传数据集规模的指数增长使得准确的IBD推论成为一项重大的计算挑战。在这里,我们介绍模板化的位置Burrows-Wheeler变换(TPBWT),以使快速IBD估计对基因型和相位误差具有鲁棒性。使用在谱系上模拟的单倍型数据以及现实的基因分型和定相误差,我们显示出TPBWT在速度和准确性方面优于其他最新的IBD推理算法。对于每种相位感知方法,我们都通过段长度来探索推断IBD的误报率和误报率,并描述常见错误的类型。我们的结果突出了大多数分阶段IBD推断方法的脆弱性。IBD估计的准确性可能对单倍型定相的质量高度敏感。另外,我们将TPBWT的性能与广泛使用的无相位IBD推断方法进行了比较,该方法对相位误差具有鲁棒性。我们介绍了基于TPBWT的样本内和样本外IBD推理算法,并展示了它们在具有数百万个样本的大规模数据集上的计算效率。此外,我们描述了TPBWT压缩的单倍型的二进制文件格式,该格式可针对非常大的同类群组快速而有效地进行样本外IBD计算。最后,我们在探索墨西哥单倍型共享地理模式的简短实证分析中证明了TPBWT的实用性。墨西哥境内各地区共享的IBD的分层聚类揭示了地理结构上的单倍型共享,并有很强的距离隔离信号。我们的TPBWT软件实现可在代码库https://github.com/23andMe/phasedibd中免费用于非商业用途。
更新日期:2020-12-23
down
wechat
bug