当前位置: X-MOL 学术bioRxiv. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Haplotype-aware variant calling enables high accuracy in nanopore long-reads using deep neural networks
bioRxiv - Bioinformatics Pub Date : 2021-03-05 , DOI: 10.1101/2021.03.04.433952
Kishwar Shafin , Trevor Pesout , Pi-Chuan Chang , Maria Nattestad , Alexey Kolesnikov , Sidharth Goel , Gunjan Baid , Jordan M. Eizenga , Karen H. Miga , Paolo Carnevali , Miten Jain , Andrew Carroll , Benedict Paten

Long-read sequencing has the potential to transform variant detection by reaching currently difficult-to-map regions and routinely linking together adjacent variations to enable read based phasing. Third-generation nanopore sequence data has demonstrated a long read length, but current interpretation methods for its novel pore-based signal have unique error profiles, making accurate analysis challenging. Here, we introduce a haplotype-aware variant calling pipeline PEPPER-Margin-DeepVariant that produces state-of-the-art variant calling results with nanopore data. We show that our nanopore-based method outperforms the short-read-based single nucleotide variant identification method at the whole genome-scale and produces high-quality single nucleotide variants in segmental duplications and low-mappability regions where short-read based genotyping fails. We show that our pipeline can provide highly-contiguous phase blocks across the genome with nanopore reads, contiguously spanning between 85% to 92% of annotated genes across six samples. We also extend PEPPER-Margin-DeepVariant to PacBio HiFi data, providing an efficient solution with superior performance than the current WhatsHap-DeepVariant standard. Finally, we demonstrate de novo assembly polishing methods that use nanopore and PacBio HiFi reads to produce diploid assemblies with high accuracy (Q35+ nanopore-polished and Q40+ PacBio-HiFi-polished).

中文翻译:

感知单倍型的变异调用可使用深度神经网络在纳米孔长读中实现高精度

长读测序具有潜力,可以通过到达当前难以映射的区域并常规地将相邻变体链接在一起以实现基于读取的定相,来转换变体检测。第三代纳米孔序列数据已证明具有较长的读取长度,但是当前基于其新颖的基于孔的信号的解释方法具有独特的误差曲线,因此进行精确的分析具有挑战性。在这里,我们介绍了一种知道单倍型的变种调用管道PEPPER-Margin-DeepVariant,它可以生成具有纳米孔数据的最新变种调用结果。我们表明,我们的基于纳米孔的方法在整个基因组规模上胜过基于短读的单核苷酸变体鉴定方法,并在节段重复和低适应性区域(基于短读的基因分型失败)中产生了高质量的单核苷酸变体。我们显示,我们的管道可以通过纳米孔读取跨基因组提供高度连续的相块,在六个样本中连续跨越注释基因的85%至92%。我们还将PEPPER-Margin-DeepVariant扩展到PacBio HiFi数据,提供了一种有效的解决方案,其性能优于当前的WhatsHap-DeepVariant标准。最后,我们演示了使用纳米孔和PacBio HiFi读数进行从头组装抛光的方法,可生产具有高精度的二倍体组件(Q35 +纳米孔抛光和Q40 + PacBio-HiFi抛光)。我们显示,我们的管道可以通过纳米孔读取跨基因组提供高度连续的相块,在六个样本中连续跨越注释基因的85%至92%。我们还将PEPPER-Margin-DeepVariant扩展到PacBio HiFi数据,提供了一种有效的解决方案,其性能优于当前的WhatsHap-DeepVariant标准。最后,我们演示了使用纳米孔和PacBio HiFi读数进行从头组装抛光的方法,可生产出高精度的二倍体组件(Q35 +纳米孔抛光和Q40 + PacBio-HiFi抛光)。我们显示,我们的管道可以通过纳米孔读取跨基因组提供高度连续的相块,在六个样本中连续跨越注释基因的85%至92%。我们还将PEPPER-Margin-DeepVariant扩展到PacBio HiFi数据,提供了一种有效的解决方案,其性能优于当前的WhatsHap-DeepVariant标准。最后,我们演示了使用纳米孔和PacBio HiFi读数进行从头组装抛光的方法,可生产具有高精度的二倍体组件(Q35 +纳米孔抛光和Q40 + PacBio-HiFi抛光)。提供比当前WhatsHap-DeepVariant标准更出色的性能的高效解决方案。最后,我们演示了使用纳米孔和PacBio HiFi读数进行从头组装抛光的方法,可生产具有高精度的二倍体组件(Q35 +纳米孔抛光和Q40 + PacBio-HiFi抛光)。提供比当前WhatsHap-DeepVariant标准更出色的性能的高效解决方案。最后,我们演示了使用纳米孔和PacBio HiFi读数进行从头组装抛光的方法,可生产具有高精度的二倍体组件(Q35 +纳米孔抛光和Q40 + PacBio-HiFi抛光)。
更新日期:2021-03-05
down
wechat
bug