当前位置: X-MOL 学术Genome Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads.
Genome Research ( IF 6.2 ) Pub Date : 2020-09-01 , DOI: 10.1101/gr.263566.120
Sergey Nurk 1 , Brian P Walenz 1 , Arang Rhie 1 , Mitchell R Vollger 2 , Glennis A Logsdon 2 , Robert Grothe 3 , Karen H Miga 4 , Evan E Eichler 2, 5 , Adam M Phillippy 1 , Sergey Koren 1
Affiliation  

Complete and accurate genome assemblies form the basis of most downstream genomic analyses and are of critical importance. Recent genome assembly projects have relied on a combination of noisy long-read sequencing and accurate short-read sequencing, with the former offering greater assembly continuity and the latter providing higher consensus accuracy. The recently introduced Pacific Biosciences (PacBio) HiFi sequencing technology bridges this divide by delivering long reads (>10 kbp) with high per-base accuracy (>99.9%). Here we present HiCanu, a modification of the Canu assembler designed to leverage the full potential of HiFi reads via homopolymer compression, overlap-based error correction, and aggressive false overlap filtering. We benchmark HiCanu with a focus on the recovery of haplotype diversity, major histocompatibility complex (MHC) variants, satellite DNAs, and segmental duplications. For diploid human genomes sequenced to 30× HiFi coverage, HiCanu achieved superior accuracy and allele recovery compared to the current state of the art. On the effectively haploid CHM13 human cell line, HiCanu achieved an NG50 contig size of 77 Mbp with a per-base consensus accuracy of 99.999% (QV50), surpassing recent assemblies of high-coverage, ultralong Oxford Nanopore Technologies (ONT) reads in terms of both accuracy and continuity. This HiCanu assembly correctly resolves 337 out of 341 validation BACs sampled from known segmental duplications and provides the first preliminary assemblies of nine complete human centromeric regions. Although gaps and errors still remain within the most challenging regions of the genome, these results represent a significant advance toward the complete assembly of human genomes.

中文翻译:


HiCanu:从高保真长读段中准确组装片段重复、卫星和等位基因变体。



完整而准确的基因组组装构成了大多数下游基因组分析的基础,并且至关重要。最近的基因组组装项目依赖于嘈杂的长读长测序和准确的短读长测序的组合,前者提供更好的组装连续性,后者提供更高的一致性准确性。最近推出的 Pacific Biosciences (PacBio) HiFi 测序技术通过提供长读长 (>10 kbp) 和高每碱基准确度 (>99.9%) 来弥合这一鸿沟。在这里,我们介绍 HiCanu,它是 Canu 汇编器的修改版,旨在通过均聚物压缩、基于重叠的纠错和积极的错误重叠过滤来充分发挥 HiFi 读取的潜力。我们对 HiCanu 进行基准测试,重点关注单倍型多样性、主要组织相容性复合体 (MHC) 变体、卫星 DNA 和片段重复的恢复。对于以 30 倍 HiFi 覆盖率进行测序的二倍体人类基因组,与当前最先进技术相比,HiCanu 实现了卓越的准确性和等位基因恢复。在有效的单倍体 CHM13 人类细胞系上,HiCanu 实现了 77 Mbp 的 NG50 重叠群大小,每碱基一致准确度为 99.999% (QV50),超过了最近组装的高覆盖率、超长 Oxford Nanopore Technologies (ONT) 的读数准确性和连续性。该 HiCanu 组装正确解析了从已知片段重复中采样的 341 个验证 BAC 中的 337 个,并提供了 9 个完整人类着丝粒区域的第一个初步组装。尽管基因组中最具挑战性的区域仍然存在差距和错误,但这些结果代表了人类基因组完整组装的重大进步。
更新日期:2020-09-15
down
wechat
bug