当前位置: X-MOL 学术Ann. Hum. Genet. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Improved assembly and variant detection of a haploid human genome using single‐molecule, high‐fidelity long reads
Annals of Human Genetics ( IF 1.9 ) Pub Date : 2019-11-11 , DOI: 10.1111/ahg.12364
Mitchell R Vollger 1 , Glennis A Logsdon 1 , Peter A Audano 1 , Arvis Sulovari 1 , David Porubsky 1 , Paul Peluso 2 , Aaron M Wenger 2 , Gregory T Concepcion 2 , Zev N Kronenberg 2 , Katherine M Munson 1 , Carl Baker 1 , Ashley D Sanders 3 , Diana C J Spierings 4 , Peter M Lansdorp 4, 5, 6 , Urvashi Surti 7 , Michael W Hunkapiller 2 , Evan E Eichler 1, 8
Affiliation  

The sequence and assembly of human genomes using long‐read sequencing technologies has revolutionized our understanding of structural variation and genome organization. We compared the accuracy, continuity, and gene annotation of genome assemblies generated from either high‐fidelity (HiFi) or continuous long‐read (CLR) datasets from the same complete hydatidiform mole human genome. We find that the HiFi sequence data assemble an additional 10% of duplicated regions and more accurately represent the structure of tandem repeats, as validated with orthogonal analyses. As a result, an additional 5 Mbp of pericentromeric sequences are recovered in the HiFi assembly, resulting in a 2.5‐fold increase in the NG50 within 1 Mbp of the centromere (HiFi 480.6 kbp, CLR 191.5 kbp). Additionally, the HiFi genome assembly was generated in significantly less time with fewer computational resources than the CLR assembly. Although the HiFi assembly has significantly improved continuity and accuracy in many complex regions of the genome, it still falls short of the assembly of centromeric DNA and the largest regions of segmental duplication using existing assemblers. Despite these shortcomings, our results suggest that HiFi may be the most effective standalone technology for de novo assembly of human genomes.

中文翻译:

使用单分子、高保真长读长改进单倍体人类基因组的组装和变异检测

使用长读长测序技术对人类基因组进行测序和组装,彻底改变了我们对结构变异和基因组组织的理解。我们比较了从同一完整葡萄胎人类基因组的高保真 (HiFi) 或连续长读 (CLR) 数据集生成的基因组组装的准确性、连续性和基因注释。我们发现 HiFi 序列数据组装了额外 10% 的重复区域,并且更准确地表示了串联重复的结构,正如正交分析所验证的那样。结果,在 HiFi 组件中恢复了额外的 5 Mbp 着丝粒周围序列,导致着丝粒 1 Mbp 内的 NG50 增加了 2.5 倍(HiFi 480.6 kbp,CLR 191.5 kbp)。此外,与 CLR 组装相比,HiFi 基因组组装的生成时间明显更短,计算资源更少。尽管 HiFi 组装显着提高了基因组许多复杂区域的连续性和准确性,但它仍然无法使用现有组装器组装着丝粒 DNA 和片段复制的最大区域。尽管存在这些缺点,我们的结果表明 HiFi 可能是人类基因组从头组装最有效的独立技术。
更新日期:2019-11-11
down
wechat
bug