当前位置: X-MOL 学术Nat. Biotechnol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Efficient hybrid de novo assembly of human genomes with WENGAN
Nature Biotechnology ( IF 33.1 ) Pub Date : 2020-12-14 , DOI: 10.1038/s41587-020-00747-w
Alex Di Genova 1, 2 , Elena Buena-Atienza 3, 4 , Stephan Ossowski 3, 4 , Marie-France Sagot 1, 2
Affiliation  

Generating accurate genome assemblies of large, repeat-rich human genomes has proved difficult using only long, error-prone reads, and most human genomes assembled from long reads add accurate short reads to polish the consensus sequence. Here we report an algorithm for hybrid assembly, WENGAN, that provides very high quality at low computational cost. We demonstrate de novo assembly of four human genomes using a combination of sequencing data generated on ONT PromethION, PacBio Sequel, Illumina and MGI technology. WENGAN implements efficient algorithms to improve assembly contiguity as well as consensus quality. The resulting genome assemblies have high contiguity (contig NG50: 17.24–80.64 Mb), few assembly errors (contig NGA50: 11.8–59.59 Mb), good consensus quality (QV: 27.84–42.88) and high gene completeness (BUSCO complete: 94.6–95.2%), while consuming low computational resources (CPU hours: 187–1,200). In particular, the WENGAN assembly of the haploid CHM13 sample achieved a contig NG50 of 80.64 Mb (NGA50: 59.59 Mb), which surpasses the contiguity of the current human reference genome (GRCh38 contig NG50: 57.88 Mb).



中文翻译:


使用 WENGAN 高效混合从头组装人类基因组



事实证明,仅使用容易出错的长读长来生成大型、重复丰富的人类基因组的精确基因组组装是很困难的,并且大多数由长读长组装的人类基因组都会添加准确的短读长以完善共有序列。在这里,我们报告了一种混合装配算法 WENGAN,它以较低的计算成本提供了非常高的质量。我们结合使用 ONT PromethION、PacBio Sequel、Illumina 和 MGI 技术生成的测序数据,演示了四个人类基因组的从头组装。 WENGAN 实施高效的算法来提高装配连续性以及共识质量。所得的基因组组装具有高连续性(重叠群 NG50:17.24–80.64 Mb)、很少的组装错误(重叠群 NGA50:11.8–59.59 Mb)、良好的共识质量(QV:27.84–42.88)和高基因完整性(BUSCO 完整:94.6– 95.2%),同时消耗低计算资源(CPU 小时:187–1,200)。特别是,单倍体 CHM13 样本的W ENGAN 组装实现了 80.64 Mb 的重叠群 NG50(NGA50:59.59 Mb),超过了当前人类参考基因组的连续性(GRCh38 重叠群 NG50:57.88 Mb)。

更新日期:2020-12-14
down
wechat
bug