当前位置: X-MOL 学术Nature › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Towards complete and error-free genome assemblies of all vertebrate species
Nature ( IF 50.5 ) Pub Date : 2021-04-28 , DOI: 10.1038/s41586-021-03451-0
Arang Rhie 1 , Shane A McCarthy 2, 3 , Olivier Fedrigo 4 , Joana Damas 5 , Giulio Formenti 4, 6 , Sergey Koren 1 , Marcela Uliano-Silva 7, 8 , William Chow 3 , Arkarachai Fungtammasan 9 , Juwan Kim 10 , Chul Lee 10 , Byung June Ko 11 , Mark Chaisson 12 , Gregory L Gedman 6 , Lindsey J Cantin 6 , Francoise Thibaud-Nissen 13 , Leanne Haggerty 14 , Iliana Bista 2, 3 , Michelle Smith 3 , Bettina Haase 4 , Jacquelyn Mountcastle 4 , Sylke Winkler 15, 16 , Sadye Paez 4, 6 , Jason Howard 17 , Sonja C Vernes 18, 19, 20 , Tanya M Lama 21 , Frank Grutzner 22 , Wesley C Warren 23 , Christopher N Balakrishnan 24 , Dave Burt 25 , Julia M George 26 , Matthew T Biegler 6 , David Iorns 27 , Andrew Digby 28 , Daryl Eason 28 , Bruce Robertson 29 , Taylor Edwards 30 , Mark Wilkinson 31 , George Turner 32 , Axel Meyer 33 , Andreas F Kautt 33, 34 , Paolo Franchini 33 , H William Detrich 35 , Hannes Svardal 36, 37 , Maximilian Wagner 38 , Gavin J P Naylor 39 , Martin Pippel 15, 40 , Milan Malinsky 3, 41 , Mark Mooney 42 , Maria Simbirsky 9 , Brett T Hannigan 9 , Trevor Pesout 43 , Marlys Houck 44 , Ann Misuraca 44 , Sarah B Kingan 45 , Richard Hall 45 , Zev Kronenberg 45 , Ivan Sović 45, 46 , Christopher Dunn 45 , Zemin Ning 3 , Alex Hastie 47 , Joyce Lee 47 , Siddarth Selvaraj 48 , Richard E Green 43, 49 , Nicholas H Putnam 50 , Ivo Gut 51, 52 , Jay Ghurye 49, 53 , Erik Garrison 43 , Ying Sims 3 , Joanna Collins 3 , Sarah Pelan 3 , James Torrance 3 , Alan Tracey 3 , Jonathan Wood 3 , Robel E Dagnew 12 , Dengfeng Guan 2, 54 , Sarah E London 55 , David F Clayton 56 , Claudio V Mello 57 , Samantha R Friedrich 57 , Peter V Lovell 57 , Ekaterina Osipova 15, 40, 58 , Farooq O Al-Ajli 59, 60, 61 , Simona Secomandi 62 , Heebal Kim 10, 11, 63 , Constantina Theofanopoulou 6 , Michael Hiller 64, 65, 66 , Yang Zhou 67 , Robert S Harris 68 , Kateryna D Makova 68, 69, 70 , Paul Medvedev 69, 70, 71, 72 , Jinna Hoffman 13 , Patrick Masterson 13 , Karen Clark 13 , Fergal Martin 14 , Kevin Howe 14 , Paul Flicek 14 , Brian P Walenz 1 , Woori Kwak 63, 73 , Hiram Clawson 43 , Mark Diekhans 43 , Luis Nassar 43 , Benedict Paten 43 , Robert H S Kraus 33, 74 , Andrew J Crawford 75 , M Thomas P Gilbert 76, 77 , Guojie Zhang 78, 79, 80, 81 , Byrappa Venkatesh 82 , Robert W Murphy 83 , Klaus-Peter Koepfli 84 , Beth Shapiro 85, 86 , Warren E Johnson 84, 87, 88 , Federica Di Palma 89 , Tomas Marques-Bonet 90, 91, 92, 93 , Emma C Teeling 94 , Tandy Warnow 95 , Jennifer Marshall Graves 96 , Oliver A Ryder 44, 97 , David Haussler 43, 85 , Stephen J O'Brien 98, 99 , Jonas Korlach 45 , Harris A Lewin 5, 100, 101 , Kerstin Howe 3 , Eugene W Myers 15, 40, 102 , Richard Durbin 2, 3 , Adam M Phillippy 1 , Erich D Jarvis 4, 6, 86
Affiliation  

High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species1,2,3,4. To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.



中文翻译:


实现所有脊椎动物的完整且无错误的基因组组装



高质量和完整的参考基因组组装是将基因组学应用于生物学、疾病和生物多样性保护的基础。然而,此类组件仅适用于少数非微生物物种1,2,3,4 。为了解决这个问题,国际 Genome 10K (G10K) 联盟5,6经过五年的努力,评估和开发了具有成本效益的方法来组装高度准确和近乎完整的参考基因组。在这里,我们介绍了从代表 6 个主要脊椎动物谱系的 16 个物种生成组件中获得的经验教训。我们确认,长读长测序技术对于最大化基因组质量至关重要,并且如果处理不当,未解决的复杂重复和单倍型杂合性是组装错误的主要来源。我们的组装纠正了重大错误,在一些最好的历史参考基因组中添加了缺失的序列,并揭示了生物学发现。其中包括鉴定许多错误的基因重复、基因大小的增加、谱系特有的染色体重排、蝙蝠基因组中重复的独立染色体断点,以及蛋白质编码基因及其调控区域中典型的富含GC的模式。吸取这些经验教训,我们启动了脊椎动物基因组计划 (VGP),这是一项国际努力,旨在为所有大约 70,000 种现存脊椎动物生成高质量、完整的参考基因组,并帮助开启生命发现的新时代科学。

更新日期:2021-04-28
down
wechat
bug