当前位置: X-MOL 学术Nature › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Progressive Cactus is a multiple-genome aligner for the thousand-genome era
Nature ( IF 50.5 ) Pub Date : 2020-11-11 , DOI: 10.1038/s41586-020-2871-y
Joel Armstrong 1 , Glenn Hickey 1 , Mark Diekhans 1 , Ian T Fiddes 1 , Adam M Novak 1 , Alden Deran 1 , Qi Fang 2, 3 , Duo Xie 2, 4 , Shaohong Feng 2, 5 , Josefin Stiller 3 , Diane Genereux 6 , Jeremy Johnson 6 , Voichita Dana Marinescu 7 , Jessica Alföldi 6 , Robert S Harris 8 , Kerstin Lindblad-Toh 6, 7 , David Haussler 1, 9 , Elinor Karlsson 6, 10, 11 , Erich D Jarvis 9, 12 , Guojie Zhang 3, 5, 13, 14 , Benedict Paten 1
Affiliation  

New genome assemblies have been arriving at a rapidly increasing pace, thanks to decreases in sequencing costs and improvements in third-generation sequencing technologies1-3. For example, the number of vertebrate genome assemblies currently in the NCBI (National Center for Biotechnology Information) database4 increased by more than 50% to 1,485 assemblies in the year from July 2018 to July 2019. In addition to this influx of assemblies from different species, new human de novo assemblies5 are being produced, which enable the analysis of not only small polymorphisms, but also complex, large-scale structural differences between human individuals and haplotypes. This coming era and its unprecedented amount of data offer the opportunity to uncover many insights into genome evolution but also present challenges in how to adapt current analysis methods to meet the increased scale. Cactus6, a reference-free multiple genome alignment program, has been shown to be highly accurate, but the existing implementation scales poorly with increasing numbers of genomes, and struggles in regions of highly duplicated sequences. Here we describe progressive extensions to Cactus to create Progressive Cactus, which enables the reference-free alignment of tens to thousands of large vertebrate genomes while maintaining high alignment quality. We describe results from an alignment of more than 600 amniote genomes, which is to our knowledge the largest multiple vertebrate genome alignment created so far.

中文翻译:


Progressive Cactus 是千基因组时代的多基因组比对仪



由于测序成本的降低和第三代测序技术的改进1-3,新基因组组装的速度正在迅速加快。例如,从 2018 年 7 月到 2019 年 7 月,目前 NCBI(国家生物技术信息中心)数据库中的脊椎动物基因组组装数量增加了 50% 以上,达到 1,485 个组装。除了来自不同物种的组装的涌入之外, ,新的人类从头组装5正在产生,这不仅能够分析小的多态性,而且能够分析人类个体和单倍型之间复杂的、大规模的结构差异。这个即将到来的时代及其前所未有的数据量为揭示基因组进化的许多见解提供了机会,但也对如何调整当前的分析方法以满足不断增加的规模提出了挑战。 Cactus6 是一个无参考的多基因组比对程序,已被证明是高度准确的,但现有的实现随着基因组数量的增加而扩展性很差,并且在高度重复的序列区域中表现不佳。在这里,我们描述了对 Cactus 的渐进扩展,以创建 Progressive Cactus,它能够对数十到数千个大型脊椎动物基因组进行无参考比对,同时保持高比对质量。我们描述了 600 多个羊膜动物基因组的比对结果,据我们所知,这是迄今为止创建的最大的多脊椎动物基因组比对。
更新日期:2020-11-11
down
wechat
bug