当前位置: X-MOL 学术Mol. Ecol. Resour. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Identifying the causes and consequences of assembly gaps using a multiplatform genome assembly of a bird-of-paradise.
Molecular Ecology Resources ( IF 5.5 ) Pub Date : 2020-09-16 , DOI: 10.1111/1755-0998.13252
Valentina Peona 1, 2 , Mozes P K Blom 3, 4 , Luohao Xu 5 , Reto Burri 6 , Shawn Sullivan 7 , Ignas Bunikis 8 , Ivan Liachko 7 , Tri Haryoko 9 , Knud A Jønsson 10 , Qi Zhou 5, 11, 12 , Martin Irestedt 3 , Alexander Suh 1, 2, 13
Affiliation  

Genome assemblies are currently being produced at an impressive rate by consortia and individual laboratories. The low costs and increasing efficiency of sequencing technologies now enable assembling genomes at unprecedented quality and contiguity. However, the difficulty in assembling repeat‐rich and GC‐rich regions (genomic “dark matter”) limits insights into the evolution of genome structure and regulatory networks. Here, we compare the efficiency of currently available sequencing technologies (short/linked/long reads and proximity ligation maps) and combinations thereof in assembling genomic dark matter. By adopting different de novo assembly strategies, we compare individual draft assemblies to a curated multiplatform reference assembly and identify the genomic features that cause gaps within each assembly. We show that a multiplatform assembly implementing long‐read, linked‐read and proximity sequencing technologies performs best at recovering transposable elements, multicopy MHC genes, GC‐rich microchromosomes and the repeat‐rich W chromosome. Telomere‐to‐telomere assemblies are not a reality yet for most organisms, but by leveraging technology choice it is now possible to minimize genome assembly gaps for downstream analysis. We provide a roadmap to tailor sequencing projects for optimized completeness of both the coding and noncoding parts of nonmodel genomes.

中文翻译:


使用天堂鸟的多平台基因组组装来确定组装间隙的原因和后果。



目前,联盟和各个实验室正在以惊人的速度生产基因组组件。测序技术的低成本和不断提高的效率现在使得能够以前所未有的质量和连续性组装基因组。然而,组装重复序列丰富和GC丰富的区域(基因组“暗物质”)的​​困难限制了对基因组结构和调控网络进化的深入了解。在这里,我们比较了当前可用的测序技术(短/连接/长读和邻近连接图)及其组合在组装基因组暗物质中的效率。通过采用不同的从头组装策略,我们将各个草稿组装件与策划的多平台参考组装件进行比较,并确定导致每个组装件内存在间隙的基因组特征。我们证明,采用长读长、链读和邻近测序技术的多平台组装在恢复转座元件、多拷贝 MHC 基因、富含 GC 的微染色体和富含重复的 W 染色体方面表现最佳。对于大多数生物体来说,端粒到端粒的组装尚未成为现实,但通过利用技术选择,现在可以最大限度地减少下游分析的基因组组装间隙。我们提供了定制测序项目的路线图,以优化非模型基因组编码和非编码部分的完整性。
更新日期:2020-09-16
down
wechat
bug