当前位置: X-MOL 学术Genom. Proteom. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
High-quality Arabidopsis thaliana Genome Assembly with Nanopore and HiFi Long Reads
Genomics, Proteomics & Bioinformatics ( IF 9.5 ) Pub Date : 2021-09-03 , DOI: 10.1016/j.gpb.2021.08.003
Bo Wang 1 , Xiaofei Yang 2 , Yanyan Jia 1 , Yu Xu 3 , Peng Jia 4 , Ningxin Dang 5 , Songbo Wang 4 , Tun Xu 4 , Xixi Zhao 6 , Shenghan Gao 4 , Quanbin Dong 6 , Kai Ye 7
Affiliation  

Arabidopsis thaliana is an important and long-established model species for plant molecular biology, genetics, epigenetics, and genomics. However, the latest version of reference genome still contains a significant number of missing segments. Here, we reported a high-quality and almost complete Col-0 genome assembly with two gaps (named Col-XJTU) by combining the Oxford Nanopore Technologies ultra-long reads, Pacific Biosciences high-fidelity long reads, and Hi-C data. The total genome assembly size is 133,725,193 bp, introducing 14.6 Mb of novel sequences compared to the TAIR10.1 reference genome. All five chromosomes of the Col-XJTU assembly are highly accurate with consensus quality (QV) scores > 60 (ranging from 62 to 68), which are higher than those of the TAIR10.1 reference (ranging from 45 to 52). We completely resolved chromosome (Chr) 3 and Chr5 in a telomere-to-telomere manner. Chr4 was completely resolved except the nucleolar organizing regions, which comprise long repetitive DNA fragments. The Chr1 centromere (CEN1), reportedly around 9 Mb in length, is particularly challenging to assemble due to the presence of tens of thousands of CEN180 satellite repeats. Using the cutting-edge sequencing data and novel computational approaches, we assembled a 3.8-Mb-long CEN1 and a 3.5-Mb-long CEN2. We also investigated the structure and epigenetics of centromeres. Four clusters of CEN180 monomers were detected, and the centromere-specific histone H3-like protein (CENH3) exhibited a strong preference for CEN180 Cluster 3. Moreover, we observed hypomethylation patterns in CENH3-enriched regions. We believe that this high-quality genome assembly, Col-XJTU, would serve as a valuable reference to better understand the global pattern of centromeric polymorphisms, as well as the genetic and epigenetic features in plants.



中文翻译:

具有纳米孔和高保真长读长的高质量拟南芥基因组组装

拟南芥是植物分子生物学、遗传学、表观遗传学和基因组学的重要且历史悠久的模式物种。然而,最新版本的参考基因组仍然包含大量缺失的片段。在这里,我们通过结合 Oxford Nanopore Technologies 超长读长、Pacific Biosciences 高保真长读长和 Hi-C 数据,报告了具有两个间隙的高质量且几乎完整的 Col-0 基因组组装(命名为 Col-XJTU)。总基因组组装大小为 133,725,193 bp,与 TAIR10.1 参考基因组相比,引入了 14.6 Mb 的新序列。Col-XJTU 组装的所有五个染色体都是高度准确的共识质量 (QV) 得分 > 60(范围从 62 到 68),高于 TAIR10.1 参考的得分(范围从 45 到 52)。我们以端粒到端粒的方式完全解析了染色体 (Chr) 3 和 Chr5 。除了包含长重复 DNA 片段的核仁组织区域外,Chr4 被完全解析。据报道,Chr1 着丝粒 (CEN1) 的长度约为 9 Mb,由于存在数万个 CEN180 卫星重复,因此组装起来特别具有挑战性。使用尖端测序数据和新颖的计算方法,我们组装了一个 3.8-Mb 长的 CEN1 和一个 3.5-Mb 长的 CEN2。我们还研究了着丝粒的结构和表观遗传学。检测到四个 CEN180 单体簇,以及着丝粒特异性组蛋白 H3 样蛋白(CENH3)表现出对 CEN180 簇 3 的强烈偏好。此外,我们观察到 CENH3 富集区域的低甲基化模式。我们相信,这种高质量的基因组组装 Col-XJTU 将为更好地了解着丝粒多态性的全球模式以及植物的遗传和表观遗传特征提供有价值的参考。

更新日期:2021-09-03
down
wechat
bug