当前位置: X-MOL 学术Genome Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Long-read sequencing of 111 rice genomes reveals significantly larger pan-genomes
Genome Research ( IF 6.2 ) Pub Date : 2022-05-01 , DOI: 10.1101/gr.276015.121
Fan Zhang 1, 2 , Hongzhang Xue 3 , Xiaorui Dong 3 , Min Li 2 , Xiaoming Zheng 1 , Zhikang Li 1, 2 , Jianlong Xu 1, 4 , Wensheng Wang 1, 2, 5 , Chaochun Wei 3, 6
Affiliation  

The concept of pan-genome, which is the collection of all genomes from a population, has shown a great potential in genomics study, especially for crop sciences. The rice pan-genome constructed from the second-generation sequencing (SGS) data is about 270 Mb larger than Nipponbare, the rice reference genome (NipRG), but it is still disadvantaged by incompleteness and loss of genomic contexts. The third-generation sequencing (TGS) with long reads can help to construct better pan-genomes. In this paper, we report a high-quality rice pan-genome construction method by introducing a series of new steps to deal with the long-read data, including unmapped sequence block filtering, redundancy removing, and sequence block elongating. Compared to NipRG, the long-read sequencing-based pan-genome constructed from 105 rice accessions, which contains 604 Mb novel sequences, is much more comprehensive than the one constructed from ∼3000 rice genomes sequenced with short reads. The repetitive sequences are the main components of novel sequences, which partially explain the differences between the pan-genomes based on TGS and SGS. Adding six wild rice accessions, there are about 879 Mb novel sequences and 19,000 novel genes in the rice pan-genome in total. In addition, we have created high-quality reference genomes for all representative rice populations, including five gapless reference genomes. This study has made significant progress in our understanding of the rice pan-genome, and this pan-genome construction method for long-read data can be applied to accelerate a broad range of genomics studies.

中文翻译:

111个水稻基因组的长读长测序揭示了显着更大的泛基因组

泛基因组的概念是一个群体中所有基因组的集合,在基因组学研究中显示出巨大的潜力,尤其是在作物科学领域。基于二代测序(SGS)数据构建的水稻泛基因组比日本晴大约270 Mb,水稻参考基因组(NipRG),但由于基因组环境的不完整和丢失,它仍然处于不利地位。长读长的第三代测序(TGS)有助于构建更好的泛基因组。在本文中,我们通过引入一系列新的步骤来处理长读数据,包括未映射的序列块过滤、冗余去除和序列块延伸,报告了一种高质量的水稻泛基因组构建方法。与 NipRG 相比,基于长读长测序的泛基因组由 105 个水稻种质构建而成,其中包含 604 Mb 的新序列,比由约 3000 个短读长测序的水稻基因组构建的泛基因组要全面得多。重复序列是新序列的主要组成部分,这部分解释了基于 TGS 和 SGS 的泛基因组之间的差异。加上6个野生稻种质,水稻泛基因组共有约879 Mb新序列和19,000个新基因。此外,我们为所有具有代表性的水稻种群创建了高质量的参考基因组,包括五个无间隙参考基因组。这项研究在我们对水稻泛基因组的理解方面取得了重大进展,这种针对长读长数据的泛基因组构建方法可用于加速广泛的基因组学研究。
更新日期:2022-05-01
down
wechat
bug