当前位置: X-MOL 学术Plant Biotech. J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
RTRIP: a comprehensive profile of transposon insertion polymorphisms in rice.
Plant Biotechnology Journal ( IF 10.1 ) Pub Date : 2020-05-30 , DOI: 10.1111/pbi.13425
Zhen Liu 1 , Tingzhang Wang 2 , Lin Wang 3 , Han Zhao 4 , Erkui Yue 1 , Yan Yan 1 , Faiza Irshad 1 , Ling Zhou 4 , Ming-Hua Duan 5 , Jian-Hong Xu 1
Affiliation  

Transposable elements (TEs), also known as transposons, a type of mobile genetic elements, are widespread across all investigated eukaryotic organisms and typically constitute the major portion of most genomes, especially in grasses, where they can account for up to 90% of the genome (Vitte et al., 2014). They not only are actively involved in altering gene structure and regulating gene expression, but also have played a profound role in reshaping genomic architecture and maintaining genomic stability (Lisch, 2013). Apart from important biological functions, TEs have been widely exploited as gene tagging and molecular markers for gene function and genetic research (Kumar and Hirochika, 2001). Their active transposition can introduce abundant genetic polymorphisms among individuals considering the presence and absence of insertions, which have been shown to contribute to genome evolution and differentiation between populations (Gonzalez et al., 2008; Studer et al., 2011). A comprehensive profile of transposon insertion polymorphisms (TIPs) is critical to TE family characterization, genetic evolution research as well as molecular marker‐assisted breeding. Therefore, a variety of sequencing strategies and bioinformatics algorithms have been developed to efficiently identify TE loci based on next‐generation sequencing (NGS) technology, and only few profiles have been constructed in well‐studied model organisms, such as Drosophila melanogaster, Caenorhabditis elegans and Homo sapiens (Kofler et al., 2012; Laricchia et al., 2017; Rishishwar et al., 2015). However, it has not been reported in rice and most plants until now.

With this problem in mind, we obtain a comprehensive TIP profile of 60 743 TE loci by analysing the resequenced data from 3000 diverse rice accessions using our developed pipeline (Figure 1a; http://ibi.zju.edu.cn/Rtrip/method.html). About 75% loci are shared by two or more rice accessions and show abundant presence/absence variations, while the remaining are private to a single accession. The average number of TE loci is 6304 for each accession, and shows large difference among accessions, varying from 4898 to 10 155. Moreover, 19 160 TE loci are inserted within or nearby genes (200 bp flanking regions), which may have a potential effect on gene function. To facilitate querying and retrieval of these data, a convenient database named RTRIP (Rice Transposon Insertion Polymorphism; Figure 1b; http://ibi.zju.edu.cn/Rtrip/index.html) has been established, which contains the information of 3000 rice varieties, 60 743 TE loci and genotyping of each variety, and provides versatile searching and browsing functions through intuitive web‐based interfaces.

image
Figure 1
Open in figure viewerPowerPoint
The pipeline for data analysis and screenshots of representative resources in RTRIP. (a) The general schematic view of the procedure to identify TE insertion polymorphism. (b) The home page of RTRIP. (c) The varieties module for detailed information of rice varieties. (d) The TE loci module showing information of TE loci identified in this study. (e) An example of annotated genes carrying TE loci. (f) The TE genotyping module providing the presence/absence status of identified TE loci in rice population. (g) The genome browser page for integrating TE variations with other omics data. (h) The BLAST search page.

The varieties module includes the information of a core collection of 3000 rice accessions, which represent the genetic diversity of this species to a large extent (The 3000 rice genomes project, 2014). These accessions are from 89 countries/regions and classified into five varietal groups, including indica, aus/boro, basmati/sadri, tropical japonica and temperate japonica. In addition to the country and varietal group designation, sample name, source, variety name, designation, genetic stock accession id, DNA accession id, biosample and SRA sample are also listed for each accession in our database if available (Figure 1c). User can browse the entire list of varieties, and they are also allowed to retrieve a subset of the list by imposing one or more filtering conditions based on the searching function on the interface.

The TE loci module incorporates a total of 60 743 TE loci identified from the rice population, which cover 496 TE families and show abundant insertion polymorphisms among accessions. For each locus entry, we offer detailed information, such as the locus ID, genomic position, insertion orientation, subordinate family and reference length, population frequency, presence/absence status in reference genome and its description (Figure 1d). If a TE locus is detected from both the forward and reverse directions of insertion site, the locus ID corresponding to the other direction is showed under column ‘Mates’. To facilitate their application as molecular markers, the 200 bp sequences flanking TE insertions have been extracted and stored under the ‘Flanking Sequence’. Users can click on the name of TE family to get detailed information of the family, where the hierarchical classification and TE consensus sequence are presented. Similarly, a searching function is also added to the module for users to extract TE loci for a given chromosome region or TE characteristic. Furthermore, we have developed a dynamic mapping tool to graphically visualize the distribution of filtered locus subset in rice genome (Figure 1d).

Among 60 743 TE loci, 19 160 are located within or 200 bp up‐ and down‐stream of annotated genes, which may affect the function of corresponding gene by insertion mutation or epigenetic regulation (Lisch, 2009). Given their potential importance in functional genomics, these loci associated with genes have been separately deposited in the ‘TE in Genes’ submodule. Here the user can find whether the gene of interest carries a TE insertion by entering keywords. In the table, the entry IDs are clickable and will direct users to the information page of genes, which contains comprehensive annotation of genes and graphical expression data (Figure 1e).

The TE genotyping module hosts the allele information of TE loci in each resequenced accession and consists of three submodules, Presence/Absence Matrix, Search by variety and Search by gene, which enable users to access the information in diverse ways. First, ‘Presence/Absence Matrix’ provides a glance over the entire data set and allows users to browse and query the presence/absence status of large sets of loci in rice population. The allele information is shown in a table format, where the row and column represent TE locus and rice accession, respectively, while the different number means the presence or absence of TE insertion (Figure 1f). Due to the large amount of data transmission, we specially developed a grid strategy to load the information block by block in order to improve the responsiveness of web page. Likewise, two sets of searching options for TE locus and variety have been set for users to obtain a subset of their interests.

Second, ‘Search by varieties’ has been designed to facilitate their application as molecular markers for gene mapping and molecular‐assisted breeding. Users can submit two or more varieties and specify a chromosome region on the search interface, and the server will return the allele information of TE loci located in corresponding regions. Besides, the detailed information of TE loci has also been friendly integrated to the output results, which provide a direct opportunity for users to select desired loci. Considering that short fragments are more likely to be successfully amplified by PCR reaction, the list of the candidate loci can be further optimized by specifying the threshold of TE length.

Third, ‘Search by Genes’ submodule is customized for TE loci associated with genes and will be applied to the study of gene function. For a given gene information, if the corresponding gene(s) carries TE insertions in our data set, the search results will present the genotypes of the loci in selected accessions. Moreover, the detailed information of TE loci and genes has also been friendly integrated to the returned tables, which will provide convenience for users to view the relevant information in a single interface.

Some popular bioinformatics tools are also available in RTRIP for browsing, searching and downloading. A generic genome browser (GBrowse) has been embodied as a platform for integrating TE variations and other omics data (Figure 1g). Here, users can visualize more intuitively the distribution of TE loci in rice genome and their relationship with annotated genes in the reference genome. The BLAST search tool has been deployed to determine whether the query sequences submitted by users encompass identified TE loci (Figure 1h). On the results page of BLAST search, each hit will be linked to the TE locus interface, with their coordinate information automatically filled into the corresponding blanks in new page. In this way, the user can know which TE locus belongs to the matched segment of query sequences. In the meantime, we also provide the download and help modules to facilitate users to obtain data in batches and to familiarize them with the database as soon as possible.

In conclusion, we have established a comprehensive bioinformatics platform for TE variation data from rice population. As far as we know, it is the first public database dedicated to share genetic variations introduced by TEs. These polymorphic TE loci, as molecular markers and gene tags, will serve as a valuable resource for genetic mapping and gene function researches, and potentially assist the process of rice breeding. In addition, this resource will also contribute to the investigation of rice TEs. RTRIP will be updated with more TE variation data as new high‐quality resequenced data are generated for more rice varieties, and other omics resources, such as epigenomics and miRNAs, will also be integrated into our database when available.



中文翻译:

RTRIP:水稻中转座子插入多态性的综合概况。

转座因子(TEs),也称为转座子,一种可移动的遗传元件,广泛分布于所有研究的真核生物中,通常构成大多数基因组的主要部分,尤其是在草丛中,它们可占到90%的基因组。基因组(Vitte2014)。它们不仅积极参与改变基因结构和调节基因表达,而且在重塑基因组结构和维持基因组稳定性方面发挥了重要作用(Lisch,2013)。除了重要的生物学功能外,TEs还被广泛用作基因标记和分子标记,用于基因功能和遗传研究(Kumar和Hirochika,2001年)。)。考虑到插入的存在和不存在,它们的活性转座可以在个体中引入丰富的遗传多态性,这已被证明有助于基因组的进化和种群间的分化(Gonzalez2008; Studer2011)。转座子插入多态性(TIP)的全面概况对于TE家族鉴定,遗传进化研究以及分子标记辅助育种至关重要。因此,已开发出多种测序策略和生物信息学算法,以基于下一代测序(NGS)技术有效地鉴定TE基因座,并且在经过充分研究的模型生物中仅构建了很少的概况,例如果蝇秀丽隐杆线虫智人(Kofler等人2012 ; Laricchia等人2017 ; Rishishwar等人2015)。但是,到目前为止,尚未在水稻和大多数植物中报道。

考虑到这个问题,通过使用我们开发的管道分析来自3000个不同水稻种质的重测序数据,我们获得了60743个TE位点的全面TIP资料(图1a; http://ibi.zju.edu.cn/Rtrip/method .html)。大约75%的基因座由两个或多个水稻品种共享,并显示出丰富的存在/不存在变异,而其余的则是单个品种的私有。每个登录位点的平均TE位点数为6304,并且在登录位点之间显示出较大差异,从4898到10 155不等。此外,在内部或附近的基因(200 bp侧翼区域)中插入了19160个TE位点,这可能具有潜在的潜力。对基因功能的影响。为了方便查询和检索这些数据,建立了一个方便的数据库,名为RTRIP(水稻转座子插入多态性;图1b; http://ibi.zju.edu.cn/Rtrip/index。

图片
图1
在图形查看器中打开微软幻灯片软件
数据分析管道和RTRIP中代表性资源的屏幕快照。(a)识别TE插入多态性的程序的总体示意图。(b)RTRIP主页。(c)品种模块,提供水稻品种的详细信息。(d)TE位点模块,显示本研究中确定的TE位点的信息。(e)带有TE基因座的带注释基因的例子。(f)TE基因分型模块,提供水稻群体中已鉴定的TE基因座的存在/不存在状态。(g)基因组浏览器页面,用于整合TE变异与其他组学数据。(h)BLAST搜索页面。

品种模块包含3000种水稻核心种质的信息,这在很大程度上代表了该物种的遗传多样性(3000种水稻基因组计划,2014年)。这些材料来自89个国家/地区,分为五个品种组,包括in稻,澳/波罗,巴斯马蒂/沙德里,热带粳稻和温带粳稻。除了国家和品种组名称外,如果可用,我们数据库中的每个种质也列出了样品名称,来源,品种名称,名称,遗传种群登录号,DNA登录号,生物样品和SRA样品(图1c)。用户可以浏览品种的整个列表,还可以通过在界面上基于搜索功能强加一个或多个过滤条件来检索列表的子集。

TE基因座模块整合了从水稻群体中鉴定的60 743个TE基因座,涵盖496个TE家族,并在种质之间显示出丰富的插入多态性。对于每个基因座条目,我们提供详细信息,例如基因座ID,基因组位置,插入方向,从属家族和参考长度,种群频率,参考基因组中的存在/不存在状态及其描述(图1d)。如果从插入位点的正反两个方向都检测到TE基因座,则在“配合”列下显示与另一个方向相对应的基因座ID。为了促进它们作为分子标记的应用,已提取了位于TE插入序列两侧的200 bp序列,并将其存储在“侧翼序列”下。用户可以单击TE家族的名称以获取该家族的详细信息,其中显示了层次分类和TE共识序列。类似地,还向模块添加了搜索功能,供用户提取给定染色体区域或TE特征的TE基因座。此外,我们开发了一种动态作图工具,以图形方式可视化水稻基因组中过滤后的基因座亚群的分布(图1d)。

在60743个TE位点中,有19160个位于注释基因的上游或下游200 bp,这可能会通过插入突变或表观遗传调控影响相应基因的功能(Lisch,2009年)。鉴于它们在功能基因组学中的潜在重要性,这些与基因相关的基因座已分别存放在“基因中的TE”子模块中。在这里,用户可以通过输入关键字来查找目标基因是否携带TE插入。在表中,条目ID是可单击的,并将引导用户到基因信息页面,其中包含基因的全面注释和图形表达数据(图1e)。

The TE genotyping module hosts the allele information of TE loci in each resequenced accession and consists of three submodules, Presence/Absence Matrix, Search by variety and Search by gene, which enable users to access the information in diverse ways. First, ‘Presence/Absence Matrix’ provides a glance over the entire data set and allows users to browse and query the presence/absence status of large sets of loci in rice population. The allele information is shown in a table format, where the row and column represent TE locus and rice accession, respectively, while the different number means the presence or absence of TE insertion (Figure 1f). Due to the large amount of data transmission, we specially developed a grid strategy to load the information block by block in order to improve the responsiveness of web page. Likewise, two sets of searching options for TE locus and variety have been set for users to obtain a subset of their interests.

其次,“按品种搜索”的设计旨在促进其作为基因标记和分子辅助育种的分子标记物的应用。用户可以提交两个或多个变体并在搜索界面上指定染色体区域,服务器将返回位于相应区域的TE基因座的等位基因信息。此外,TE基因座的详细信息也已被友好地集成到输出结果中,这为用户提供了选择所需基因座的直接机会。考虑到短片段更可能通过PCR反应成功扩增,可以通过指定TE长度阈值来进一步优化候选基因座的列表。

第三,“按基因搜索”子模块是为与基因相关的TE基因座定制的,将被用于基因功能的研究。对于给定的基因信息,如果相应的基因在我们的数据集中带有TE插入,则搜索结果将显示所选登录物中基因座的基因型。而且,TE基因座和基因的详细信息也已经友好地集成到返回的表中,这将为用户提供方便,使其在单个界面中查看相关信息。

RTRIP中还提供了一些流行的生物信息学工具,用于浏览,搜索和下载。通用基因组浏览器(GBrowse)已被实现为整合TE变异和其他组学数据的平台(图1g)。在这里,用户可以更直观地可视化水稻基因组中TE基因座的分布及其与参考基因组中带注释基因的关系。已经部署了BLAST搜索工具来确定用户提交的查询序列是否包含已标识的TE位点(图1h)。在BLAST搜索的结果页面上,每个匹配都将链接到TE轨迹界面,其坐标信息会自动填充到新页面的相应空白中。这样,用户可以知道哪个TE基因座属于查询序列的匹配片段。同时,

总之,我们已经建立了一个全面的生物信息学平台,用于处理水稻种群的TE变异数据。据我们所知,这是第一个致力于共享TE引入的遗传变异的公共数据库。这些多态性的TE基因座,作为分子标记和基因标签,将为遗传作图和基因功能研究提供宝贵的资源,并有可能辅助水稻育种。此外,该资源还将有助于对水稻TEs的研究。RTRIP将使用更多的TE变异数据进行更新,因为将为更多的水稻品种生成新的高质量重测序数据,并且其他组学资源(例如表观基因组学和miRNA)也将整合到我们的数据库中。

更新日期:2020-05-30
down
wechat
bug