当前位置: X-MOL 学术Plant Biotech. J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
eRice: a refined epigenomic platform for japonica and indica rice.
Plant Biotechnology Journal ( IF 10.1 ) Pub Date : 2020-01-09 , DOI: 10.1111/pbi.13329
Pingxian Zhang 1 , Yifan Wang 1 , Sadaruddin Chachar 1 , Jian Tian 1 , Xiaofeng Gu 1
Affiliation  

Epigenetic modifications including histone modifications and DNA methylation influence various biological processes in multicellular organisms. DNA 5‐methylcytosine (5mC) is the most widely studied DNA modification mark in eukaryotes and is involved in modulating the activities and functions of developmental signals (Wu and Zhang, 2017). Recent studies have shown that the previously unknown epigenetic mark DNA N6‐methyldeoxyadenosine (6mA) is widely distributed throughout the genomes of model animals such as Drosophila (Zhang et al. , 2015), and throughout the human genome (Xiao et al. , 2018). In land plants, the distribution patterns and potential functions of 6mA sites were largely undiscovered until recent papers reported genome‐wide 6mA sites in dicot Arabidopsis and monocot rice (Liang et al. , 2018; Zhang et al. , 2018). A previous study revealed the 6mA methylome in the two main rice cultivars japonica Nipponbare (Nip) and indica 93‐11, which have been at single‐nucleotide resolution using single‐molecule real‐time (SMRT) sequencing (Zhang et al. , 2018). Analysis of the genomic distribution of 6mA and its biological functions in rice genomes showed that 6mA is associated with gene expression, plant development and stress responses. Until now, an epigenomic database especially for rice, particularly for DNA methylation has not been available. Here, we describe the species‐specific epigenomic annotation database, eRice (http://www.elabcaas.cn/rice/index.html), that will facilitate efficient annotation of epigenomic data for both japonica and indica rice. The eRice database integrates DNA methylation data for 6mA and 5mC at single‐base resolution, artificial intelligence (AI)‐based 6mA predictions, histone modifications, genomic, transcriptomic resources in the Nip and 93‐11 reference cultivars. The epigenomic information about the distributions of epigenetic markers, especially 6mA, across rice genomes will enhance our understanding of the epigenetic regulation of complex biological processes in plant development for future breeding by molecular design.

The eRice database is dedicated to providing efficient and reliable epigenomic and genomic resources for both japonica (Nip) and indica (93‐11) rice cultivars. Briefly, we have stored refined, publicly available genomic and epigenomic resources including the latest reference genomes, 6mA, 5mC and transcriptomes of 3‐week‐old rice seedlings under short day (SD) conditions (Zhang et al. , 2018), and various histone modifications of rice young seedlings or leaves under SD conditions (Lu et al. , 2015; Zhang et al. , 2012). The eRice database also provided 6mA prediction sites based on the deep‐learning approaches (Washburn et al. , 2019). The schematic structure (Figure 1a) and the home page (Figure 1b) of eRice are shown, and eRice includes gene annotation, DNA methylation, multi‐JBrowse and BLAST pages for performing gene information and analysing DNA methylation.

image
Figure 1
Open in figure viewerPowerPoint
The structure of eRice database. (a) Structure of the eRice website. The eRice database includes the rice reference genomes, SMRT‐seq data, Bisulfite‐seq data, ChIP‐seq data and transcriptomes. 6mA‐AI‐prediction data were generated from 6mA methylation resources using AI methods. (b) eRice home page. (c) The gene annotation page. (d) The DNA methylation page. (e) The 6mA AI‐prediction page. (f,g) The multi‐JBrowse page. The Hd3a locus (LOC_Os06g06320) is used as an example to illustrate various epigenomic and genomic information (f) and detailed sequence information (g).

The eRice genomic annotation page provides a flexible interface for efficient retrieval and graphical visualization of genome‐wide annotation data and sequence information (Figure 1c). On this page, a keyword‐based search engine is provided to allow searches for all relevant genes by first selecting the most closely related reference genome Nip or 93‐11 as the defaults and entering a locus identifier (currently only supporting the full‐name locus ID) or functional keyword (e .g . demethylase). Links to pages with detailed information on gene location; gene structure; gene annotations; nucleotide and amino acid sequences; and gene expression data are then provided. In this section, details about 6mA and 5mC sites associated with the candidate gene are also shown. A more detailed description of the 6mA and 5mC information available on the eRice DNA methylation page follows.

The eRice DNA methylation page provides the first genome‐wide epigenomic data resource for the distributions of 6mA and 5mC (Figure 1d) in rice. This information has been refined to permit searching by selecting a candidate chromosome and entering a site of interest (from the starting site to the ending site); which then provides links to more detailed pages. On the detailed page for 6mA, detailed methylation parameters such as chromosome position, DNA strand, fraction score and a 20‐bp reference sequence above and below the given methylation site may be viewed. In contrast, the detailed page for 5mC methylation sites of interest shows methylation types (CG, CHG and CHH) but not detailed methylation parameters. Moreover, genomic sequences at a given methylation site that show, for instance, whether the methylation site is within a gene or in an intergenic region are also available.

Deep‐learning approaches or AI methods, have been largely responsible for a recent paradigm shift in image and natural language processing. These deep‐learning methodologies are being applied to biological problems in agriculture and genetics (Washburn et al. , 2019). Here, by leveraging previous deep‐learning methods and open‐source code (Washburn et al. , 2019), we have developed AI models trained with previously released 6mA data for predicting and memorizing on both the motifs and the functional evolutionary history of 6mA in rice (Figure 1e). Briefly, we first retrieved 41‐bp sequences (including 20‐bp reference sequences up‐ and downstream from an A base) in the whole rice genome, then encoded the sequences with the one‐hot approach (A = 1000, G = 0100, T = 0010, C = 0001). Our prediction models were constructed in Python 2.7 using Keras 2.2.4 with a Tensorflow back end. The final architecture consists of two convolutional layers, with each group of layers followed by a maximum pooling and a dropout layer, and a final prediction layer. A ‘relu’ activation function was used for each layer in the model (except for the final prediction layer, which used a Softmax activation function depending on the model) (Washburn et al. , 2019). Using these trained deep‐learning models, we tested 6mA in two Nip and 93‐11 cultivars for predicting 6mA site from the whole rice genome region (Loss = 0.125, Accuracy = 0.958 and AUC (Area Under Curve) = 0.989, which was calculated with 10‐folds cross validation). This architecture will allow rice researchers to efficiently query the eRice database and predict potential 6mA sites in targeted genes or regions.

The eRice database runs JBrowse (Buels et al. , 2016), a powerful genomic data tool for visualizing the Nip and 93‐11 genomes. Different tracks can be chosen to review genome‐wide reference sequences and annotation information, 6mA methylation data and AI‐based 6mA prediction data for different genomic regions (Figure 1f and g). Moreover, we have added genome‐wide epigenomic resources for single‐base‐resolution 5mC and various histone modifications to facilitate tracking of associations between this epigenomic information and other ‘omics‐based data. Further, the ViroBLAST (Deng et al. , 2007) tool allows searches of nucleotide or amino acid sequences for candidate homologs in the reference genome and CDS (coding DNA sequence) data and shows epigenetic modification sites with alignment results in text format. Setting the desired parameters using the BLAST tool is enabled for advanced searches. All of the genomic and epigenomic data in the database may be freely accessed and some useful links to other databases have also been added.

Adenine methylation on the sixth position of the purine ring has been regarded as a sixth base that could be important in plant development and environmental responses (Liang et al. , 2018; Zhang et al. , 2018). Our eRice database has been established as an extensive bioinformatics platform providing epigenomic resources (especially for 6mA) and genomic annotation of the reference cultivars, Nip and 93‐11, respectively. Genomic annotations and epigenomic resources can be downloaded at the eRice download page. Owing to deep‐learning methods, which are being widely applied to questions in agricultural and genetic sciences (Washburn et al. , 2019), we could intelligently predict genome‐wide distribution of 6mA sites in rice genomes. With the multiple epigenomic, other ‘omics‐based data and AI‐prediction resources of eRice available through a user‐friendly website, we would like eRice to serve as an efficient tool for epigenetic design of rice traits involving developmental cues (such as heading date, yield and quality) and responses to environmental stimuli (such as drought response, ambient temperature and salt stress) in rice. To keep up with the latest advances and offer more analysis tools for rice plants, we will continue to supplement the eRice database with additional epigenetic data including for RNA methylation, non‐coding RNA and various histone modifications, as well as other ‘omics‐based data, to extend the functionality of the eRice database and make it a convenient electronic platform for the community of rice researchers.



中文翻译:

eRice:精制的表观基因组平台,适用于粳稻和in稻。

表观遗传修饰包括组蛋白修饰和DNA甲基化影响多细胞生物中的各种生物学过程。DNA 5-甲基胞嘧啶(5mC)是真核生物中研究最广泛的DNA修饰标记,并参与调节发育信号的活性和功能(Wu和Zhang,2017年)。最近的研究表明,以前未知的表观遗传标记DNA N6-甲基脱氧腺苷(6mA)广泛分布在果蝇模型动物的基因组中(Zhang等人2015)以及整个人类基因组中(Xiao等人2018))。在陆地植物中,直到最近的论文报道了双子叶植物拟南芥和单子叶植物水稻中全基因组6mA的位点之前,基本上没有发现6mA位点的分布模式和潜在功能(Liang等人2018 ; Zhang等人2018)。先前的研究显示,在两个主要水稻品种的6毫安甲基化粳稻日本晴(NIP)和籼稻93-11,已使用单分子实时(SMRT)测序一直处于单核苷酸分辨率(张等人2018)。水稻基因组中6mA的基因组分布及其生物学功能分析表明,6mA与基因表达,植物发育和胁迫响应有关。到目前为止,尚无特别是水稻,尤其是DNA甲基化的表观基因组数据库。在这里,我们描述了物种特异性的表观基因组注释数据库,埃里塞(http://www.elabcaas.cn/rice/index.html),这将有利于两个表观数据的高效注释粳稻籼稻白饭。eRice数据库以单碱基分辨率集成了6mA和5mC的DNA甲基化数据,基于人工智能(AI)的6mA预测,组蛋白修饰,Nip和93-11参考品种的基因组,转录组资源。有关水稻基因组中表观遗传标记(特别是6mA)分布的表观遗传学信息将增强我们对植物发育中复杂生物过程的表观遗传学调控的了解,以便通过分子设计育种。

伊利斯数据库致力于为双方提供高效,可靠的表观基因组和基因组资源粳稻(NIP)和籼稻(93-11)水稻品种。简而言之,我们已经存储了精炼的,可公开获得的基因组和表观基因组资源,包括在短日(SD)条件下最新的参考基因组,3周龄水稻幼苗的6mA,5mC和转录组(Zhang et al。2018),以及各种SD条件下水稻幼苗或叶片的组蛋白修饰(Lu2015; Zhang2012)。eRice数据库还基于深度学习方法(Washburn)提供了6mA预测站点等。2019)。显示了eRice的示意图结构(图1a)和主页(图1b),eRice包括基因注释,DNA甲基化,multi-JBrowse和BLAST页面,用于执行基因信息和分析DNA甲基化。

图片
图1
在图形查看器中打开PowerPoint
eRice数据库的结构。(a)eRice网站的结构。eRice数据库包括水稻参考基因组,SMRT-seq数据,亚硫酸氢盐-seq数据,ChIP-seq数据和转录组。使用AI方法从6mA甲基化资源生成6mA‐AI预测数据。(b)eRice主页。(c)基因注释页面。(d)DNA甲基化页面。(e)6mA AI预测页面。(f,g)多重J浏览页面。以Hd3a基因座(LOC_Os06g06320)为例说明各种表观基因组和基因组信息(f)和详细序列信息(g)。

eRice基因组注释页面提供了灵活的界面,可对全基因组注释数据和序列信息进行有效检索和图形可视化(图1c)。在此页面上,提供了基于关键字的搜索引擎,以允许通过首先选择最相关的参考基因组Nip或93-11作为默认值并输入基因座标识符(当前仅支持全名基因座)来搜索所有相关基因。 ID)或功能性的关键字(é。脱甲基酶)。链接到有关基因定位的详细信息的页面;基因结构 基因注释;核苷酸和氨基酸序列;然后提供基因表达数据。在此部分中,还显示了与候选基因相关的6mA和5mC位点的详细信息。以下是eRice DNA甲基化页面上可用的6mA和5mC信息的更详细说明。

eRice DNA甲基化页面提供了第一个全基因组范围的表观基因组数据资源,用于水稻中6mA和5mC的分布(图1d)。此信息已经过完善,可以通过选择候选染色体并输入感兴趣的位点(从起始位点到终止位点)进行搜索;然后提供指向更详细页面的链接。在6mA的详细页面上,可以查看详细的甲基化参数,例如染色体位置,DNA链,馏分分数和给定甲基化位点上下的20 bp参考序列。相反,感兴趣的5mC甲基化位点的详细页面显示了甲基化类型(CG,CHG和CHH),但没有详细的甲基化参数。此外,给定甲基化位点的基因组序列显示,例如

深度学习方法或AI方法在很大程度上导致了图像和自然语言处理领域最近的范式转变。这些深度学习方法已被应用于农业和遗传学中的生物学问题(Washburn et al。2019)。在这里,通过利用以前的深度学习方法和开源代码(Washburn等人2019),我们已经开发了使用先前发布的6mA数据训练的AI模型,以预测和记忆水稻6mA的基序和功能进化史(图1e)。简而言之,我们首先在整个水稻基因组中检索到41 bp的序列(包括从A碱基上下游的20 bp的参考序列),然后使用单发方法(A = 1000,G = 0100, T = 0010,C = 0001)。我们的预测模型是使用Keras 2.2.4和Tensorflow后端在Python 2.7中构建的。最终的体系结构由两个卷积层组成,每个层的组后面是最大池化和丢包层,以及最后的预测层。模型中的每一层都使用了“ relu”激活函数(最终预测层除外,等。2019)。使用这些训练有素的深度学习模型,我们在两个Nip和93-11品种中测试了6mA,以预测整个水稻基因组区域的6mA位点(损失= 0.125,准确度= 0.958和AUC(曲线下面积)= 0.989,计算得出与10倍交叉验证)。这种结构将使水稻研究人员能够有效地查询eRice数据库,并预测目标基因或区域中潜在的6mA位点。

eRice数据库运行JBrowse(Buels et al。2016),这是一个功能强大的基因组数据工具,用于可视化Nip和93-11基因组。可以选择不同的轨迹来查看不同基因组区域的全基因组参考序列和注释信息,6mA甲基化数据和基于AI的6mA预测数据(图1f和g)。此外,我们为单碱基分辨率5mC和各种组蛋白修饰添加了全基因组表观基因组资源,以促进跟踪此表观基因组信息与其他基于“组学”的数据之间的关联。此外,ViroBLAST(Deng2007工具可在核苷酸或氨基酸序列中搜索参考基因组和CDS(编码DNA序列)数据中的候选同源物,并以文本格式显示表观遗传修饰位点和比对结果。启用了使用BLAST工具设置所需参数的高级搜索功能。可以自由访问数据库中的所有基因组和表观基因组数据,并且还添加了一些指向其他数据库的有用链接。

嘌呤环第六位的腺嘌呤甲基化被认为是第六个碱基,在植物发育和环境响应中可能很重要(Liang2018; Zhang2018)。我们的eRice数据库已建立为一个广泛的生物信息学平台,可分别提供参考品种Nip和93-11的表观基因组资源(尤其是6mA)和基因组注释。可以在eRice下载页面上下载基因组注释和表观基因组资源。由于深度学习方法被广泛应用于农业和遗传科学中的问题(Washburn et al。2019),我们可以智能地预测水稻基因组中6mA位点的全基因组分布。借助可通过用户友好的网站获得的eRice的多种表观基因组学,其他基于'组学'的数据和AI预测资源,我们希望eRice可以作为水稻特性的表观遗传设计的有效工具,涉及发育线索(例如抽穗期) ,产量和品质)以及对环境刺激的响应(例如干旱响应,环境温度和盐胁迫)。为了跟上最新进展并为水稻植物提供更多分析工具,我们将继续为eRice数据库补充其他表观遗传数据,包括RNA甲基化,非编码RNA和各种组蛋白修饰以及其他基于“组学”的数据,

更新日期:2020-01-09
down
wechat
bug