当前位置: X-MOL 学术Plant Biotech. J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
MPOD: Applications of integrated multi-omics database for medicinal plants
Plant Biotechnology Journal ( IF 10.1 ) Pub Date : 2021-12-25 , DOI: 10.1111/pbi.13769
Simei He 1, 2 , Ling Yang 1, 3 , Shuang Ye 1, 2 , Yuan Lin 1, 2 , Xiaobo Li 1, 2 , Yina Wang 1, 2 , Geng Chen 1, 2 , Guanze Liu 1, 2 , Ming Zhao 1, 2 , Xiu Zhao 4 , Kunhua Wei 5 , Guanghui Zhang 1, 2 , Jianhua Miao 5 , Yang Dong 1, 5, 6 , Shengchao Yang 1, 2
Affiliation  

Plant natural products (PNPs) have been an important source in human nutrition, industrial raw materials, medicinal ingredients and half of anticancer drugs are derived from PNPs such as paclitaxel, vinblastine, and ginsenoside (Caputi et al., ; Luo et al., 2019; Yang et al., 2020). Biosynthesis is one of the key ways to produce PNPs, and the increasing development of medicinal Phyto-omics data helps to decode the PNPs biosynthetic pathway (Liu et al., 2017). Genetic resources also provide the basis for medicinal plants (MPs) molecular breeding.

To integrate the genome and transcriptome data of MPs, we completed the first omics database for herbal medicine (HMOD) in December 2017 (Wang et al., 2018). The less genomic data and the simple metabolites information from the website, as the data increases, makes it necessary to comprehensively optimize and upgrade the database from the data, interface, tool, and management. Thus, we constructed an integrated multi-omics database for MPs (MPOD; http://medicinalplants.ynau.edu.cn/).

MPOD collects genomes and transcriptomes of MPs published since January 2018. In addition, we sequenced six genomes, 28 transcriptomes, and five metabolomes in this study. All genomic and transcriptomic sequences in the MPOD are available for query of orthologous gene candidates, and homology comparison between gene families from different species by blast. More importantly, correlation analyses between metabolite distribution and gene expression including metabolite content in different tissues, Pearson correlation analyses of genes involved metabolic pathways and expression profile were performed. Compared with HMOD, MPOD details metabolic pathways of flavonoids, alkaloids and terpenoids, respectively. To facilitate synthetic biology, ‘the biosynthetic tools’ module is added in MPOD with some popular bioinformatics tools including SynVisio, heatmap, and enrichment.

The framework of MPOD is constructed using MySQL, ThinkPHP, and FastAdmin, with four main modules, including genomics, transcriptomics, pathways, and biosynthetic tools (Figure 1a, b). In brief, the genomics module consists of genomes, genome size, re-sequencing, and gene (Figure 1c). This module contains 154 published genomes and 6 unpublished genome-assemblies (Synsepalum dulcificum, Antirrhinum majus, Platycodon grandiflorus, Codonopsis pilosula, Panax vietnamensis, Gynostemma pentaphyllum) from this project. The web interface of species constitutes species introduction, sequencing data, assembly results, the data source links, and reference. For the published genomic data, the GCA data uploaded on NCBI has been linked to MPOD, and for unpublished data, FASTA formatted files for assembly, CDS, and protein sequences can be downloaded from this database. Genome size provides 50 plant genome size results, predicted by flow cytometry. Re-sequencing contains single nucleotide polymorphism (SNP) information of Erigeron breviscapus, P. notoginseng (He et al., 2021) from our team, and published re-sequencing data for 19 other plants. Gene section provides gene assembly, annotation, and expression profiles from E. breviscapus and Acanthopanax senticosus.

Details are in the caption following the image
Figure 1
Open in figure viewerPowerPoint
Schematic of the Database for Medicinal Plants. (a) The flow diagram showing design and construction of MPOD. (b) The home page of MPOD. (c) The ‘genomics’ module providing summary of genomes, genome size, and re-sequencing. (d) The ‘transcriptomics’ module showing sequencing, assembly result and expression profiles. (e) The ‘pathways’ module. (f) The ‘biosynthetic tools module’ providing detailed information of catalytic components, chassis cells, and regulatory elements. (g) A case study for the application of MPOD.

The transcriptomics module contains transcriptomes, expression, and Pearson. The transcriptomes collect 200 published and 28 de novo sequenced data in this project (Figure 1d). It consists of species introduction, sample information, sequencing data, assembly results, annotation methods, the data source links, and reference. The transcriptome data is uploaded and linked like genomes. More importantly, for 28 unpublished transcriptomes, we provide gene expression profiles from different experimental conditions or tissues in a heatmap for easy visualization. We also perform Pearson correlation analyses of genes involved in metabolic pathways using some of our transcript expression data.

The pathways module collects 85 typical compounds whose biosynthetic pathway has been deciphered, including 28 flavonoids, 28 terpenoids, 20 alkaloids, and 9 other compounds. This module lists the compound name, molecular formula, molecular weight, function, basic organisms, precursor, host, synthesis type, downstream gene, pathway, and reference (Figure 1e). Furthermore, this module also collects 7 important compounds, but their biosynthetic pathways are not completely deciphered. Similarly, it includes type of compounds, distribution, proposed pathway, and provides the sequences and expression profiles of candidate genes potentially involved in biosynthesis. It also provides five metabolomes showing that metabolite content from different tissues using heatmap.

The biosynthetic tools module lists chassis cells, catalytic components, and regulatory elements (Figure 1f). Chassis cells present 46 strains of Escherichia coli and Saccharomyces cerevisiae commonly used in biosynthesis, and Nicotiana benthamiana and Solanum lycopersicum as a heterologous expression platform for reconstituting PNPs pathways. In the section of catalytic components, 629 enzymes from 8 major gene families that play key roles in the biosynthesis of natural products were summarized, including 21 acyltransferase (ACT), 7 C-glycosyltransferase (CGT), 159 cytochrome P450 (CYP), 75 O-methyltransferase (OMT), 163 oxidosqualene cyclase (OSC), 25 squalene epoxidase (SE), 65 terpene synthases (TPS), and 114 UDP-glycosyltransferases (UGT). The accession number, gene length, sequence, reaction equation, and references are listed. The regulatory elements section presents 196 microbial promoter and terminator sequences commonly used in biosynthesis.

In addition to the main modules, MPOD provides some popular bioinformatics tools including ‘BLAST’, ‘Search’, ‘Heatmap’, and ‘JBrowse’ (Dong et al., 2020). All available MPOD genomes and gene models are incorporated into JBrowse. ‘SynVisio’ shows gene synteny relationships of chromosome-level reference genomes. ‘Co-expression analysis’ creates networks comprising sets of genes whose expressions are highly correlated.

A typical case of a user using our web is shown in Figure 1g. Gypenoside A is the main active component of G. pentaphyllum, and its content is the highest in leaves from metabolome. The biosynthesis of gypenoside A begins with 2,3-oxidosqualene, but the key downstream enzymes OSC, CYP, and UGT have not been identified. A total of 235 CYPs from G. pentaphyllum (GpCYPs) were found by Blast. The phylogenetic tree was constructed based on the deduced amino acid sequences for the GpCYPs and other plant CYPs, and were distributed in eight subfamilies, namely 144 CYP71, 34 CYP85, 28 CYP72, 20 CYP86, and 4 CYP74. We also explored the expressions of GpCYPs from different tissues and presented as a heatmap. Furthermore, we performed Pearson correlation analyses of our transcript expression data among GpOSCs, GpCYPs, and GpUGTs using GpOSCs as the query gene (Figure 1g). These results facilitate the discovery of unknown genes involved in gypenoside A biosynthesis.

In summary, from genes to metabolite levels, MPOD integrates the genomics, transcriptomics, and metabolomics data of MPs published in almost recent years and sequenced in this study. These datasets provide a rich genetic resource for mining functional genes, screening molecular markers, and developing biological elements. Further combination of pathways and catalytic components greatly facilitate to decode the biosynthetic pathways of medicinal ingredients. MPOD will be continuously updated as multi-omics data increases and new bioinformatics tools emerge, so that MPOD provides long-term support to the research of MPs molecular-assisted breeding and synthetic biology.



中文翻译:

MPOD:药用植物综合多组学数据库的应用

植物天然产物(PNPs)一直是人类营养、工业原料、药用成分的重要来源,一半的抗癌药物来源于PNPs,如紫杉醇、长春碱、人参皂苷(Caputi et al ., ; Luo et al ., 2019 年;杨等人2020 年)。生物合成是生产 PNPs 的关键途径之一,药用植物组学数据的不断发展有助于破译 PNPs 的生物合成途径(Liu et al ., 2017)。遗传资源也为药用植物(MPs)分子育种提供了基础。

为了整合 MPs 的基因组和转录组数据,我们于 2017 年 12 月完成了第一个草药组学数据库 (HMOD) (Wang et al ., 2018 )。来自网站的基因组数据和简单的代谢物信息越来越少,随着数据的增加,需要从数据、界面、工具和管理等方面对数据库进行全面优化和升级。因此,我们为 MPs 构建了一个集成的多组学数据库(MPOD;http://medicinalplants.ynau.edu.cn/)。

MPOD 收集了自 2018 年 1 月以来发表的 MPs 的基因组和转录组。此外,我们在本研究中对 6 个基因组、28 个转录组和 5 个代谢组进行了测序。MPOD 中的所有基因组和转录组序列都可用于查询直系同源基因候选,以及通过blast 比较来自不同物种的基因家族之间的同源性。更重要的是,进行了代谢物分布与基因表达之间的相关性分析,包括不同组织中的代谢物含量,涉及代谢途径的基因的Pearson相关性分析和表达谱。与 HMOD 相比,MPOD 分别详述了黄酮类、生物碱类和萜类化合物的代谢途径。为了促进合成生物学,MPOD 中添加了“生物合成工具”模块以及一些流行的生物信息学工具,包括 SynVisio、

MPOD 的框架使用 MySQL、ThinkPHP 和 FastAdmin 构建,具有四个主要模块,包括基因组学、转录组学、通路和生物合成工具(图 1a、b)。简而言之,基因组学模块由基因组、基因组大小、重新测序和基因组成(图 1c)。该模块包含 154 个已发表的基因组和 6 个未发表的基因组组合(Synsepalum dulcificum、金鱼草、桔梗、党参、越南人参、绞股蓝) 从这个项目。物种的Web界面构成物种介绍、测序数据、组装结果、数据源链接和参考。对于已发表的基因组数据,NCBI 上上传的 GCA 数据已链接到 MPOD,对于未发表的数据,可以从该数据库下载组装、CDS 和蛋白质序列的 FASTA 格式文件。基因组大小提供 50 个植物基因组大小结果,通过流式细胞术预测。重新测序包含灯盏花P的单核苷酸多态性 (SNP) 信息。三七(何等2021) 来自我们的团队,并发布了其他 19 种植物的重新测序数据。基因部分提供了灯盏花刺五加的基因组装、注释和表达谱。

详细信息在图片后面的标题中
图1
在图形查看器中打开微软幻灯片软件
药用植物数据库示意图。(a) 显示 MPOD 设计和构造的流程图。(b) MPOD 的主页。(c) “基因组学”模块提供基因组、基因组大小和重新测序的总结。(d) 显示测序、组装结果和表达谱的“转录组学”模块。(e) “途径”模块。(f) “生物合成工具模块”提供催化成分、底盘单元和调节元件的详细信息。(g) MPOD 应用案例研究。

转录组学模块包含转录组、表达和 Pearson。转录组在该项目中收集了 200 个已发表的和 28 个从头测序的数据(图 1d)。它由物种介绍、样本信息、测序数据、组装结果、注释方法、数据源链接和参考组成。转录组数据像基因组一样被上传和链接。更重要的是,对于 28 个未发表的转录组,我们在热图中提供了来自不同实验条件或组织的基因表达谱,以便于可视化。我们还使用我们的一些转录表达数据对参与代谢途径的基因进行 Pearson 相关分析。

途径模块收集了85个生物合成途径已被破译的典型化合物,其中黄酮类化合物28个,萜类化合物28个,生物碱类20个,其他化合物9个。该模块列出了化合物名称、分子式、分子量、功能、基本生物、前体、宿主、合成类型、下游基因、途径和参考(图 1e)。此外,该模块还收集了 7 种重要的化合物,但它们的生物合成途径尚未完全破译。同样,它包括化合物的类型、分布、建议的途径,并提供了可能参与生物合成的候选基因的序列和表达谱。它还提供了五个代谢组,使用热图显示来自不同组织的代谢物含量。

生物合成工具模块列出了底盘细胞、催化成分和调节元件(图 1f)。底盘细胞呈现生物合成常用的大肠杆菌酿酒酵母46株,以及本氏烟草番茄作为重组 PNPs 通路的异源表达平台。在催化组分部分,总结了来自 8 个主要基因家族的 629 种酶,它们在天然产物的生物合成中起关键作用,包括 21 种酰基转移酶(ACT)、7 种 C-糖基转移酶(CGT)、159 种细胞色素 P450(CYP)、75 O-甲基转移酶 (OMT)、163 氧化鲨烯环化酶 (OSC)、25 角鲨烯环氧酶 (SE)、65 萜烯合酶 (TPS) 和 114 UDP-糖基转移酶 (UGT)。列出了登录号、基因长度、序列、反应方程式和参考文献。调控元件部分展示了生物合成中常用的 196 个微生物启动子和终止子序列。

除了主要模块外,MPOD 还提供了一些流行的生物信息学工具,包括“BLAST”、“Search”、“Heatmap”和“JBrowse”(Dong et al ., 2020)。所有可用的 MPOD 基因组和基因模型都被纳入 JBrowse。“SynVisio”显示染色体水平参考基因组的基因同线性关系。“共表达分析”创建了由表达高度相关的基因组组成的网络。

用户使用我们网站的典型案例如图 1g 所示。Gypenoside A 是G的主要活性成分。pentaphyllum,其含量在代谢组的叶子中最高。绞股蓝皂苷 A 的生物合成始于 2,3-氧化角鲨烯,但尚未确定关键的下游酶 OSC、CYP 和 UGT。共有 235 个来自G. pentaphyllum的 CYP(GpCYPs) 是由 Blast 发现的。系统发育树是根据 GpCYPs 和其他植物 CYPs 的推导氨基酸序列构建的,分布在 8 个亚科中,即 144 个 CYP71、34 个 CYP85、28 个 CYP72、20 个 CYP86 和 4 个 CYP74。我们还探索了来自不同组织的 GpCYP 的表达,并以热图的形式呈现。此外,我们使用 GpOSCs 作为查询基因对 GpOSCs、GpCYPs 和 GpUGTs 之间的转录表达数据进行了 Pearson 相关性分析(图 1g)。这些结果有助于发现与绞股蓝皂苷 A 生物合成有关的未知基因。

综上所述,从基因到代谢物水平,MPOD 整合了近几年发表并在本研究中测序的 MPs 的基因组学、转录组学和代谢组学数据。这些数据集为挖掘功能基因、筛选分子标记和开发生物元件提供了丰富的遗传资源。途径和催化成分的进一步组合极大地促进了药物成分的生物合成途径的解码。MPOD将随着多组学数据的增加和新的生物信息学工具的出现而不断更新,从而为MPs分子辅助育种和合成生物学的研究提供长期支持。

更新日期:2021-12-25
down
wechat
bug