Full-length sequencing of Ginkgo biloba L. reveals the synthesis of terpenoids during seed development

https://doi.org/10.1016/j.indcrop.2021.113714Get rights and content

Highlights

  • Obtain a high-quality Ginkgo seed development-related transcript data set.

  • Reveals the diversity of TTLs regulation modes in ginkgo seed development.

  • Further optimized the structure and function annotations of the ginkgo genome.

Abstract

Full-length transcriptome sequencing based on the PacBio sequencing platform could significantly optimize the annotation of gene structures. As an ancient relic gymnosperm in the monotypic order Ginkgoales, Ginkgo biloba L. contains rich terpenoids that are medicinally valuable. The seeds have abundant edible endosperm, which is delicious and of high nutritional value. However, existing molecular studies on the developmental process of ginkgo seeds are relatively weak, and the biosynthesis of terpenoids in seeds has received little attention. Therefore, single-molecule real-time (SMRT) technology and Illumina sequencing were combined to sequence six tissues related to the reproductive growth and development of ginkgo in order to generate a high-quality full-length transcription database. In total, 20.98 Gb of clean reads containing 178,548 full-length non-chimeric (FLNC) sequences were obtained. From these data, 4019 novel genes and 22,845 novel isoforms were predicted, 52.32 % of the novel genes were annotated, and three novel isoforms were annotated in terpene synthesis related pathways. The enrichment analysis of differentially expressed genes (DEGs) showed that, 95 genes were enriched into 21 categories related to seed development, and 47 DEGs were enriched in the skeletal pathway of terpene synthesis. Combined with the real-time quantitative reverse transcription PCR (qRT-PCR), the phosphosynthase family members synthesizing terpene precursors have diverse and complex expression trends during seed development. Our findings confirm the advantages of SMRT, which facilitated the construction a rich transcript data-set for research on the development of ginkgo seeds, enriching the annotation of the ginkgo genome, and enhancing our understanding of gene regulation of terpene biosynthesis in ginkgo seeds.

Introduction

Ginkgo biloba L. known as a "living fossil", is a relict plant that originated about 280 million years ago (Gong et al., 2008). G. biloba, a long-lived dioecious gymnosperm, is now the sole existing member of the Ginkgopsida. This species is considered significant for linking non-flowering plants and angiosperms due to its special evolutionary status (Wu et al., 2018). Originally from China, G. biloba has important horticultural and economic values (Guan et al., 2016), and its medicinal application dates back thousands of years (Goh and Barlow, 2002). Ginkgo biloba extract (GBE) is currently a popular health care product, and its numerous chemical components and pharmacological effects have been extensively studied and reported. Ginkgo fruits are composed of fleshy episperm, hard mesosperm, membranous endopleura, endosperm, and embryo. While the endosperm and embryo are edible, they have a certain allergenicity (Sado et al., 2019). Ginkgo fruits have antioxidant, antibacterial and insecticide biological activity, and are used in Chinese medicine to treat asthma, cough, and other diseases (Khalil et al., 2020; Wang and Zhang, 2019).

Plant secondary metabolites are a class of small molecular organic compounds produced by plant secondary metabolic activities. These substances are stored in certain organs or tissues of plants, are species-specific, and participate in the process of plant stress resistance, interaction, and information transfer (Wang et al., 2013). The main terpenoids in ginkgo include bilobalide (sesquiterpene) and ginkgolides (diterpene), which are the only ones that have t-butyl [-C17(CH3)3] natural substances with functional groups (Geng et al., 2018), which play an important role in the protection of nerves (Hua et al., 2017; Sui et al., 2019) and the treatment of cardiovascular and cerebrovascular diseases (Cao and Li, 2019; Liu et al., 2019). All parts of the ginkgo seed contain bilobalide and ginkgolide. The part with the highest total terpenoid content is the embryo, followed by the endosperm (Zhang et al., 2015). It is generally believed that ginkgo seeds undergo a dormancy process after maturity (Singh et al., 2008), although this is controversial (Feng et al., 2018). The immature embryo and endosperm of Ginkgo seeds gradually mature during the release of physiological dormancy until germination. Therefore, the synthesis and metabolism of terpenoids during seed development are worthy of attention.

High-throughput sequencing based on the Illumina platform is an effective method for gene annotation and expression quantification (Wang et al., 2016), while the read produced by next-generation sequencing (NGS) technology is relatively short with only 100–150 bp, and usually cannot span the entire transcript, reducing the accuracy of sequence assembly. Single-molecule real-time sequencing technology (SMRT) is a recently developed third-generation sequencing (TGS) technology, which has the characteristics of longer reads and the detection of base modifications. The average length of SMRT reads reaches 10−15 kb, which can meet the requirements for obtaining full-length transcripts (Roberts et al., 2013). This method can be used to accurately detect transcript structure, to analyze transcript homology, and to identify alternative splicing events, variable polyadenylation events, and fusion genes. A variety of plants have been studied using SMRT such as sorghum (Sorghum bicolor L., Abdel-Ghany et al., 2016), wheat (Triticum aestivum L., Dong et al., 2015), Salvia miltiorrhiza Bunge (Xu et al., 2015), Ananas comosus var. bracteatus (Ma et al., 2019), and ornamental crabapple (Malus spp., Huang et al., 2020). Illumina sequencing is often used to correct full-length transcripts due to the high error rate of SMRT sequencing. The strategy of combining SMRT and NGS makes the transcripts obtained in the experiment more accurate and enables the quantification of expression through NGS data (Koren et al., 2012; Chen et al., 2019).

In this study, the buds, leaves, and seeds (kernels) of G. biloba were sequenced using SMRT and NGS. A full-length transcript dataset was obtained for the development of ginkgo seeds. At the same time, the structure and function of the novel genes and isoforms were annotated, which expanded the annotation information of genes related to the seed development of ginkgo. It was aimed to lay a foundation for the further molecular research of ginkgo seed development through this work. In addition, the DEGs in terpenoid-related pathways were enriched and analyzed, which improved the understanding of the anabolic activities of terpenoids in ginkgo seed development.

Section snippets

Plant materials

Ginkgo grows on the campus of Nanjing Forestry University (32°4′43.45″ N, 118°48′52.01″ E), located in Nanjing, Jiangsu Province, China. Ginkgo seeds collected in June (early development period), October (fruit ripening, fertilization process completed), and January of the following year (after stratification treatment, physiological post-ripening completed) were selected as the research materials for differential analysis related to seed development. Based on the above plant tissues, male

PacBio sequencing and error correction

A total of 252,049 circular consensus sequences (CCS) were sequenced from the mixed library, including 70.84 % (178,548) of the FLNC sequences with an average length of 1311 bp (Table 1). The 12 samples used in RNA-seq sequencing produced more than 12,100 million clean reads, with an average of 10 Gb per sample. RNA-seq sequencing data were filtered for quality control, which revealed that the base content distribution, base sequencing quality, and GC content distribution were all normal. The

PacBio sequencing enriched the genomic annotations of G. Biloba

G. biloba is one of the five major gymnosperms. It has no close relatives and has a unique evolutionary status. Fossil evidence shows that since the age of dinosaurs, modern ginkgo has hardly changed, and are therefore important tree species for the study of plant evolution (Zhou and Zheng, 2003). G. biloba has become the focus of scientific research because of its high utilization, and its medicinal, ornamental, and edible values. Compared with the huge genome size of G. biloba, the existing

Conclusions

In this study, PacBio SMRT and Illumina RNA sequencing were used to sequence the full-length transcriptome of ginkgo flowers, leaves, and seeds at different developmental stages. Some novel ginkgo isoforms together with according lncRNAs were identified, which were confirmed to be related to seed development and terpenoid biosynthesis through functional annotation. Furthermore, the DEGs at the developmental stages of seeds were analyzed, and it was found that in ginkgo seeds, synthase family

Funding

This research was funded by the National Natural Science Foundation of China (31971689), the Guangdong Basic and Applied Basic Research Foundation (2019A1515111150), and the China Postdoctoral Science Foundation (2015T80557).

Author contributions

L.X. and M.X. conceived and designed the project; X.H. and Y.X. undertook the molecular biology experiment; X.H. and B.H. participated in the data analysis; X.H. drafted the manuscript; L.X., B.H., and M.X. modified the manuscript. All authors have read and approved the manuscript for publication.

CRediT authorship contribution statement

Xin Han: Conceptualization, Investigation, Writing - original draft, Visualization. Bing He: Validation, Formal analysis, Writing - review & editing. Yue Xin: Resources, Data curation. Meng Xu: Conceptualization, Methodology. Li-an Xu: Conceptualization, Validation, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no competing interests.

References (47)

  • S.E. Abdel-Ghany et al.

    A survey of the sorghum transcriptome using single-molecule long reads

    Nature Commun.

    (2016)
  • D. An et al.

    Isoform Sequencing and. State-of-Art Applications for Unravelling Complexity of Plant Transcriptomes

    Genes (Basel).

    (2018)
  • B. Buchfink et al.

    Sensitive protein alignments at tree-of-life scale using DIAMOND

    Nat. Methods

    (2021)
  • A. Cao et al.

    Bilobalide protects H9c2 cell from oxygen-glucose-deprivation-caused damage through upregulation of miR-27a

    Artif. Cells Nanomed. Biotechnol.

    (2019)
  • Q. Chao et al.

    The developmental dynamics of the Populus stem transcriptome

    Plant Biotechnol. J.

    (2019)
  • S. Chen et al.

    Fastp: an ultra-fast all-in-one FASTQ preprocessor

    Bioinformatics

    (2018)
  • Z. Chen et al.

    Transcriptome analysis based on a combination of sequencing platforms provides insights into leaf pigmentation in Acer rubrum

    BMC Plant Biol.

    (2019)
  • A. Conesa et al.

    Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research

    Bioinformatics

    (2005)
  • L. Dong et al.

    Single-molecule real-time transcript sequencing facilitates common wheat genome annotation and grain transcriptome research

    BMC Genomics

    (2015)
  • J. Feng et al.

    Embryo Development, Seed Germination, and the Kind of Dormancy ofGinkgo biloba L

    Forests

    (2018)
  • R.D. Finn et al.

    HMMER web server: interactive sequence similarity searching

    Nucleic Acids Res.

    (2011)
  • T. Geng et al.

    Research development of ginkgo terpene lactones

    Zhongguo Zhong Yao Za Zhi

    (2018)
  • R. Guan et al.

    Draft genome of the living fossil Ginkgo biloba

    Gigascience

    (2016)
  • Cited by (0)

    1

    These authors contributed equally.

    View full text