Introduction

The modification of metabolism by biotechnological techniques is often utilized for the optimal production of plant metabolites, which directly benefit human health and plant growth. For example, “golden rice” is a transgenic line of Oryza sativa that was genetically engineered to biosynthesize β-carotene, a pro-vitamin A, in the edible parts of rice (Ye et al. 2000; Yonekura-Sakakibara and Saito 2009). The introduction of transcription factors from the snapdragon to tomato increased the production of anthocyanins, which have health protective properties (Butelli et al. 2008). However, many similar approaches do not necessarily lead to the expected results, e.g., overexpression of foreign S-linalool synthase in transgenic petunia did not result in the expected accumulation of free linalool, but led to the accumulation of S-linalyl-β-d-glucoside (Lucker et al. 2001). These unexpected results suggest that highly complex regulatory systems control plant metabolism and also indicate the need for more precise information on plant metabolism. In this context, metabolomics plays a key role in the field of molecular biotechnology, where plant cells are modified by the expression of engineered genes. Metabolomic analysis provides us with in-depth information on cellular metabolism via a snapshot of the metabolome, often combined with data from other “omics” (Oksman-Caldentey and Saito 2005; Saito and Matsuda 2010).

Metabolomics is one of the omics approaches that can be used to acquire comprehensive information on metabolites. It aims to grasp the global state of metabolism in measured samples. Among the omics studies used in plant sciences, genomics was the first to emerge, and uncovered the genome sequences of several organisms, including Arabidopsis (Arabidopsis Genome Initiative 2000) and rice (Goff et al. 2002; Sasaki et al. 2002; Yu et al. 2002; International Rice Genome Sequencing Project 2005). There should be no doubt that the development of the automated DNA sequencer has led to the current progress of genomics. Other omics studies have also been developed as a result of technical innovations. Microarrays made high throughput analysis of mRNA expression feasible and led to the emergence of transcriptomics. Two-dimensional electrophoresis and mass spectrometry (MS) significantly contributed to the development of proteomics. Similarly, MS and nuclear magnetic resonance (NMR) spectroscopy have facilitated metabolomic studies. However, metabolomics is not as advanced as the other omics because there is a critical difference between metabolites and other molecules, i.e., DNA, RNA, and proteins are linear polymers consisting of a limited numbers of monomers, and the interpretation of RNA and protein sequences can be facilitated by genome information according to the central dogma of molecular biology, whereas metabolites comprise a more heterogeneous group than these polymers in terms of their physical and chemical properties, varying widely with respect to size, polarity, quantity, and stability. In addition, there are an estimated 200,000 plant metabolites (Fiehn 2002b; Dixon and Strack 2003), and many of these metabolites remain unknown. Thus, no single method has yet been developed for plant metabolomics, and researchers who want to acquire comprehensive metabolome information have to employ several methodologies according to the chemical properties of the metabolites. In spite of these difficulties, metabolomics, metabolic profiling, and metabolic fingerprinting have been employed in many biological studies. These techniques have been applied to the functional identification of unknown genes through the metabolic profiling of plants in which some genes are up- or down-regulated (Bino et al. 2004; Hirai et al. 2005; Oksman-Caldentey and Saito 2005; Tohge et al. 2005; Watanabe et al. 2008; Yonekura-Sakakibara et al. 2008; Okazaki et al. 2009; Matsuda et al. 2010), the discovery of biomarkers associated with disease phenotypes (Soga et al. 2006; Sreekumar et al. 2009), the safety assessment of genetically modified organisms (GMOs) (Baker et al. 2006; Beale et al. 2009; Kusano et al. 2011a), and the discovery of compounds involved in plant resistance to biotic and abiotic stresses (Leiss et al. 2009; Ward et al. 2010b; Kusano et al. 2011c). When combined with genomics, transcriptomics, and/or proteomics, metabolomics can also help to interpret and understand many complex biological processes; indeed, metabolomics is now widely recognized as a cornerstone of systems biology (Quackenbush 2002; Hall 2006; Saito and Matsuda 2010). In this review, we introduce the basic analytical protocols for plant metabolomics and bioinformatics and the practical application of metabolomics to the biological study of plants.

Analytical technologies for the plant metabolome

Mass spectrometry

MS is the most frequently used technique in metabolic studies. MS provides mass-to-charge ratio information, which enables the structure of metabolites to be determined. The main advantage of MS is its high sensitivity. In addition, the combination of chromatographic separation with MS increases the number of compounds that can be detected by reducing the complexity of the mass spectra and the matrix effect. There are several chromatographic techniques that can be combined with MS.

Gas chromatography (GC)–MS is used for metabolite profiling. Capillary GC uses a carrier gas to move analytes through a coated, fused silica capillary. GC–MS requires the analyte to be vaporized in order for its migration through the capillary; therefore, analytes must be volatile or amenable to chemical derivatization to render them volatile. Certain types of samples (terpenoids and essential oils) are particularly well suited for GC–MS analysis. With appropriate derivatization, more polar metabolites or metabolites with highly polar functional groups can also be analyzed, e.g., amino acids, sugars, organic acids, fatty acids, and amines. This indicates that many compounds associated with primary metabolism can be analyzed by GC–MS. Electron ionization (EI) is the most commonly used GC–MS ionization technique; it is robust, highly reproducible, and considered to be less affected by the matrix effect, e.g., ion suppression. In addition, EI generates informative and characteristic mass spectra due to the relatively high degree of fragmentation reactions, which are useful for compound identification. However, molecular ions, which effectively reduce the number of candidate structures or elemental compositions, are often undetected, suggesting that GC–EI–MS can be used for the targeted analysis of known primary metabolites. Mass spectral libraries of EI–MS are commercially and non-commercially available, and these can be routinely used for peak annotation. To date, GC–MS-based metabolic profiling has been frequently used in many metabolomic studies of plants, animals, and microorganisms (Fiehn et al. 2000; Roessner et al. 2001; Fernie et al. 2004; Shellie et al. 2005; Keurentjes et al. 2006; Schauer et al. 2006, 2008; Kusano et al. 2007, 2011a, c; Ralston-Hooper et al. 2008).

Liquid chromatography (LC)–MS is also used for metabolomic studies. The major ionization method for LC–MS is atmospheric ionization (API), which includes electrospray ionization (ESI) and atmospheric chemical ionization (APCI) (Codrea et al. 2007; Dunn 2008). These methods do not require the compound to be volatile, and are thus suited for compounds with heat-labile functional groups, chemically unstable substructures, high vapor points, and high molecular weights. In plant metabolomics, LC–MS is frequently used to profile secondary metabolites (e.g., phenylpropanoids, alkaloids, and saponins) and complex lipids (e.g., glycerolipids, and sphingolipids), often with a reversed-phase column (e.g., C18) for separation (De Vos et al. 2007; Naoumkina et al. 2007; Bottcher et al. 2008; Farag et al. 2008; Matsuda et al. 2009, 2010; Okazaki et al. 2011). Many of these compounds are not amenable to volatilization by GC. Secondary metabolites are not directly involved in the normal growth or development of plants, but they have important physiological roles, e.g., as toxic compounds for predators and attractors for pollinators. In addition, they have been historically used as pharmaceutical compounds and food additives for flavor, indicating the special importance of LC–MS-based analysis in plant metabolomics. In contrast to EI–MS, API–MS usually affords [M+H]+ and [M−H] as the main signals, which are useful for reducing the candidate structure of the detected compounds. By using collision-induced dissociation of tandem MS, i.e., MS/MS, more informative mass spectra with many fragments are obtained. Although various compounds can be analyzed by LC–MS, peak annotation is still troublesome due to the shortage of mass spectral libraries for API–MS and commercially available authentic compounds. Much effort is required for the more comprehensive identification of metabolites by the production of additional reference compounds (Nakabayashi et al. 2009, 2010) and the generation of more comprehensive databases (Kind and Fiehn 2006; Moco et al. 2006; Shinbo et al. 2006; Bocker and Rasche 2008; Horai et al. 2010). In addition to untargeted metabolomics, widely targeted metabolomic studies use a triple quadrupole mass spectrometer equipped with LC (Sawada et al. 2009; Albinsky et al. 2010b). In a targeted metabolomics strategy, predefined metabolite-specific signals (by selected reaction monitoring with tandem MS or selected ion monitoring) are often used to determine precisely and accurately the relative abundances and concentrations of a limited number of known and expected endogenous metabolites (Griffiths et al. 2010).

Capillary electrophoresis (CE)–MS is utilized to analyze a wide spectrum of ionic metabolites. Similar to LC–MS, API is the most suitable ionization method for CE–MS. In capillary zone electrophoresis, constituent ions migrate on the basis of their electrostatic force, which results from the charge and size of ions, in addition to the electro-osmotic flow derived from the capillary and the type of electrolyte used. Ionic compounds are separated at a high resolution in a narrow capillary. In metabolomic analyses using CE–MS, samples are often divided for cation and anion analyses. For cation analysis, Soga’s method, using formic acid as an electrolyte, is experimentally convenient and provides excellent separation of metabolites with good reproducibility (Soga et al. 2006). In contrast, the routine analysis of anions at a high resolution is more difficult than for cations, although various approaches, including the use of coated capillaries, have been assessed (Soga et al. 2002; Harada et al. 2006). Recently, a platinum ESI electrospray needle has been developed that significantly improved the analysis of anions by CE–MS (Soga et al. 2009). Target ionic metabolites in CE–MS analyses include amino acids, organic acids, nucleotides, and sugar phosphates. The compounds detectable by CE–MS are rather similar to those detected by GC–MS, but CE–MS can analyze these compounds without derivatization. In addition, similar to LC–MS, molecular-related ions are also detectable with CE–MS. Since the metabolites suited for CE–MS analysis are physiologically important and common to all organisms, CE–MS has been used in a variety of metabolomic studies (Monton and Soga 2007; Oikawa et al. 2008; Sato et al. 2008; Ramautar et al. 2009; Urano et al. 2009; Ishikawa et al. 2010), including the identification of biomarkers for the progression of prostate cancer (Sreekumar et al. 2009), oxidative stress (Soga et al. 2006), the measurement of internal body time (Minami et al. 2009), and the functional analysis of unknown genes in Arabidopsis (Ohkama-Ohtsu et al. 2008, 2011; Watanabe et al. 2008).

As described above, many of the analytes in metabolomic studies are still unknown, and even the known compounds are difficult to obtain from commercial sources. Under such unfavorable circumstances, the accurate m/z values with ultra-high resolution generated by Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS) are very important. FT-ICR MS is of value in metabolomic studies for the following two reasons. Firstly, different compounds with very similar molecular masses can be detected separately, even by their direct infusion without any chromatographic steps, and this can result in the rapid detection of a number of metabolites. Secondly, an accurate estimate of the chemical formula of the detected peaks can be acquired, which can facilitate annotation procedures for unknown compounds. MS/MS analysis coupled with FT-ICR MS is also helpful to estimate the chemical formula because fragment ions are also detected with high resolution and accuracy. In fact, several metabolites that accumulated in Arabidopsis following treatment with herbicides were identified by MS/MS analyses of FT-ICR MS (Oikawa et al. 2006). However, because of difficulties in hardware handling and the extremely large amount of data generated, the number of reports on metabolomic studies with FT-ICR MS is rather limited compared with other types of MS. The non-targeted metabolite analysis of strawberry fruit was the first published metabolomic study using FT-ICR MS (Aharoni et al. 2002). This technique has also been applied to Arabidopsis functional genomics (Hirai et al. 2004, 2005; Tohge et al. 2005), transgenic tobacco (Mungur et al. 2005), and the identification of metabolic biomarkers in Crohn’s disease (Jansson et al. 2009). Hyphenation of FT-ICR MS and LC has increased the range of applications for FT-ICR MS in metabolomic studies of tomato (Iijima et al. 2008) and Lotus japonicas (Suzuki et al. 2008), although the real power of this MS approach still seems to be difficult to demonstrate due to the shortage of informatics technology. A recent report described a fine structural elucidation strategy with FT-ICR MS by using all of the isotopic peaks derived from 13C-, 15N-, 34S-, and 18O-substituted forms recorded under high magnetic field FT-ICR MS (Miura et al. 2010). In addition, the combination of FT-ICR MS and the labeling of metabolites with stable isotopes (e.g., 13C and 15N) has been introduced to elevate the quality of plant metabolite annotation (Hegeman et al. 2007; Giavalisco et al. 2008, 2009). According to these reports, the comparison of MS data from 12C/13C labeling experiments enabled the removal of background noise peaks and the positive identification of compounds with a true biological origin, thus eliminating the ambiguity in chemical formula assignment and resulting in the clear association of one measured mass to one chemical formula.

Since the metabolome consists of a vast array of compounds, most current metabolomic analyses using a single analytical protocol can only detect a fraction of the metabolites in a complex biological sample; thus, the combination of multiple MS systems, considering the scope of each MS system, can reveal the overall metabolic status of a sample. For example, we established a multi-MS platform for metabolomic analysis that consisted of several MS techniques (GC–MS, CE–MS, and LC–MS). This platform has been applied to the functional identification of an unknown gene (Watanabe et al. 2008) and the evaluation of a genetically modified tomato (Kusano et al. 2011a), as described later. Of course, such a multi-MS system provides a large amount of metabolite information, but the analysis of this data depends on the availability of well-structured databases that describe the assayed metabolites. Chemical identifiers are notoriously incoherent and encompass a wide range of different referencing schemes with varying scope and coverage, since chemical databases use multiple types of identifiers in parallel, but lack a common primary key for reliable database consolidation. Thus, connecting identifiers of analytes found in experimental data with the identifiers of their parent metabolites in public databases can, therefore, be very important. Recently, a tool for consolidating metabolite identifiers was developed to enable contextual and multi-platform metabolomic data analysis (Redestig et al. 2010), and we utilized this tool in practice to prepare a single consensus dataset for our multi-MS platform.

Nuclear magnetic resonance spectroscopy

NMR spectroscopy is another technology that is frequently employed in metabolomic studies since it is able to determine the atomic state of compounds and enables the identification of metabolites that are otherwise unidentifiable by MS analysis (Ward et al. 2007, 2010a; Beale et al. 2009; Kim et al. 2011). NMR spectroscopy can yield detailed information on the quantities and identities of the metabolites present in extracts or in vivo (Kikuchi et al. 2004; Lindon et al. 2004; Wang et al. 2004; Krishnan et al. 2005; Clayton et al. 2006; Ratcliffe and Shachar-Hill 2006; Sekiyama and Kikuchi 2007; Tian et al. 2007; Hagel et al. 2008; Sekiyama et al. 2010). Metabolome analysis by NMR usually does not require any pretreatment including column chromatography and derivatization, which makes standardization straightforward and points to the great potential of NMR spectroscopy in metabolomics, although attentions should be paid to pH of samples because chemical shifts could be changed depending on pH. For metabolomic analysis by NMR, buffered solutions are frequently utilized to stabilize the pH of sample solutions. Among the advantages of NMR over MS-based methods are that it is non-destructive, non-biased, highly quantitative, and enables the identification of complex unidentified metabolites. Even compounds which are bound to insoluble polymers can also be analyzed by high-resolution solid state NMR, which can afford direct and intact structural information. The major disadvantage of NMR, relative to MS, is its low sensitivity. Signal overlap from many similar molecules in biological samples is a major problem that inhibits the accurate assignment of NMR signals. However, these disadvantages, such as its lack of sensitivity and resolution, are gradually being overcome by the development of cryogenic probes (Kovacs et al. 2005), higher-strength superconducting magnets, miniaturized radiofrequency coils (Van and Veenstra 2009), and multidimensional NMR techniques (Kikuchi et al. 2004; Sekiyama and Kikuchi 2007; Chikayama et al. 2010). Dynamic nuclear polarization (DNP) enhanced NMR technique, developed by a combination of NMR and electron paramagnetic resonance (EPR) (Frydman and Blazina 2007), may provide a new approach to acquire hyper-sensitive NMR spectra for metabolomics. Labeling the samples with stables isotopes, e.g., 13C and 15N. is also a useful technique that selectively enhances the sensitivity of NMR and improves signal sharpness (Kikuchi and Hirayama 2007; Chikayama et al. 2010). Labeling of metabolites with isotopes receptive by NMR is also useful for metabolic flux analysis and fluxomics by tracking the selective signal enhancement of isotopologues (Mesnard and Ratcliffe 2005; Eisenreich and Bacher 2007; Sekiyama and Kikuchi 2007; Coquin et al. 2008; Bothwell and Griffin 2011; Fan and Lane 2011; Moseley et al. 2011; Ward et al. 2011). These suggest the importance of the complementary use of MS and NMR in metabolomics.

Informatics for metabolomic studies

Extremely large amounts of data are generated by instrument analysis, particularly for the high-performance instruments frequently used for metabolome analysis that can detect tiny signals with high resolution. To handle the large amount of datasets generated and comprehend the metabolome data, automated software is needed that can identify peaks from raw MS or NMR data, align the peaks among samples, and identify and quantify each metabolite; therefore, informatics is an essential tool for processing large metabolomic datasets (Fukushima et al. 2009b; Tohge and Fernie 2009, 2010; Go 2010). For GC–MS data, an automated spectral deconvolution and identification system using the NIST mass spectral library is available (Ausloos et al. 1999). To process raw MS data acquired by LC–MS or CE–MS, several open source programs are available, including MetAlign (Vorst et al. 2005), MZmine2 (Katajamaa et al. 2006), XCMS (Smith et al. 2006; Benton et al. 2008), MET-IDEA (Broeckling et al. 2006), and Mass++ (http://groups.google.com/group/massplusplus). To analyze FT-ICR MS datasets, DrDmassPlus (Oikawa et al. 2006) can be used for peak picking, peak alignment, and some statistical analysis. These programs facilitate comprehensive data analysis for non-targeted metabolomic approaches.

As with any omics discipline, metabolite annotation and identification is highly dependent on the availability and quality of electronic databases. To annotate the peaks processed by the programs described above, several mass spectral databases can be used, including MassBank (Horai et al. 2010), METLIN (Smith et al. 2005), the Golm Metabolite Database (Kopka et al. 2005), FiehnLib (Kind et al. 2009), and Lipid Search (Taguchi et al. 2007). The Human Metabolome Database (Wishart et al. 2009) and the Madison Metabolomics Consortium Database (Markley et al. 2007; Cui et al. 2008) provide mass spectral and NMR data for human metabolome studies; they are also useful for annotating signals derived from primary metabolites in plants. Recently, “ReSpect,” a depository of API–MSn data from phytochemicals drawn from the literature, has been developed (http://spectra.psc.riken.jp/) as one of the web applications of PRIMe, a platform of RIKEN Metabolomics (http://prime.psc.riken.jp/; (Akiyama et al. 2008)). Users can access MSn information at this website, and they can freely download the manually refined MSn data. Some of these databases are integrated with software for processing raw MS data, e.g., METLIN and XCMS, and MassBank and Mass++. In addition to these spectral databases, several compound databases, including KNApSAcK (Shinbo et al. 2006), Kyoto Encyclopedia of Genes and Genomes (KEGG) (Kanehisa et al. 2008), PubChem (Wheeler et al. 2008), LipidBank (Yasugi and Watanabe 2002), and LIPIDMAPS (Fahy et al. 2007), are readily accessible. The information on compound name, chemical formula, and molecular weight that is deposited in these databases is also helpful for peak identification when authentic compounds and standard mass and NMR spectra are not available. Several databases can also be used that focus mainly on the mass spectra or metabolite information from several plant species. In MS2T (Matsuda et al. 2009), LC–MS data acquired from various plant species, including Arabidopsis, rice, wheat, and several non-model plant species, e.g., pear, are deposited. KOMICS (Iijima et al. 2008) and MotoDB (Moco et al. 2006) provide information for the ions that are detected in tomato, while ARMeC (http://www.armec.org/MetaboliteLibrary/) is a database of metabolites from Arabidopsis and potato. These data can be used to validate metabolome data among various laboratory, and as a reference source for many researchers who have to set up metabolome analytical platforms.

To comprehend the biochemical process involved in metabolism from metabolome data, metabolic pathway databases are very useful. There are several pathway databases available for plant metabolomics, including PlantCyc (http://www.plantcyc.org/), MapMan (Thimm et al. 2004), KaPPA-View (Tokimatsu et al. 2005; Sakurai et al. 2011), Arabidopsis Reactome (http://www.arabidopsisreactome.org/about.html), and KEGG (Kanehisa et al. 2008). KEGG PLANT (http://www.kegg.jp/kegg/plant/) is a new interface of KEGG that focuses on phytochemicals, especially secondary metabolites. Some of these databases import the abundance information of metabolites (and transcriptome information), and display an integrated overview of the metabolic state by reflecting the abundance of each metabolite on the putative metabolic pathway in plants. These databases also include information on the enzymatic reactions underlying these metabolic pathways and information on the proteins or genes involved in each reaction step. Although these pathway databases are helpful, it is necessary to bear in mind that many metabolic pathways in plants are divided by subcellular and intercellular compartmentalization and are highly redundant when attempting to understand the metabolic state of plants from metabolome data.

The deposition of metabolomic data in publicly available databases is also an important issue that needs to be considered. These databases enable data sharing with researchers in different laboratories and facilitate the meta-analysis of the original data. These depositories include AtMetExpress (http://prime.psc.riken.jp/lcms/AtMetExpress/) (Matsuda et al. 2010), PlantMetabolomics.org (http://plantmetabolomics.vrac.iastate.edu/) (Bais et al. 2010), and MetabolomeExpress (https://www.metabolome-express.org/) (Carroll et al. 2010).

In metabolomic studies, statistical analysis is often employed to evaluate significant differences in the metabolites detected in different samples. The methodology typically employs multivariate analysis to statistically process the massive amount of analytical chemistry data generated by high-throughput and simultaneous metabolite analyses (Fiehn 2002a; Fukusaki and Kobayashi 2005; Hall 2006; Okada et al. 2010). Various statistical methods used in conventional genetic studies are applicable to metabolomic data by considering the amount of each metabolite as a trait value. Principal component analysis (PCA), a multivariate analysis method, is commonly used in metabolomic studies. The PCA model can provide an overview of all observations or samples in a data table by projecting and clustering each sample and highlighting any holistic differences in the complex metabolic state of each sample. Many reports have described the application of PCA to metabolomic data (Catchpole et al. 2005; Takahashi et al. 2005; Tarpley et al. 2005; Tohge et al. 2005; Baker et al. 2006; Dixon et al. 2006; Oikawa et al. 2006; Kim et al. 2007; Kusano et al. 2007; Moco et al. 2007; Ku et al. 2010b). In addition, several statistical analytical methods have been used to analyze metabolomic datasets, e.g., hierarchical cluster analysis (HCA) (Grata et al. 2007; Parveen et al. 2007), partial least squares discriminant analysis (PLS-DA) (Jonsson et al. 2004; Kusano et al. 2007; Ku et al. 2010a), and batch-learning self-organizing map analysis (BL-SOM) (Hirai et al. 2004, 2005; Matsuda et al. 2010). Depending on the objective of each study, the most appropriate statistical analytical method should be exploited to evaluate the available metabolomic data.

Biotechnological applications

There have been many biological applications of plant metabolomics. In this section, we introduce example case studies, some of which are summarized in Table 1.

Table 1 Application of metabolomics in plant biological studies

Application of metabolomics to functional genomics

The functional characterization of gene function on the genome scale is one of the most important tasks in the post-genomic era. In modern Arabidopsis research, loss-of-function or gain-of-function mutant lines play an important role in deciphering gene function. A combination of the metabolomic approach with these bioresources has been demonstrated to be an effective strategy to uncover the role of genes with unknown functions. Using GC–MS and CE–MS, Watanabe et al. (2008) obtained a metabolome dataset from T-DNA insertion mutants of Arabidopsis with immature β-substituted alanine synthase (bsas) genes and its corresponding wild-type. Statistical analysis revealed that one unknown metabolite was found in the wild-type plant, but did not accumulate in one of the bsas mutants, bsas3;1. This compound was eventually identified as γ-glutamyl-β-cyanoalanine, and the bsas3;1 gene was shown to be involved in the biosynthesis of the precursor of this dipeptide. Metabolomics is also effective in high-throughput mutant screening. High-throughput GC–MS analysis of Arabidopsis gain-of-function mutant lines overexpressing full-length cDNAs of Arabidopsis led to the finding of a new transcriptional factor regulating nitrogen metabolism (Albinsky et al. 2010a). The functional characterization of unknown genes using a metabolomics approach can also be highly accelerated by combinatory use of other omics, such as transcriptomics. Several groups have used transcriptome co-expression analysis with metabolome data to find new metabolic genes using a limited number of known genes on the basis of the assumption that genes involved in the same metabolic pathway are co-expressed by a shared regulatory mechanism (Saito et al. 2008). This strategy, using the “gene-to-metabolite” correlation, successfully revealed the function of genes involved in plant primary metabolism (Okazaki et al. 2009; Kusano et al. 2011b), secondary metabolism (Hirai et al. 2005; Tohge et al. 2005; Hirai et al. 2007; Yonekura-Sakakibara et al. 2007; Farag et al. 2008, 2009; Yonekura-Sakakibara et al. 2008; Kuzina et al. 2009; Matsuda et al. 2010), and the circadian clock (Fukushima et al. 2009a).

Application of metabolomics to quantitative genetics

Natural variation provides a valuable resource to study the genetic regulation of quantitative traits. Since metabolite levels in plant tissues (m-trait) is also a quantitative trait, quantitative trait loci (QTL) analysis of m-traits is useful to enhance our understanding of the genetic architecture underlying naturally variable phenotypes with respect to metabolism. For example, QTL analysis of the levels of vitamin E in Arabidopsis seeds revealed the several QTLs responsible for the control of levels of this compound (Gilliland et al. 2006). QTL analysis of m-traits of Arabidopsis also revealed several loci regulating the levels of secondary metabolites flavonoids and glucosinolates (Keurentjes et al. 2006; Chan et al. 2010). Recently, metabolome QTL (mQTL) analysis has made a comprehensive understanding of the genetic background of m-traits possible (Keurentjes et al. 2006; Schauer et al. 2006, 2008; Wentzell et al. 2007; Lisec et al. 2008; Rowe et al. 2008; Kliebenstein 2009; Chan et al. 2010). These reports revealed that the mQTLs are unevenly distributed throughout the genome, and there are several mQTL hot-spot regions (Keurentjes et al. 2006; Lisec et al. 2008; Rowe et al. 2008), suggesting the possibility that overall metabolic state of plant cells could be controlled by modification of small genomic regions. The relationship between m-traits and other important biological traits, such as yield, taste and biomass, has been focused on since these traits are likely to closely interact with plant metabolism (Schauer et al. 2006; Meyer et al. 2007; Lisec et al. 2008). The analysis of tomato fruit traits with metabolome data indicated that there are weak correlations among these traits (Schauer et al. 2006, 2008). Regression analysis of metabolome data for Arabidopsis biomass traits demonstrated that the growth rate of Arabidopsis seedlings is, to some extent, predictable from the metabolome signature (Meyer et al. 2007). In addition, combinatory analyses of mQTL and naturally variable loci that altered circadian clock output suggested possible links between clock function and metabolism in Arabidopsis (Kerwin et al. 2011). These pioneering studies suggest that the interaction between metabolite composition and other traits may be predictable through future advanced metabolome analysis. The use of natural variation and mapping populations are also useful to discover new metabolic genes because transformation is still highly difficult in many plant species, except Arabidopsis (Morreel et al. 2006; Schilmiller et al. 2010). The accumulation of metabolome data from these lines may be useful for crop improvement through metabolomics-assisted breeding (Fernie and Schauer 2009).

Evaluation of the substantial equivalence of genetically modified organisms

To improve crop properties at the genetic level, two different strategies can be employed: traditional breeding and genetic modification (GM); however, both strategies can generate “unintended phenotypic effects” in metabolome. The public often feel insecure about the use of these strategies, particularly with GMOs, although it is not that surprising considering that we currently cannot target transgenes to specific genomic sites of the host plant, and that little is known about the mechanisms that determine the sites at which integration occurs. In addition, we do not fully understand the function of the non-coding regions of the genome. There are an increasing number of GM crops, such as barley, maize, pea, rice, soybean, and wheat (Ricroch et al. 2011); thus, we have to provide a suitable protocol to evaluate the safety of these GMOs.

A major principle and guiding tool for the food safety assessment of GMOs is the concept of “substantial equivalence” (SE), according to the principles outlined in the Organization for Economic Cooperation and Development (OECD) consensus documents (OECD 2006), and further elaborated by the Food and Agriculture Organization of the United Nations/World Health Organization. The internationally accepted SE framework proposes that risk assessment of GMOs should begin by comparing them with traditional varieties. The goal of SE evaluation is not to draw a conclusion about the novel organism’s safety status because that would require the testing of all compounds, which is impossible; instead, by examining a broad set of traits, it aims to obtain a picture of the magnitude of changes to use as a screen for possible problematic changes and a starting point for further investigations. Since metabolome analysis can provide a vast amount of metabolite information, several metabolomic studies have performed an SE evaluation of GM tomato (Le Gall et al. 2003), potato (Catchpole et al. 2005), and wheat (Baker et al. 2006). Recently, we analyzed and evaluated the SE of a transgenic tomato expressing a foreign gene coding for taste-modifying miraculin using a multiple MS-based metabolomics platform (Kusano et al. 2011a). The chosen multiple-MS-based metabolomics platform (GC–MS, CE–MS, and LC–MS) detected compounds that represent 86% of the estimated chemical diversity of the metabolites listed in the LycoCyc database, and showed that >92% had an acceptable range of variation, while simultaneously indicating a reproducible transformation-related metabolic signature.

Application of metabolomics to the investigation of adaptive responses against stresses

Plants possess defense mechanisms against various stresses so they can adapt to unfavorable surroundings. Since the synthesis of specific metabolites is one of the most well-known defense reactions, the investigation of defense reactions with a metabolomics approach can help the holistic understanding of the strategies of plants against stresses. Sunlight is indispensable to all plants, except for several parasitic species, but ultra-violet (UV) light is a harmful element causing the oxidation of DNA and cellular components by the production of free radicals. Metabolome analysis of the UV-B response of wild-type Arabidopsis and mutants for flavonoid or sinapoyl malate revealed that the early metabolic responses only occurred at the level of primary metabolites, suggesting that these effectively prime the cell to facilitate the later production of UV-B-absorbing secondary metabolites, such as flavonoid and sinapoyl malate (Kusano et al. 2011c). In addition to abiotic stress, biotic stress is also a serious problem for plants. Ward and coworkers investigated the metabolic changes in Arabidopsis associated with the establishment of disease, and demonstrated clear differences in the metabolome of Arabidopsis leaves infected with virulent Pseudomonas syringae compared to uninfected leaves (Ward et al. 2010b). In addition to the confirmation of changes in phenolic and indolic compounds, they identified rapid alterations in the abundance of amino acids and other nitrogenous compounds, specific classes of glucosinolates, disaccharides, and molecules that influence the prevalence of reactive oxygen species. These reports suggest that comprehensive reprogramming of metabolic pathways is involved in the interaction of plants with stresses. Recently, many studies have examined the overall metabolic changes associated with biotic or abiotic stresses using metabolome analysis (Urano et al. 2009; Widodo et al. 2009; Chaouch et al. 2010; Consonni et al. 2010; Mhamdi et al. 2010; Simon et al. 2010).

The discovery of biomarkers is also important to understand the defense reaction of plants, and it could help to determine the disease condition at the metabolite level, leading to the development of new drugs, early disease screening, and improved pest control. Biomarker discovery by metabolomics has been actively performed in the field of human metabolomics (Soga et al. 2006; Jansson et al. 2009; Koulman et al. 2009; Sreekumar et al. 2009). In such kind of studies, the unambiguous identification or annotation of most of the metabolites is not necessarily required if the discovered biomarkers are truly useful for early disease screening. Thus, the power of the technologies used in metabolomics may directly play an effective role in this field since we still expend considerable time on processing the information derived from many unknown metabolites obtained by metabolome analysis.