RMet: An automated R based software for analyzing GC-MS and GC×GC-MS untargeted metabolomic data
Graphical abstract
Introduction
Metabolomics is the comprehensive study of all metabolites in a cell, tissue, or an organism in order to produce a metabolic snapshot of a biological system [1]. Metabolomic samples are mostly of high complexity due to the presence of numerous metabolites with specific physicochemical properties in the metabolome. This complexity is illustrated by the number of metabolites and phytochemicals in the plant kingdom which is estimated to be greater than 200000 [2]. As a result, measurement of all metabolites in such complex sample matrices requires the use of multiple sophisticated analytical instruments and remains an analytical challenge. Among different analytical platforms, gas chromatography-mass spectrometry (GC-MS) is the more frequently used technique for separation and identification of metabolites. However, the complexity present in most of the biological samples pushes this technique to its limits. The comprehensive two-dimensional gas chromatography-mass spectrometry (GC×GC-MS) is a great solution to overcome this challenge [3]. The GC×GC instrument separates the greatest number of metabolites with excellent sensitivity and, when combined with a fast mass spectrometry detector such as time-of-flight (TOF), provides an exceptional metabolite identification power [4]. The GC×GC technique has various advantages over GC such as improved resolution, increased separation capacity, better signal to noise ratios which lead to enhanced analyte detectability, and the ability of chemical class ordering in the 2D total ion chromatogram (TIC) [3]. However, there are significant challenges in the process of obtaining desired information from GC×GC-MS data, mostly due to the large volume of the produced data (e.g. typically in gigabytes (GB) per sample) [4]. This issue will certainly emerge for untargeted metabolomics studies in which there is a need to run many samples from at least two sample classes with a minimum of three replicate runs for each sample.
Chemometric techniques based on multivariate data analysis can properly tackle the problems surrounding large metabolomics data sets [5]. Multivariate curve resolution-alternating least squares (MCR-ALS) is a frequently used multivariate resolution method for decomposition of measure mixed analytical signals into the contribution profiles of pure constituents using a bilinear data decomposition. Combination of an appropriate compression strategy such as wavelet transform with MCR-ALS is a perfect solution for approaching the problems surrounding metabolomics data volume [5].
Different software tools have been developed for analysis of metabolomic data and for obtaining qualitative and quantitative information. Currently available omics tools such as Metabolyzer [6], PlantMat [7], MetExpert [8], MetExtract II [9,10], Lipostar [10], IMMA [11], FlavonQ-2.0v [12], and MetaboliteDetector [13] are only able to analyze Liquid Chromatography-Mass Spectrometry (LC-MS) and GC-MS metabolomic data. Although a number of data analysis methods are developed for processing GC×GC-MS data, to date there is no software that combine all required steps for analyzing GC×GC-MS metabolomics data. Thus, there is an essential necessity for a comprehensive user-friendly software which is specifically designed for omics studies in order to facilitate and speed up time-consuming data mining process of complex GC-MS and GC×GC-MS metabolomic data. Such software can popularize the application of GC-MS and GC×GC-MS techniques combined with novel chemometric algorithms and modern statistical approaches among the researchers in the field of metabolomics. In this regard, it can provide them with a large amount of new useful information about their studied biological system.
In order to meet this demand, we have developed RMet, an automated R based user-friendly graphical user interface (GUI) that aims to overcome challenges during the analysis of complex and big metabolomic GC-MS and GC×GC-MS data sets for transforming them to proper biological information. This software includes all steps of a complete untargeted metabolic data analysis work including preprocessing, segmentation, data compression, multivariate curve resolution (MCR), important metabolites identification, and metabolites classification.
Briefly, RMet applies MCR-ALS [[14], [15], [16], [17]] as its resolution algorithm which is one of the most efficient ways to handle fundamental challenges that occur during GC×GC analyses such as elution time shifts, baseline/background contribution, peak overlap, and peak shape changes [3]. Then, it performs metabolite classification by building a partial least squares-discriminant analysis (PLS-DA) [18,19] and finally introduces the significantly affected metabolites using variable importance in projection (VIP) [20] scores. Afterward, metabolic pathway analysis should be performed using metabolic pathways databases in order to identify affected metabolic pathways.
The workflow is designed in such a way that enables individuals to perform a complete analysis and obtain appropriate results without having any knowledge of chemometrics, but simultaneously provides detailed statistics for experts who want to customize and optimize their analysis. Preprocessing and Segmenting of large GC×GC-MS data is a tedious process; a great deal of attention is required while segmenting data or removing redundant areas such as column bleeding and derivatization agents from the TIC if it is desired to manually modify the matrices, but all these operations are easily done in RMet by just a few clicks. Also, RMet is low-size software written in R programming language which is an open source language with high popularity among the statisticians and data scientists [21]. These novel features make RMet a dominant automated computational tool for analyzing GC×GC-MS metabolomics. It should be pointed out that RMet can be used for the analysis of GC-MS metabolomic data too. In the following sections, the RMet’s function is demonstrated in data processing of a previous environmental metabolomics study on lettuce samples exposed to contaminants of emerging concern (CECs) by GC×GC-TOFMS which aims to investigate the effect of CECs exposure of lettuce on its metabolic pathways [3].
Section snippets
Software development
RMet is developed under RStudio version 1.1.383 using R core version 3.4.3, its execution file can be run in Windows environment without any limitation on the version and it is available free of charge at https://github.com/SUTChemometricsGroup/RMet along with a manual, source codes and sample data sets. In order to use RMet on the Linux and Macintosh operating systems, one should install R core (freely available at https://cran.r-project.org) and run the RMet.R code which is available at the
Results and discussion
RMet’s data processing strategy is shown in Fig. 1. It is a specifically designed data processing platform for analyzing both GC×GC-MS and GC-MS metabolomic data sets. Following, the applied approaches and algorithms in each step of the RMet workflow will be discussed in detail during the analysis of GC×GC-MS data of control and CECs exposed lettuce samples. These data sets were used to demonstrate the functionality and output of each data mining step.
Conclusion
Development of new integrated software for fast and accurate analysis of large GC-MS and GC×GC-MS untargeted metabolic data set can extremely help the researches in the field of metabolomics. In order to meet this crucial need, we have developed RMet, a novel automated R based software for analyzing both GC-MS and GC×GC-MS untargeted metabolomic data sets in a simple, quick, and reliable manner. All required steps for completing a metabolomics data analysis workflow including data preprocessing
Declaration of competing interest
The authors declare no conflict of interest.
Acknowledgment
The authors would like to thank the Research Council of Sharif University of Technology (SUT) for the financial support of this research with grant no. G960613. They would like to thank Prof. Josep M. Bayona from IDAEA-CSIC institute in Barcelona (Spain) to get access to the metabolomic data sets used in the work.
References (29)
- et al.
A tutorial review: metabolomics and partial least squares-discriminant analysis–a marriage of convenience or a shotgun wedding
Anal. Chim. Acta
(2015) - et al.
Comprehensive two-dimensional gas chromatography (GC× GC) retention time shift correction and modeling using bilinear peak alignment, correlation optimized shifting and multivariate curve resolution
Chemometr. Intell. Lab. Syst.
(2012) - et al.
Performance of some variable selection methods when multicollinearity is present
Chemometr. Intell. Lab. Syst.
(2005) - et al.
Wavelets—something for analytical chemistry?
Trends Anal. Chem.
(1997) - et al.
Wavelet transform and its applications in high performance liquid chromatography (HPLC) analysis
Chemometr. Intell. Lab. Syst.
(1999) Metabolomics and Systems Biology in Human Health and Medicine
(2014)- et al.
Linking the morphological and metabolomic response of Lactuca sativa L exposed to emerging contaminants using GC× GC-MS and chemometric tools
Sci. Rep.
(2017) - et al.
Gas Chromatography and Comprehensive Two-Dimensional Gas Chromatography Hyphenated with Mass Spectrometry for Targeted and Nontargeted Metabolomics in "Metabolomics in Practice: Successful Strategies to Generate and Analyze Metabolic Data
(2013) - et al.
Big (Bio) chemical data mining using chemometric methods: a need for chemists
Angew. Chem. Int. Ed.
(2018) - et al.
Metabolyzer: a novel statistical workflow for analyzing postprocessed lc–ms metabolomics data
Anal. Chem.
(2013)
PlantMAT: a metabolomics tool for predicting the specialized metabolic potential of a system and for large-scale metabolite identifications
Anal. Chem.
MetExpert: an expert system to enhance gas chromatography‒mass spectrometry-based metabolite identifications
Anal. Chim. Acta
MetExtract II: a software suite for stable Isotope-assisted untargeted metabolomics
Anal. Chem.
Lipostar, a comprehensive platform-neutral cheminformatics tool for lipidomics
Anal. Chem.
Cited by (13)
Recent advances in comparative analysis for comprehensive two-dimensional gas chromatography–mass spectrometry data
2024, Data Handling in Science and TechnologyMultiway analysis in process analytical chemistry
2024, Data Handling in Science and TechnologyExploration of metabolomics of Phyllanthus emblica: From ayurveda food to modern prospectives in quality control
2023, Food Chemistry AdvancesSample preparation and mass spectrometry for determining mycotoxins, hazardous fungi, and their metabolites in the environment, food, and healthcare
2023, TrAC - Trends in Analytical ChemistryReview of contemporary chemometric strategies applied on preparing GC–MS data in forensic analysis
2022, Microchemical JournalCitation Excerpt :Additionally, the full chromatogram removes any subjectivity associated with selecting appropriate variables for subsequent data analysis procedures. On the other hand, there are multiple efforts devoted to proposing an automated analytical approach by using untargeted GC–MS data [97–102]. For instance, Zhang et al. [101] proposed a comprehensive data analysis workflow for GC–MS-based metabolomics.
MSroi: A pre-processing tool for mass spectrometry-based studies
2021, Chemometrics and Intelligent Laboratory SystemsCitation Excerpt :For instance, Erny presented the Finnee tool, which is a MATLAB-based toolbox, that allows the processing of hyphenated LC-HRMS datasets [40]. Similarly, both Moayedpour [41] and Ma [42] have proposed tools for analysing GC-MS and GC × GC-MS datasets. Finally, the previously cited approach of Trindade for the analysis of TOF-SIMS data was implemented in the simsMVA software allowing the analysis of MS images [43].