RMet: An automated R based software for analyzing GC-MS and GC×GC-MS untargeted metabolomic data

https://doi.org/10.1016/j.chemolab.2019.103866Get rights and content

Highlights

  • Development of an R based software for nontargeted metabolomics.

  • RMet is developed to overcome the challenges in the metabolomic analysis workflow.

  • RMet is dedicated to the analysis of GC-MS and GC×GC-MS metabolomic data.

  • RMet includes all steps of a complete untargeted metabolic data analysis.

  • Preprocessing, multivariate curve resolution and classification are included in RMet.

Abstract

Gas chromatography-mass spectrometry (GC-MS) and comprehensive two-dimensional gas chromatography-mass spectrometry (GC×GC-MS) are powerful techniques for measurement of all metabolites in complex metabolic samples. However, analyzing GC-MS and especially GC×GC-MS metabolomic data is a major challenge to the researchers in the field of metabolomics mainly due to the complexity and large data size. In this regard, an automated R based software entitled RMet has been developed to overcome the challenges in the metabolomic analysis workflow of GC-MS and GC×GC-MS data sets. Additionally, it is able to facilitate the complex process of extracting reliable and useful biological information from these data sets. Moreover, RMet can greatly accelerate the time-consuming data analysis process of large GC-MS and GC×GC-MS datasets by the means of modern chemometric methods. In fact, RMet transforms raw GC-MS and GC×GC-MS data files into the elution profiles and mass spectra of important (significantly affected metabolites) which can be imported into NIST MS search software for the final identification of these metabolites. To show the performance of the developed software, large GC×GC-MS data sets of a previously reported environmental metabolomics study on lettuce samples exposed to contaminants of emerging concerns (CECs) were analyzed by RMet. The procedure for analyzing GC-MS metabolic data with RMet is as same as GC×GC-MS data sets but some steps can be skipped due to the lower size of GC-MS data sets. The software, its manual, sample data sets and source code are freely available on https://github.com/SUTChemometricsGroup/RMet.

Introduction

Metabolomics is the comprehensive study of all metabolites in a cell, tissue, or an organism in order to produce a metabolic snapshot of a biological system [1]. Metabolomic samples are mostly of high complexity due to the presence of numerous metabolites with specific physicochemical properties in the metabolome. This complexity is illustrated by the number of metabolites and phytochemicals in the plant kingdom which is estimated to be greater than 200000 [2]. As a result, measurement of all metabolites in such complex sample matrices requires the use of multiple sophisticated analytical instruments and remains an analytical challenge. Among different analytical platforms, gas chromatography-mass spectrometry (GC-MS) is the more frequently used technique for separation and identification of metabolites. However, the complexity present in most of the biological samples pushes this technique to its limits. The comprehensive two-dimensional gas chromatography-mass spectrometry (GC×GC-MS) is a great solution to overcome this challenge [3]. The GC×GC instrument separates the greatest number of metabolites with excellent sensitivity and, when combined with a fast mass spectrometry detector such as time-of-flight (TOF), provides an exceptional metabolite identification power [4]. The GC×GC technique has various advantages over GC such as improved resolution, increased separation capacity, better signal to noise ratios which lead to enhanced analyte detectability, and the ability of chemical class ordering in the 2D total ion chromatogram (TIC) [3]. However, there are significant challenges in the process of obtaining desired information from GC×GC-MS data, mostly due to the large volume of the produced data (e.g. typically in gigabytes (GB) per sample) [4]. This issue will certainly emerge for untargeted metabolomics studies in which there is a need to run many samples from at least two sample classes with a minimum of three replicate runs for each sample.

Chemometric techniques based on multivariate data analysis can properly tackle the problems surrounding large metabolomics data sets [5]. Multivariate curve resolution-alternating least squares (MCR-ALS) is a frequently used multivariate resolution method for decomposition of measure mixed analytical signals into the contribution profiles of pure constituents using a bilinear data decomposition. Combination of an appropriate compression strategy such as wavelet transform with MCR-ALS is a perfect solution for approaching the problems surrounding metabolomics data volume [5].

Different software tools have been developed for analysis of metabolomic data and for obtaining qualitative and quantitative information. Currently available omics tools such as Metabolyzer [6], PlantMat [7], MetExpert [8], MetExtract II [9,10], Lipostar [10], IMMA [11], FlavonQ-2.0v [12], and MetaboliteDetector [13] are only able to analyze Liquid Chromatography-Mass Spectrometry (LC-MS) and GC-MS metabolomic data. Although a number of data analysis methods are developed for processing GC×GC-MS data, to date there is no software that combine all required steps for analyzing GC×GC-MS metabolomics data. Thus, there is an essential necessity for a comprehensive user-friendly software which is specifically designed for omics studies in order to facilitate and speed up time-consuming data mining process of complex GC-MS and GC×GC-MS metabolomic data. Such software can popularize the application of GC-MS and GC×GC-MS techniques combined with novel chemometric algorithms and modern statistical approaches among the researchers in the field of metabolomics. In this regard, it can provide them with a large amount of new useful information about their studied biological system.

In order to meet this demand, we have developed RMet, an automated R based user-friendly graphical user interface (GUI) that aims to overcome challenges during the analysis of complex and big metabolomic GC-MS and GC×GC-MS data sets for transforming them to proper biological information. This software includes all steps of a complete untargeted metabolic data analysis work including preprocessing, segmentation, data compression, multivariate curve resolution (MCR), important metabolites identification, and metabolites classification.

Briefly, RMet applies MCR-ALS [[14], [15], [16], [17]] as its resolution algorithm which is one of the most efficient ways to handle fundamental challenges that occur during GC×GC analyses such as elution time shifts, baseline/background contribution, peak overlap, and peak shape changes [3]. Then, it performs metabolite classification by building a partial least squares-discriminant analysis (PLS-DA) [18,19] and finally introduces the significantly affected metabolites using variable importance in projection (VIP) [20] scores. Afterward, metabolic pathway analysis should be performed using metabolic pathways databases in order to identify affected metabolic pathways.

The workflow is designed in such a way that enables individuals to perform a complete analysis and obtain appropriate results without having any knowledge of chemometrics, but simultaneously provides detailed statistics for experts who want to customize and optimize their analysis. Preprocessing and Segmenting of large GC×GC-MS data is a tedious process; a great deal of attention is required while segmenting data or removing redundant areas such as column bleeding and derivatization agents from the TIC if it is desired to manually modify the matrices, but all these operations are easily done in RMet by just a few clicks. Also, RMet is low-size software written in R programming language which is an open source language with high popularity among the statisticians and data scientists [21]. These novel features make RMet a dominant automated computational tool for analyzing GC×GC-MS metabolomics. It should be pointed out that RMet can be used for the analysis of GC-MS metabolomic data too. In the following sections, the RMet’s function is demonstrated in data processing of a previous environmental metabolomics study on lettuce samples exposed to contaminants of emerging concern (CECs) by GC×GC-TOFMS which aims to investigate the effect of CECs exposure of lettuce on its metabolic pathways [3].

Section snippets

Software development

RMet is developed under RStudio version 1.1.383 using R core version 3.4.3, its execution file can be run in Windows environment without any limitation on the version and it is available free of charge at https://github.com/SUTChemometricsGroup/RMet along with a manual, source codes and sample data sets. In order to use RMet on the Linux and Macintosh operating systems, one should install R core (freely available at https://cran.r-project.org) and run the RMet.R code which is available at the

Results and discussion

RMet’s data processing strategy is shown in Fig. 1. It is a specifically designed data processing platform for analyzing both GC×GC-MS and GC-MS metabolomic data sets. Following, the applied approaches and algorithms in each step of the RMet workflow will be discussed in detail during the analysis of GC×GC-MS data of control and CECs exposed lettuce samples. These data sets were used to demonstrate the functionality and output of each data mining step.

Conclusion

Development of new integrated software for fast and accurate analysis of large GC-MS and GC×GC-MS untargeted metabolic data set can extremely help the researches in the field of metabolomics. In order to meet this crucial need, we have developed RMet, a novel automated R based software for analyzing both GC-MS and GC×GC-MS untargeted metabolomic data sets in a simple, quick, and reliable manner. All required steps for completing a metabolomics data analysis workflow including data preprocessing

Declaration of competing interest

The authors declare no conflict of interest.

Acknowledgment

The authors would like to thank the Research Council of Sharif University of Technology (SUT) for the financial support of this research with grant no. G960613. They would like to thank Prof. Josep M. Bayona from IDAEA-CSIC institute in Barcelona (Spain) to get access to the metabolomic data sets used in the work.

References (29)

  • F. Qiu et al.

    PlantMAT: a metabolomics tool for predicting the specialized metabolic potential of a system and for large-scale metabolite identifications

    Anal. Chem.

    (2016)
  • F. Qiu et al.

    MetExpert: an expert system to enhance gas chromatography‒mass spectrometry-based metabolite identifications

    Anal. Chim. Acta

    (2018)
  • C. Bueschl et al.

    MetExtract II: a software suite for stable Isotope-assisted untargeted metabolomics

    Anal. Chem.

    (2017)
  • L. Goracci et al.

    Lipostar, a comprehensive platform-neutral cheminformatics tool for lipidomics

    Anal. Chem.

    (2017)
  • Cited by (13)

    • Multiway analysis in process analytical chemistry

      2024, Data Handling in Science and Technology
    • Review of contemporary chemometric strategies applied on preparing GC–MS data in forensic analysis

      2022, Microchemical Journal
      Citation Excerpt :

      Additionally, the full chromatogram removes any subjectivity associated with selecting appropriate variables for subsequent data analysis procedures. On the other hand, there are multiple efforts devoted to proposing an automated analytical approach by using untargeted GC–MS data [97–102]. For instance, Zhang et al. [101] proposed a comprehensive data analysis workflow for GC–MS-based metabolomics.

    • MSroi: A pre-processing tool for mass spectrometry-based studies

      2021, Chemometrics and Intelligent Laboratory Systems
      Citation Excerpt :

      For instance, Erny presented the Finnee tool, which is a MATLAB-based toolbox, that allows the processing of hyphenated LC-HRMS datasets [40]. Similarly, both Moayedpour [41] and Ma [42] have proposed tools for analysing GC-MS and GC ​× ​GC-MS datasets. Finally, the previously cited approach of Trindade for the analysis of TOF-SIMS data was implemented in the simsMVA software allowing the analysis of MS images [43].

    View all citing articles on Scopus
    View full text