Elsevier

Atmospheric Environment

Volume 259, 15 August 2021, 118538
Atmospheric Environment

Quantifying wintertime O3 and NOx formation with relevance vector machines

https://doi.org/10.1016/j.atmosenv.2021.118538Get rights and content

Highlights

  • Eleven gas-phase chemicals were measured during a wintertime field study in Utah.

  • A high-resolution time-of-flight chemical ionization mass spectrometer (CIMS) was used.

  • A relevance vector machine (RVM) was used to predict O3 and NO2 concentrations.

Abstract

This paper uses a machine learning model called a relevance vector machine (RVM) to quantify ozone (O3) and nitrogen oxides (NOx) formation under wintertime conditions. Field study measurements were based on previous work described by Olson et al. (2019), where continuous measurements were reported from a wintertime field study in Utah. RVMs were formulated using either O3 or nitrogen dioxide (NO2) as the output variable. Values of the correlation coefficient (r2) between predicted and measured concentrations were 0.944 for O3 and 0.931 for NO2. RVMs are constructed from the observed measurements and result in sparse model formulations, meaning that only a subset of the data is used to approximate the entire dataset. For this study, the RVM with O3 as the output variable used only 20% of the measurement data while the RVM with NO2 used 16%. RVMs were then used as a predictive model to assess the importance of individual precursors. Using O3 as the output variable, increases in three species resulted in increased O3 concentrations: hydrogen peroxide (H2O2), dinitrogen pentoxide (N2O5), and molecular chlorine (Cl2). For the two termination products measured during the study, nitric acid (HNO3) and formic acid (CH2O2), no change in O3 concentration was observed. Using NO2 as the output variable, only increases in N2O5 resulted in increased NO2 concentrations.

Introduction

Machine learning and pattern recognition are rapidly growing fields of study, and have been widely applied in engineering, business, and science (Mjolsness and DeCoste, 2001). Originating from a variety of mathematical formulations and optimization techniques, numerous learning algorithms have been developed, many of which are capable of complex analyses (Jordan and Mitchell, 2015). One such machine learning model is called the relevance vector machine (RVM). Previous researchers (e.g., Bishop, 2006) have noted that RVMs, along with the closely related support vector machines, have been used in a variety of analyses involving both classification and regression. Recent studies using relevance vector machines include topics ranging from price forecasting (Agrawal et al., 2019) to fatigue analysis (Caesarendra et al., 2010; Widodo and Yang, 2011; Zio and Di Maio, 2012). While RVMs have also been applied to environmental systems (e.g., Ghosh and Mujumdar, 2008; Samui and Dixon, 2012), this paper is to our knowledge the first to apply RVMs to atmospheric chemistry.

The chemistry of ozone (O3) and nitrogen oxides (NOx) formation is well-suited to analyses using machine learning techniques. The sheer number of chemical reactions leading to O3 and NOx formation, coupled with time and location dependence of individual species, are indicative of the complexity and high dimensionality that often warrants more advanced modeling approaches. Previous articles have reviewed the reaction chemistry, experimental measurements, or computational simulations associated with various aspects of O3 and NOx formation (e.g., Sillman, 1999; Atkinson, 2000; Russell and Dennis, 2000; Atkinson and Arey, 2003; Stockwell et al., 2012). However, numerous uncertainties still exist in identifying and prioritizing chemical intermediates of importance, especially for the case of wintertime conditions where ambient conditions are typically much different than other seasons. Recent studies have reported measurements of ozone and ozone precursors under wintertime conditions (Oltmans et al., 2014; Rappenglück et al., 2014; Koss et al., 2015). Recent studies have also considered seasonal variations of ground-level ozone formation (Clifton et al., 2014; Shen and Mickley, 2017; Balram et al., 2020).

Real-time measurements are now commonly used to better understand atmospheric processes. These measurements have included elemental and organic carbon (e.g., Bae et al., 2004), inorganic compounds (e.g., Harrison et al., 2004), particle-phase organic compounds (e.g., Jimenez et al., 2003; Canagaratna et al., 2007), and gas-phase organic compounds (e.g., Lee et al., 2014, 2018). While such measurements often provide detailed information, one significant challenge is developing approaches to summarize the data without loss of critical information. Simple averaging schemes could be used, though such data reductions are user-defined and unrelated to the underlying chemical processes. Thus, a formal mathematical approach would be valuable in reducing the redundancies that are inherent to highly time-resolved data.

The objective of this research is to use relevance vector machines to quantify the formation of O3 and NOx under wintertime conditions. An additional aspect of this validation effort is to examine a machine learning approach despite clear differences in formulation vis-à-vis traditional deterministic models. Machine learning models, typically through a series of latent variables, are often constructed such that a given set of input values corresponds directly to an output. As such, machine learning models have been applied extensively for analyses involving pattern recognition (e.g., image and voice recognition). This approach is in stark contrast with traditional deterministic models, where the values of output variables are propagated over time as is often the case for mass balance models quantifying chemical reactions in the atmosphere. This direct correspondence between inputs and outputs—a key feature of many machine learning algorithms—is likely a challenge for real-time measurements that can be poorly scaled relative to the underlying formation processes (e.g., chemical reactions in the atmosphere). This work is an extension of a previous analysis on O3 and NOx formation (Olson et al., 2019), where real-time measurements of 11 different gas-phase organic and inorganic species were completed in Utah during winter. The previous analysis was completed using vector autoregressions and focused on describing O3 and NOx formation. RVMs are used in this work to both explain and predict continuous measurements collected under wintertime conditions.

Section snippets

Site description

One stationary outdoor location, located on the Utah State University campus in Logan, UT (41°45′31.95″N, 111°48′54.44″W), was used for gas-phase measurements. A temperature-controlled shelter was used to house all instrumentation during the sampling period. The sampling campaign occurred from January 20, 2017 through February 15, 2017. As described in more detail below, a total of 11 species were measured throughout study.

Analytical methods

A 2B Technologies Model 211 UV photometric analyzer was used to measure

Results and discussion

The overall aim of this research is to better understand O3 and NOx formation using RVMs. This work builds on a previous autoregression analysis (Olson et al., 2019) that used the same wintertime sampling campaign reporting real-time measurements of the 11 gas-phase organic and inorganic species described in Section 2.2. One limitation of the previous paper is that autoregressions, though well-suited to describing continuous measurements, are less effective at predicting future values. In this

Conclusions

The intent of this research is to better understand wintertime formation of O3 and NO2 using relevance vector machines. The results from this study show that relevance vector machines—using only a fraction of the available data—can accurately predict measured concentrations of O3 and NO2. Additionally, this work addresses a key discrepancy between machine learning models and traditional deterministic models. For machine learning models like RVMs a given set of input values corresponds directly

CRediT authorship contribution statement

David A. Olson: Conceptualization, Data curation, Formal analysis, Methodology, Software, Validation, Visualization, Writing – original draft, Writing – review & editing. Theran P. Riedel: Investigation, Writing – original draft, Writing – review & editing. John H. Offenberg: Conceptualization, Data curation, Formal analysis, Methodology, Software, Validation, Visualization, Writing – original draft, Writing – review & editing. Michael Lewandowski: Conceptualization, Data curation, Funding

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The U.S. Environmental Protection Agency through its Office of Research and Development funded and collaborated in the research described here under Contract EP-C-15-008 to Jacobs Technology. The manuscript has been subjected to internal review and has been cleared for publication. The views expressed in this article are those of the authors, and do not necessarily represent the views or policies of the U.S. Environmental Protection Agency. Mention of organization names and trademarks does not

References (44)

  • A. Widodo et al.

    Application of relevance vector machine and survival probability to machine degradation assessment

    Expert Syst. Appl.

    (2011)
  • E. Zio et al.

    Fatigue crack growth estimation by relevance vector machine

    Expert Syst. Appl.

    (2012)
  • R. Atkinson et al.

    Atmospheric degradation of volatile organic compounds

    Chem. Rev.

    (2003)
  • M. Baasandorj et al.

    Coupling between chemical and meteorological processes under persistent cold-air pool conditions: evolution of wintertime PM2.5 pollution events and N2O5 observations in Utah's Salt Lake Valley

    Environ. Sci. Technol.

    (2017)
  • M.S. Bae et al.

    Hourly and daily patterns of particle-phase organic and elemental carbon concentrations in the urban atmosphere

    J. Air Waste Manag. Assoc.

    (2004)
  • C.M. Bishop

    Pattern Recognition and Machine Learning

    (2006)
  • M. Canagaratna et al.

    Chemical and microphysical characterization of ambient aerosols with the aerodyne aerosol mass spectrometer

    Mass Spectrom. Rev.

    (2007)
  • O.E. Clifton et al.

    Twenty-first century reversal of the surface ozone seasonal cycle over the northeastern United States, Geophys

    Res. Lett.

    (2014)
  • 40 Code of Federal Regulations § 50 Appendix D

    Reference Measurement Principle and Calibration Procedure for the Measurement of Ozone in the Atmosphere (Chemiluminescence Method)

    (2016)
  • 40 Code of Federal Regulations § 50 Appendix F

    Measurement Principle and Calibration Procedure for the Measurement of Nitrogen Dioxide in the Atmosphere (Gas Phase Chemiluminescence)

    (2012)
  • C.J. Gaston et al.

    Reactive uptake of N2O5 to internally mixed inorganic and organic particles: the role of organic carbon oxidation state and inferred organic phase separations

    Atmos. Chem. Phys.

    (2014)
  • T. Hastie et al.

    The Elements of Statistical Learning: Data Mining, Inference, and Prediction

    (2009)
  • Cited by (8)

    View all citing articles on Scopus
    View full text