Quantifying wintertime O3 and NOx formation with relevance vector machines
Introduction
Machine learning and pattern recognition are rapidly growing fields of study, and have been widely applied in engineering, business, and science (Mjolsness and DeCoste, 2001). Originating from a variety of mathematical formulations and optimization techniques, numerous learning algorithms have been developed, many of which are capable of complex analyses (Jordan and Mitchell, 2015). One such machine learning model is called the relevance vector machine (RVM). Previous researchers (e.g., Bishop, 2006) have noted that RVMs, along with the closely related support vector machines, have been used in a variety of analyses involving both classification and regression. Recent studies using relevance vector machines include topics ranging from price forecasting (Agrawal et al., 2019) to fatigue analysis (Caesarendra et al., 2010; Widodo and Yang, 2011; Zio and Di Maio, 2012). While RVMs have also been applied to environmental systems (e.g., Ghosh and Mujumdar, 2008; Samui and Dixon, 2012), this paper is to our knowledge the first to apply RVMs to atmospheric chemistry.
The chemistry of ozone (O3) and nitrogen oxides (NOx) formation is well-suited to analyses using machine learning techniques. The sheer number of chemical reactions leading to O3 and NOx formation, coupled with time and location dependence of individual species, are indicative of the complexity and high dimensionality that often warrants more advanced modeling approaches. Previous articles have reviewed the reaction chemistry, experimental measurements, or computational simulations associated with various aspects of O3 and NOx formation (e.g., Sillman, 1999; Atkinson, 2000; Russell and Dennis, 2000; Atkinson and Arey, 2003; Stockwell et al., 2012). However, numerous uncertainties still exist in identifying and prioritizing chemical intermediates of importance, especially for the case of wintertime conditions where ambient conditions are typically much different than other seasons. Recent studies have reported measurements of ozone and ozone precursors under wintertime conditions (Oltmans et al., 2014; Rappenglück et al., 2014; Koss et al., 2015). Recent studies have also considered seasonal variations of ground-level ozone formation (Clifton et al., 2014; Shen and Mickley, 2017; Balram et al., 2020).
Real-time measurements are now commonly used to better understand atmospheric processes. These measurements have included elemental and organic carbon (e.g., Bae et al., 2004), inorganic compounds (e.g., Harrison et al., 2004), particle-phase organic compounds (e.g., Jimenez et al., 2003; Canagaratna et al., 2007), and gas-phase organic compounds (e.g., Lee et al., 2014, 2018). While such measurements often provide detailed information, one significant challenge is developing approaches to summarize the data without loss of critical information. Simple averaging schemes could be used, though such data reductions are user-defined and unrelated to the underlying chemical processes. Thus, a formal mathematical approach would be valuable in reducing the redundancies that are inherent to highly time-resolved data.
The objective of this research is to use relevance vector machines to quantify the formation of O3 and NOx under wintertime conditions. An additional aspect of this validation effort is to examine a machine learning approach despite clear differences in formulation vis-à-vis traditional deterministic models. Machine learning models, typically through a series of latent variables, are often constructed such that a given set of input values corresponds directly to an output. As such, machine learning models have been applied extensively for analyses involving pattern recognition (e.g., image and voice recognition). This approach is in stark contrast with traditional deterministic models, where the values of output variables are propagated over time as is often the case for mass balance models quantifying chemical reactions in the atmosphere. This direct correspondence between inputs and outputs—a key feature of many machine learning algorithms—is likely a challenge for real-time measurements that can be poorly scaled relative to the underlying formation processes (e.g., chemical reactions in the atmosphere). This work is an extension of a previous analysis on O3 and NOx formation (Olson et al., 2019), where real-time measurements of 11 different gas-phase organic and inorganic species were completed in Utah during winter. The previous analysis was completed using vector autoregressions and focused on describing O3 and NOx formation. RVMs are used in this work to both explain and predict continuous measurements collected under wintertime conditions.
Section snippets
Site description
One stationary outdoor location, located on the Utah State University campus in Logan, UT (41°45′31.95″N, 111°48′54.44″W), was used for gas-phase measurements. A temperature-controlled shelter was used to house all instrumentation during the sampling period. The sampling campaign occurred from January 20, 2017 through February 15, 2017. As described in more detail below, a total of 11 species were measured throughout study.
Analytical methods
A 2B Technologies Model 211 UV photometric analyzer was used to measure
Results and discussion
The overall aim of this research is to better understand O3 and NOx formation using RVMs. This work builds on a previous autoregression analysis (Olson et al., 2019) that used the same wintertime sampling campaign reporting real-time measurements of the 11 gas-phase organic and inorganic species described in Section 2.2. One limitation of the previous paper is that autoregressions, though well-suited to describing continuous measurements, are less effective at predicting future values. In this
Conclusions
The intent of this research is to better understand wintertime formation of O3 and NO2 using relevance vector machines. The results from this study show that relevance vector machines—using only a fraction of the available data—can accurately predict measured concentrations of O3 and NO2. Additionally, this work addresses a key discrepancy between machine learning models and traditional deterministic models. For machine learning models like RVMs a given set of input values corresponds directly
CRediT authorship contribution statement
David A. Olson: Conceptualization, Data curation, Formal analysis, Methodology, Software, Validation, Visualization, Writing – original draft, Writing – review & editing. Theran P. Riedel: Investigation, Writing – original draft, Writing – review & editing. John H. Offenberg: Conceptualization, Data curation, Formal analysis, Methodology, Software, Validation, Visualization, Writing – original draft, Writing – review & editing. Michael Lewandowski: Conceptualization, Data curation, Funding
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
The U.S. Environmental Protection Agency through its Office of Research and Development funded and collaborated in the research described here under Contract EP-C-15-008 to Jacobs Technology. The manuscript has been subjected to internal review and has been cleared for publication. The views expressed in this article are those of the authors, and do not necessarily represent the views or policies of the U.S. Environmental Protection Agency. Mention of organization names and trademarks does not
References (44)
- et al.
Ensemble of relevance vector machines and boosted trees for electricity price forecasting
Appl. Energy
(2019) Atmospheric chemistry of VOCs and NOx
Atmos. Environ.
(2000)- et al.
A novel soft sensor based warning system for hazardous ground-level ozone using advanced damped least squares neural network
Ecotoxicol. Environ. Saf.
(2020) - et al.
Application of relevance vector machine and logistic regression for machine degradation assessment
Mech. Syst. Signal Process.
(2010) - et al.
Statistical downscaling of GCM simulations to streamflow using relevance vector machine
Adv. Water Resour.
(2008) - et al.
Major component composition of PM10 and PM2.5 from roadside and urban background sites
Atmos. Environ.
(2004) - et al.
Time series analysis of wintertime O3 and NOx formation using vector autoregressions
Atmos. Environ.
(2019) - et al.
NARSTO critical review of photochemical models and modeling
Atmos. Environ.
(2000) The relation between ozone, NOx and hydrocarbons in urban and polluted rural environments
Atmos. Environ.
(1999)- et al.
Measurements of atmospheric hydroperoxides over a rural site in central Japan during summers using a helicopter
Atmos. Environ.
(2016)
Application of relevance vector machine and survival probability to machine degradation assessment
Expert Syst. Appl.
Fatigue crack growth estimation by relevance vector machine
Expert Syst. Appl.
Atmospheric degradation of volatile organic compounds
Chem. Rev.
Coupling between chemical and meteorological processes under persistent cold-air pool conditions: evolution of wintertime PM2.5 pollution events and N2O5 observations in Utah's Salt Lake Valley
Environ. Sci. Technol.
Hourly and daily patterns of particle-phase organic and elemental carbon concentrations in the urban atmosphere
J. Air Waste Manag. Assoc.
Pattern Recognition and Machine Learning
Chemical and microphysical characterization of ambient aerosols with the aerodyne aerosol mass spectrometer
Mass Spectrom. Rev.
Twenty-first century reversal of the surface ozone seasonal cycle over the northeastern United States, Geophys
Res. Lett.
Reference Measurement Principle and Calibration Procedure for the Measurement of Ozone in the Atmosphere (Chemiluminescence Method)
Measurement Principle and Calibration Procedure for the Measurement of Nitrogen Dioxide in the Atmosphere (Gas Phase Chemiluminescence)
Reactive uptake of N2O5 to internally mixed inorganic and organic particles: the role of organic carbon oxidation state and inferred organic phase separations
Atmos. Chem. Phys.
The Elements of Statistical Learning: Data Mining, Inference, and Prediction
Cited by (8)
Relevance vector machine (RVM)
2022, Handbook of HydroInformatics: Volume I: Classic Soft-Computing TechniquesMachine learning for hours-ahead forecasts of urban air concentrations of oxides of nitrogen from univariate data exploiting trend attributes
2023, Environmental Science: AdvancesRobust sparse Bayesian learning for broad learning with application to high-speed railway track monitoring
2023, Structural Health Monitoring