A deep learning approach to real-time CO concentration prediction at signalized intersection

https://doi.org/10.1016/j.apr.2020.05.007Get rights and content

Highlights

  • A hybrid machine learning framework is proposed to predict the CO concentration at the signalized intersection.

  • Mobile portable monitoring equipment is used to record the CO concentration in fine-scale detection.

  • Random Forest algorithm is used to rank the importance of variables contributing to the CO concentration.

  • The multi-variables LSTM neural network (LSTM) improves the prediction accuracy of real-time CO concentration.

Abstract

Vehicle exhaust emissions at signalized intersections are the essential source of traffic-related pollution to pedestrians. Therefore, it is critical to predicting traffic emissions, especially the hazardous CO gas, with practical and accurate methods. However, the CO emission and concentration at crosswalks can be influenced by the complex traffic conditions in a complicated way, making the prediction of CO concentration a challenging task for traditional statistical models. To this end, a hybrid machine learning framework is proposed in this study to investigate the concentration of CO emissions at pedestrian crosswalks. The proposed method firstly ranks key influencing factors with a random forest approach. Then a prediction model with Multi-Variate Long Short-Term Memory (LSTM) neural networks based on the selected factors is developed. Data is collected at the field intersection for model training and validation. The autoregressive integrated moving average (ARIMA), support vector machines (SVM), radial basis functions network (RBFN), nonlinear vector autoregressive (VAR) and gated recurrent unit (GRU) neural network are selected as the benchmark models to verify the performance of the proposed model. The Root Mean Square Errors (RMSE), Mean Absolute Error (MAE) and R square are calculated to evaluate the performance of models comprehensively. The results indicated that the proposed model overwhelms the benchmark models in terms of prediction accuracy.

Introduction

With the trend of urbanization and industrialization around the world, air pollution is becoming an urgent problem (Liu et al., 2016; Zou et al., 2009). Vehicle exhaust proves to be one of the significant sources of air pollution in the urban area (Pan et al., 2019; Wang et al., 2020, 2018). Many studies have shown that motor vehicle air pollution induced cardiovascular and respiratory diseases (Roosbroeck et al., 2008; Zhang and Peng, 2014; Zhao et al., 2013). Currently, it is indicated that complex traffic conditions at the road intersection, such as traffic flows and vehicle states, result in the increasing exhaust emission as well as additional vehicle pollution exposure to pedestrians (Gokhale, 2011; Wang et al., 2015). Therefore, using appropriate methods to predict the concentration of traffic-related pollution at the road intersection is critical and essential.

In the past decades, there have been many attempts in this regard. Liu et al. (2016) applied a land-use regression model to assess the relationship between land use and air pollution. Xie et al. (2020) developed a multivariate nonlinear grey model to forecast the traffic-related emissions at a national level based on the kernel method. Xu et al. (2019) provided a geographically weighted regression method to analyze the relationship between air pollutant emissions and traffic conditions. Niu et al. (2017) established an early warning system to forecast the day-ahead air pollution concentrations by ensembling the least square support vector machine and empirical mode decomposition. However, these research are focusing on intra-urban air pollution, and the results cannot be applicable in fine-scale intersections. Besides, the lack of data makes the model less widely used. Analyzing fine-scale variations of air pollution in microenvironments is crucial because it is more directly associated with traffic pollutants exposure to pedestrians. Some dispersion models have been used to estimate the near-road pollutant concentrations, such as CALINE4, CAL3QHC, and AERMOD (Chen et al., 2009). However, these deterministic models have limitations in revealing the dispersion characteristics of vehicle pollutants, which are highly nonlinear and have a strong association with many factors like traffic emission, geographical conditions, and meteorological conditions (Cai et al., 2009). Additionally, statistical distribution models have been applied to investigate the relationship between air pollution concentrations and meteorological variables, but these models cannot predict the entire concentration range (Gokhale and Khare, 2004; Wang et al., 2015). Subsequently, deterministic and statistical distribution models are combined to predict the vehicle pollutant concentrations, but these hybrid models require much data, which are usually not well known (Inal, 2010).

With computational advancement, the artificial neural network (ANN) has been used extensively to estimate air pollution concentrations because it is suitable for modeling nonlinear and uncertain problems. The results of these studies indicated that ANN could overcome these limitations and have great performance in air pollution prediction (Elangasinghe et al., 2014; He et al., 2014; Liu and Chen, 2020; Singh et al., 2012; Zhang and Peng, 2014). However, compared with traditional statistical models and shallow machine learning models (ANN), the deep-learning neural network can solve time series prediction problems better and achieve remarkable prediction (Lv et al., 2014; Tian and Pan, 2015) and has been widely used in traffic areas (Liu et al., 2019a). Long Short-Term Memory neural network is one of the best models because it can learn the time series with long-time spans and automatically determine the optimal time lags for prediction (Hochreiter and Schmidhuber, 1997). It can address the issue of spatial and temporal dependence simultaneously and has been successfully used in many areas (Bao et al., 2019; Ma et al., 2015; Xu et al., 2018).

It is indicated that near-road traffic pollutants are of high spatial and temporal variability, implying that the fixed observation points to monitor the air pollution would be challenging to explain the actual traffic pollution concentrations concerning space and time. One of the most pollutants released by vehicles is CO (Linna Sengkey et al., 2011), and few studies have focused on the prediction of CO at the road intersection. Therefore, to address the research gap, in this study, the portable equipment was used to measure the specific CO concentration. Furthermore, a hybrid model combining the Random Forest and Long Short-Term Memory networks is proposed to predict the CO concentration at a road intersection. The result can provide insightful information for decision-makers to decrease vehicle pollutant exposure to pedestrians by adopting real-time traffic control measures. The main contributions in this paper are summarized as follows: Develop a hybrid framework based on LSTM to predict the CO concentrations at the road intersection. The noise of time series data is removed by data preprocessing, and the random forest is used to rank the importance of traffic flow in different directions.

The remainders of this paper are organized as follows. In Section 2, the field test and data collection are introduced, and the components of the proposed hybrid framework are summarized, including data preprocessing, random forest, and LSTM natural network. The result of the hybrid framework based on LSTM and the conclusion of this paper are discussed in Section 3.

Section snippets

Field test and data collection

Field experiments (31.900402° N, 118.836858° E) were conducted at the intersection of Shuanglong Ave. and Jiyin Ave. (Fig. 1(a)). Shuanglong Ave. is a five-lane road (both directions) with a total width of 39m. Jiyin Ave. is an arterial in the Jiangning district with 6-lanes in total. The traffic volume and population density are relatively high because the intersection is near Jiulonghu Campus railway station and Tongren hospital. There are no buildings or barriers around the intersections.

Data processing results

The procedure mentioned above was conducted for three observation site data, and the data preprocessing results are shown in Fig. 5. The detailed procedure of outlier detection and replacement is as follows:

  • (1)

    The primary step was to determine the appropriate value of eps and MinPts of the DBSCAN algorithm. MinPts was determined by experience. Eps was determined based on the k-distance graph, ranging the distances to the k = MinPts nearest neighbor of each observation. The appropriate value of eps

Conclusion

In this study, a hybrid framework based on LSTM neural network was developed to predict the CO concentration at the intersection. To improve the accuracy of prediction, the data preprocessing was conducted firstly to remove the outliers and decrease the influence of the unavoidable noise on prediction. The status of vehicles at the intersection was also taken into consideration. Moreover, the random forest was applied to rank the importance of these variables. Finally, the variables are added

Data statement

The data used in this paper are available from the corresponding author upon request.

CRediT authorship contribution statement

Yuxuan Wang: Conceptualization, Data curation, Formal analysis, Writing - original draft. Pan Liu: Methodology, Writing - review & editing. Chengcheng Xu: Conceptualization, Methodology, Validation. Chang Peng: Data curation, Validation. Jiaming Wu: Writing - review & editing.

Declaration of competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This work was supported by the National Key Research and Development Program of China (No. 2018YFB1600900 and SQ2018YFGH000413), Natural Science Foundation of Jiangsu Province (BK20171358) and Fundamental Research Funds for the Central Universities. The authors would like to thank the editor and the reviewers for their constructive comments and valuable suggestions to improve the quality of this article.

References (49)

  • M. Kühnlein et al.

    Improving the accuracy of rainfall rates from optical satellite sensors with machine learning — a random forests-based approach applied to MSG SEVIRI

    Rem. Sens. Environ.

    (2014)
  • M. Li et al.

    Short-term prediction of safety and operation impacts of lane changes in oscillations with empirical vehicle trajectories

    Accid. Anal. Prev.

    (2020)
  • H. Liu et al.

    Prediction of outdoor PM2.5 concentrations based on a three-stage hybrid neural network model

    Atmos. Pollut. Res.

    (2020)
  • C. Liu et al.

    A land use regression application into assessing spatial variation of intra-urban fine particulate matter (PM 2.5 ) and nitrogen dioxide (NO 2 ) concentrations in City of Shanghai, China

    Sci. Total Environ.

    (2016)
  • Y. Liu et al.

    DeepPF: a deep learning based architecture for metro passenger flow prediction

    Transport. Res. C Emerg. Technol.

    (2019)
  • Z. Liu et al.

    A tailored machine learning approach for urban transport network flow estimation

    Transport. Res. C Emerg. Technol.

    (2019)
  • X. Ma et al.

    Long short-term memory neural network for traffic speed prediction using remote microwave sensor data

    Transport. Res. C Emerg. Technol.

    (2015)
  • M. Niu et al.

    Application of decomposition-ensemble learning paradigm with phase space reconstruction for day-ahead PM 2.5 concentration forecasting

    J. Environ. Manag.

    (2017)
  • J. Nowotarski et al.

    An empirical comparison of alternative schemes for combining electricity spot price forecasts

    Energy Econ.

    (2014)
  • T. Nyitrai et al.

    The effects of handling outliers on the performance of bankruptcy prediction models

    Soc. Econ. Plann. Sci.

    (2019)
  • Y. Pan et al.

    Estimation of real-driving emissions for buses fueled with liquefied natural gas based on gradient boosted regression trees

    Sci. Total Environ.

    (2019)
  • R.K. Pearson

    Exploring process data

    J. Process Contr.

    (2001)
  • K.P. Singh et al.

    Linear and nonlinear modeling approaches for urban air quality prediction

    Sci. Total Environ.

    (2012)
  • Z. Wang et al.

    Fine-scale estimation of carbon monoxide and fine particulate matter concentrations in proximity to a road intersection by using wavelet neural network with genetic algorithm

    Atmos. Environ.

    (2015)
  • Cited by (21)

    • Prediction method of PM2.5 concentration based on decomposition and integration

      2023, Measurement: Journal of the International Measurement Confederation
    View all citing articles on Scopus

    Peer review under responsibility of Turkish National Committee for Air Pollution Research and Control.

    View full text