An LSTM-based neural network method of particulate pollution forecast in China

Yarong Chen; Shuhang Cui; Panyi Chen; Qiangqiang Yuan; Ping Kang; Liye Zhu

doi:10.1088/1748-9326/abe1f5

1. Introduction

The epidemiological investigation, animal toxicology test and human clinical observation of PM₁₀ show that PM₁₀ has obvious and direct toxic effects on human health, and can cause extensive damage to respiratory system (Leng et al 2017), heart and blood system, immune system and endocrine system (Li et al 2002). For every 50 μg m⁻³ increase of PM₁₀ daily average concentration, the mortality could increase by 4%–5% on average in the research of Utah River Valley (Maynard 1997). In the meantime, the exposure level of PM₁₀ concentration is also important. When the mass concentration of PM₁₀ is greater than 100 μg m⁻³, the mortality rate is 11% higher than that when the mass concentration of PM₁₀ is less than 50 μg m⁻³ (Maynard 1997). In addition, as a major component of air pollution, aerosol load trends also have an important impact on the prediction of climate change (Rotstayn et al 2015, Westervelt et al 2015, Wang et al 2016, Yang et al 2017). Therefore, accurate and timely prediction of PM₁₀ is of great significance in terms of both climatology and socio-economic development.

In recent years, statistical methods have been used in air pollution predictions and gradually formed a research trend for its advantages of high efficiency, convenience and low cost. In addition to a single numerical prediction model (Li et al 2016, Zhao et al 2016, Yang et al 2017), machine learning methods have also been gradually applied to air pollution prediction (Mallet et al 2009, Papaleonidas and Iliadis 2013, Nieto et al 2018). Deep learning, as one of the most highly sought after class of machine learning algorithms, shows great potential in fitting nonlinear complex relationships between the influencing factors and the pollution concentrations (Hinton et al 2006, Li et al 2018), such as the recurrent neural network (RNN) method, which usually refers to considering the influence of the current pollution concentration of the geographical adjacent area of the interested area (Tong et al 2019).

Long short-term memory (LSTM) is a RNN with long-term and short-term memory. In recent years, it has become an effective and scalable model to solve some learning problems related to sequential data. The prediction ability of LSTM has been revealed in atmospheric science recently (Asanjan et al 2018, Alhirmizy and Qader 2019, Mohapatra et al 2020). These studies show that LSTM has better statistical properties (such as correlation coefficient and mean squared error) than traditional RNN and also has better performance than the single environmental meteorological model (Dai et al 2019). Therefore, LSTM method brings us another opportunity of better predicting the PM₁₀ changing trend in China.

In recent years, the air particulate pollution in China shows an obviously region-based feature. For example, the sources of particulate pollution in Jing–Jin–Ji Area are similar, and the seasonal distribution shows specific characteristics of regions (Dao 2015). Li et al (2017) found that aerosol extinction coefficient in Northeast and Northwest China continued to decline. Around 2006, the trend of the Pearl River Delta and southwest China has indeed changed from increasing to decreasing, while the trend of the North China Plain (NCP) and the Yangtze River Delta (YRD) is still increasing. These results indicate the diversity of PM₁₀ in different regions and the complexity of its prediction. There is a significant correlation in PM₁₀ among the cities in the Yangtze River Delta region and PM₁₀ concentration is also correlated with other pollutants (e.g. NO₂ and SO₂) (Shi et al 2008). Similar relationships also appears in Pearl River Delta region (Hu et al 2011) and Northwest China (Qiu 2010), which are light and heavy polluted PM₁₀ regions, respectively.

This paper aims to explore the potential of a LSTM-based method prediction ability of daily PM₁₀ concentration and develop the general prediction model that can be applied to different regions of China. We evaluate the model adaptability in five representative regions of China in section 4. This study is an important progress of neural network method that is applied to air pollution forecast. Although this study focuses on forecasting PM₁₀ concentrations, it can play a significant role in generalization to a broader area to forecast other air pollutants.

2. Data

2.1. Data source

We obtain the national urban pollution data from the national urban air quality real-time release platform (http://106.37.208.233:20035/, last access: 22 November 2020). The meteorological data is from the China surface climatic data daily data set V3.0, which is from the China Meteorological Data Network (https://data.cma.cn, last access: 22 November 2020). We select the national urban pollution data and meteorological data of 359 cities from period 2015 to 2017. We have 1095 data samples for each city in our data set and our LSTM model is based on the time dimension. The details of the data are summarized in table S1 (available online at stacks.iop.org/ERL/16/044006/mmedia). The description of data preprocessing and the temporal feature are in supplemental text S1. The combined data sets for our research are saved at figshare (Zhu 2021).

2.2. Spatial patterns of different city metropolitans

Data collection sites are mainly distributed in the eastern region of China, especially the Yangtze River Delta, the Pearl River Delta and the Beijing–Tianjin–Hebei region, with relatively few sites in the western region. Therefore, in this paper, we select a few representative cities (Beijing, Taiyuan, Shanghai, Nanjing and Guangzhou) from different geographic regions (Jing–Jin–Ji, Northwest China, Yangtze River Delta and Pearl River Delta) of China and cities surrounding them within a certain distance (e.g. 100 km, 200 km) for this study (figure 1(a)). Due to the geographical proximity between Beijing and Taiyuan as well as Nanjing and Shanghai, some of the selected cities within 200 km are with overlapped areas (the red circle in figure 1(a)).

**Figure 1.** (a) Location map of selected *in situ* sites. The stars represent the target cities: Beijing, Taiyuan, Shanghai, Nanjing, and Guangzhou. The target cities and their surrounding cities are shown as red, blue, purple, green, and yellow, respectively. Cities within 100 km distances are indicated as squares, while cities within 200 km distances are indicated as inverted triangle. Stations in red circle are within the 200 km of both Nanjing and Shanghai. (b) Double-layer LSTM neural network structure diagram. (c) Double-layer LSTM flow chart.
Download figure:
Standard image High-resolution image

3. Methods

3.1. The principle of LSTM

Long short-term memory (LSTM) neural network is a widely used RNN architecture in the field of deep learning. It has a tree-like hierarchical structure, consisting of input layer, hidden layer and output layer (figure 1(b)), while its network nodes are recursive with the input information connected to each other in sequential order. Each hidden layer contains LSTM structure and dropout layer. LSTM structure is the core part of the entire neural network. Each LSTM neuron contains three control gates: input gate, forgetting gate and output gate. The input gate learns to decide when to let the activation signal into the cell, the output gate learns when to let the activation signal out of the cell, and the forgetting gate learns when to let the cell of the last moment into the cell of the next moment. In the forgetting gate, only the information conforming to algorithm authentication will remain, while the information not conforming will be forgotten, thus it effectively solves the problem of long-term dependence (Greff et al 2017). In the dropout layer, neural network cells are temporarily discarded from the network with a certain probability during the training process, which is a means to prevent overfitting and has good fault tolerance.

3.2. Model design

In our model, the prediction object is the target pollutant concentration of a central city, and the prediction factors are the historical daily data of meteorological elements and pollution elements of the central city and its surrounding n_SITE stations in the past r days. Specifically, the target pollutant is daily PM₁₀ in this study. The input factors include d factors such as daily average temperature, daily average humidity, maximum wind speed, maximum wind direction, SO₂ concentration, NO₂ concentration, CO concentration, O₃ concentration for 8 h, PM_2.5 concentration and PM₁₀ concentration (table 1).

Table 1. Input factors.

Variable	Unit	Notes
PM₁₀	μg m⁻³	Target variable to predict with nearby cities
PM_2.5	μg m⁻³
SO₂	μg m⁻³
NO₂	μg m⁻³
CO	μg m⁻³
O₃	μg m⁻³
Mean temperature	0.1 °C	Unit is 0.1 times of 1 °C
Mean relative humidity	%
Great wind speed	0.1 m s⁻¹	Unit is 0.1 times of 1 m s⁻¹
Great wind direction		Numbers are used to represent 16 directions and then be converted to degrees (wind from North is 0°)

We constructed the input factor matrix W, and its dimension is r × n_SITE × d (figure 1(c)). One or more LSTM cells constitute the middle layer of the prediction model, and the number of layers is denoted as n_LAYER . Each LSTM layer contains n_NEU neurons and is followed by a dropout layer. The output layer is where we find the predicted target p_PRE . The forecaster is denoted as p_PRE while the corresponding observed value is denoted as p_OBS . After e times of prediction, the prediction sequence p_PRE would be achieved. The R-square and root mean square error (RMSE) of p_PRE and p_OBS are calculated to evaluate the prediction performance of different model configurations.

During the training process, the optimization function used in the LSTM model is the 'Adam' algorithm (Zhang et al 2019) and the loss function is 'Mean_Squared_Error'.

In general, the prediction effect of the model is related to the parameters of r, n_SITE, n_LAYER and n_NEU . In this study, we used Python and its deep learning libraries Tensorflow and Keras to build and implement the above neural network algorithms. We set up a control experiment of multiple parameter configurations, and then discuss the capabilities of our LSTM model in different regions of China.

We divided the data set with time length of L days into two sets, with 70% of the available data allocated as training set and the remaining 30% data as test set. A summary of experiment settings is shown in table S3. We adopted the control variables for the selected representative cities (Beijing, Taiyuan, Shanghai, Nanjing, and Guangzhou) with only one changeable variable in each group of experiments. Finally, we used R² and RMSE to evaluate the experimental results to explore the influence of parameters such as the number of layers in LSTM (supplemental text S2), the scope of surrounding cities (supplemental text S3), the days of input data as well as different regions that the model applied to (supplemental text S4). Based on the experimental results, we determined the best model is double-layer LSTM with previous one observed data as input within a certain geographical range (DLP1–LSTM model; DLP1, double-layer with previous ONE observed data).

3.3. Other machine learning methods

To evaluate superiority of the LSTM model in PM₁₀ prediction in this paper, we compared it with five other methods including traditional statistical methods and machine learning models: ordinary least squares regression (OLSR), Bayesian ridge regression, support vector regression (SVR), multi-layer perceptron (MLP), and Random Forest Regression. We will briefly introduce these methods in the following paragraphs.

OLSR, which seeks the best match for data by minimizing L2 norm errors, is a mathematical optimization technique (Moutinho and Hutcheson 2011). In this algorithm, OLSR is used to fit a linear model using coefficients in Linear Regression to minimize the sum of squares of differences between the actual observed data and the predicted data.

Bayesian Ridge Regression solves these regression problems by imposing a penalty on the Regression coefficients on the basis of OLSR. In other words, a regular term is added to the deviated sum of squares function, which is used to adjust parameters and delete those related terms (Massaoudi et al 2020).

SVR is different from traditional regression methods. For SVR, as long as the predication f(x) does not deviate too much from Y, the prediction can be regarded as correct and there is no need to calculate the error. SVR is still a convex optimization problem and thus ensures the global solution give it nonlinear prediction ability and generalization performance (Uçak and Günel 2016).

The creation of feedforward neural network of MLP starts from the most basic form of a single perceptron. A sensor has one or more inputs, biases, activation functions, and a single output. The sensor takes the inputs, multiplies them by certain weights, and passes them to the activation function to produce the output. The input layer of an MLP consists of all the variables in the individual neurons, and the output layer consists of response variables. The input layer and the hidden layer include a constant neuron associated with the intercepting synapse or deviation (that is, a synapse not directly affected by any covariable called interception) (Egbo 2018).

Random Forest Regressor is an ensemble method by creating forest with multiple decision trees during training. Each tree contains an arrangement of decision nodes, based on which the tree is divided into different branches until the termination point (leaf) is reached. Each decision node depends on whether the value of the input feature exceeds a certain threshold. The final prediction is obtained by averaging each tree prediction that provided by training it independently (Pillai et al 2020).

In this paper, the modules in the Scikit-earn library are used to implement the models for comparison. To ensure reliability of the comparison results, the data preprocessing and the input factors in other machine learning methods are kept consistent with our LSTM model.

4. Results

4.1. DLP1–LSTM model performance between representative cities

In order to study the applicability of double-layer LSTM model to PM₁₀ prediction in different regions of China, we applied our DLP1–LSTM model to several typical metropolitans that represent different features of pollution in China. They are Beijing, Taiyuan, Shanghai, Nanjing and Guangzhou. We set up the configuration of using observation data from cities within 100 km of circle around of the target city and 1 d before the prediction day. The training convergence curves (figure S2) show that the DLP1–LSTM prediction model gradually converges for all five target cities. This further reflects the good applicability of the prediction model in China.

Figure 2 is the time series diagram of the predictions and observations of all five cities. In general, the predictions match the trends of observations of all five cities. The predictions could match most of the peaks and valleys. We also know from figure 2 that the observed PM₁₀ of Guangzhou is relatively lower than that of other cities and has fewer sharp changes. Figure S3 shows that in the representative cities, the predicted values of the DLP1–LSTM is roughly positively correlated with the observations, and the slope of the fitting curve is close to 1. In addition, Nanjing has the highest R² (0.828), while Shanghai has the lowest R² (0.642) among these five cities. The predictions of PM₁₀ concentrations in these cities all have good correlations (R = 0.81–0.91) with observations. Guangzhou has the smallest RMSE, while Taiyuan has the largest RMSE.

In summary, our configuration of double LSTM model is able to make accurate predictions of daily PM₁₀ concentrations for various city metropolitans within a wide range of pollution levels in China.

4.2. DLP1–LSTM model performance of next-day trend prediction at different pollution levels

To further explore the predictive power of the DLP1–LSTM model, we analyzed the sensitivities of the model to different levels of pollution and also its performance in the five representative cities of different regions. We divided the pollution into three levels: mild (0–50 μg m⁻³), moderate (50–100 μg m⁻³) and polluted (>100 μg m⁻³), and then tested the prediction accuracy of our model on next-day trend (increase or decrease from the previous day) for different pollution levels. Overall, the model has the best trend prediction capability for PM₁₀ concentration in heavy pollution situation (table S7).

Figure S4 compares the next-day changes of predictions to observations. Positive value means the PM₁₀ concentration increases when compared to the previous day, while negative value means it decreases when compared to the previous day. Beijing has more days with moderate and heavy pollution, Nanjing has more days with moderate and mild pollution, and Taiyuan has more days with heavy pollution. The prediction accuracy of the next-day trend is much better for polluted cases of these three cities. However, Guangzhou has the most days of mild pollution, and the corresponding correlation of prediction of mild pollution is much better than other cities (figure S3 and table 2). Shanghai has the lowest accuracy of the next-day trend forecast, at about 70%. The prediction ability of the model is weak for mild (R² = 0.12) and moderate (R² = 0.17) pollution days.

Table 2. Prediction accuracy of next-day trend on increasing or decreasing for five representative cities at three pollution levels: 0–50 μg m⁻³ (mild), 50–100 μg m⁻³ (moderate), >100 μg m⁻³ (polluted).

The city	Beijing			Taiyuan			Nanjing			Guangzhou			Shanghai
The levels (μg m⁻³)	0–50	50–100	>100	0–50	50–100	>100	0–50	50–100	>100	0–50	50–100	>100	0–50	50–100	>100
Numbers	73	143	85	18	89	194	104	130	67	157	128	16	158	120	23
Percentage of total(%)	24.33	47.67	28.33	29.67	29.67	64.67	34.67	43.33	22.33	52.33	42.67	5.33	52.67	40.00	7.67
RMSE (μg m⁻³)	24.39	29.56	33.47	19.76	24.76	29.18	14.70	16.05	20.48	9.44	11.78	14.10	17.34	23.15	24.97
R²	0.13	0.17	0.86	0.14	0.23	0.74	0.34	0.41	0.70	0.51	0.58	0.32	0.12	0.17	0.64
Accuracy	77.30%			76.00%			74.33%			79.00%			69.67%
R²	0.79			0.81			0.85			0.86			0.56
RMSE (μg m⁻³)	29.59			27.45			16.71			10.77			20.47

It can be seen from figure S3 and table 2 that the prediction ability of next-day trend in Shanghai is weaker than that in the other cities. It is due to the large coastal area of Shanghai, and its landform is mostly plain, which is conducive to the diffusion of pollutants. In addition, there is a lot of rain in summer, especially in the Meiyu season. The relative humidity is high and is negatively correlated with the concentration of PM₁₀ (Jian et al 2019), that is, it is conducive to the settlement of PM₁₀. Moreover, Shanghai has less pollution in general and fewer days of heavy pollution, which reduces the ability of the model to capture its characteristics. Guangzhou, which also faces the sea and has a relatively lower degree of pollution, is special with more hills and high in the northeast and low in the southwest. As a result, the terrain of Guangzhou is not conducive to the diffusion of pollutants and the DLP1–LSTM model is able to capture the data features well.

In general, the prediction accuracy of increasing or decreasing of the next day is about 70%–80% when the PM₁₀ concentration of the current day is known, and it varies slightly among different representative cities. The model is more suitable for predicting heavy pollution events (PM₁₀ concentration >100 μg m⁻³) with some regional differences.

5. Comparison of LSTM with other methods

In this section, we compared the prediction performances of LSTM with several traditional statistical prediction methods (OLSR, Bayesian ridge regression, SVR) as well as two machine learning methods (MLP, Random Forest Regression). Here we show the comparison results in Guangzhou as an example. We set up the same input factors for DLP1–LSTM and the other five methods, which are the meteorological factors and pollution factors recorded in the previous day of Guangzhou as well as partial stations (Zhongshan city, Dongguan city, Huizhou city and Foshan city) within 100 km around it (table 1). Similarly, the first 70% of the data set is the training set while the remaining 30% is the test set.

In the case of Guangzhou, the DLP1–LSTM method does a much better job than the other five methods on predicting the daily PM₁₀ concentrations with R² equals to 0.838 while other methods only have R² in the range of 0.466–0.537 (figure 3). From the time series plots (figure S5), it is also obvious that the performance of DLP1–LSTM model is much better than the other five models which show large discrepancies with observations and always miss the peaks, either behind or ahead.

**Figure 3.** Comparison of predicted and observed daily PM₁₀ concentrations in Guangzhou with six different prediction methods. Regression lines are shown as blue color. The histogram bar of predictions and observations are located on the right and top of the box, respectively.
Download figure:
Standard image High-resolution image

Besides the methods we applied in this section, there are many combined methods applied by other studies in order to better predict the air quality. Guo et al (2020) explored the potential of including wavelet method in the artificial neural networks (ANNs) and evaluated the performance of several combined algorithms of wavelet and ANNs. The R² of the best prediction case of Air Pollution Index is 0.78 and 0.79 at Xi'an and Lanzhou in China, respectively. Cortina–Januchs et al (2015) used a combination of Multilayer Perceptron Neural Network and clustering algorithm to predict the daily PM₁₀ concentration at three monitoring stations of Salamanca city in Mexico with R² between 0.49 and 0.59. It specified that the combined method of ANNs with clustering algorithms had better generalization capacities than those based on a simple ANN method.

In our study, our LSTM-based method shows obvious superiority in PM₁₀ prediction and the good adaptability to different regions of China. In the future, we would like to explore the opportunity of combined methods of LSTM and data analyzing algorithms as the studies we mentioned above.

6. Conclusions

In this paper, we explored the LSTM neural network method with application in atmospheric particulate pollution prediction. Our DLP1–LSTM model shows excellent adaptability for various regions in China with different geographical conditions and PM₁₀ characteristics. First, the predictions of PM₁₀ concentrations in cities of different regions all have good correlations (R = 0.81–0.91) with observations. Second, the DLP1–LSTM model also performances well on predicting the changing trend of next-day's PM₁₀ concentration. The prediction accuracy of whether the next day would increase or decrease is 70%–80%. In addition, by dividing the pollution degree into three levels (mild: <50 μg m⁻³, moderate: 50–100 μg m⁻³ and polluted: >100 μg m⁻³), the model has the best trend prediction capability for PM₁₀ concentration in heavy pollution situation. This shows great potential for our model to contribute to the prediction, protection and regulation of seasonal concentrated pollution in PM₁₀ heavily polluted areas, such as northwest China.

Among various prediction methods (LSTM, OLSR, Bayesian ridge, SVR, MLP and Random Forest), the DLP1–LSTM model shows superior performance than the others, and it indicates the great application prospect of LSTM method on pollution forecast with temporal-correlated feature.

The significance of this study is not only the application of LSTM method for PM₁₀ daily concentration prediction, but also the great potential of implementing RNN method on better forecasting particulate matter pollution. In the future, our research will focus more on the adjustment of model structure and hyperparameters tuning in order to improve the spatio-temporal scale of model prediction (e.g. hourly prediction on a wider geographical area). In addition, the capability of DLP1–LSTM model on PM_2.5 prediction as well as other pollutants is worthy to explore.

Acknowledgments

This project is supported by National Natural Science Foundation of China Grant 41975139 and Natural Science Foundation of Guangdong Province Grant 2020A1515011133. The national urban pollution data is from the national urban air quality real-time release platform (http://106.37.208.233:20035/). The meteorological data is from the China Meteorological Data Network (https://data.cma.cn). The processed data sets for this research are available at Figshare (https://doi.org/10.6084/m9.figshare.13627961).

Data availability statement

The data that support the findings of this study are openly available at the following DOI: https://doi.org/10.6084/m9.figshare.13627961.

An LSTM-based neural network method of particulate pollution forecast in China

Article metrics

Submit

Author e-mails

Author affiliations

Dates

Peer review information

Abstract

1. Introduction