A novel dynamic ensemble air quality index forecasting system

doi:10.1016/j.apr.2020.04.010

Atmospheric Pollution Research

Volume 11, Issue 8, August 2020, Pages 1258-1270

https://doi.org/10.1016/j.apr.2020.04.010 Get rights and content

Highlights

•
A novel dynamic ensemble air quality index forecasting system is proposed.
•
The two-stage data preprocessing strategy is employed in data preprocessing module.
•
Different hybrid models are proposed to deal with different characteristics in series.
•
Two kinds of reasonable test are utilized to testify the system.
•
The results demonstrate that the proposed system has outstanding performance.

Abstract

The air quality index (AQI) can reflect the change of air quality in real time. It has linear characteristics, nonlinear and fuzzy features. However, a single model cannot fit the dynamic changes of AQI scientifically and reasonably. Therefore, this paper proposes a new dynamic ensemble forecasting system based on multi-objective intelligent optimization algorithm to forecast AQI, which has time-varying parameter weights and mainly contains three module: data preprocessing module, dynamic integration forecasting module and system evaluation module. In the data preprocessing module, the off-line frequency domain filtering approach is applied to identify and correct the outliers in the series. To better extract the series information and remove the random noise, the time series is decomposed into multi-level utilizing decomposition strategy and reconstructed. In the dynamic integration forecasting module, three hybrid models based on ARIMA, optimized extreme learning machine and fuzzy time series model, named as HCA, HCME and HCFL respectively, are used to forecast the reconstructed series and time varying parameters are employed to dynamically combine the forecasting results. In the system evaluation module, the accuracy of the system was tested by parameter test method and non-parametric test method respectively. The results demonstrate that the proposed dynamic integrated model is not only superior to other comparison models in forecasting accuracy, but also provides strong technical support for air quality forecasting and treatment.

Introduction

In recent years, frequent air pollution incidents have not only caused huge economic losses, but also caused serious harm to human health, as well as many social problems (Dong and Zeng, 2018; Vong et al., 2014; Chen et al., 2018). Effectively evaluating and forecasting changes in air quality can provide a reference and basis for forecasting, controlling and mitigating air pollution, and will also provide constructive opinions and suggestions for decision-maker to take more economical and efficient measures in improving air quality in the future (Hao and Tian, 2019). However, most of the existing researches select PM_2.5, sulfur dioxide (SO₂), nitrogen dioxide (NO₂) and other gas emissions as the measurement indicators of air pollution. The different pollutants selected will contradict the conclusions of the study, and the unreasonable selection of indicators will weaken the interpretation ability of the model (Zhu et al., 2018). The air quality index can comprehensively reflect the changes in air quality by synthesizing various pollutant indicators. Therefore, it is scientific and reasonable to select the air quality index as the research object.

Due to the complex changes in air quality, outliers and random noise are usually exist in the AQI time series. Testing and correcting the outliers existing in the series can effectively improve the forecasting accuracy, which is an essential step in establishing model. In recent years, there have been more and more researches and applications on outlier testing (Tang and He, 2017; Yuen and Ortiz, 2017; Marczak and Proietti, 2016). The time series decomposition method is also gradually applied to the inspection and elimination of random noise (Ramesh and Ramesh, 2017; Zhang et al., 2017). Eliminating the outliers and stochastic noise effects in the AQI time series is the first step in the prediction of air quality index. Establishing a reasonable analysis and forecasting model is a crucial step to objectively and reasonably forecast the AQI time series. As China's air quality continues to deteriorate, there are many studies on China's air quality forecasting, which mainly focusing on the following aspects: statistical models, machine learning models, and fuzzy time series forecasting models.

Statistical models are widely used in the analysis and prediction of air quality. Kannan and Rajola (2006) proved that the daily index is highly correlated with meteorological variables and applied linear multiple regression statistical method to forecast AQI. Donnelly et al. (2015) proposed a real-time air quality forecasting model for generating high accuracy and high computational efficiency, and successfully applied it to air quality prediction for three urban sites and one rural site; Kumar and Goyal (2011) used the principal component analysis method to reduce the dimension, and then established the ARIMA model and combination model to forecast and analyze the AQI, and obtained reasonable forecasting results. Kim (2016) applied multiple regression model to 3 years of daily AQI ozone data from a station in San Bernardino County, CA and successfully forecast day-ahead AQI. Wu et al. (2017) proposed a novel model combining grey accumulated generating technique and Holt-Winters method for air quality index forecasting to enhance forecast accuracy. An (2016) developed creative introduction of such number of replacement method to improve the Least square method GM (2,1) model which can improve the AQI forecast accuracy for a long time. However, statistical models also have many assumptions and strict data requirements, which are not suitable for long-term predictions (Qin et al., 2017; Li et al., 2019).

The machine learning model has the advantages of high forecasting accuracy and short running time in forecasting nonlinear time series, which has been successfully applied in the field of AQI prediction. Sharma et al. (2003) developed two mathematical models using neural network to forecast AQI for following three days and get satisfactory results. Kang et al. (2010) performed the real-time Kalman Filter bias-adjusted prediction for both O₃ and PM_2.5 which have as well as or even better than the previous studies from archived air quality forecast in earlier years. Chaudhuri and Chowdhury (2018) tried different neural network models to select the best model to forecast the AQI of Kolkata and the radial basis functional model is found to be the best network model for the purpose. Machine learning models also have disadvantages, for example, they are prone to fall into a local optimal or over-fitting situation, leading to situations where the prediction error is too large (Du et al., 2019).

The fuzzy time series forecasting model has a significant ability to handle the uncertainty and ambiguity inherent in the data collection process (Carvalho and Costa, 2017). However, the traditional time series forecasting analysis assumes that the uncertainty of the data is completely described by randomness, ignoring the ambiguity of the sample data itself. The fuzzy time series forecasting model can take into account the ambiguity of the data set and is of great significance for improving the forecasting accuracy. In recent years, fuzzy time series forecasting approaches have also achieved a lot of results in air quality forecasting (Rahman et al., 2015; Domańska and Wojtylak, 2012).

Any single model has advantages and disadvantages and none model is perfect. In order to overcome the shortcomings of a single model and combine the advantages of each single model, a lot of hybrid forecasting models are proposed. Hybrid forecasting models usually integrate data preprocessing techniques, optimization techniques, and forecasting techniques to improve forecasting accuracy. Zhou et al. (2019) obtains satisfactory results from a hybrid forecasting system which integrates the data preprocessing, model selection strategy and optimization technology. Zhu et al. (2017) experimentally proved that the hybrid model based on support vector machine can accurately forecast the AQI; Wang et al. (2017) combined a two-stage decomposition technique and a differential evolution algorithm with an extreme learning machine (ELM) to construct a new model of AQI prediction through parameter optimization, which provides information support for the prevention and control of air pollution. Kumar and Goyal (2013) forecasts the daily AQI through a neural network based on principal component analysis. Zhang and Yuan (2015) predicted air quality based on a Spark implementation of random forest algorithm and the proposed method is evaluated with real meteorology data obtained from Beijing. In order to further absorb the advantages of different types of models, combined models are increasingly used in the field of prediction. Ganesh et al. (2017) proposed an ensemble model which combined artificial neural networks and regression models and achieves highest efficiency in terms of forecasting air quality index.

The determination of weight is the most critical step in the modeling of combined forecasting models (Jiang and Liu, 2019; Niu and Wang, 2019). Usually, there are two ways to determine the weights: fixed weight and dynamic weight. According to the characteristics of the individual forecasting model, the fixed weights are weighted by reasonable methods. The methods of weighting usually include arithmetic average method, least squares method and optimization algorithm optimization. The dynamic weight refers to the change of the weight coefficient with the observation value, that is, the weight coefficient is a dynamic time series. Dynamic weights can adapt to the characteristics of data changes and can be empowered. Therefore, the obtained weight coefficient is more reasonable, and the combined forecasting model obtained has higher forecasting accuracy.

In summary, the characteristics of the current forecasting models can be summarized as follows:

1)
Each single model has its own disadvantages, such as not suitable for nonlinear time series prediction, easily falls into local optimization, prone to overfitting and more.
2)
The hybrid model combines data pre-processing or optimization techniques to optimize the forecasting model in the modeling process and improve the performance of the model. Because it is built on the basis of a single forecasting model, it cannot process data with multiple characteristics.
3)
The combined forecasting model can integrate the forecasting results of heterogeneous forecasting models, which can not only process the multi-dimensional features of the data, but also be more accurate in forecasting error.
4)
Traditional combined forecasting models are often based on fixed weight coefficients, such as arithmetic average weights, optimization algorithm based weights, non-negative constraint weights, and so on.

Therefore, based on the above analysis, this paper proposes a novel dynamic ensemble air quality index forecasting system to predict and analyze the AQI series of the three cities. The proposed system mainly contains three modules: data preprocessing module, dynamic integration forecasting module and system evaluation module. In the data preprocessing module, offline frequency domain filtering method and complete ensemble empirical mode decomposition with adaptive noise are employed to identify outliers and noise in the original series and correct them. Then the reconstructed series obtained. In dynamic integration forecasting module, to process the multi-dimensional characteristics of the data, three hybrid forecasting models (HCA, HCME and HCFL) are proposed as the basic forecasting models of the combined forecasting model. Specifically, multi-objective optimized algorithm are used to optimize the flexible parameters. Then three kinds of dynamic weights are applied to integrate the forecasting results of the three hybrid models. In system evaluation module, Diebold–Mariano (DM) test) and Wilcoxon rank-sum (WRS) test are applied to testify the effectiveness of the proposed system. Experimental results shows that the proposed model outperforms other benchmark models with high accuracy and stability and the proposed system can provided accurate air quality information to environmental decision makers.

The main findings and contributions of the paper are described as follows:

(1)
A novel dynamic ensemble air quality index forecasting system is developed. The proposed forecasting system mainly contains three modules: data preprocessing module, dynamic integration forecasting module and system evaluation module, whose weights coefficients can adapt according to data changes to improve forecasting accuracy.
(2)
The two-stage data preprocessing strategy is creatively proposed to address the problems of outliers and noise. The data preprocessing strategy can better extract the characteristics of the dataset while excluding the influence of outliers and noise.
(3)
Three different kinds of hybrid models are proposed as sub-forecasting models, which can deal with the time series with linear, nonlinear and fuzzy characteristics. The hybrid models proposed this paper come from three kinds of forecasting models and integrate the data preprocessing strategy and multi-objective optimization technology, which have a great improvement in forecasting accuracy.
(4)
In system evaluation module, two kinds of reasonable test are utilized to testify the system, which proved the system has good performance. DM test and WRS test are selected to evaluate the system which belong to parameter test and non-parameter test respectively. The evaluation results of the proposed system are real and reliable according to two statistical testing methods.
(5)
The results demonstrate that the proposed system with outstanding performance can provide important information support for the prevention and control of air pollution. The proposed system performs better than other comparison models in three case studies which verify the effectiveness of the system.

Section snippets

The development of the dynamic ensemble air quality index forecasting system

The proposed dynamic ensemble air quality index forecasting system mainly contains three module: data preprocessing module, dynamic integration forecasting module and system evaluation module. The detailed forecasting process is introduced in this section.

The system evaluation

The corresponding statistical test is very necessary to evaluate the forecasting performance of the system. In this paper, two statistical test methods, such as the parameter test method (Diebold–Mariano test (Diebold and Mariano, 1995)) and non-parametric method (Wilcoxon rank-sum test), are used to testify the forecasting performance of the system.

The DM test is mainly used to compare the results of the two forecasting models for significant differences. The null hypothesis is that there is

Empirical analysis

In this paper, the air quality index is forecasted by constructing a dynamic ensemble forecasting system based on the two-stage data preprocessing strategy. The following is the experimental process and the analysis of the forecasting results.

Conclusion

Based on the time-varying parameter weight theory, a dynamic ensemble forecasting system based on multi-objective intelligent optimization algorithm is proposed to forecast the AQI series from Shijiazhuang, Zhengzhou, and Guangzhou. The proposed system mainly contains of three modules: data preprocessing module, dynamic integration forecasting module and system evaluation module. In the data preprocessing module, the off-line frequency domain filtering are employed to test and correct the

CRediT authorship contribution statement

Hongmin Li: Conceptualization, Software, Writing - original draft. Jianzhou Wang: Methodology, Writing - review & editing. Hufang Yang: Visualization, Software.

Declaration of competing interest

The authors declare that there is no conflict of interest with regard to the publication of this paper.

Acknowledgements

This work was supported by Major Program of National Social Science Foundation of China (Grant No.17ZDA093).

References (51)

D.P. Allen
A frequency domain Hampel filter for blind rejection of sinusoidal interference from electromyograms
J. Neurosci. Methods
(2009)
D.P. Allen et al.
Suppression of deep brain stimulation artifacts from the electroencephalogram by frequency-domain Hampel filtering
Clin. Neurophysiol.
(2010)
J.G. Carvalho et al.
Identification method for fuzzy forecasting models of time series
Appl. Soft Comput. J
(2017)
S. Chen et al.
Air pollution, student health, and school absences: evidence from China
J. Environ. Econ. Manag.
(2018)
F. Cheng et al.
The VEC-NAR model for short-term forecasting of oil prices
Energy Econ.
(2019)
D. Domańska et al.
Application of fuzzy time series models for forecasting pollution concentrations
Expert Syst. Appl.
(2012)
K. Dong et al.
Public willingness to pay for urban smog mitigation and its determinants: a case study of Beijing, China
Atmos. Environ.
(2018)
A. Donnelly et al.
Real time air quality forecasting using integrated parametric and non-parametric regression techniques
Atmos. Environ.
(2015)
P. Du et al.
A novel hybrid model for short-term wind power forecasting
Appl. Soft Comput. J
(2019)
Y. Hao et al.
The study and application of a novel hybrid system for air quality early-warning
Appl. Soft Comput. J
(2019)

Cited by (19)

Wood planer control: Predictive and prescriptive approaches via Automatic State Matching Gaussian processes
2024, Engineering Applications of Artificial Intelligence
We present a novel artificial intelligence approach that encompasses both predictive and prescriptive aspects for the challenging task of model-based control of industrial wood planers. These sophisticated lumber finishing machines are known for the complexity of their operation, and the available data pertaining to the planing process exhibits complex, non-linear patterns. First, we leverage an ensemble of Gaussian Processes with a specialized weighting scheme named Automatic State Matching, achieving a 39% reduction in prediction error for the thickness of the outgoing board compared to conventional industry methods, as corroborated by real-world data. Subsequently, the predictive strategy is utilized in a novel robust control strategy which exploits the properties of Gaussian Processes to prescribe settings for wood planers. An empirical evaluation on simulated data demonstrated the viability of our prescriptive method, resulting in an 83% reduction in deviation from a predetermined target dimension.
A new perspective on air quality index time series forecasting: A ternary interval decomposition ensemble learning paradigm
2023, Technological Forecasting and Social Change
Accurate forecasting of the air quality index (AQI) plays a crucial role in taking precautions against upcoming air pollution risks. However, air quality may fluctuate greatly in a certain period. Existing forecasting approaches always face the problem of losing valuable information on air quality status, even in the interval models of recent research. To address this issue, this paper suggests a new AQI forecasting perspective and paradigm built upon ternary interval-valued time series (TITS), multivariate variational mode decomposition (MVMD), multivariate relevance vector machine (MVRVM), mixed coding particle swarm optimization (MCPSO), and meteorological factors, which is able to capture the trend and volatility changes of AQI concurrently. The proposed paradigm involves four procedures: TITS construction in terms of the daily minimum, daily mean, and daily maximum AQI, multi-scale decomposition via MVMD, individual forecasting by MCPSO-optimized MVRVM, and ensemble learning forecasting using a simple addition approach. Experiments based on datasets collected from four municipalities in China demonstrated that the presented paradigm can hit higher accuracy than other comparable models, and the application analysis also shows that it has application potential in the AQI online forecasting system. To conclude, the proposed paradigm provides a promising alternative for AQI time series forecasting.
A statistical learning framework for spatial-temporal feature selection and application to air quality index forecasting
2022, Ecological Indicators
Citation Excerpt :
Alimissis et al. (2018) found that compared with MLR, artificial neural network has higher prediction accuracy under the condition of limited air quality network density. Li et al. (2020) proposed a new dynamic ensemble forecasting system based on machine learning to forecast the AQI, which generates accurate air quality forecasting. As for the choice of machine learning algorithms, many scholars prefer SVM to predict air quality because of its flexibility and scalability.
Accurate air quality index (AQI) forecasting makes a difference to public health, local economic development, and ecological environment. As a typical geographical datum, the spatial autocorrelation (SAC) of the AQI is often ignored, which may violate the assumptions of some models, such as machine learning which requires variables to be independent and identically distributed. Considering the strong SAC of the AQI, this study proposes a novel statistical learning framework integrating SAC variables, feature selection, and support vector regression (SVR) for AQI prediction in which correlation analysis and time series analysis are used to extract the spatial-temporal features. In addition, the historical AQI series of the target site is adjusted by using trigonometric regression to eliminate the non-stationarity. To further improve prediction accuracy, a feature selection method combining reinforcement learning with a heuristic algorithm is adopted. To demonstrate the effectiveness of our proposed framework, we select the AQI data of 34 cities from the Yangtze River Delta, which is one of the most polluted areas in eastern China, and focus on the three largest cities, Nanjing, Hangzhou, and Shanghai. We compared the proposed framework with several baselines, and the experiment illustrates that the forecasting accuracy of the proposed framework is significantly better than the baselines at all selected key sites that can provide accurate predictions for air quality.
A dynamic ensemble deep deterministic policy gradient recursive network for spatiotemporal traffic speed forecasting in an urban road network
2022, Digital Signal Processing: A Review Journal
Citation Excerpt :
According to the literature survey, in a multi-objective optimization algorithm, the most paramount task in the process of dynamic ensemble modeling is to continuously choose the Pareto optimal solution. At present, the selected results for Pareto optimal solutions are mainly determined based on artificial parameters, which reduces the adaptability and generalization performance of these ensemble models [38]. Therefore, it is of great significance to select an intelligent algorithm that can adaptively optimize the Pareto optimal solution of a multi-objective optimization algorithm.
Traffic congestion is a difficult problem that restricts the construction of urbanization. Spatiotemporal traffic speed forecasting technologies can provide effective technical support for alleviating traffic congestion and ensuring vehicle travel safety. The ensemble learning algorithm is a hot topic in traffic speed modeling. In this field, previous ensemble learning methods mainly adopt the principle of static modeling, which limits the learning ability of the model to dynamic features. To solve this problem, in this paper, a new dynamic ensemble deep deterministic policy gradient recursive network is presented for traffic speed forecasting, which comprises three main modeling steps. In step I, the simple recursive network (SRU) and temporal convolution network (TCN) methods are used as the main predictors to build the traffic speed forecasting model. In step II, the multi-objective imperialist competitive algorithm (MOICA) integrates these neural networks by optimizing the weight coefficients and generating the Pareto solution set. In step III, the deep deterministic policy gradient (DDPG) method dynamically selects the Pareto optimal solution of the MOICA according to the changes in the traffic speed data. The MOICA and DDPG dynamically integrate the forecasting results from the SRU and TCN to obtain the final results. Based on the experimental analysis results, several conclusions can be given as follows: (a) the model presented in this paper can obtain accurate traffic speed forecasting results with MAPE values below 4% on all data sets. (b) the proposed model can achieve better results than thirteen alternative models and four proposed models from other researchers. (c) the proposed model can improve the prediction performance of traditional predictors by about 6%.
A new PM2.5 forecasting model based on data preprocessing, reinforcement learning and gated recurrent unit network
2022, Atmospheric Pollution Research
Citation Excerpt :
Hence, efficient and accurate forecasting of PM2.5 concentration is of great significance. To achieve accurate forecasting of PM2.5 concentration, many prediction models have been proposed by scholars (Li et al., 2020; Liu et al., 2021a). These models include mechanism models, statistical models, and artificial intelligence (AI) models.
Accurate PM2.5 forecasting is of great significance to atmosphere pollution monitoring and control. To accurately predict PM2.5 concentration, a novel hybrid model is proposed. Our novel model includes the following three modeling processes: In stage I, a novel secondary decomposition method is adopted to decompose the raw PM2.5 data into several subseries. In stage II, a feature selection method based on reinforcement learning selects optimal features of each subseries for the predictor. In stage III, the selected features are input into a gated recurrent unit network and output the final forecasting result of all subseries. The experimental results of the paper on different data sets have verified that: (1) The proposed feature selection method based on reinforcement learning can select the optimal features, and our method outperforms the traditional feature selection method in the forecasting accuracy; (2) The novel model has excellent prediction performance in all cases and can obtain the optimal forecasting accuracy compared with twenty benchmark models and three state-of-the-art models.
Automatic State Matching Gaussian Process Ensemble for Wood Planer Control
2022, IFAC-PapersOnLine
Wood planers are high speed sophisticated lumber finishing machines that are difficult to operate and for which the available data shows complex, non-linear patterns. We present a machine learning approach to build a control loop for an industrial wood planer. In order to predict the thickness of the outgoing boards with better accuracy than the industry standard whilst allowing dynamic planer adjustments, we use an ensemble of Gaussian Processes with a specialized weighting scheme we call Automatic State Matching. It reduces the prediction error by 39% compared to current industrial practice.

View all citing articles on Scopus

: Peer review under responsibility of Turkish National Committee for Air Pollution Research and Control.

View full text

A novel dynamic ensemble air quality index forecasting system

Highlights

Abstract

Introduction

Section snippets

The development of the dynamic ensemble air quality index forecasting system

The system evaluation

Empirical analysis

Conclusion

CRediT authorship contribution statement

Declaration of competing interest

Acknowledgements

J. Neurosci. Methods

Clin. Neurophysiol.

Appl. Soft Comput. J

J. Environ. Econ. Manag.

Energy Econ.

Expert Syst. Appl.

Atmos. Environ.

Atmos. Environ.

Appl. Soft Comput. J

Appl. Soft Comput. J

Neurocomputing

Appl. Soft Comput. J

Atmos. Environ.

Environ. Model. Assess.

Sci. Total Environ.

Neurocomputing

J. Clean. Prod.

Int. J. Forecast.

Appl. Energy

Knowl. Base Syst.

Neurocomputing

Data Knowl. Eng.

Neurocomputing

Sci. Total Environ.

Comput. Methods Appl. Mech. Eng.