A novel dynamic ensemble air quality index forecasting system
Introduction
In recent years, frequent air pollution incidents have not only caused huge economic losses, but also caused serious harm to human health, as well as many social problems (Dong and Zeng, 2018; Vong et al., 2014; Chen et al., 2018). Effectively evaluating and forecasting changes in air quality can provide a reference and basis for forecasting, controlling and mitigating air pollution, and will also provide constructive opinions and suggestions for decision-maker to take more economical and efficient measures in improving air quality in the future (Hao and Tian, 2019). However, most of the existing researches select PM2.5, sulfur dioxide (SO2), nitrogen dioxide (NO2) and other gas emissions as the measurement indicators of air pollution. The different pollutants selected will contradict the conclusions of the study, and the unreasonable selection of indicators will weaken the interpretation ability of the model (Zhu et al., 2018). The air quality index can comprehensively reflect the changes in air quality by synthesizing various pollutant indicators. Therefore, it is scientific and reasonable to select the air quality index as the research object.
Due to the complex changes in air quality, outliers and random noise are usually exist in the AQI time series. Testing and correcting the outliers existing in the series can effectively improve the forecasting accuracy, which is an essential step in establishing model. In recent years, there have been more and more researches and applications on outlier testing (Tang and He, 2017; Yuen and Ortiz, 2017; Marczak and Proietti, 2016). The time series decomposition method is also gradually applied to the inspection and elimination of random noise (Ramesh and Ramesh, 2017; Zhang et al., 2017). Eliminating the outliers and stochastic noise effects in the AQI time series is the first step in the prediction of air quality index. Establishing a reasonable analysis and forecasting model is a crucial step to objectively and reasonably forecast the AQI time series. As China's air quality continues to deteriorate, there are many studies on China's air quality forecasting, which mainly focusing on the following aspects: statistical models, machine learning models, and fuzzy time series forecasting models.
Statistical models are widely used in the analysis and prediction of air quality. Kannan and Rajola (2006) proved that the daily index is highly correlated with meteorological variables and applied linear multiple regression statistical method to forecast AQI. Donnelly et al. (2015) proposed a real-time air quality forecasting model for generating high accuracy and high computational efficiency, and successfully applied it to air quality prediction for three urban sites and one rural site; Kumar and Goyal (2011) used the principal component analysis method to reduce the dimension, and then established the ARIMA model and combination model to forecast and analyze the AQI, and obtained reasonable forecasting results. Kim (2016) applied multiple regression model to 3 years of daily AQI ozone data from a station in San Bernardino County, CA and successfully forecast day-ahead AQI. Wu et al. (2017) proposed a novel model combining grey accumulated generating technique and Holt-Winters method for air quality index forecasting to enhance forecast accuracy. An (2016) developed creative introduction of such number of replacement method to improve the Least square method GM (2,1) model which can improve the AQI forecast accuracy for a long time. However, statistical models also have many assumptions and strict data requirements, which are not suitable for long-term predictions (Qin et al., 2017; Li et al., 2019).
The machine learning model has the advantages of high forecasting accuracy and short running time in forecasting nonlinear time series, which has been successfully applied in the field of AQI prediction. Sharma et al. (2003) developed two mathematical models using neural network to forecast AQI for following three days and get satisfactory results. Kang et al. (2010) performed the real-time Kalman Filter bias-adjusted prediction for both O3 and PM2.5 which have as well as or even better than the previous studies from archived air quality forecast in earlier years. Chaudhuri and Chowdhury (2018) tried different neural network models to select the best model to forecast the AQI of Kolkata and the radial basis functional model is found to be the best network model for the purpose. Machine learning models also have disadvantages, for example, they are prone to fall into a local optimal or over-fitting situation, leading to situations where the prediction error is too large (Du et al., 2019).
The fuzzy time series forecasting model has a significant ability to handle the uncertainty and ambiguity inherent in the data collection process (Carvalho and Costa, 2017). However, the traditional time series forecasting analysis assumes that the uncertainty of the data is completely described by randomness, ignoring the ambiguity of the sample data itself. The fuzzy time series forecasting model can take into account the ambiguity of the data set and is of great significance for improving the forecasting accuracy. In recent years, fuzzy time series forecasting approaches have also achieved a lot of results in air quality forecasting (Rahman et al., 2015; Domańska and Wojtylak, 2012).
Any single model has advantages and disadvantages and none model is perfect. In order to overcome the shortcomings of a single model and combine the advantages of each single model, a lot of hybrid forecasting models are proposed. Hybrid forecasting models usually integrate data preprocessing techniques, optimization techniques, and forecasting techniques to improve forecasting accuracy. Zhou et al. (2019) obtains satisfactory results from a hybrid forecasting system which integrates the data preprocessing, model selection strategy and optimization technology. Zhu et al. (2017) experimentally proved that the hybrid model based on support vector machine can accurately forecast the AQI; Wang et al. (2017) combined a two-stage decomposition technique and a differential evolution algorithm with an extreme learning machine (ELM) to construct a new model of AQI prediction through parameter optimization, which provides information support for the prevention and control of air pollution. Kumar and Goyal (2013) forecasts the daily AQI through a neural network based on principal component analysis. Zhang and Yuan (2015) predicted air quality based on a Spark implementation of random forest algorithm and the proposed method is evaluated with real meteorology data obtained from Beijing. In order to further absorb the advantages of different types of models, combined models are increasingly used in the field of prediction. Ganesh et al. (2017) proposed an ensemble model which combined artificial neural networks and regression models and achieves highest efficiency in terms of forecasting air quality index.
The determination of weight is the most critical step in the modeling of combined forecasting models (Jiang and Liu, 2019; Niu and Wang, 2019). Usually, there are two ways to determine the weights: fixed weight and dynamic weight. According to the characteristics of the individual forecasting model, the fixed weights are weighted by reasonable methods. The methods of weighting usually include arithmetic average method, least squares method and optimization algorithm optimization. The dynamic weight refers to the change of the weight coefficient with the observation value, that is, the weight coefficient is a dynamic time series. Dynamic weights can adapt to the characteristics of data changes and can be empowered. Therefore, the obtained weight coefficient is more reasonable, and the combined forecasting model obtained has higher forecasting accuracy.
In summary, the characteristics of the current forecasting models can be summarized as follows:
- 1)
Each single model has its own disadvantages, such as not suitable for nonlinear time series prediction, easily falls into local optimization, prone to overfitting and more.
- 2)
The hybrid model combines data pre-processing or optimization techniques to optimize the forecasting model in the modeling process and improve the performance of the model. Because it is built on the basis of a single forecasting model, it cannot process data with multiple characteristics.
- 3)
The combined forecasting model can integrate the forecasting results of heterogeneous forecasting models, which can not only process the multi-dimensional features of the data, but also be more accurate in forecasting error.
- 4)
Traditional combined forecasting models are often based on fixed weight coefficients, such as arithmetic average weights, optimization algorithm based weights, non-negative constraint weights, and so on.
Therefore, based on the above analysis, this paper proposes a novel dynamic ensemble air quality index forecasting system to predict and analyze the AQI series of the three cities. The proposed system mainly contains three modules: data preprocessing module, dynamic integration forecasting module and system evaluation module. In the data preprocessing module, offline frequency domain filtering method and complete ensemble empirical mode decomposition with adaptive noise are employed to identify outliers and noise in the original series and correct them. Then the reconstructed series obtained. In dynamic integration forecasting module, to process the multi-dimensional characteristics of the data, three hybrid forecasting models (HCA, HCME and HCFL) are proposed as the basic forecasting models of the combined forecasting model. Specifically, multi-objective optimized algorithm are used to optimize the flexible parameters. Then three kinds of dynamic weights are applied to integrate the forecasting results of the three hybrid models. In system evaluation module, Diebold–Mariano (DM) test) and Wilcoxon rank-sum (WRS) test are applied to testify the effectiveness of the proposed system. Experimental results shows that the proposed model outperforms other benchmark models with high accuracy and stability and the proposed system can provided accurate air quality information to environmental decision makers.
The main findings and contributions of the paper are described as follows:
- (1)
A novel dynamic ensemble air quality index forecasting system is developed. The proposed forecasting system mainly contains three modules: data preprocessing module, dynamic integration forecasting module and system evaluation module, whose weights coefficients can adapt according to data changes to improve forecasting accuracy.
- (2)
The two-stage data preprocessing strategy is creatively proposed to address the problems of outliers and noise. The data preprocessing strategy can better extract the characteristics of the dataset while excluding the influence of outliers and noise.
- (3)
Three different kinds of hybrid models are proposed as sub-forecasting models, which can deal with the time series with linear, nonlinear and fuzzy characteristics. The hybrid models proposed this paper come from three kinds of forecasting models and integrate the data preprocessing strategy and multi-objective optimization technology, which have a great improvement in forecasting accuracy.
- (4)
In system evaluation module, two kinds of reasonable test are utilized to testify the system, which proved the system has good performance. DM test and WRS test are selected to evaluate the system which belong to parameter test and non-parameter test respectively. The evaluation results of the proposed system are real and reliable according to two statistical testing methods.
- (5)
The results demonstrate that the proposed system with outstanding performance can provide important information support for the prevention and control of air pollution. The proposed system performs better than other comparison models in three case studies which verify the effectiveness of the system.
Section snippets
The development of the dynamic ensemble air quality index forecasting system
The proposed dynamic ensemble air quality index forecasting system mainly contains three module: data preprocessing module, dynamic integration forecasting module and system evaluation module. The detailed forecasting process is introduced in this section.
The system evaluation
The corresponding statistical test is very necessary to evaluate the forecasting performance of the system. In this paper, two statistical test methods, such as the parameter test method (Diebold–Mariano test (Diebold and Mariano, 1995)) and non-parametric method (Wilcoxon rank-sum test), are used to testify the forecasting performance of the system.
The DM test is mainly used to compare the results of the two forecasting models for significant differences. The null hypothesis is that there is
Empirical analysis
In this paper, the air quality index is forecasted by constructing a dynamic ensemble forecasting system based on the two-stage data preprocessing strategy. The following is the experimental process and the analysis of the forecasting results.
Conclusion
Based on the time-varying parameter weight theory, a dynamic ensemble forecasting system based on multi-objective intelligent optimization algorithm is proposed to forecast the AQI series from Shijiazhuang, Zhengzhou, and Guangzhou. The proposed system mainly contains of three modules: data preprocessing module, dynamic integration forecasting module and system evaluation module. In the data preprocessing module, the off-line frequency domain filtering are employed to test and correct the
CRediT authorship contribution statement
Hongmin Li: Conceptualization, Software, Writing - original draft. Jianzhou Wang: Methodology, Writing - review & editing. Hufang Yang: Visualization, Software.
Declaration of competing interest
The authors declare that there is no conflict of interest with regard to the publication of this paper.
Acknowledgements
This work was supported by Major Program of National Social Science Foundation of China (Grant No.17ZDA093).
References (51)
A frequency domain Hampel filter for blind rejection of sinusoidal interference from electromyograms
J. Neurosci. Methods
(2009)- et al.
Suppression of deep brain stimulation artifacts from the electroencephalogram by frequency-domain Hampel filtering
Clin. Neurophysiol.
(2010) - et al.
Identification method for fuzzy forecasting models of time series
Appl. Soft Comput. J
(2017) - et al.
Air pollution, student health, and school absences: evidence from China
J. Environ. Econ. Manag.
(2018) - et al.
The VEC-NAR model for short-term forecasting of oil prices
Energy Econ.
(2019) - et al.
Application of fuzzy time series models for forecasting pollution concentrations
Expert Syst. Appl.
(2012) - et al.
Public willingness to pay for urban smog mitigation and its determinants: a case study of Beijing, China
Atmos. Environ.
(2018) - et al.
Real time air quality forecasting using integrated parametric and non-parametric regression techniques
Atmos. Environ.
(2015) - et al.
A novel hybrid model for short-term wind power forecasting
Appl. Soft Comput. J
(2019) - et al.
The study and application of a novel hybrid system for air quality early-warning
Appl. Soft Comput. J
(2019)
Extreme learning machine: theory and applications
Neurocomputing
Variable weights combined model based on multi-objective optimization for short-term wind speed forecasting
Appl. Soft Comput. J
Real-time bias-adjusted o3 and pm2.5 air quality index forecasts and their performance evaluations over the continental United States
Atmos. Environ.
Ordinal time series model for forecasting air quality index for ozone in southern California
Environ. Model. Assess.
Forecasting of daily air quality index in Delhi
Sci. Total Environ.
Multi-label classification using a cascade of stacked autoencoder and extreme learning machines
Neurocomputing
Novel analysis–forecast system based on multi-objective optimization for air quality index
J. Clean. Prod.
Outlier detection in structural time series models: the indicator saturation approach
Int. J. Forecast.
A combined model based on data preprocessing strategy and multi-objective optimization algorithm for short-term wind speed forecasting
Appl. Energy
Red tide time series forecasting by combining ARIMA and deep belief network
Knowl. Base Syst.
A local density-based approach for outlier detection
Neurocomputing
Fuzzy time series model based on probabilistic approach and rough set rule induction for empirical research in stock markets
Data Knowl. Eng.
Predicting minority class for suspended particulate matters level by extreme learning machine
Neurocomputing
A novel hybrid model for air quality index forecasting based on two-phase decomposition technique and modified extreme learning machine
Sci. Total Environ.
Outlier detection and robust regression for correlated data
Comput. Methods Appl. Mech. Eng.
Cited by (19)
Wood planer control: Predictive and prescriptive approaches via Automatic State Matching Gaussian processes
2024, Engineering Applications of Artificial IntelligenceA new perspective on air quality index time series forecasting: A ternary interval decomposition ensemble learning paradigm
2023, Technological Forecasting and Social ChangeA statistical learning framework for spatial-temporal feature selection and application to air quality index forecasting
2022, Ecological IndicatorsCitation Excerpt :Alimissis et al. (2018) found that compared with MLR, artificial neural network has higher prediction accuracy under the condition of limited air quality network density. Li et al. (2020) proposed a new dynamic ensemble forecasting system based on machine learning to forecast the AQI, which generates accurate air quality forecasting. As for the choice of machine learning algorithms, many scholars prefer SVM to predict air quality because of its flexibility and scalability.
A dynamic ensemble deep deterministic policy gradient recursive network for spatiotemporal traffic speed forecasting in an urban road network
2022, Digital Signal Processing: A Review JournalCitation Excerpt :According to the literature survey, in a multi-objective optimization algorithm, the most paramount task in the process of dynamic ensemble modeling is to continuously choose the Pareto optimal solution. At present, the selected results for Pareto optimal solutions are mainly determined based on artificial parameters, which reduces the adaptability and generalization performance of these ensemble models [38]. Therefore, it is of great significance to select an intelligent algorithm that can adaptively optimize the Pareto optimal solution of a multi-objective optimization algorithm.
A new PM2.5 forecasting model based on data preprocessing, reinforcement learning and gated recurrent unit network
2022, Atmospheric Pollution ResearchCitation Excerpt :Hence, efficient and accurate forecasting of PM2.5 concentration is of great significance. To achieve accurate forecasting of PM2.5 concentration, many prediction models have been proposed by scholars (Li et al., 2020; Liu et al., 2021a). These models include mechanism models, statistical models, and artificial intelligence (AI) models.
Automatic State Matching Gaussian Process Ensemble for Wood Planer Control
2022, IFAC-PapersOnLine
Peer review under responsibility of Turkish National Committee for Air Pollution Research and Control.