A novel dynamic ensemble air quality index forecasting system

https://doi.org/10.1016/j.apr.2020.04.010Get rights and content

Highlights

  • A novel dynamic ensemble air quality index forecasting system is proposed.

  • The two-stage data preprocessing strategy is employed in data preprocessing module.

  • Different hybrid models are proposed to deal with different characteristics in series.

  • Two kinds of reasonable test are utilized to testify the system.

  • The results demonstrate that the proposed system has outstanding performance.

Abstract

The air quality index (AQI) can reflect the change of air quality in real time. It has linear characteristics, nonlinear and fuzzy features. However, a single model cannot fit the dynamic changes of AQI scientifically and reasonably. Therefore, this paper proposes a new dynamic ensemble forecasting system based on multi-objective intelligent optimization algorithm to forecast AQI, which has time-varying parameter weights and mainly contains three module: data preprocessing module, dynamic integration forecasting module and system evaluation module. In the data preprocessing module, the off-line frequency domain filtering approach is applied to identify and correct the outliers in the series. To better extract the series information and remove the random noise, the time series is decomposed into multi-level utilizing decomposition strategy and reconstructed. In the dynamic integration forecasting module, three hybrid models based on ARIMA, optimized extreme learning machine and fuzzy time series model, named as HCA, HCME and HCFL respectively, are used to forecast the reconstructed series and time varying parameters are employed to dynamically combine the forecasting results. In the system evaluation module, the accuracy of the system was tested by parameter test method and non-parametric test method respectively. The results demonstrate that the proposed dynamic integrated model is not only superior to other comparison models in forecasting accuracy, but also provides strong technical support for air quality forecasting and treatment.

Introduction

In recent years, frequent air pollution incidents have not only caused huge economic losses, but also caused serious harm to human health, as well as many social problems (Dong and Zeng, 2018; Vong et al., 2014; Chen et al., 2018). Effectively evaluating and forecasting changes in air quality can provide a reference and basis for forecasting, controlling and mitigating air pollution, and will also provide constructive opinions and suggestions for decision-maker to take more economical and efficient measures in improving air quality in the future (Hao and Tian, 2019). However, most of the existing researches select PM2.5, sulfur dioxide (SO2), nitrogen dioxide (NO2) and other gas emissions as the measurement indicators of air pollution. The different pollutants selected will contradict the conclusions of the study, and the unreasonable selection of indicators will weaken the interpretation ability of the model (Zhu et al., 2018). The air quality index can comprehensively reflect the changes in air quality by synthesizing various pollutant indicators. Therefore, it is scientific and reasonable to select the air quality index as the research object.

Due to the complex changes in air quality, outliers and random noise are usually exist in the AQI time series. Testing and correcting the outliers existing in the series can effectively improve the forecasting accuracy, which is an essential step in establishing model. In recent years, there have been more and more researches and applications on outlier testing (Tang and He, 2017; Yuen and Ortiz, 2017; Marczak and Proietti, 2016). The time series decomposition method is also gradually applied to the inspection and elimination of random noise (Ramesh and Ramesh, 2017; Zhang et al., 2017). Eliminating the outliers and stochastic noise effects in the AQI time series is the first step in the prediction of air quality index. Establishing a reasonable analysis and forecasting model is a crucial step to objectively and reasonably forecast the AQI time series. As China's air quality continues to deteriorate, there are many studies on China's air quality forecasting, which mainly focusing on the following aspects: statistical models, machine learning models, and fuzzy time series forecasting models.

Statistical models are widely used in the analysis and prediction of air quality. Kannan and Rajola (2006) proved that the daily index is highly correlated with meteorological variables and applied linear multiple regression statistical method to forecast AQI. Donnelly et al. (2015) proposed a real-time air quality forecasting model for generating high accuracy and high computational efficiency, and successfully applied it to air quality prediction for three urban sites and one rural site; Kumar and Goyal (2011) used the principal component analysis method to reduce the dimension, and then established the ARIMA model and combination model to forecast and analyze the AQI, and obtained reasonable forecasting results. Kim (2016) applied multiple regression model to 3 years of daily AQI ozone data from a station in San Bernardino County, CA and successfully forecast day-ahead AQI. Wu et al. (2017) proposed a novel model combining grey accumulated generating technique and Holt-Winters method for air quality index forecasting to enhance forecast accuracy. An (2016) developed creative introduction of such number of replacement method to improve the Least square method GM (2,1) model which can improve the AQI forecast accuracy for a long time. However, statistical models also have many assumptions and strict data requirements, which are not suitable for long-term predictions (Qin et al., 2017; Li et al., 2019).

The machine learning model has the advantages of high forecasting accuracy and short running time in forecasting nonlinear time series, which has been successfully applied in the field of AQI prediction. Sharma et al. (2003) developed two mathematical models using neural network to forecast AQI for following three days and get satisfactory results. Kang et al. (2010) performed the real-time Kalman Filter bias-adjusted prediction for both O3 and PM2.5 which have as well as or even better than the previous studies from archived air quality forecast in earlier years. Chaudhuri and Chowdhury (2018) tried different neural network models to select the best model to forecast the AQI of Kolkata and the radial basis functional model is found to be the best network model for the purpose. Machine learning models also have disadvantages, for example, they are prone to fall into a local optimal or over-fitting situation, leading to situations where the prediction error is too large (Du et al., 2019).

The fuzzy time series forecasting model has a significant ability to handle the uncertainty and ambiguity inherent in the data collection process (Carvalho and Costa, 2017). However, the traditional time series forecasting analysis assumes that the uncertainty of the data is completely described by randomness, ignoring the ambiguity of the sample data itself. The fuzzy time series forecasting model can take into account the ambiguity of the data set and is of great significance for improving the forecasting accuracy. In recent years, fuzzy time series forecasting approaches have also achieved a lot of results in air quality forecasting (Rahman et al., 2015; Domańska and Wojtylak, 2012).

Any single model has advantages and disadvantages and none model is perfect. In order to overcome the shortcomings of a single model and combine the advantages of each single model, a lot of hybrid forecasting models are proposed. Hybrid forecasting models usually integrate data preprocessing techniques, optimization techniques, and forecasting techniques to improve forecasting accuracy. Zhou et al. (2019) obtains satisfactory results from a hybrid forecasting system which integrates the data preprocessing, model selection strategy and optimization technology. Zhu et al. (2017) experimentally proved that the hybrid model based on support vector machine can accurately forecast the AQI; Wang et al. (2017) combined a two-stage decomposition technique and a differential evolution algorithm with an extreme learning machine (ELM) to construct a new model of AQI prediction through parameter optimization, which provides information support for the prevention and control of air pollution. Kumar and Goyal (2013) forecasts the daily AQI through a neural network based on principal component analysis. Zhang and Yuan (2015) predicted air quality based on a Spark implementation of random forest algorithm and the proposed method is evaluated with real meteorology data obtained from Beijing. In order to further absorb the advantages of different types of models, combined models are increasingly used in the field of prediction. Ganesh et al. (2017) proposed an ensemble model which combined artificial neural networks and regression models and achieves highest efficiency in terms of forecasting air quality index.

The determination of weight is the most critical step in the modeling of combined forecasting models (Jiang and Liu, 2019; Niu and Wang, 2019). Usually, there are two ways to determine the weights: fixed weight and dynamic weight. According to the characteristics of the individual forecasting model, the fixed weights are weighted by reasonable methods. The methods of weighting usually include arithmetic average method, least squares method and optimization algorithm optimization. The dynamic weight refers to the change of the weight coefficient with the observation value, that is, the weight coefficient is a dynamic time series. Dynamic weights can adapt to the characteristics of data changes and can be empowered. Therefore, the obtained weight coefficient is more reasonable, and the combined forecasting model obtained has higher forecasting accuracy.

In summary, the characteristics of the current forecasting models can be summarized as follows:

  • 1)

    Each single model has its own disadvantages, such as not suitable for nonlinear time series prediction, easily falls into local optimization, prone to overfitting and more.

  • 2)

    The hybrid model combines data pre-processing or optimization techniques to optimize the forecasting model in the modeling process and improve the performance of the model. Because it is built on the basis of a single forecasting model, it cannot process data with multiple characteristics.

  • 3)

    The combined forecasting model can integrate the forecasting results of heterogeneous forecasting models, which can not only process the multi-dimensional features of the data, but also be more accurate in forecasting error.

  • 4)

    Traditional combined forecasting models are often based on fixed weight coefficients, such as arithmetic average weights, optimization algorithm based weights, non-negative constraint weights, and so on.

Therefore, based on the above analysis, this paper proposes a novel dynamic ensemble air quality index forecasting system to predict and analyze the AQI series of the three cities. The proposed system mainly contains three modules: data preprocessing module, dynamic integration forecasting module and system evaluation module. In the data preprocessing module, offline frequency domain filtering method and complete ensemble empirical mode decomposition with adaptive noise are employed to identify outliers and noise in the original series and correct them. Then the reconstructed series obtained. In dynamic integration forecasting module, to process the multi-dimensional characteristics of the data, three hybrid forecasting models (HCA, HCME and HCFL) are proposed as the basic forecasting models of the combined forecasting model. Specifically, multi-objective optimized algorithm are used to optimize the flexible parameters. Then three kinds of dynamic weights are applied to integrate the forecasting results of the three hybrid models. In system evaluation module, Diebold–Mariano (DM) test) and Wilcoxon rank-sum (WRS) test are applied to testify the effectiveness of the proposed system. Experimental results shows that the proposed model outperforms other benchmark models with high accuracy and stability and the proposed system can provided accurate air quality information to environmental decision makers.

The main findings and contributions of the paper are described as follows:

  • (1)

    A novel dynamic ensemble air quality index forecasting system is developed. The proposed forecasting system mainly contains three modules: data preprocessing module, dynamic integration forecasting module and system evaluation module, whose weights coefficients can adapt according to data changes to improve forecasting accuracy.

  • (2)

    The two-stage data preprocessing strategy is creatively proposed to address the problems of outliers and noise. The data preprocessing strategy can better extract the characteristics of the dataset while excluding the influence of outliers and noise.

  • (3)

    Three different kinds of hybrid models are proposed as sub-forecasting models, which can deal with the time series with linear, nonlinear and fuzzy characteristics. The hybrid models proposed this paper come from three kinds of forecasting models and integrate the data preprocessing strategy and multi-objective optimization technology, which have a great improvement in forecasting accuracy.

  • (4)

    In system evaluation module, two kinds of reasonable test are utilized to testify the system, which proved the system has good performance. DM test and WRS test are selected to evaluate the system which belong to parameter test and non-parameter test respectively. The evaluation results of the proposed system are real and reliable according to two statistical testing methods.

  • (5)

    The results demonstrate that the proposed system with outstanding performance can provide important information support for the prevention and control of air pollution. The proposed system performs better than other comparison models in three case studies which verify the effectiveness of the system.

Section snippets

The development of the dynamic ensemble air quality index forecasting system

The proposed dynamic ensemble air quality index forecasting system mainly contains three module: data preprocessing module, dynamic integration forecasting module and system evaluation module. The detailed forecasting process is introduced in this section.

The system evaluation

The corresponding statistical test is very necessary to evaluate the forecasting performance of the system. In this paper, two statistical test methods, such as the parameter test method (Diebold–Mariano test (Diebold and Mariano, 1995)) and non-parametric method (Wilcoxon rank-sum test), are used to testify the forecasting performance of the system.

The DM test is mainly used to compare the results of the two forecasting models for significant differences. The null hypothesis is that there is

Empirical analysis

In this paper, the air quality index is forecasted by constructing a dynamic ensemble forecasting system based on the two-stage data preprocessing strategy. The following is the experimental process and the analysis of the forecasting results.

Conclusion

Based on the time-varying parameter weight theory, a dynamic ensemble forecasting system based on multi-objective intelligent optimization algorithm is proposed to forecast the AQI series from Shijiazhuang, Zhengzhou, and Guangzhou. The proposed system mainly contains of three modules: data preprocessing module, dynamic integration forecasting module and system evaluation module. In the data preprocessing module, the off-line frequency domain filtering are employed to test and correct the

CRediT authorship contribution statement

Hongmin Li: Conceptualization, Software, Writing - original draft. Jianzhou Wang: Methodology, Writing - review & editing. Hufang Yang: Visualization, Software.

Declaration of competing interest

The authors declare that there is no conflict of interest with regard to the publication of this paper.

Acknowledgements

This work was supported by Major Program of National Social Science Foundation of China (Grant No.17ZDA093).

References (51)

  • G.B. Huang et al.

    Extreme learning machine: theory and applications

    Neurocomputing

    (2006)
  • P. Jiang et al.

    Variable weights combined model based on multi-objective optimization for short-term wind speed forecasting

    Appl. Soft Comput. J

    (2019)
  • D. Kang et al.

    Real-time bias-adjusted o3 and pm2.5 air quality index forecasts and their performance evaluations over the continental United States

    Atmos. Environ.

    (2010)
  • S.E. Kim

    Ordinal time series model for forecasting air quality index for ozone in southern California

    Environ. Model. Assess.

    (2016)
  • A. Kumar et al.

    Forecasting of daily air quality index in Delhi

    Sci. Total Environ.

    (2011)
  • A. Law et al.

    Multi-label classification using a cascade of stacked autoencoder and extreme learning machines

    Neurocomputing

    (2019)
  • H. Li et al.

    Novel analysis–forecast system based on multi-objective optimization for air quality index

    J. Clean. Prod.

    (2019)
  • M. Marczak et al.

    Outlier detection in structural time series models: the indicator saturation approach

    Int. J. Forecast.

    (2016)
  • X. Niu et al.

    A combined model based on data preprocessing strategy and multi-objective optimization algorithm for short-term wind speed forecasting

    Appl. Energy

    (2019)
  • M. Qin et al.

    Red tide time series forecasting by combining ARIMA and deep belief network

    Knowl. Base Syst.

    (2017)
  • B. Tang et al.

    A local density-based approach for outlier detection

    Neurocomputing

    (2017)
  • H.J. Teoh et al.

    Fuzzy time series model based on probabilistic approach and rough set rule induction for empirical research in stock markets

    Data Knowl. Eng.

    (2008)
  • C. Vong et al.

    Predicting minority class for suspended particulate matters level by extreme learning machine

    Neurocomputing

    (2014)
  • D. Wang et al.

    A novel hybrid model for air quality index forecasting based on two-phase decomposition technique and modified extreme learning machine

    Sci. Total Environ.

    (2017)
  • K. Yuen et al.

    Outlier detection and robust regression for correlated data

    Comput. Methods Appl. Mech. Eng.

    (2017)
  • Cited by (19)

    • A statistical learning framework for spatial-temporal feature selection and application to air quality index forecasting

      2022, Ecological Indicators
      Citation Excerpt :

      Alimissis et al. (2018) found that compared with MLR, artificial neural network has higher prediction accuracy under the condition of limited air quality network density. Li et al. (2020) proposed a new dynamic ensemble forecasting system based on machine learning to forecast the AQI, which generates accurate air quality forecasting. As for the choice of machine learning algorithms, many scholars prefer SVM to predict air quality because of its flexibility and scalability.

    • A dynamic ensemble deep deterministic policy gradient recursive network for spatiotemporal traffic speed forecasting in an urban road network

      2022, Digital Signal Processing: A Review Journal
      Citation Excerpt :

      According to the literature survey, in a multi-objective optimization algorithm, the most paramount task in the process of dynamic ensemble modeling is to continuously choose the Pareto optimal solution. At present, the selected results for Pareto optimal solutions are mainly determined based on artificial parameters, which reduces the adaptability and generalization performance of these ensemble models [38]. Therefore, it is of great significance to select an intelligent algorithm that can adaptively optimize the Pareto optimal solution of a multi-objective optimization algorithm.

    • A new PM2.5 forecasting model based on data preprocessing, reinforcement learning and gated recurrent unit network

      2022, Atmospheric Pollution Research
      Citation Excerpt :

      Hence, efficient and accurate forecasting of PM2.5 concentration is of great significance. To achieve accurate forecasting of PM2.5 concentration, many prediction models have been proposed by scholars (Li et al., 2020; Liu et al., 2021a). These models include mechanism models, statistical models, and artificial intelligence (AI) models.

    View all citing articles on Scopus

    Peer review under responsibility of Turkish National Committee for Air Pollution Research and Control.

    View full text