Introduction

High levels of air pollutants are added by industries, vehicles and different natural/anthropogenic sources leading to air quality degradation. As discussed in (Linares et al. 2018), institutions, like World Health Organization (WHO) or the European Environment Agency have reported that being exposed to pollutants increases the risk of early death (García and Aznarte 2020). Air pollution causes harmful effects on the human’s and other animals’ health, together with damages to plants and monuments. In addition, particulate air pollution is directly related to cardiovascular (Brook et al. 2004) and respiratory (Gorai et al. 2014) disorders.

Nitrogen dioxide (NO2) commonly is derived from fossil fuel combustion and causes bronchitis, pneumonia, emphysema, etc., when entered the alveoli. Sulfur dioxide (SO2) comes from the fuel combustion, including sulfur, like coal and petroleum. At high levels, it irritates the respiratory tract make breathing difficult (Chen et al. 2012). Sulfur dioxide (SO2) is one of highly reactive gases called “oxides of sulfur” that is emitted into the atmosphere by fossil fuels (coal and petroleum products), power and industrial plants, and industrial processes, like steel and mining, burning fuels consisting of high sulfur amount because of transportation vehicles, including locomotives, ships, pulp industries, and natural sources, like volcanic emissions (Andersson et al. 2013).

Global energy consumption is constantly increasing (from 3728 Mtoe in 1965 to 12,928 Mtoe in 2014) due to an increase in population and economic growth (Aydin 2015a). The use of different types of energy sources has increased the global primary energy consumption provided by fossil sources, particularly oil, coal, and natural gas, up to 87%. The increase in energy demand for the next years (Aydin 2014) would be met not only by the increase in renewable energy source, but also by fossil fuels, mainly oil and gas (Aydin 2015b). Countries with the highest energy consumption currently constitute about 62% of the world's energy consumption. Thus, modeling their energy consumption for obtaining an estimated model of future world energy consumption is of great importance (Gokhan Aydin et al. 2016). Cities as high-density urban areas consume a high amount of energy. They occupy only 2% of the land, but use about 75% of the global energy consumption, and are responsible for 80% of the world’s greenhouse gas emission (Feng and Zhang 2012). Moreover, due the increase in fossil fuels use from the Industrial Revolution, greenhouse gases have significantly increased. Consequentially, global warming and climate changes have been the main concerns, particularly during the past two decades. The effect of global warming-related consequences affecting the global economy has been widely studied since the 1990s. International organizations aimed at decreasing the adverse effects of global warming using intergovernmental and binding laws. Carbon dioxide (CO2) has been known as the most important greenhouse gases in the Earth's atmosphere. The energy sector has been the first by the direct burning of fuels, by which a high amount of CO2 is emitted. CO2 from energy indicates nearly 60% of the emissions of anthropogenic greenhouse gas, which varies highly by country, because of diverse national energy structures (Köne and Büke 2010). Oil plays an important role in the world economy and natural gas (NG) has become a direct competitor for the first because of its environmental benefits and its effect on electricity production. In the next two decades, there will be a rise in energy demand provided by fossil fuels, and oil still plays a main role complemented with the increasing effect of NG mostly on electricity production (Aydin 2014).

In the current century, the rarity of the fossil fuels to generate electricity and the high population make energy as one of the basic needs of developing human societies. However, increased awareness and concern regarding the environmental issues resulted in the increased willingness for using energy-efficient technologies. Thus, energy is one of the most important global challenges for all countries (Mohammadi et al. 2018). The highest approved oil reserves, such as non-conventional oil deposits can be found in Venezuela, Saudi Arabia, Canada and Iran (20%, 18%, 13%, 10.6% of global reserves, respectively) (Tofigh and Abedian 2016). Based on Oil & Gas Journal, from January 2013, Iran has been estimated to have 155 billion barrels of oil reserves, over 10% of the global total reserves and 13% of Organization of the Petroleum Exporting Countries (OPEC) reserves (Tofigh and Abedian 2016) that remains approximately for 94 years (Nejat et al. 2013). The highest proved NG reserves can be found in Russia, Iran, and Qatar (24%, 16.8%, and 12.5% of global reserves), respectively. Also, Iran has been the third-largest NG reserves in Asia because of the South Pars field development. Nonetheless, it will be the seventh NG producer by 2035 (Tofigh and Abedian 2016).

The air quality forecast issue is addressed by time series assessment, in which the model selection and parametrization are highly regarded. Recently, using machine learning approaches, such as neural networks, supported via deep learning are producible (Valput et al. 2019). Gennaro et al. (2013), proposed the artificial neural network (ANN) to forecast daily PM10 concentration at regional as well as urban background areas. Feng et al. (2015), presented air mass trajectory assessment as well as wavelet transform for improving ANN prediction accuracy of daily mean PM2.5 concentration and it was applied for recognizing marked corridors, whereas wavelet transform was used for coping with the PM2.5 concentration fluctuation effectively. In (Sun and Sun 2017), authors proposed a new combined forecasting model according to principal component analysis (PCA) and least squares support vector machine (LSSVM) optimized with cuckoo search algorithm regarding the prediction of PM2.5 concentrations. In (Madaan 2019), authors provided a novel forecasting model according to Bidirectional Long Short-Term Memory Networks for the prediction of air quality via estimation of the concentration levels for different pollutants (NO2), particulate matter (PM2.5 and PM10), by which threat level can be classified for them in the next 24 h. Prasad et al. (2016), designed adaptive neuro-fuzzy inference system (ANFIS) in order to predict the daily air pollution levels of SO2, NO2, CO, ozone (O3), and PM10 in the climate of a Megacity (Howrah). In addition, Li et al. (2018) provided an intelligent model for air pollutant concentration prediction according to the weighted extreme learning machine (WELM) as well as the adaptive neuro-fuzzy inference system (ANFIS). García and Aznarte (2020) have proposed a forecasting model based on Shapley additive explanations to predict NO2 time series. Li and Jin (2018) presented a combined intelligent model based on fuzzy synthetic evaluation for the early warning monitoring of air pollutants in China. Sen et al. (2016) proposed an ARIMA forecasting model for predicting energy consumption and greenhouse gas (GHG) emission. The model applied for an Indian pig iron manufacturing organization. Ding et al. (2017) provided a new grey multivariable model to predict CO2 emission form fuel combustion in China. Wang and Ye (2017) presented a forecasting method based on nonlinear grey multivariable method for predicting Chinese carbon emission from fossil energy use. Furthermore, a discrete grey forecasting model for energy-related CO2 emissions prediction in China from 2011 to 2015 implemented by (Ding et al. 2020). A hybrid intelligent prediction model based on semi-experimental regression approach as well as ANFIS for air pollution forecasting has been applied by (Zeinalnezhad et al. 2020). Maleki et al. (2019) applied a forecasting model based on ANN for criteria air pollutant concentrations forecasting such as O3, NO2, SO2, PM10, PM2.5, CO, AQI, and AQHI. In (Say and Yücel 2006), Turkey’s energy sector reviewed from 1970 to 2002. The total energy consumption (TEC) was modeled through the economic growth (proxied by gross national product—GNP) as well as population growth as the main factors for determining the energy consumption in developing countries. Also, they reviewed the relationship between the TEC and total CO2 (TCO2) emission. Accordingly, they modeled, the potent association between TEC and TCO2 (R2 = 0.998) using regression analysis. In addition, a regression model can predict the TEC according to the population as well as the GNP with high confidence (R2 = 0.996). Kumar et al. (2017) provided American Meteorological Society/Environmental Policy Agency Regulatory Model (AERMOD) in order to predict short-term air quality considering weather forecasting based on WRF method. The comprehensive emission inventory was provided regarding the sources in Chembur, Mumbai. In (Kumar et al. 2016)), vehicular pollution modeling presented with AERMOD using simulated wheatear meteorology through Weather Research and Forecasting method. NOx and PM levels were 3.6 and 1.45 times greater in peak time compared with off-peak and evening peak, respectively. Tao et al. (2019), proposed a new short-term forecasting model considering deep learning to PM2.5 concentration. The proposed model has been applied for the Beijing PM2.5 dataset. Liu et al. (2020) presented a new wind-sensitive attention structure using the long short-term memory (LSTM) neural network method for forecasting the air pollution—PM2.5 level. A combined air quality early warning model that includes estimation, forecasting, and assessment is provided by (Jiang et al. 2019). Hähnela et al. (2020) developed a deep-learning framework to monitor and forecast air-pollution that can train across various model domains. In (Ding et al. 2021), a new selection model considering the cooperative data index has been developed to characterize the optimum forecasting group from different forecasts. Agarwal et al. (2020) provided a prediction method based on artificial neural networks for forecasting PM10, PM2.5, NO2, and O3 pollutant level for the current day and the next four days in an area with high pollution.

In the light of the abovementioned state of the art, the main contributions and novelty of this paper are presented as follows:

  1. (a)

    The air pollution from manufacturing industry production has been analyzed in different months based on various input variables.

  2. (b)

    Since input variables plays very important roles in the performance of the forecasting model, the best features as input variable have been selected using mutual information model in this paper.

  3. (c)

    A novel forecasting application (long short-term memory network and multi-verse optimization algorithm) has been proposed to predict the amount of produced NO2 and SO2 by the Combined Cycle Power Plant. In this model, the LSTM parameters are optimized using the MVO metaheuristic algorithm.

  4. (d)

    The proposed forecasting application has been successfully verified on the real datasets. In addition, the forecasting model is compared with the other valid benchmark models such as (LSTM-PSO, ENN-MVO, and ENN-PSO).

  5. (e)

    The best advantage of the proposed prediction application compared with the methods presented in recent studies is the use of a combined deep learning method. In particular, in the proposed application, deep learning models and optimization algorithms have been used and integrated with the method of mutual information to select the best features as the model inputs.

This paper is organized as follows: Sect. 2 explains the case study and artificial intelligence methods. Section 3 describes the proposed air pollution forecasting application. The results and discussion provided in Sect. 4. Finally, Sect. 5 presents research conclusions.

Materials and methods

After a brief explanation of the case study and data gathering, this section explains the artificial intelligence methods developed in the paper.

Case study

The Kerman Combined Cycle Power Plant is located in Kerman, Iran. The latitude and longitude are 56.7904° and 30.2091°, respectively.

The Kerman Combined Cycle Power Plant, with a rated power equivalent to 1912 (MW), includes eight gas units, each with a rated power of 159 (MW) along with 4 steam units, each with a rated power of 160 (MW). Each steam contains two vertical heat recovery steam generator. The exhaust steam is transferred to a steam turbine from two boilers, each boiler consisting of two high and low pressure drums as well as a deaerator drum.

The main fuel of this power plant is natural gas, while its support fuel is gas oil (diesel), which is stored in 20 million liter tanks. In this regard, gas at 70% and diesel at 30% are the fuel of this power plant. Figure 1 shows the Kerman Combined Cycle Power Plant. The data has been collected based on the different stacks output. Moreover, the proposed forecasting model has been tested by two different datasets: (Case A) is the NO2 and SO2 produced by stack 1; (Case B) is the NO2 and SO2 produced by stack 2. Based on these two cases, we can analysis the level of air pollution and better evaluate the proposed model. The datasets including the wind speed, air temperature, NO2, and SO2 for five months (May–September 2019) with a time step of 3-h. The location of Kerman Combined Cycle Power Plant is indicated in Fig. 1. In addition, Fig. 2 represents the Windrose diagram based on wind speed and wind direction data.

Fig. 1
figure 1

Location of Kerman combined cycle power plant

Fig. 2
figure 2

The Windrose diagram

Mutual Information

Various indicators are developed to calculate statistical dependence, including Mutual information (MI). Shannon, as the creators of the entropy (Shannon 1948), proposed the following formula to calculate the Shannon entropy (which is also known as information content).

$$ H\left( X \right) = - \mathop \sum \limits_{i = 1}^{N} p\left( {x_{i} } \right)\log \left[ {p\left( {x_{i} } \right)} \right] $$
(1)

Yang et al. (2000) determined the MI between two randomly selected variables.

$$ {\text{MI}}\left( {X,Y} \right) = H\left( X \right) + H\left( Y \right) - H\left( {X,Y} \right) $$
(2)

Here \(H\left(X\right), H\left(Y\right)\), and \(H\left(X,Y\right)\) are equal to the entropy of X, Y, and their common entropy, as follows:

$$ H\left( {X,Y} \right) = - \mathop \sum \limits_{x \in X} \mathop \sum \limits_{y \in Y} p_{XY} \left( {x,y} \right) $$
(3)

Selecting appropriate variables is a crucial step in developing models. Analyzing the previously observed trends is a valuable source for identifying important factors. Babel and colleagues argued that access to mutual information is a vital source to identify appropriate factors (Babel et al. 2015). In the present research, MI was used to select the best features of the model and to evaluate the appropriateness of used lagged values. Table 1 pinpoints strong correlations between NO2 and SO2 and all the other indicators.

Table 1 The correlation coefficient of NO2 and SO2 with other indicators

Elman neural network

Elman neural network (ENN) is a kind of self-recursive neural network that is usually composed of two hidden layers and output (Elman 1990). This network is combined by a hidden layer and an output layer. In Elman neural network, due to the recurrent nature, the output of the hidden layer will be returned by the feedback loop to input of the hidden layer. The most important part of neural network is determining the most optimal values of parameters in the network. Network parameters are the weights and biases. Since the Elman neural network has recursive feature, if two different Elman networks with the same weight and biases and with a specified period and specific input enter into the network, they may produce a different output. Figure 3 shows the Elman neural network and recursive loops. This network, like other neural networks for optimal calculation of the parameters and training the network needs a way to train the network to obtain the optimum value of parameters.

Fig. 3
figure 3

The architecture of the Elman neural network

Long short-term memory neural network

LSTM is a unique recurrent neural network (RNN) method with the RNN features, in which a series of memory cells are used for coping with the arbitrary input data and enhancing the learning process of time series. Also, it captures the long-term dependence of the input information for preventing the gradient disappearing of data transmission, leading to enhancing its ability for capturing the dynamic alterations of the time series (Yuan et al. 2019). Hochreiter and Schmidhuber (1997) developed the long short-term memory (LSTM) architecture and it was improved by Gers et al. (1999) using an extra forget gate. It is the most effective RNN architecture and is widely used. In the LSTM method, gate unites, output and input units, and memory cells, are combined (Yuan et al. 2019).

Gate units:

$$ {\text{Input}}\;{\text{gate:}}\;g_{i} = \sigma \left( {W_{ix} \cdot X_{t} + W_{ih} \cdot h_{t - 1} + b_{i} } \right) $$
(4)
$$ {\text{Forget}}\;{\text{gate:}}\;g_{f} = \sigma \left( {W_{fx} \cdot X_{t} + W_{fh} \cdot h_{t - 1} + b_{f} } \right) $$
(5)
$$ {\text{Output}}\;{\text{gate:}}\;g_{o} = \sigma \left( {W_{ox} \cdot X_{t} + W_{oh} \cdot h_{t - 1} + b_{o} } \right) $$
(6)

Input units:

$$ C\_{\text{in}} = \beta \left( {W_{cx} \cdot X_{t} + W_{ch} \cdot h_{t - 1} + b_{c} } \right) $$
(7)

Memory cells:

$$ C_{t} = g_{f} \cdot C_{t - 1} + g_{i} \cdot C\_{\text{in}} $$
(8)

Output unit:

$$ h_{t} = g_{o} \cdot \theta \left( {C_{t} } \right) $$
(9)
$$ y = \varphi \left( {W_{y} \cdot h_{t} + b_{y} } \right) $$
(10)

where \(\sigma \left(\bullet \right)\), \(\beta \left(\bullet \right)\), and \(\theta \left(\bullet \right)\) indicate nonlinear activation functions, respectively; \(\varphi \left(\bullet \right)\) represents the output unit function; the weight coefficient W as well as bias coefficient b are applied for establishing the relationship among the units made the model. Figure 4 shows the main structure of the LSTM network.

Fig. 4
figure 4

The main structure of LSTM

Multi-verse optimization algorithm

As a nature-based algorithm, the multi-verse optimization algorithm (MVO) is developed by Mirjalili et al. (2016). This algorithm is the main inspiration of developing the theory of multi-verse in astrophysics. Based on the MVO, several big bangs form various universes, while white holes, black holes, and wormholes connect these universes. Mirjalili argued that matters of the MVO move from a universe to another by white/black holes so that black and white holes attract and emit matters, respectively. Wormholes connect two sides of a universe. The major terms of this theory are as follows: each universe is a solution, while each solution is comprised of a series of objects, generations or iterations that are used to demonstrate the time, and the inflation rate is used to demonstrate the value of each object in a particular universe. In this theory, a solution is equal to a universe with various white/black/wormholes. To enhance the value of objects, white holes are assumed to be more likely in a particular solution, which indicates a higher value. On the other hand, black holes are more likely to be formed in objects with the worst values, which causes the transmission of values from variables with acceptable solutions. This enhances the likelihood of improving weak solutions, which in turn translates into an improved mean value of all solutions. In Eqs. 3 and 4 the core structure of the algorithm is described:

$$ x_{i}^{j} = \left\{ {\begin{array}{*{20}c} {x_{k}^{j} \quad r_{1} < {\text{NI}}\left( {U_{i} } \right)} \\ {x_{i}^{j} \quad r_{1} \ge {\text{NI}}\left( {U_{i} } \right)} \\ \end{array} } \right. $$
(11)

where \({X}_{i}^{j}\) represents the jth object of the ith universe, r1 is a random number in a predefined spectrum ranging from zero to one, NI(Ui) is equal to the normalized inflation rate of the ith universe and \({X}_{k}^{j}\) represents the jth object of the kth universe.

$$ x_{i}^{j} = \left\{ {\begin{array}{*{20}l} {x_{j} + {\text{TDR}} \times \left( {\left( {{\text{Ub}}_{b} - {\text{Lb}}_{b} } \right) \times r_{4} \times Lb_{b} } \right)\quad r_{3} < 0.5|r_{2} < {\text{WEP}}} \\ {x_{j} - {\text{TDR}} \times \left( {\left( {{\text{Ub}}_{b} - {\text{Lb}}_{b} } \right) \times r_{4} \times Lb_{b} } \right)\quad r_{3} \ge 0.5|r_{2} < {\text{WEP}}} \\ {x_{i}^{j} \quad r_{2} \ge {\text{WEP}}} \\ \end{array} } \right. $$
(12)

where Xj is the jth centroid of the best universe obtained so far, UB represents the upper bound, LB equals the minimum bound, Traveling Distance Rate (TDR) and Wormhole Existence Probability (WEP) are coefficients, r2, r3, and r4 represent random values ranging from zero to one.

Also, the algorithm of MVO represents the ideal solution to optimize and apply it to influence other solutions. In the original research, the authors argued that wormholes can be found in all universes. Again, in turn, it enhances the likelihood of having access to better solutions and maintaining the ideal solution which is reached in the optimization process. At the end of optimization, the ideal solution can be achieved as a global optimum for a particular problem. The prerequisite of solving the abovementioned equations is exchanging variables between various solutions. It worth noting that if these equations be applied in similar patterns (either exploitative or exploratory) yield similar results. The MVO contains the following processes to appropriately focus on various patterns during the optimization pattern, which is as follows:

$$ {\text{WEP}} = {\text{Min}} + {\text{Iteration}} \times \left( {\frac{{{\text{Max}} - {\text{Min}}}}{L}} \right) $$
(13)
$$ {\text{TDR}} = 1 - \frac{{{\text{Iteration}}^{{{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 p}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{$p$}}}} }}{{L^{{{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 p}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{$p$}}}} }} $$
(14)

where p is the exploitation component. Two types of adaptive variables are available in MVO: WEP and TDR. WEP increases based on the frequency of iterations to increase exploitation. To enhance the precision of exploitation/local in the process of finding the best solution, TDR should be increased in various iterations. Hence, MVO can be considered as a revolutionary algorithm to exchange matters. This issue indicates the crossover operator, a known revolutionary operator to find the optimum solution. It leads to an unexpected alternation of universes, enhances the exploration, and keeps the diverseness of the universes while performing the iterations. After identifying the best universe, each universe takes a series of variables in a random process. This issue indicates the mutation, a revolutionary algorithm. In turn, the mutation operator results in slight alternations in best solutions and exploitation. Elitism is a revolutionary operator to maintain the best solution achieved during the optimization process. The elitism achieves through finding the best universe (Fathy and Rezk 2018).

Proposed air pollution forecasting application

The proposed strategy acts in such a way that the data are firstly evaluated basing on the mutual information statistical method; the parameters that have a good correlation move to the next stage; then, input variables (best features) are classified to predict two output variables (NO2 and SO2).

After selecting the feature, the future amount of NO2 and SO2 will be predicted by using combined intelligent forecasting application. The proposed air forecasting application phases are as follows:

  • Phase 1 Determining the effect coefficient of effective indicators on the NO2 and SO2 emissions, by using the Pearson correlation.

  • Phase 2 Designing the main engine forecaster based on long short-term memory neural network considering the obtained best features in phase 1.

  • Phase 3 Optimizing the parameters of LSTM and ENN neural network, by using the different metaheuristic optimization algorithms such as multi-verse optimization algorithm and particle swarm optimization algorithm.

  • Phase 4 Testing and training of the combined models, by using the total data as well as the classified data in phase 1.

Figure 5 presents the overall framework of the proposed forecasting application.

Fig. 5
figure 5

The overall framework of the proposed hybrid forecasting application

Results and discussions

This section summarizes the error measurement indicators and the “NO2 and SO2 forecasting results” section.

Error measurement indicators

After optimizing the neural network parameters, the hybrid forecasting models are trained and tested. The optimal parameter values should be integrated in the neural network. The network should be trained, using some of the input data (in this study, 80% of the data is used for training). Afterwards, different error indicators are used to assess the trained network. In this study, to evaluate the performance of forecasting models, three error indicators were incorporated, i.e., RMSE (Eq. 15), MAE (Eq. 16), and MAPE (Eq. 17).

$$ {\text{RMSE}} = \sqrt {\frac{1}{m}\mathop \sum \limits_{i = 1}^{m} \left( {x_{{{\text{acti}}}} - x_{{{\text{fori}}}} } \right)^{2} } $$
(15)
$$ {\text{MAE}} = \frac{1}{m}\mathop \sum \limits_{i = 1}^{m} \left| {x_{{{\text{acti}}}} - x_{{{\text{fori}}}} } \right| $$
(16)
$$ {\text{MAPE}} = \frac{100}{m}\mathop \sum \limits_{i = 1}^{m} \left| {\frac{{x_{{{\text{acti}}}} - x_{{{\text{fori}}}} }}{{x_{{{\text{acti}}}} }}} \right| $$
(17)

NO2 and SO2 forecasting results

Since the environmental pollutants play a very important role in global warming as well as climate change, predicting their growth rate in the near future will help to manage the control of these pollutants. Nowadays, artificial intelligent methods have a very acceptable accuracy in predictive problems. As discussed previously, the aim of this paper is to predict the amount of produced NO2 and SO2 by the Combined Cycle Power Plant. In this study, various intelligent models (ENN-PSO, ENN-MVO, and LSTM-PSO) are applied as predictive intelligence models. Table 2 shows the predicted results of the proposed model and the other provided model in this study using the data of Case A from May to September in two different types of input data, respectively. In this Table, two types of input variables are considered. It means, type (1) includes: wind speed, air temperature, and three lagged values of each output variables (\({\text{NO}}_{{2_{t - 1} }}\), \({\text{NO}}_{{2_{t - 2} }}\), and \({\text{NO}}_{{2_{t - 3} }}\) for NO2 forecasting; \({\text{SO}}_{{2_{t - 1} }}\), \({\text{SO}}_{{2_{t - 2} }}\), and \({\text{SO}}_{{2_{t - 3} }}\) for SO2 forecasting), type (2) includes: just lagged values of each output variables.

Table 2 Performance of the proposed model and other combined forecasting models to predict NO2 and SO2 based on type (1) and (2) input variables (Case A)

As indicated in Table 2, the performance of the proposed approach is very better than the other combined forecasting models for two output variables (NO2 and SO2). In addition, to better monitor the results of the Table, the forecasted value and real value of two different months have been selected and are shown in Fig. 6.

Fig. 6
figure 6

The real and forecasted values of NO2 and SO2 in two different test months

Furthermore, the two output variables (NO2 and SO2) have been predicted based on two different types of data for Case B. The results are shown in Table 3.

Table 3 Performance of the proposed model and other combined forecasting models to predict NO2 and SO2 based on type (1) and (2) input variables (Case B)

As shown in Tables 2 and 3, the proposed model (MI-LSTM-MVO) is presented to predict NO2 and SO2. The results illustrate that the efficiency and stability of the proposed model is better than the other provided model for both outputs forecasting. Based on the accuracy of the proposed forecasting model, the air pollution produced by a Combined Cycle Power Plant can be predicted and managed. Additionally, accurate and reliable forecasting methods can be used by policy makers as useful tool to accurately define strategies, plans and rules for decreasing atmospheric pollution. Figure 7 shows the MAPE error of the provided models for NO2 and SO2 forecasting.

Fig. 7
figure 7

Comparisons of MAPE error criteria for different forecasting models

Conclusion

As a globally important phenomenon, more attention is paying to air pollution, mainly because predicting the polluted days both can prevent negative health outcomes and provides the necessary information to increase policy-makers’ awareness. Identifying factors that contribute to air pollution and their trend over time has a crucial role in developing effective models aimed to reduce air pollution. In this study, a forecasting method more efficient and effective has been developed, in order to predict air pollution from manufacturing industry. The major contribution of the paper is combining the long short-term memory based on optimizing the hyper parameters of the LSTM using multi-verse optimization metaheuristic algorithm to predict NO2 and SO2. In addition, the proposed method is tested with the real manufacturing-related air pollution dataset from Combined Cycle Power Plant in Kerman, Iran from May to September 2019. Also, the forecasting performances of the proposed method have been compared with some benchmark methods (LSTM-PSO, ENN-MVO, and ENN-PSO). According to the results section, the proposed method (MI-LSTM-MVO) has better performances than other methods considering different months and network input variables.

Nonetheless, despite the homogenous representation of meteorology, forecasts are well linked to the observations. Compared with the uncertainty of the emission inventories, such ANN-oriented forecasting model is rapid and less resource-intensive to generate daily forecasts. The forecasting results are applicable for scientific purposes and also to take short-term corrective plans for air quality control in cities with high pollution. It can be replicated in different cities following the needed setup, optimization, and validation for emergency planning and short-term air quality control.