Abstract

The accuracy and consistency of streamflow prediction play a significant role in several applications involving the management of hydrological resources, such as power generation, water supply, and flood mitigation. However, the nonlinear dynamics of the climatic factors jeopardize the development of efficient prediction models. Therefore, to enhance the reliability and accuracy of streamflow prediction, this paper developed a three-stage hybrid model, namely, IVL (ICEEMDAN-VMD-LSTM), which integrated improved complete ensemble empirical mode decomposition with additive noise (ICEEMDAN), variational mode decomposition (VMD), and long short-term memory (LSTM) neural network. Monthly data series of streamflow, temperature, and precipitation in the Swat River Watershed, Pakistan, from January 1971 to December 2015 was used as a case study. Firstly, the correlation analysis and the two-stage decomposition approach were employed to select suitable inputs for the proposed model. ICEEMDAN was employed as a first decomposition stage, to decompose the three data series into intrinsic mode functions (IMFs) and a residual component. In the second decomposition stage, the component of high frequency (IMF1) was decomposed by VMD, as the second decomposition. Afterward, all the components obtained through the correction analysis and the two-stage decomposition approach were predicted by using the LSTM network. Finally, the predicted results of all components were aggregated, to formulate an ensemble prediction for the original monthly streamflow series. The predicted results showed that the performance of the proposed model was superior to the other developed models, in respect of several evaluation benchmarks, demonstrating the applicability of the proposed IVL model for monthly streamflow prediction.

1. Introduction

The accuracy of the streamflow prediction technique is crucial for efficient management and planning of hydroresources. However, the involvement of nonlinear processes, such as evaporation, topography, anthropic activities, and rainfall, poses a challenge for efficient streamflow prediction [1]. Streamflow prediction can be categorized into short-term prediction (e.g., daily or hourly), medium-term prediction (e.g., seasonal, monthly, and weekly), and long-term prediction (e.g., annual) [2].

Process-driven models (PDMs) and data-driven models (DDMs) represent the two general categories of streamflow prediction models. PDMs consider the physical processes of the water cycle [3], whereas the DDMs are based on artificial intelligence (AI) methods and avoid considering the physical mechanisms of the watershed. In other words, these AI-based models are more user-friendly compared to the PDMs [4]. The development of PDMs is very complex, and these models are prone to several factors. These factors include the effects of watershed’s underlying conditions on the accuracy and integrity of data, the intricacy of rainfall-streamflow process, the spatial-temporal variation of climatological data, and the limited knowledge of streamflow patterns in the watersheds. Majority of these models necessitates a large quantity of data for training and testing, which makes these models computationally complex. Resultantly, the researchers have attempted to develop substitute approaches to predict streamflow with reasonable accuracy and comparative ease. The DDMs can be regarded as a black-box and try to establish relations between the input and output variables with limited information on the underlying hydrological process [5]. DDMs have a simpler architecture than the PDMs since they require fewer data. These models can circumvent the influence of uncertainties on model performance, which is experienced due to complex hydrological processes and also offer good prediction results [6]. The DDMs are becoming popular with the advent and advancement of AI. These models are more suitable for streamflow forecasting than the PDMs, particularly when limited knowledge of the hydrological process is available [7]. The DDMs can be viewed as a promising solution to resolve the challenges of uncertainty and sensitivity inherent with the PDMs [8, 9].

Machine learning models (MLMs) are extensively employed to study the nonlinear dynamics of the hydrological variables [1012]. Neural networks [13], support vector machines (SVM) [14], and random forests are the most popular MLMs for prediction [15]. MLMs are feasible for predication of streamflow, temperature, and precipitation variables on a large scale [16, 17]. Recent studies have demonstrated the superior performance of deep learning (DL) approaches for streamflow prediction [1821]. LSTM network can be employed to model streamflow-precipitation variables due to its ability of learning long-term inputs and outputs dependencies [22]. Therefore, LSTM has been successfully applied in numerous streamflow-precipitation studies [23, 24].

MLMs coupled with decomposition techniques are employed, to enhance the performance of standalone models, and for more accurate prediction [25, 26]. The decomposition techniques have been effectively applied to decompose the streamflow time series and to improve the performance of MLMs [27]. ICEEMDAN is the latest version of complete ensemble empirical mode decomposition with additive noise (CEEMDAN) and decomposes the signal into the subcomponents having less noise [28]. VMD is another advanced decomposition technique having outstanding frequency search performance and sampling properties [29].

The selection of input variables in the machine learning (ML) based DDMs (ML-DDMs) for streamflow prediction is of vital importance. Different combinations of inputs are applied to predict the target values of the streamflow. The streamflow prediction can be performed, by considering the observed streamflow time series as an input, to predict the target streamflow [30, 31]. The streamflow, precipitation, and temperature variables can also be applied as an input to predict the target streamflow [32, 33].

This paper developed five standalone MLMs, including a radial basis function neural network (RBF), support vector regression (SVR), random forest regression (RFR), gated recurrent unit neural network (GRU), and LSTM to determine the model with the best prediction performance. The monthly streamflow, temperature, and precipitation series were selected as the input variables for models development. The different statistical metrics were employed to assess the performance of the models in the training and testing periods. The performance of the LSTM network was superior to the standalone counterparts. The standalone LSTM network was selected, and its prediction performance was enhanced further by the development of two-stage hybrid models (ICEEMDAN-LSTM and VMD-LSTM). The two-stage hybrid models revealed better results than the standalone LSTM network. The two-stage hybrid models for the streamflow prediction can be extended to the three-stage hybrid models to improve the performance of the two-stage hybrid models [34]. Therefore, considering the superior decomposition properties of ICEEMDAN and VMD techniques and the better prediction capability of LSTM than the other MLMs, this paper proposed a three-stage hybrid model IVL for streamflow prediction. Experimental results proved that the proposed model was superior to the two-stage hybrid and standalone models in terms of several performance measures. Specifically, the main objectives of this study were the following:(1)The development of a three-stage hybrid model coupling a two-stage decomposition approach with a DL model(2)The applicability of the proposed model for the streamflow prediction by considering streamflow, temperature, and precipitation as input variables(3)Verification of the performance of the proposed model with two-stage and standalone models by comparing results

The remainder of this paper is arranged as follows. Section 2 introduces the decomposition and DL approaches, the statistical metrics for performance evaluation, methodology, and the study area. Section 3 presents all the results along with the discussion of the results, and Section 4 summarizes the conclusions of this study.

2. Materials and Methods

2.1. Improved Complete Ensemble Empirical Mode Decomposition with Additive Noise

ICEEMDAN was proposed to resolve the issues of the spurious modes and the frequency aliasing as faced by the other EMD based techniques [28]. By adding white noise, ICEEMDAN realizes the frequency continuity among adjacent scales, which results in the weakening of frequency aliasing effect [35]. The calculation methodology of ICEEMDAN is given as follows:(i)Add white noise of specific amount to the original signal , aswhere is the added noise number, denotes the signal to be decomposed, represents the white noise, and depicts the first EMD component of the white noise.(ii)Afterward, the first residue can be obtained aswhere represents the local mean of envelope that fulfills the sifting threshold of IMF.(iii)The first IMF can be obtained by utilizing EMD after the decomposition of N signals as(iv)The following steps can be applied to calculate the second residue and mode:(v)Calculate th residue and mode:(vi)Repeat (4) for the next stages.

2.2. Variational Mode Decomposition

This study utilized VMD to construct a two-stage hybrid model and to verify its applicability for streamflow perdition. The benefit of the VMD technique is the absence of residual noise during the decomposition process. Equations (8)–(11) describe the main steps of the VMD technique [29]. As a constrained optimization issue, optimization functions to lessen the spectral bandwidth sum of all modes are given aswhere and denote modes set and centre frequencies, respectively. The Lagrangian multipliers and the term of quadratic penalty are introduced to convert the above optimization issue into the following unconstrained problem:

The alternative direction method of multipliers is feasible to solve (2). The two stages of (2) can be demonstrated as follows:(i) minimization:(ii) minimization:where denotes the number of iterations and show the Fourier transform of , respectively. The detailed decomposition process of VMD technique can be found in [29].

Compared to the ICEEMDAN technique, VMD is an adaptive signal decomposition technique and avoids the presence of residual modes. These advanced features of VMD make the decomposition process of VMD superior to the other decomposition techniques. The present study carried out an additional decomposition of the IMF1 component by the hybrid combination of VMD with ICEEMDAN for further resolution of the low patterns of frequency. This enables the DL model to perform the streamflow prediction more accurately with fine-scale decomposition components.

2.3. Long Short-Term Memory Neural Network

LSTM is an advanced version of the recurrent neural network (RNN) specially designed to address the issues of vanishing and exploding gradients as being inherent by RNNs [36]. LSTM can preserve long-term dependencies through its unique architecture, gates, and the cell state [23]. The LSTM network takes input at time step and hidden states and updates its hidden states as follows [37]:where denote the network weights, are bias vectors, is the sigmoidal function, and shows the hyperbolic tangent function [37].

2.4. Statistical Metrics

Statistical metrics were employed to evaluate the performance of the proposed and other predictive models. The commonly used statistical metrics in the field of hydrology include mean absolute error (MAE), root mean square error (RMSE), Nash-Sutcliffe coefficient of efficiency (NSCE), and mean absolute percentage error (MAPE). The following equations were used to define these metrics:

In (18)–(21), and depict the observed and predicted values of streamflow, respectively, while represents the number of data points.

2.5. ICEEMDAN-VMD-LSTM-Based Hybrid Modelling

This paper proposed a hybrid model IVL based on ICEEMDAN, VMD, and LSTM network to predict monthly streamflow. The systematic sequence of the proposed model is explained as follows:Step 1: To select suitable input variables for the IVL model, the correlation analysis and the ICEEMDAN approach were applied to the streamflow, temperature, and precipitation time series.Step 2. The highest frequency component obtained because of ICEEMDAN was further decomposed by VMD into subcomponents.Step 3. The components obtained as a result of the ICEEMDAN-VMD technique and the correlation analysis were applied to the LSTM network to construct the prediction model.Step 4. The predicted results of Step 3 were reconstructed to finalize the prediction.Step 5. The performance of the proposed model was evaluated by applying several evaluation benchmarks, including the two-stage hybrid models, standalone models, and statistical metrics. The hybrid models included VMD-LSTM and ICEEMDAN-LSTM models, whereas the RBF, SVR, RFR, GRU, and LSTM models were established as standalone models. Figure 1 explains the flowchart of the proposed methodology.

2.6. Dataset and Study Area

The monthly streamflow, temperature, and precipitation data were selected in this study to predict one-month ahead streamflow at Chakdara station in the Swat River Watershed. The monthly data from January 1971 to December 2015 were taken, which corresponds to a sample size of 540 values, for each of streamflow, temperature, and precipitation datasets. The datasets were divided into the training dataset (70% of the total data) and the testing dataset (30% of the total data). The detailed description of the selection of the input variables for different models is provided in Table 1. Figure 2 provides pairwise relation between streamflow, temperature, and precipitation through a pairplot.

The data were collected from the Water and Power Development Authority (WAPDA), Pakistan, and Pakistan Meteorological Department (PMD). The Swat River Watershed is situated in the Khyber Pakhtunkhwa Province, Pakistan. Figure 3 illustrates the location of the Swat River Watershed in Pakistan. The perianal Swat River commences from the mountains of Swat Kohistan with the convergence of Utar and Ushu tributes. After streaming through the Kalam valley and the Swat area, the Swat River flows through the Malakand district and ends up into the Kabul River. The Swat River Watershed is mostly hilly, with heights stretching from 360 m to 4,500 m. The glaciers lie above 4,000 m, and vegetation is visible between 1,800 m and 3,400 m [38]. Precipitation occurs mostly in winter and summer. The high precipitation in the summer monsoon season sometimes results in flooding events. Swat River is vital for the economy of the Swat valley. It irrigates the districts of Swat, Malakand, and Peshawar and fulfills the needs of springs and water wells. The Swat River provides a natural habitat of flora and fauna in the region and attracts thousands of tourists. The hydropower stations on the Swat River provide electricity to the national grid of Pakistan.

3. Results and Discussion

3.1. Decomposition Analysis

Firstly, the ICEEMDAN was applied to decompose the three (streamflow, temperature, and precipitation) data series into several components, as demonstrated in Figure 4. ICEEMDAN decomposed the streamflow and temperature signals into seven IMFs (IMF1-IMF7) and a residual (Residual) component, whereas nine IMFs (IMF1-IMF9) and a residual (Residual) component were obtained due to the decomposition of the precipitation series through ICEEMDAN. The decomposed components (IMFs and Residual) provide the information of the high to low frequency components present within the three input data series.

The first decomposed component (IMF1) of the three data series obtained through the ICEEMDAN preprocessing technique was further decomposed by VMD due to high oscillatory fluctuations. The number of intrinsic modes’ determination is an important step, in the VMD process, and represents an acceptable data series, for an accurate approximation model [39]. Different methods were employed for the mode determination of VMD, including the correlation analysis [40], the centre frequency method [41], and the EMD process [42]. This study applied the correlation analysis to the decomposed components, obtained through the decomposition of observed streamflow, temperature, and precipitation series by the ICEEMDAN technique for mode determination, as presented in Figure 5.

Figure 5 shows that the numbers of modes for the decomposition of the IMF1 component by VMD were found as eight, eight, and ten, respectively, for streamflow, temperature, and precipitation series. The decomposition of the IMF1 component of streamflow, temperature, and precipitation, is depicted in Figure 6.

Figures 6(a), 6(b), and 6(c) depict the decomposition of the IMF1 component of streamflow, temperature, and precipitation series by VMD into QF1–QF8 (eight modes), TF1–TF8 (eight modes), and PF1–PF10 (ten modes), respectively.

3.2. Selection of Models Input Variables

This study employed both the decomposition techniques and the correlation analysis to select suitable input variables for the development of all DL models. The ACF and CCF values of the three time series were calculated with a 95% confidence level to extract relevant input variables for model development. The ACF and PACF analysis for the streamflow time series are described in Figures 7(a) and 7(b), respectively. It is evident from Figure 7(a) that a significant correlation exists at 1st, 11th, and 12th lag; therefore, these three lag values were selected as one of the inputs.

Figure 8(a) illustrates that a significant CCF between streamflow and temperature series is present at 1st, 10th, 11th, and 12th lag. Therefore, these four values were also chosen for model inputs. The 3rd and 4th lag values of the streamflow and precipitation series were chosen as the input due to significant correlation, as shown in Figure 8(b).

Table 1 demonstrates the selection of input variables for the development of different models to predict the target variable of one-month ahead streamflow. For the standalone RBF, SVR, RFR, GRU, and LSTM models, the input variables were the observed time series of streamflow , temperature , and precipitation and the components obtained through the correlation analysis of these three data series . For the two-stage hybrid ICEEMDAN-LSTM model, the inputs variables were obtained, by employing both the correlation analysis and the ICEEMDAN technique (Q (IMF1–IMF7, Residual), T (IMF1–IMF7, Residual), and (IMF1–IMF9, Residual)) to the observed time series of streamflow, temperature, and precipitation. The two-stage hybrid VMD-LSTM model employed the correlation analysis and the VMD technique (Q (VF1–VF7, Residual), T (VF1–VF7, Residual), and (VF1–VF9, Residual)) to the observed time series of streamflow, temperature, and precipitation. The three-stage hybrid IVL model employed the correlation analysis , ICEEMDAN, and VMD techniques (Q (IMF2–IMF7, Residual), T (IMF2–IMF7, Residual), (IMF2–IMF9, Residual), QIMF1 (QF1–QF8), TIMF1 (TF1–TF8), and PIMF1 (PF1–PF10)) to the observed time series of streamflow, temperature, and precipitation.

3.3. Models Structure and Parameter Selection

All the analyses were performed using MATLAB R2015a software under the environment of Intel (R) Core i7-10510 U CPU @ 3.70 GHz, 16G RAM, by utilizing a Windows 10, 64-bit operating system. Moreover, Python 3.6 programming language was used in PyCharm integrated development environment, based on NumPy and Pandas packages, to implement all MLMs. The modules, including the Scikit-learn and the Keras employing Google TensorFlow backend, were also employed to develop MLMs.

For the ICEEMDAN technique, the value of standard deviation was set as 0.2, the realizations were 500, and the maximum sifting iterations were set as 5000. For the VMD technique, the moderate bandwidth constraint was taken as 2000, and effectively shutoff Lagrangian multiplier was considered. The uniform distributed initialization of the centre frequencies of all modes was used. Moreover, no DC part was imposed during the decomposition process, while the tolerance parameter was taken as 1E-7. More details for parameter selection of ICEEMDAN and VMD can be found in [28, 29]. The network consists of two hidden layers with 128, 64, or 32 nodes in each layer, and a dropout value of 0.2 was used to avoid overfitting. Adam was selected as an optimizer for all the models, and 1000 epochs were used for training the models.

Due to the difference in the dimension of streamflow, temperature, and precipitation datasets, normalization of the whole data is necessary to achieve the best performance of the models. The normalization was performed through the sklearn preprocessing module by employing the MinMaxScaler function to transform the data between zero and one. The formula for normalization is

3.4. Prediction Outcomes

To verify the performance of the IVL model, the predicted results of the IVL model were compared with VMD-LSTM, ICEEMDAN-LSTM, LSTM, GRU, RFR, SVR, and RBF models, during the training and testing periods. Tables 2 and 3 illustrate the results of statistical metrics for the performance evaluation of models in the training and testing periods. The performance of the hybrid models was far better in comparison to the standalone MLMs, where no decomposition of input variables was involved. Moreover, better results of LSTM with the lower error values of the statistical metrics than the other MLMs also established the viability of the LSTM network to predict streamflow, during the training and testing periods.

It is evident from Table 2 that the integrated IVL model yielded better accuracy and lowest error compared to the two-stage hybrid and standalone models. Conversely, the RBF model revealed the worst effectiveness and efficiency as compared with the standalone, two-stage, and three-stage hybrid models. During the training period, the IVL model showed 4.496 m3/s, 8.419 m3/s, 16.169 m3/s 18.381 m3/s, 21.609 m3/s, 23.665 m3/s, and 32.437 m3/s reduction in MAE than the VMD-LSTM, ICEEMDAN-LSTM, LSTM, GRU, RFR, SVR, and RBF models, respectively. Moreover, the IVL model was able to reduce RMSE by 4.925 m3/s, 9.609 m3/s, 22.538 m3/s, 25.260 m3/s, 31.035 m3/s, 34.929 m3/s, and 44.951 m3/s compared to the VMD-LSTM, ICEEMDAN-LSTM, LSTM, GRU, RFR, SVR, and RBF models, respectively, during the training period. The results of MAPE during the training period for the IVL model also showed a lesser value of 2.049%, 4.655%, 8.665%, 10.699%, 12.426%, 13.020%, and 21.068% compared to the VMD-LSTM, ICEEMDAN-LSTM, LSTM, GRU, RFR, SVR, and RBF models, respectively. The NSCE results for the IVL model were closer to 1 compared to all other models. Furthermore, the NSCE results of the other seven models were greater than 0.8, which shows the suitability of all the developed models for streamflow prediction.

Table 3 also illustrates the superior results of the IVL model compared to VMD-LSTM, ICEEMDAN-LSTM, LSTM, GRU, RFR, SVR, and RBF models in terms of MAE, RMSE, and MAPE during the testing period. It is also observable that two-stage hybrid models also acted to reduce the errors with higher efficiency than the standalone models during the testing periods. Furthermore, the VMD-LSTM model showed better results than the ICEEMDAN-LSTM model during the testing periods.

The streamflow prediction results for all models in the training and testing periods are shown in Figures 9 and 10. It is evident from the figures that the standalone models were inferior to the hybrid models in effectively capturing the extreme values of streamflow. The three-stage hybrid IVL model was the most efficient in predicting the peak values during the training and testing periods. The standalone models were comparatively easy to develop; however, they showed a lesser accuracy in predicting the streamflow compared to the three hybrid models. The hybrid models were complex to construct; however, the hybrid models showed a better capability of predicting the intricate nonlinear relation between the input and the output parameters with more accuracy. Therefore, the hybrid models possess the ability of meeting the necessities of medium- and long-term streamflow prediction.

Figures 11 and 12 illustrate the scatter plots, whereas Figures 13 and 14 represent the boxplots of all models, to highlight the graphical comparison of models performance during the training and testing periods. The scatter plots provide the degree of dispersion and correlation between the observed and predicted values.

From Figures 11 and 12, it is evident that the scatter points of the hybrid models were nearer to the 1 : 1 gradient line compared with the standalone MLMs. This provided evidence of better accuracy delivered by the hybrid models than the individual MLMs. The IVL model showed the most concentrated scatter points around the regression line, with the lowest error and highest value of R2, while the RBF model had the most dispersed scatter points around the regression line.

Figures 13 and 14 illustrate that the location of the median was more towards the bottom of the box for all models during the training and testing periods and represented all the plots that skewed to the right. The LSTM model revealed a better distribution of predicted data than the RBF, SVR, and GRU models during the training and testing periods. However, the boxplots of the hybrid models were better than the standalone LSTM model. During the training period, the LSTM, ICEEMDAN-LSTM, VMD-LSTM, and IVL models showed a median value of 124.643 m3/s, 118.217 m3/s, 112.802 m3/s, 117.589 m3/s, and 117.473 m3/s, respectively. Moreover, the interquartile ranges (the third quartile minus the first quartile) of LSTM, ICEEMDAN-LSTM, VMD-LSTM, and IVL models were 243.448 m3/s, 225.473 m3/s, 205.152 m3/s, 219.747 m3/s, and 219.211 m3/s, respectively, during the testing period. During the testing period, the value of the median was 126.926 m3/s, 124.432 m3/s, 131.961 m3/s, 121.232 m3/s, and 120.625 m3/s for the LSTM, ICEEMDAN-LSTM, VMD-LSTM, and IVL models. Furthermore, the LSTM, ICEEMDAN-LSTM, VMD-LSTM, and IVL models showed interquartile ranges of 233.089 m3/s, 224.831 m3/s, 220.608 m3/s, 243.382 m3/s, and 242.812 m3/s, respectively. It is evident from the boxplot figures that the IVL model showed the best results, while the results of the RBF model were the worst.

According to the results discussed so far, in Tables 2-3 and Figures 914, the IVL model undoubtedly demonstrated the implementation of a superior model for streamflow prediction, by considering the streamflow, temperature, and precipitation variables. Moreover, the results also revealed the feasibility of ICEEMDAN and VMD approaches to improve the performance of the ML-DDMs. The three-stage hybrid prediction model enhanced the performance of the two-stage hybrid prediction models. The VMD-LSTM hybrid model presented better results than the ICEEMDAN-LSTM hybrid model, which indicates the superiority of the VMD technique over the ICEEMDAN technique. The standalone DL models (LSTM and GRU) showed better results than the standalone RFR, SVR, and RBF models, which highlight the advantages of the DL models, over the other MLMs, whereas the RFR ensemble model revealed better results than the SVR and RBF models. The performance of the SVR model was also better than the standalone RFB model. Regardless of the different performances shown by all the developed models, the results showed that all the models are feasible for the streamflow prediction.

For brevity, the authors considered only the three-stage hybrid model by integrating ICEEMDAN, VMD, and LSTM network for streamflow prediction. However, practically all the developed standalone models can be extended further to the two-stage and three-stage hybrid models. It shows that the ML-DDMs allow ease of extension and integration, to form the hybrid prediction models. This fact highlights the superiority of the ML-DDMs, over the PDMs. The IVL model is also feasible to predict different factors in the field of hydrology and meteorology, which signifies another advantage of the ML-DDMs (black-box models), compared to the PDMs (white-box models). The black-box models require the input variables to predict the output variables, whereas the in-depth consideration of the physical process is necessary, for the white-box models. The accurate prediction is indispensable for the effective management of hydroresources and for timely mitigation of extreme events and natural disasters. The proposed model can be applied to develop an early warning system, for protection against the flood damages, like the flood event of 2010, which occurred in the Swat River Watershed [38]. The IVL model is also viable to predict any form of time series. The prediction of wind speed, solar radiation, pollution emissions, and climate change trends is also a feasible option by employing the proposed model.

Despite the superb performance of the IVL model to predict the monthly streamflow, this study offers some limitations. This study considered streamflow prediction on monthly basis; however, there is a need to investigate streamflow prediction also on a daily, weekly, and annual basis for efficient management of the watershed, reservoir operation and planning, and water allocation and supply. Furthermore, this study employed streamflow, temperature, and precipitation variables for the streamflow prediction and does not consider important streamflow components (groundwater flow, surface, and subsurface components), infiltration, evapotranspiration, and human-made aspects. Nevertheless, the consideration of the above-mentioned components is necessary for more accurate streamflow prediction tasks. Therefore, our future study will investigate streamflow prediction for other watersheds in Pakistan by considering different time scales, streamflow and associated components, and efficient input variable selection techniques.

4. Conclusions

In this study, a two-stage hybrid decomposition model was developed by integrating ICEEMDAN and VMD techniques. Subsequently, the LSTM model was coupled in the hybrid scheme, ultimately forming a three-stage hybrid model IVL (ICEEMDAN-VMD-LSTM) to predict monthly streamflow in the Swat River Watershed, Pakistan. The input variables for model development were selected from monthly time series data of streamflow, temperature, and precipitation, by employing correlation functions and the decomposition techniques. The datasets were split into the training (70% of the total dataset) and testing (30% of the total dataset) periods. Statistical metrics, including MAE, RMSE, NSCE, MAPE, and R2, were employed to evaluate the performance of the established models.

The decompositions of the streamflow, temperature, and precipitation time series were performed using the ICEEMDAN technique, which resulted in the improved performance of the standalone LSTM model. Consequently, the ICEEMDAN-LSTM model showed 4.132 m3/s and 10.256 m3/s reductions in MAE and RMSE values, respectively, compared to the standalone LSTM model for streamflow prediction during the testing period. Moreover, the error reductions in the case of the VMD-LSTM model were 8.176 m3/s and 15.032 m3/s, based on the MAE and RMSE values during the testing period compared to the LSTM model. The VMD-LSTM model revealed 4.044 m3/s and 4.776 m3/s lower values of MAE and RMSE, respectively, compared to the ICEEMDAN-LSTM.

Although the two-stage hybrid models provided improved results compared to the LSTM model, the presence of oscillatory fluctuations with high frequency within the IMFs may cause a poor prediction of the time series. To avoid this drawback, a secondary decomposition of IMF1 (generated through ICEEMDAN) of the three time series was performed by employing the VMD technique. The three-stage hybrid IVL model showed 14.538 m3/s and 13.169 m3/s reductions in MAE and RMSE, respectively, compared to the ICEEMDAN-LSTM model, whereas the error reduction values of the IVL model for MAE and RMSE were 10.494 m3/s and 8.393 m3/s, respectively, compared to the VMD-LSTM model. The standalone LSTM model also showed higher MSE and RMSE values of 18.670 m3/s and 23.425 m3/s, respectively, than the IVL model. However, the performance of the standalone LSTM model was better than the RBF, SVR, RFR, and GRU standalone models. Overall, the IVL model showed the superior performance for streamflow prediction among the developed models with an MAE of 7.083 m3/s and 10.075 m3/s, RMSE of 12.553 m3/s and 19.249 m3/s, NSCE of 0.992 and 0.985, MAPE of 7.465% and 9.050%, and R2 of 0.993 and 0.985 during the training and testing periods, respectively.

The proposed model can be employed to support water and environmental monitoring tasks; hence, this provides stakeholders with efficient means to respond to warnings, upcoming outbreaks, and happenings. It will eventually be helpful to provide support towards the strategic planning, operation, and the sustainable management of water resources.

Data Availability

The streamflow, temperature, and precipitation data of the Swat River, Pakistan, used to support the findings of this study are included in this article. The data are also available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

The authors are thankful to the Water and Power Authority, Pakistan, and Pakistan Meteorological Department, for providing data for this study. This work was supported by the National Natural Science Foundation of China (Grant number 51607105) and Provincial Natural Science Foundation of Hubei Province (Grant number 2016CFA097).