Introduction

Modeling hydrological factors, such as runoff, has a great effect on reducing droughts and managing water resources (Nourani et al. 2011). Several models have been developed to simulate complex hydrological processes. Intelligent models such as ANFIS and neural network have shown an almost acceptable ability to model and forecast nonlinear hydrological time series (Nourani et al. 2009). Rainfall-runoff models have been used since the late nineteenth century; besides, there are currently several hydrological models to simulate the rainfall-runoff process (Wu and Chau 2011). Rainfall-runoff models include conceptual, physical, empirical, and artificial intelligence models (Jothiprakash and Magar 2009). Many studies have been conducted on the effect of the runoff on rivers and the resulting problems (Zhou et al. 2015; Liu et al. 2014). Approximately 40% of the natural hazards are related to floods, in addition 20–300 million people are affected by the problems caused by floods yearly (Dewan 2013). The flood in Pakistan caused many deaths and financial losses to the agriculture and infrastructure in 2010 (Ghumman et al. 2011).

Data driven models (DDMs) are among the modern methods to model hydrological parameters and the widely used methods by researchers (Kisi et al. 2019). The DDMs are used to model hydrological processes such as linear and multilinear regression (LMR), due to the high accuracy (Abdulelah Al-Sudani et al. 2019). The artificial neural network (ANN) and autoregressive integrated moving average (ARIMA), monthly flow modeling using ANFIS and ANN–ARIMA using snow telemetry data were performed in Elephant Butte reservoir in Mexico city (Zamani Sabzi et al.2017). Linear time series models are widely used to model hydrological time series (Mohammadi et al. 2006). Also, nonlinear time series approaches can be used for hydrological modeling (Xie et al. 2016).

The study of Altunkaynak (2007) showed that ANN can well model the relationship between water and rainfall levels. The study of Nayak et al. (2005) showed that ANFIS performed better than ANN and ARIMA in estimating the discharge of the Bantarani river in Odisha, India. ANFIS was used by Nourani and Komasi (2013) to simulate the rainfall-runoff process of Eel river in California. One of the most important factors affecting water resources planning and management of the basin, is the response of the basin to rainfall-runoff versus infiltration and evaporation (Uhlenbrook et al. 2004). The innovation of this study is using PMI algorithm to introduce effective input parameters in forecasting and modeling the monthly flow of the Maroon basin with artificial intelligence models (ANFIS–support vector machines–and artificial neural network). This algorithm has been rarely used by previous researchers for this purpose. In addition, another innovation of this research is using different Holt-Winters hybrid models and the comparison of their performances with monthly classical artificial intelligence models. Therefore, the objective of this study is to compare the performance of classical artificial intelligence models with Holt-Winters hybrid models to predict monthly inflow.

Materials and methods

Study area

Maroun basin with an area of about 3824 km2 is located in the geographical coordinates of 49°50ʹ to 51°10ʹ east longitude, 30°30ʹ to 31°20ʹ north latitude and the heights of Behbahan city of Khuzestan province of Iran. Maroun basin is surrounded by the basins of the Zohreh and Karoun rivers in Khuzestan and Kohgiluyeh and Boyer Ahmad provinces. The climate of Maroun basin is affected by the low latitude, changes in altitude in different areas (from zero to over 3600 m above sea level) and finally the proximity to the Persian Gulf in its southern parts. The mean annual rainfall in the Maroun basin varies from about 150 mm in the low coastal plains to about 900 mm in the northern highlands, and the regime of such rainfall is Mediterranean. The maximum rainfall occurs between December and March. The dry season of the basin is long, and in lower areas lasts from May to late October. Most of the catchment area of Maroun river is made up of mountainous area. The northern and eastern parts are higher than the other parts.

Support vector machine

If the data is discrete, support vector machine (SVM) trains linear machines to produce an optimum level that separates the data without error and with the maximum distance between the plane and the nearest training points (support vectors). If we define the training points as [Xi, Vi] and the input vector XiRn and the class value yi ϵ {− 1, 1}, i = 1, …, N in case of linearly separable data, the decision rules based on Eq. (1) are defined in such a way that they separate the binary decision classes by an optimal plane:

$$y = {\text{sign}} \left( {\mathop \sum \limits_{i = 1}^{N} y_{i} a_{i} \left( {X.X} \right) + b} \right)$$
(1)

where Y is the output of the equation, yi is the class value of the training sample Xi and represents the internal multiplication.

The vector x = (X1, X2,..., Xn) represents an input data and the vectors i = 1, …, N, Xi are support vectors. In Eq. (1), the parameters ai and b determine the super-plane. If the data are not linearly separable, Eq. (1) is transformed to Eq. (2).

$$y = {\text{sign}} \left( {\mathop \sum \limits_{i = 1}^{N} y_{i} a_{i} K\left( {X.X} \right) + b} \right)$$
(2)

The function K (X, Xi) is a kernel function that generates internal multiplications to create machines with different types of nonlinear decision levels in the data space (Vapnik 2000).

Adaptive network fuzzy inference system

Adaptive network fuzzy inference system (ANFIS) uses available input–output data pairs during training. Then, IF–THEN fuzzy rules are obtained to connect these parts to each other. ANFIS training means determining the parameters belonging to two parts, introduction and consequence, using the optimization algorithm (Jang 1993).

If the output of each layer of neurophasic network is shown as Q1,i which is the output of the i group in the 1st layer, then the performance of different layers can be expressed as follows:

  • Layer 1: Each node in this layer is equivalent to a fuzzy set (Eq. 3):

    $$Q_{i}^{1} = \mu_{Ai} \left( x \right)$$
    (3)
  • Layer 2: Each node of this layer is shown by II in which the input signals are multiplied by each other and produce output (Eq. 4):

    $$W_{i} = \mu_{Ai} \left( x \right) \times \mu_{Bi} \left( y \right),\quad i = 1,2$$
    (4)
  • Layer 3: Each node in this layer is represented by N, which calculates the ratio of the activity degree of rule provision to the sum of the activity degrees of all rules (Eq. 5):

    $$\overline{{W_{i} }} = \frac{{W_{i} }}{{W_{1} + W_{2} }},\quad i = 1,2$$
    (5)
  • Layer 4: The output of each node in this layer is calculated by Eq. (6):

    $$Q_{i}^{4} = \overline{{W_{i} }} f_{i} = \overline{{W_{i} }} \left( {p_{i} x + q_{i} y + r_{i} } \right)$$
    (6)

    where \(\overline{{W_{i} }}\) is the output of Layer 3 and (pi, qi, ri) is the set of adaptive parameters of this layer. These parameters are called result parameters.

  • Layer 5: Each node in this layer, which is shown as Σ, calculates the final output value by Eq. (7) (Jang 1993):

    $$Q_{i}^{5} = \mathop \sum \limits_{i} \overline{{W_{i} }} f_{i} = \frac{{\mathop \sum \nolimits_{i} W_{i} f_{i} }}{{\mathop \sum \nolimits_{i} W_{i} }}$$
    (7)

Holt-Winters model

To predict the time series possessing seasonal or cyclical changes in addition to the trend, Holt-Winters model is used, which has been used in the studies on changes and the prediction of some meteorological elements in the country. To use this model, it is necessary to estimate three components of level or average (x), trend (T) and seasonal component (S). The three factors are calculated by the following equations.

$$F_{t} = \alpha \left( {F_{t - 1} - T_{t - 1} } \right) + \left( {1 - \alpha } \right)\frac{{Y_{t - 1} }}{{S_{t - k} }}$$
(8)
$$S_{t} = YS_{t - k} + \left( {1 - Y} \right)\frac{{Y_{t} }}{{F_{t} }}$$
(9)
$$T_{t} = \beta T_{t - 1} + \left( {1 - \beta } \right)\left( {F_{t} - F_{t - 1} } \right)$$
(10)

where Ft: smoothed surface at time t,

Ft−1: smoothed surface at time t − 1,

Yt − 1: actual data value at time t − 1,

Tt: estimated trend,

St: estimated seasonal value,α: smoothing constant for the data,β: smoothing constant for trend estimation,

У: smoothing constant for estimating seasonal changes and k periods in each year. Holt-Winters exponential smoothing coefficients are always between zero and one (Ord et al. 1997; Hyndman et al. 2002).

PMI-based input selection (PMIS) algorithm

PMI-based input selection (PMIS) algorithm introduced in this article was first developed by Sharma (2000) to identify input variables affecting hydrological models. PMI algorithm performs each iteration by finding the Cs that maximize the PMI value with respect to the output variable (with respect to the inputs that are pre-selected) and an input (C) and output (Y). The statistical concept that PMI estimates for Cs is based on a confidence interval determined by the distribution formed by a self-starting loop. If the input is significant, Cs is added to S and the selection continues until no significant input remains, then subsequently the algorithm stops. Given a random output variable Y, there is some uncertainty about an observation y that is a member of Y, which can be defined according to the Shannon entropy H, (Shannon 1948). However, assuming a random input variable X on which Y depends, cross-observation (x, y) reduces such uncertainty, since possessing x allows the value of y to be inferred, and vice versa. According to the definition of mutual information I (X; Y), the reduction in uncertainty of variable Y is due to the observation of X (Cover and Thomas 1991). This problem is displayed as a common part between two circles. This common part is where the reduced uncertainty around X and Y is specified through the conditional entropy H(X|Y) and H(Y|X), respectively. The mutual information (MI) can be obtained directly from the following Eq. 11:

$$I\left( {X;Y} \right) = \iint {p\left( {x{,}y} \right)}{\text{log}}\frac{{p\left( {x{,}y} \right)}}{p\left( x \right)p\left( y \right)}dxdy$$
(11)

p(y) and p(x) are the marginal probability density functions (pdfs) of X and Y, respectively, and p (x, y) is the joint probability density function. In any case, practically, the correct form of probability density functions in Eq. (11) is unknown. Hence, probability density estimation is used instead. By replacing the probability density estimates with the numerical approximation of the integral in Eq. (12), there will be:

$$I\,\left( {X;Y} \right) \approx \frac{1}{n}\mathop \sum \limits_{{{\text{i}} = 1}}^{{\text{n}}} {\text{log}}\left[ {\frac{{f\left( {xi,yi} \right)}}{{f\left( {xi} \right)f\left( {yi} \right)}}} \right]$$
(12)

where f represents the estimated density based on a sample of n observations (x, y). It should be noted that different bases are used for the logarithm, but usually 2 or e is used. If the base of the logarithm is not mentioned, the natural logarithm will be considered. Assuming Eq. (6), the accurate and effective estimation of MI depends to some extent on the method used to estimate the marginal and joint probability density functions. In general, there are three criteria for stopping PMI algorithm, (1) tabulated critical values, (2) AIC and (3) Hempel, as explained below. Tables of critical values of the correlation coefficient (R) are available, based on an analytical formula for the error distribution of an estimate for an assumed sample size. In the case of the linear R, the distribution of the sample estimate follows a t distribution. The tables of critical values of R based on the t distribution are simply constructed (David 1966) which provide a critical value for R for a given sample size and a given confidence interval. However, unlike the linear R, an equivalent analytical definition for I cannot be deduced according to Eq. (6) (Goebel et al. 2005). Hence, researchers should use self-starting to calculate \(f(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{I} )\) (Granger et al. 2004; Sharma 2000). A study by Granger et al. (2004) investigated the distribution of the information estimator based on p \(\hat{S}\) for a number of time series models. Finally, a practical alternative to the self-starting system was proposed. Monte Carlo simulation was used to practically determine the distribution of MI estimator, which was used in the first step to develop a stopping criterion based on approximate critical values. In each simulation, MI will be calculated for a series of data compared to Gaussian-white noise data (with sample size n, between 50 and 5000 values), which is used to obtain data with critical values that can be used for the independent test based on MI. For each sample size, first a series \(\varepsilon_{y}\) ~ N(0,1) is constructed and then the marginal probability density functions \(f\varepsilon_{y}\) are calculated. A total of one hundred thousand independent iterations of the series \(\varepsilon_{x}\) ~ N(0,1) were made, independent of \(\varepsilon_{y}\) ~ N(0,1). For each sample of \(\varepsilon_{x}\), the marginal probability density functions \(f_{{\varepsilon_{y} }}\) and \(f_{{\varepsilon_{x} \varepsilon_{y} }}\) were estimated and subsequently \(\hat{I}(\varepsilon_{x} \varepsilon_{y} )\) was evaluated. The critical values of \(\hat{I}(\varepsilon_{x} \varepsilon_{y} )\) are listed in Table 1 at different confidence intervals. Two alternative stopping rules were formulated by which, in each iteration, I'CsYs was compared with the corresponding critical values of Ib (95) and Ib (99) used instead of direct calculations of the self-starting system to determine which variable should be selected or when the algorithm should be stopped. Computational elimination of the self-starting system loop makes the selection of input variables much faster.

Table 1 Optimum values of Holt-Winters model coefficients for the monthly flow at Idenak hydrometric station

AIC (Akaike 1974), as a measure of the relationship between the accuracy of the regression filter and the size of the input set S, is adopted to formulate this stopping criterion. The criteria such as AIC are usually used as the basis for evaluation in model selection. AIC Eq. (13) is as follows:

$${\text{AIC}} = n\log_{e} \left( {\frac{1}{n}\mathop \sum \limits_{i = 1}^{n} u_{i}^{2} } \right) + 2p$$
(13)

where n is the number of observations, ui is the n residuals and p is the number of model parameters. For linear regression, the number of parameters is equal to K + 1, where K is equal to the number of variables. To select effective input variables using AIC, variables that have the minimum AIC value are selected. The outlier selection method is a powerful statistical method to determine whether a given x value significantly deviates from other values in a set of x values. This test evaluates the deviation of a single observation from the mean of all observations. An observation value with a Z-score higher than 3 based on the 3σ rule for normal distribution is usually considered as outlier (outlier has a value higher than 3 times the standard deviation plus the mean of the data set). For the formulation of a stopping rule based on the identification of outliers for PMIS algorithm, the basic assumption is that the data set has a proportion of additional and irrelevant variables and important variables. However, the potential masking of outliers is important and should be considered, assuming that the data set has more than one dependent variable. For this reason, a modified Z value, which uses Hempel's distance, was used instead of increasing the efficiency of the method (Davies and Gather 1993). Hempel's distance is based on the median of the set of inputs. The failure point of test is n/2 and it is known as one of the most powerful tests to identify outlier (Davies and Gather 1993; Pearson 2002). The test starts by calculating the absolute deviation from the PMI mean for all inputs as the following equation:

$$d_{j} = \left| {I_{{C_{j} Y.S}} - I_{{C_{j} Y.S}}^{{\left( {50} \right)}} } \right|$$
(14)

where dj represents the absolute deviation, and \(I_{{C_{j} Y.S}}^{(50)}\) represents the median PMI for the set of C inputs. Therefore, Hempel’s distance can be determined as follows (15):

$$Z_{j} = \frac{{d_{j} }}{{1.4826d_{j}^{{\left( {50} \right)}} }}$$
(15)

where Zj represents Hempel’s distance (modified Z-score) for the input set Cj and \(d_{j}^{(50)}\) represents the median absolute deviation (MAD), dj. A coefficient of 1.4826 modifies the distance so that the Z > 3 rule can be applied, as applied to the conventional Z-test (Pearson 2002). Given this stopping criterion, the PMI algorithm based on input selection no longer contains a self-starting loop, and PMI is not compared with any critical value of I. Instead, the value of Zs is determined to select Cs, and if Zs > 3, the input is selected and added to S, otherwise, the operation of the input variable selection algorithm is stopped. This study has two separate parts, which are described below. First, the input parameters for modeling and predicting the effective flow rate of Maroun river were selected based on PMI algorithm and introduced as the input to ANFIS–ANN–SVM. Second, for the construction of hybrid models, after confirming the accuracy of the fitting of the above model, the residuals from the fitting of the univariate Holt-Winters model were introduced as the input to ANFIS–ANN–SVM for modeling and predicting the monthly inflow to Maroun dam reservoir.

Discussion and results

In this study, to model and predict the monthly flow rate of Maroun river through the proposed models, the data are divided into two series of training and test data. In addition, the monthly data of 2002–2012 was used in the test phase and the monthly data of 2013–2021 was used in the training phase.

Figure 1 shows the histogram of residuals, variance of residuals and residuals on probability paper. The histogram of the residuals of the fitted Holt-Winters model shows the monthly flow rate at Idenak hydrometric station by Minitab. As it is shown, the error distribution of the residuals resulting from the fitting of the above model is normal along a straight line. The probability curve of the data confirms the normality of the residuals of the model, the variance of the residuals is stationary. Also, in the curve of the residuals versus time around the zero horizontal level, the rectangular distribution confirms no trend and the stationarity of the residuals resulting from the fitting is obvious. One of the important points in this study for the Holt-Winter models is the estimation of coefficients of the model. Given that no rule or algorithm was embedded by Minitab to estimate the coefficients of Holt-Winters model, trial and error was used to estimate these coefficients. On the other hand, it was found that by reducing the beta coefficient, the predictive power of the model increased to some extent. But the accuracy of the fit of the model is reduced. This is opposite in the case of gamma and alpha coefficients, i.e. by increasing the values of the alpha and gamma coefficients, the predictive power and accuracy of the Holt-Winters model increase to some extent (Table 1). To select the effective input parameters for modeling and predicting the monthly flow rate of Maroun through the above artificial intelligence models (ANN, ANFIS and SVM), PMI algorithm was used. According to Hempel and AIC, the monthly flow rate with a 3-month lag, and both parameters precipitation and temperature with a 1-month lag, with the lowest values of AIC and the highest values of Hempel, were introduced as the input to all artificial intelligence models (Table 2). To determine the neurons of all three layers in the neural network and the hybrid model (Holt-Winters-ANN), because no algorithm is considered to determine the optimum number of neurons in the neural networks, the trial and error was used. Among different active functions in the neural network, the hyperbolic tangent function was selected as the optimum active function for neural networks while sigmoid function was selected as the optimum function for Holt-Winters hybrid model. Therefore, the function with the error backpropagation with an optimum neural structure for the (5–2–3) ANN and the (6–2–3) Holt-Winters hybrid model was selected. The number and type of membership functions in ANFIS and Holt-Winters-ANFIS hybrid model were selected by trial and error. Since the number of membership functions has a negative impact on the efficiency of the model, the principle of using the lower membership function was used for fitting both ANFIS and Holt-Winters-ANFIS hybrid models. Also, input Gaussian and output Triangular membership function was used as the optimum function in ANFIS and in Holt-Winters-ANFIS hybrid models. The input Triangular and output Triangular membership function was determined for Holt-Winters-ANFIS hybrid models. In this study, RBF kernel function was used to model the monthly flow rate for both SVM and the Holt-Winters-SVM hybrid model. One of the principles of using the kernel function is to find the optimum values of the parameters C, E and Γ for SVM and for the Holt-Winters-Support Vector Machine (HSVM) hybrid model. For this purpose, the network search method was used. Also the optimal values of the mentioned parameters were determined, (C = 2.4, E = 0.0018, Γ = 3.8) For SVM and (C = 2.9, E = 0.0016, Γ = 4.1) for Holt-SVM hybrid models. In the testing (94 root mean square error of 94 and coefficient of explanation of 0.76) and train stage (root mean square error of 102 and coefficient of explanation of 0.71), the Holt Winters models have the weakest performance. Also among the three mentioned artificial intelligence models, the support vector machine model performs better than the other two models in both testing (with root mean square error of 78 and coefficient of explanation of 0.83) and training stages. With root mean square error of 86 and coefficient of explanation of 0.85. Table 3 shows the evaluation criteria of the performance of pre-institutional models for predicting the monthly flow rate at Idenak hydrometric station. Figure 2 shows the distribution of observed monthly flow rate and monthly flow rate predicted by the proposed models.

Fig. 1
figure 1

Histogram of the residuals of Holt-Winters fitted to the monthly flow rate at Idenak hydrometric station

Table 2 Effective input variables in modeling and forecasting the monthly flow based on the results of the PMI algorithm
Table 3 Evaluation criteria of the performance of pre-institutional models for predicting the monthly flow rate at Idenak hydrometric station
Fig. 2
figure 2

Distribution of observed and predicted monthly flow by the proposed models

Conclusion

The evaluation criteria of the performance of the models in Table 3, such as the root mean square error and R2 show the poor performance of the seasonal model of the Holt-Winters time series with root mean square error of 102 and coefficient of explanation of 0.63 in train stage and with root mean square error of 94 and coefficient of explanation of 0.58 in test stage. While all the artificial intelligence models performed much better at both stages than the Holt-Winters model, among the artificial intelligence models, ANN with optimal neural structure (3–5–2) showed some weak performance for modeling the monthly flow at Idenak station with root mean square error of 82 and coefficient of explanation of 0.81, in train stage and with root mean square error of 73 and coefficient of explanation of 0.76 in test stage In general, the performance of the hybrid models is better than all the artificial intelligence models despite the application of the PMI algorithm for improving the performance of the aforementioned models in simulating and predicting river flow. In addition, increasing or decreasing the number of neurons do not improve the performance of the neural artificial model and the hybrid Holt-Winters-neural artificial model with optimal neural structure (2–4–2) in predicting river flow. Also increasing or decreasing the number of the membership function do not improve the performance of ANFIS models and Holt-Winters ANFIS hybrid model in predicting river flow discharge. Among the various hybrid models used in this study, the Holt-ANFIS hybrid model showed a good ability to simulate peak flow rate with root mean square error of 68 and coefficient of explanation of 0.89 in train stage and with root mean square error of 54 and coefficient of explanation of 0.83 in test stage. In this research, since the three artificial intelligence models, compared to the Holt-Winters model and the aforementioned hybrid models, have many input parameters for modeling and forecasting the river flow, it seems that using hybrid models is a suitable solution to prevent wasting time by using less input parameters.