Daily Water Level Prediction of Zrebar Lake (Iran): A Comparison between M5P, Random Forest, Random Tree and Reduced Error Pruning Trees Algorithms

Nhu, Viet-Ha; Shahabi, Himan; Nohani, Ebrahim; Shirzadi, Ataollah; Al-Ansari, Nadhir; Bahrami, Sepideh; Miraki, Shaghayegh; Geertsema, Marten; Nguyen, Hoang

doi:10.3390/ijgi9080479

Open AccessArticle

Daily Water Level Prediction of Zrebar Lake (Iran): A Comparison between M5P, Random Forest, Random Tree and Reduced Error Pruning Trees Algorithms

¹

Geographic Information Science Research Group, Ton Duc Thang University, Ho Chi Minh City 700000, Vietnam

²

Faculty of Environment and Labour Safety, Ton Duc Thang University, Ho Chi Minh City 700000, Vietnam

³

Department of Geomorphology, Faculty of Natural Resources, University of Kurdistan, Sanandaj 66177-15175, Iran

⁴

Board Member of Department of Zrebar Lake Environmental Research, Kurdistan Studies Institute, University of Kurdistan, Sanandaj 66177-15175, Iran

⁵

Young Researchers and Elite Club, Dezful Branch, Islamic Azad University, Dezful 64616-45169, Iran

⁶

Department of Rangeland and Watershed Management, Faculty of Natural Resources, University of Kurdistan, Sanandaj 66177-15175, Iran

⁷

Department of Civil, Environmental and Natural Resources Engineering, Lulea University of Technology, 971 87 Lulea, Sweden

⁸

Department of Hydrological Sciences, University of Nevada, 89557-02601-775-685-8040, Reno, NV 89557, USA

⁹

Department of Watershed Sciences Engineering, Faculty of Natural Resources, University of Agricultural Science and Natural Resources of Sari, Mazandaran 48181-68984, Iran

¹⁰

Natural Resource Operations and Rural Development, Ministry of Forests, Lands, Prince George, BC V2L 1R5, Canada

¹¹

Institute of Research and Development, Duy Tan University, Da Nang 550000, Vietnam

Show full affiliation list

Hide full affiliation list

^*

Authors to whom correspondence should be addressed.

ISPRS Int. J. Geo-Inf. 2020, 9(8), 479; https://doi.org/10.3390/ijgi9080479

Submission received: 17 June 2020 / Revised: 23 July 2020 / Accepted: 29 July 2020 / Published: 31 July 2020

(This article belongs to the Special Issue The Use of GIS and Soft Computing Methods in Water Resource Planning)

Download

Browse Figures

Versions Notes

Abstract

:

Zrebar Lake is one of the largest freshwater lakes in Iran and it plays an important role in the ecosystem of the environment, while its desiccation has a negative impact on the surrounded ecosystem. Despite this, this lake provides an interesting recreation setting in terms of ecotourism. The prediction and forecasting of the water level of the lake through simple but practical methods can provide a reliable tool for future lake water resource management. In the present study, we predict the daily water level of Zrebar Lake in Iran through well-known decision tree-based algorithms, including the M5 pruned (M5P), random forest (RF), random tree (RT) and reduced error pruning tree (REPT). We used five different water input combinations to find the most effective one. For our modeling, we chose 70% of the dataset for training (from 2011 to 2015) and 30% for model evaluation (from 2015 to 2017). We evaluated the models’ performances using different quantitative (root mean square error (RMSE), mean absolute error (MAE), coefficient of determination (R²), percent bias (PBIAS) and ratio of the root mean square error to the standard deviation of measured data (RSR)) and visual frameworks (Taylor diagram and box plot). Our results showed that water level with a one-day lag time had the highest effect on the result and, by increasing the lag time, its effect on the result was decreased. This result indicated that all the developed models had a good prediction capability, but the M5P model outperformed the others, followed by RF and RT equally and then REPT. Our results showed that these algorithms can predict water level accurately only with a one-day lag time in water level as an input and they are cost-effective tools for future predictions.

Keywords:

lake water level; lag time; decision tree algorithms; prediction accuracy; forecasting water level

1. Introduction

Many lakes, in addition to being important aquatic ecosystems, supply water for irrigation in domestic, agricultural and industrials areas [1]. The prediction of the daily water level fluctuation of lakes is important in water resource planning and catchment management, hydroelectric power facilities and navigation and and domestic, agricultural and industrial water extraction [2]. Water levels in lakes depend on the natural water exchange between lakes and their catchments and can be sensitive to climate change, and particularly precipitation and evaporation [3,4]. Water levels are also influenced by groundwater recharge, water extraction, artificial dams, floods, etc. [5,6,7,8,9]. Recent studies on water level fluctuation have concentrated on declining lake levels, as well as diminished discharge into outlet streams [10]. Both physical measurements and modeling are employed for mapping water level fluctuations, with modeling being a less expensive alternative [11].

Machine learning (ML) models, such as logistic regression (LR), support vector machines (SVMs) and artificial neural networks (ANNs), are increasingly used in land and water management disciplines due to their high performance, accuracy and predictive capability [12,13,14]. ML algorithms have been applied in flood susceptibility mapping [15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31], rainfall–runoff modeling [32,33], reservoir inflow forecasting [34,35], stream flow prediction [36,37], suspended sediment estimation [38,39] and the estimation of daily reference evapotranspiration [40,41]. Bayes-based algorithms, such as Bayesian logistic regression (BLR) and decision tree algorithms, such as random forest (RF), alternating decision tree (ADT), logistic model trees (LMT), naïve Bayes tree (NBT), reduced error pruning tree (REPTree) and classification and regression trees (CARTs), have been applied in water resource issues, especially in flood susceptibility mapping [18,42,43,44,45]. Khosravi et al. [46] developed a hybrid algorithm of bagging–decision tree for bed load transport rate prediction. Khosravi et al. [47] predicted fluoride concentration in groundwater though different lazy learners. Bui et al. [48] predicted the water quality index (WQI) through different bagging (BA), CVparameter selection (CVPS) and randomizable filtered classification (RFC). Bui et al. [49], in another piece of research, predicted heavy metal concentration in groundwater using a Gaussian process (GP) algorithm. Salih et al. [50] developed attribute selected classifier (M5P), M5Rule (M5R) and KStar (KS) models for suspended sediment load prediction at the Trenton meteorological station on the Delaware River, USA. However, the application of different decision tree-based ML algorithms as powerful and robust algorithms for predicting daily water level fluctuation is still limited in the literature and rarely applied in predicting daily water levels in mountainous lakes.

In this study, our goal is to assess different decision tree-based ML algorithms for predicting daily water level fluctuation in Zrebar Lake, including the M5 pruned (M5P), random forest (RF), random tree (RT) and reduced-error pruning tree (REPT) algorithms. The validation and comparison of these models was done using different quantitative indices, such as root mean square error (RMSE), correlation coefficient (R²), mean absolute error (MAE), percent bias (PBIAS), ratio of the root mean square error to the standard deviation of measured data (RSR) and Nash–Sutcliffe coefficient (NSE). The modeling and data process were designed and coded with MATLAB 2018.

2. Study Area

Zrebar Lake is located 3 km west of Marivan City in Kurdistan Province in western Iran (Figure 1). The lake is 1285 m asl, has a length of 5 km, a maximum width of 1.6 km, a maximum depth of 6 m, an area of 8.9 km² and a volume of some 30 million m³ [51]. The water of Zrebar Lake is provided by seasonal rainfall (average precipitation is 650 mm/year) and subaqueous lake-bottom springs. The lake is immediately surrounded by a wetland complex and further surrounded by mountains, forests and croplands. Irrigated farms and rain-fed croplands comprise 60% of the land cover, oak forests 23% and grasslands cover 17% [52]. Zrebar Lake is the only freshwater lake in western Iran and is important for the local economy in terms of tourism, irrigation and fish harvesting.

3. Materials and Methods

3.1. Data Assemblage and Preparation

One of the challenges in modeling nonlinear hydrological processes is choosing the most important variables from all possible input variables [54]. Input selection is critical for learning systems, particularly during the identification process, when the dataset is big and the number of variables is large [55]. The primary objective of data assemblage and preparation is to select the appropriate input variables based upon the available data. In the model, the combination of various input variables, also known as feature selection (variable subset selection), is a process of selecting the optimum subset of inputs according to established governing principles [56]. The tweaking of models through such selection is done to increase model accuracy and efficiency (to reduce calibration time).

In the current study, we applied different input variable combinations to solve emerging issues during the modeling process. As inflow to Zrebar Lake is only primarily from subaqueous springs, we chose antecedent water level as a variable. In this context, we used various combinations of water level (WL) at different lag times (i.e., WL(t-1 to t-5)). For the purpose of this study, we calibrated the features using observed daily water level data across 6 years from September 2011 to September 2017. We initially defined five scenarios, starting with WL for the present day (WL(t-1)), and then continued with combination for the previous 2, 3, 4 and 5 days (Table 1). All these input scenarios were applied to develop a model (i.e., model training) to predict lake water level as an output. We then calculated the prediction accuracy for every different scenario. Generally, the best input variable was a combination of WL(t-1), WL(t-2), WL(t-3), WL(t-4), WL(t-5) and the output variable (WL(t)) selected for model training. Next, we developed algorithms that were applied to predict WL(t) using the testing dataset.

For the purpose of modeling, we used 70% of the dataset for training (from 2011 to 2015) and 30% for testing (from 2015 to 2017) [57]. While there is no universally applied guide for data division, the training and the testing datasets have to carry similar statistical properties and 70:30 is the most commonly used ratio [27,58,59,60,61,62].

3.2. Methodology

3.2.1. M5P

The M5 tree is a highly accurate and computationally cheap state-of-the-art model amongst decision tree learners which works based on regression tasks that have very high dimensionality. It was developed by Quinlan [63]. M5, instead of assigning a constant to the leaf node, allocates a multivariate linear regression model at each leaf to forecast numerical values. Therefore, the performance of an M5 tree model is highly dependent on the chosen linear models. Amongst M5 models, M5P is a binary regression tree on which the last leaf nodes are the linear regression functions that yield continuous numerical attributes. Thereafter, to do tree pruning, tree evacuation and substitution are performed by a linear function approximation, which diminishes variance in cells and creates smaller nodes with a tree-like structure.

The M5P method is capable of handling large datasets, along with missing data recovered by dividing input spaces into various smaller sub-spaces. In general, the minimum number of instances, batch size, constructed regression trees, number of decimal places and unpruned and unchecked capabilities are all advantages of M5P models. A more detailed study has been investigated by Khorsvai et al. [64] about the M5P modeling approach.

3.2.2. Random Forest (RF)

RF, first designated by Breimen et al. [65], is an ensemble approach for building predictive models for both classification and regression tasks. It is a way of combining less predictive base models to yield better predictive models. Due to their simple nature, low assumptions and high performance, RF models have been broadly used in machine learning (ML). The term “forest” refers to a series of decision trees that are by themselves “weak” classifiers. A regression forest does not have the same predictive power as a singular regression tree [65,66]. Where a single tree splits into just one criterion, it is then very sensitive to the training dataset. Even small changes in the dataset and splitting criterion can prime different tree structures and yield different explanations [66]. Therefore, RF models classify the variables based upon their importance to attain the best RF model.

3.2.3. Random Tree (RT)

RT divides a dataset into sub-spaces and fits a constant for each sub-space. A single tree model has a tendency to be very unstable and shows a poor prediction accuracy. However, by bagging RT as a decision tree algorithm, it can yield highly accurate results [67]. RT has high flexibility along with fast training capability [68].

3.2.4. Reduced Error Pruning Tree (REPT)

When a decision tree is constructed, due to noise or outliers, several branches reproduce variances in the training dataset. This problem has been addressed as over fitting in tree pruning, which uses statistical procedures to eliminate the less accurate branches and generally includes pre-pruning and post-pruning. The principle incentive of pruning is “trading accuracy for simplicity”. REPT is an integrated method of the reduced error pruning (REP) and the DT approaches, in which the pruned trees are produced by using the test data. It uses the validation dataset to estimate generalization error. This method was first employed by Quinlan [63] when they applied a decision tree based on the available information and variance reduction. The advantage of REPT lies in its ability to reduce the complexity of the tree by pruning, which decreases the dimensions of a decision tree and over-fitting during the learning process without a substantial accuracy loss [69]. Thus, a pruning process is needed to cut the tree back. REPT is capable of fast learning by decreasing variance to create decision trees [70].

3.2.5. Model Evaluation and Comparison

To validate and compare the models, the five quantitative statistics, including: root mean square error (RMSE), correlation coefficient (R²), mean absolute error (MAE), percent bias (PBIAS), ratio of the root mean square error to the standard deviation of measured data (RSR) and Nash–Sutcliffe coefficient (NSE), were utilized to assess the performance of the evaluation methods. Furthermore, to visually compare the model performance, Taylor diagrams and boxplots were investigated [71]. Taylor diagrams were introduced by Taylor [71] tographically show how closely an estimated output (or a set of estimated outputs) matches observations. They are plotted based on the correlation and standard deviation of the estimated and observed datasets. It is especially useful in evaluating multiple aspects of complex models or in gauging the relative skill of many different models (e.g., [72]). However, the box plot provides more information about data distribution, as well as maximum and minimum values, which is important in modeling.

These indices are expressed by the following equations:

R^{2} = \frac{\sum_{i = 1}^{n} (WL (i) - \bar{WL} (i)) (\hat{WL} (i) - \hat{\bar{WL}} (i))}{\sqrt{\sum_{i = 1}^{n} {[WL (i) - \bar{WL} (i)]}^{2}} \sqrt{\sum_{i = 1}^{n} {[\hat{WL} (i) - \tilde{y WL} (i)]}^{2}}}

(1)

RMSE = \sqrt{\frac{1}{n_{}} \sum_{i = 1}^{n} {[WL (i) - \hat{WL} (i)]}^{2}} .

(2)

MAE = \frac{1}{n} \sum_{i = 1}^{n} | WL (i) - \hat{WL} (i) | .

(3)

PBIAS = 100 * (\frac{\sum_{i = 1}^{n} (WL (i) - \hat{WL} (i))}{\sum_{i = 1}^{n} WL (i)})

(4)

NSE = 1 - (\frac{\sum_{i = 1}^{n} (WL (i) - \hat{WL} (i))}{\sum_{i = 1}^{n} WL (i) - \bar{WL (i)})})

(5)

PSR = (\frac{\sum_{i = 1}^{n} {(WL (i) - \hat{WL} (i))}^{2}}{\sqrt{\sum_{i = 1}^{n} {(WL (i) - \bar{WL (i)})}^{2}}})

(6)

where

n

is the quantity of samples,

WL

is the actual value of the output,

\bar{WL}

is the average of

WL

over the entire target set,

\tilde{WL}

is the average of

\hat{WL}

over the entire target set and

\hat{WL}

is the simulated output value.

R² describes the degree of collinearity between our simulated and measured data. It ranges between 0 and 1, with higher

R^{2}

values designating better prediction accuracy, and values greater than 0.5 are considered acceptable [73]. The RMSE and MAE measure the error of the models. Lower values of RMSE and MAE designate better model predictive performance. The NSE is a normalized statistic that governs the relative extent of the residual variance compared to the measured data variance [74]. The NSE ranges between −∞ and 1. It designates a flawless match between observed and predicted values when NSE = 1. Model predictive performance is categorized as very good, good, acceptable or unacceptable with the ranges of 0.75 < NSE ≤ 1.00, 0.65 < NSE ≤ 0.75, 0.50 < NSE ≤ 0.65, 0.40 < NSE ≤ 0.50 or NSE ≤ 0.4, respectively [64,75]. The PBIAS determines the average inclination of the simulated values to be larger or smaller than the observed values [76] and, hence, it is the best metric to show over- or underestimation [64]. It varies from −∞ to 1, with negative values showing overestimation, while positive values indicates underestimation [77]. The RSR is intended as the ratio of the RMSE and standard deviation of measured data. The RSR differs from the optimal value of 0 to a large positive value. A lower RSR means a lower RMSE, which shows a better model predictive performance. RSR classification ranges are represented as very good, good, acceptable and unacceptable with ranges of 0.00 ≤ RSR ≤ 0.50, 0.50 ≤ RSR ≤ 0.60, 0.60 ≤ RSR ≤ 0.70 and RSR > 0.70, respectively [64].

In addition, two convenient graphical evaluation tools, such as Taylor diagrams and boxplots, were applied to visually compare the model performance [71]. Taylor diagrams deliver the similarity between two patterns and how strictly a model pattern ties to observation. It uses three corresponding model performance statistics, including the standard deviation (sigma), R², and the RMSE, which can be plotted on a two-dimensional graph by the law of cosines. In general, Taylor diagrams are a useful tool for assessing the comparative skill of different models. Furthermore, boxplots have been used for the purpose of evaluation, as they present five statistics, including minimum, lower quartile, median, upper quartile and maximum, in a graphic presentation. The schematic diagram of the methodology is illustrated in Figure 2.

4. Results and Analysis

We tested the performance of four models to predict daily water level fluctuation in both training and testing stages using various evaluation criteria (Table 2). According to our statistical evaluation criteria, we observed that all the models had very good predictive ability (R² > 0.7). Our result of the coefficient of determination (R²) showed that these models are all acceptable but the M5P model performed best, due to having the highest R² value (0.9874), followed by the RF (0.9697), RT (0.9654) and REPT (0.965) models. In terms of the RMSE value, the M5P model also had the highest predictive power by having the lowest RMSE (0.05), followed by the RF and RT (0.09) and REPT (0.1) models. The M5P model yielded the lowest MAE criteria (0.01), followed by the RF (0.02), RT (0.03) and REPT (0.033) models. In addition, the NSE metric was classified from greatest predictive power to least, as follows: M5P (0.98) > RF and RT (0.96) > REPT (0.95), which is similar to R². Additionally, the results from the PBIAS reveal that all of the applied models underestimated water levels (due to a positive value of PBIAS). The calculated PBIAS values were between 0–0.2 for all models, indicating a very good performance in predicting daily water level fluctuation in the study area. Finally, the performance all of the applied models, based on the RSR values, was classified into four classes: very good, good, satisfactory and unsatisfactory with ranges of 0.00 ≤ RSR ≤ 0.50, 0.50 ≤ RSR ≤ 0.60, 0.60 ≤ RSR ≤ 0.70 and RSR > 0.70, respectively. Therefore, the RSR shows a very good performance across all our developed models.

We used the Pearson correlation coefficient (PCC) to calculate the relative importance of the input variables (WL(t-1 to (t-5)) and daily water level for different time lags to determine the most important factor for the prediction of daily water level. These were WL(t-1) (0.981), followed by WL(t-2) (0.964), WL(t-3) (−0.946), WL(t-4) (0.928) and WL(t-5) (0.925) and they show that water level with a one-day lag time had the highest effect on the result and this effectiveness reduced with greater lag times. The R values are represented in Table 3, indicating information for the given inputs and output/target variables. The results of PCC show that WL(t-1) and WL(t-5) had the highest and lowest daily water level values, respectively (PCC = 0.981 and 0.925). After the completion of the correlation analysis, we applied the best input combination by using the testing dataset shown in Table 4. Base on the R values, we evaluated different input combinations in the models (M5P, RF, RT and REPT) at both the training and testing stage. We found that compound WL(t-1) (combination 1) for all developed models, except M5P (REPT, RT and RF), was the best input combination due to the highest R in the testing phase, with values of 0.982, 0.981 and 0.980, respectively. For M5P, the most effective combination was WL(t-1) and WL(t-2) (combination 2) with a value of R = 0.933. We found that, overall, the M5P model generally had a better fitting accuracy and the highest correlation among the input variables and that the RF model had the poorest accuracy in the approximation of the training data.

Figure 3 shows the line graphs and scatter plots of the observed and predicted daily water levels. The result shows that all the models predicted daily water level with a high level of accuracy, while only M5P was able to perfectly predict the maximum values of the water level fluctuation of Zrebar Lake. Also, M5P was the closest to the observed water level values and best line (45° line), with minimum scatter with the linear equation

w l^{p r e d} = 19.576 w l^{o b s} + 0.9848

. Conversely, the REPT model provided the worst estimates with maximum scatter (Figure 4). This confirms that M5P outperformed the other models, including RF, RT and REPT. This result is in accordance with the evaluation criteria presented in Table 2.

We also further analyzed model efficiency using Taylor diagrams (Figure 5) and boxplots (Figure 6). The closer the point of each developed model to the observed point location, the higher the performance. Here, our results also show that the models had good predictive power, but the M5P algorithm had a higher correlation and lower RMSE. Based on the normalized standard deviation values, the SD of the M5P model was similar to the observed SD, whereas REPT had a lower standard deviation, followed by the RT and RF models.

The results of the boxplot are presented in Figure 6. The boxplot for predicting maximum values by the M5P model was closer to the observed values, whereas REPT, RT and RF underestimated water levels. In term of quartile, the median and minimum values of all the models were able to predict WL values similar to the observed values with a significant degree of accuracy, although the M5P model outperformed the other models.

5. Discussion

Lakes can be complex ecosystems and provide numerous uses to society, ranging from drinking water supplies, recreation, navigation, irrigation, hydroelectric power and more. Tools that predict water level fluctuation are important for the management of lakes, water consumption and their surrounding catchments. In this study, we developed and applied advanced soft computing decision tree-based ML algorithms, including M5P, RF, RT and REPT, for predicting the daily water level fluctuation of Zrebar Lake, Kurdistan Province, Iran. To our knowledge, this is the first time these models have been used for predicting the daily water levels of lakes.

We computed and measured the predictive performance of the learning models for training and validation datasets by RMSE, MAE, NSE, PBIAS, RSR and R² criteria, as used by others [78,79,80,81,82,83,84,85]. After implementing the learning models, we created a histogram of actual and estimated values, a Taylor diagram, scatterplots and a boxplot. The results of the validation phases are of greater importance than the performance of the evaluation by the training dataset (modeling phase) [28,86]. We observed that, although the models were well trained and successfully performed in all scenarios, the M5P model under the second scenario (WL(t-1), WL(t-2)) outperformed and outclassed the REPT, RT and RF models, which had a high performance in the first scenario (only WL(t-1)) only. The reason the M5P model succeeded over the other models is probably related to the advantages of this model. The first advantage is its more efficient learning process, which does not rely on assumptions of data type and distribution, can handle many attributes and high dimensions and is robust in dealing with missing data. The second advantage of M5P is its ability to construct a simple tree structure and applicable linear equations in multiple leaves, with which it can explicitly explain the relationship between the variable inputs and output parameters [64,87].

Other models, like the RF and RT models, are also known as decision tree-based algorithms used for both classification and regression problems, but these models are limited in their abilities to construct a large numbers of trees, making the algorithms slow and ineffective for real-time predictions. According to Kisi et al. [5], decision tree-based algorithms (M5P, RT and RF) have a higher prediction power than models with hidden layers in their structures (ANN, adaptive-neuro-fuzzy inference system (ANFIS) and fuzzy logic (FL)).

Other water researchers who have used the M5P model have reported mixed levels of performance [88,89,90]. For example, Balouchi et al. [86] found that the M5P model performance was inferior to the ANN-MLP and radial basis function neural network (ANN-RBF) models for the prediction of scour depth at river confluences. In contrast, Onyari and Ilunga [90] also compared multilayer neural networks (ANN-MLP) with M5P tree models to predict the stream flow in Luvuvhu Catchment, South Africa, with the M5P model yielding better predictions.

In our study, the M5P model provided the best prediction of the daily water level fluctuation of Zrebar Lake. The difference in the results from the modeling process compared to other models with less favorable results requires further scrutiny. While M5P was optimal for predicting lake levels at Zrebar Lake, it underperformed for others, as described above, and thus requires further testing in other settings. The main limitation of the current research is the lack of a comprehensive dataset, such as rainfall, inflow discharge, evaporation and so on, which have a meaningful effect on the result. It is recommended to compare the results of the present study with ensemble-based models and optimization algorithms to develop a more robust algorithm.

6. Conclusions

The accurate prediction of lake water level fluctuation can help guide the sustainable development and management of lake water usage. In this study, we tested and developed a number of state-of-the-art soft computing benchmark ML models, including M5P, RF, RT and REPT, to spatially predict the daily water level fluctuation of Zrebar Lake, Kurdistan Province, Iran. We used various scenarios, based on the combination of data inputs, to select the optimal parameters. We evaluated the performance of the developed models quantitatively using RMSE, MAE, NSE, PBIAS, RSR and R² measures. Our results are summarized as follows:

The M5P model outperformed the other models when WL(t-1) and WL(t-2) variables, the second scenario, were selected as inputs, which implied a combination of one- and two-day lag times of water level prediction. The best performance by other ML models was achieved with a one-day lag time of real measured water levels.

Our results showed that the M5P had the highest power performance and accuracy (the lowest RMSE and the highest R²), in comparison to other machine learning models. Additionally, the M5P model had a tighter fit to the observed data based on the scatter plots and histograms of actual and estimated values, thus showing promise for wider applications in water level prediction.

A lake with a one-day lag time has the highest effect on the results, while its effectiveness reduces with more lag time.

The best input scenario is the one in which all input variables are considered.

The M5P model is able to predict maximum lake water level perfectly, compared to other developed algorithms.

Author Contributions

All authors contributed equally to the work. Conceptualization, Viet-Ha Nhu, Himan Shahabi, Ebrahim Nohani, Ataollah Shirzadi, Nadhir Al-Ansari, Sepideh Bahrami, Shaghayegh Miraki, Marten Geertsema and Hoang Nguyen; data curation, Viet-Ha Nhu, Himan Shahabi, Ebrahim Nohani, Ataollah Shirzadi, Nadhir Al-Ansari, Sepideh Bahrami, Shaghayegh Miraki, Marten Geertsema and Hoang Nguyen; formal analysis, Viet-Ha Nhu, Himan Shahabi, Ebrahim Nohani, Ataollah Shirzadi, Nadhir Al-Ansari, Sepideh Bahrami, Shaghayegh Miraki, Marten Geertsema and Hoang Nguyen; investigation, Viet-Ha Nhu, Himan Shahabi, Ebrahim Nohani, Ataollah Shirzadi, Nadhir Al-Ansari, Sepideh Bahrami, Shaghayegh Miraki, Marten Geertsema. and Hoang Nguyen; methodology, Viet-Ha Nhu, Himan Shahabi, Ebrahim Nohani, Ataollah Shirzadi, Nadhir Al-Ansari, Sepideh Bahrami, Shaghayegh Miraki, Marten Geertsema and Hoang Nguyen; software, Viet-Ha Nhu, Himan Shahabi, Ebrahim Nohani, Ataollah Shirzadi, Nadhir Al-Ansari, Sepideh Bahrami, Shaghayegh Miraki, Marten Geertsema and Hoang Nguyen; validation, Viet-Ha Nhu, Himan Shahabi., Ebrahim Nohani, Ataollah Shirzadi, Nadhir Al-Ansari, Sepideh Bahrami, Shaghayegh Miraki, Marten Geertsema and Hoang Nguyen; visualization, Viet-Ha Nhu, Himan Shahabi, Ebrahim Nohani, Ataollah Shirzadi, Nadhir Al-Ansari, Sepideh Bahrami., Shaghayegh Miraki, Marten Geertsema and Hoang Nguyen; writing—original draft preparation, Viet-Ha Nhu, Himan Shahabi, Ebrahim Nohani, Ataollah Shirzadi, Nadhir Al-Ansari, Sepideh Bahrami, Shaghayegh Miraki, Marten Geertsema and Hoang Nguyen; writing—review and editing, Viet-Ha Nhu, Himan Shahabi, Ebrahim Nohani, Ataollah Shirzadi, Nadhir Al-Ansari, Sepideh Bahrami, Shaghayegh Miraki, Marten Geertsema and Hoang Nguyen; All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the University of Kurdistan, Iran, based on grant number GRC98-04469-1.

Conflicts of Interest

The authors declare no conflict of interest.

References

Vuglinskiy, V. Water Level in Lakes and Reservoirs, Water Storage; Global Terrestrial Observing System: Rome, Italy, 2009. Available online: http://www.fao.org/gtos/doc/ECVs/T04/T04.pdf (accessed on 21 July 2020).
Hwang, C.; Cheng, Y.-S.; Han, J.; Kao, R.; Huang, C.-Y.; Wei, S.-H.; Wang, H. Multi-decadal monitoring of lake level changes in the qinghai-tibet plateau by the topex/poseidon-family altimeters: Climate implication. Remote Sens. 2016, 8, 446. [Google Scholar] [CrossRef] [Green Version]
Karimi, S.; Shiri, J.; Kisi, O.; Makarynskyy, O. Forecasting water level fluctuations of urmieh lake using gene expression programming and adaptive neuro-fuzzy inference system. Int. J. Ocean Clim. Syst. 2012, 3, 109–125. [Google Scholar] [CrossRef] [Green Version]
Altunkaynak, A. Forecasting surface water level fluctuations of lake van by artificial neural networks. Water Resour. Manag. 2007, 21, 399–408. [Google Scholar] [CrossRef]
Kisi, O.; Shiri, J.; Nikoofar, B. Forecasting daily lake levels using artificial intelligence approaches. Comput. Geosci. 2012, 41, 169–180. [Google Scholar] [CrossRef]
Rahmati, O.; Choubin, B.; Fathabadi, A.; Coulon, F.; Soltani, E.; Shahabi, H.; Mollaefar, E.; Tiefenbacher, J.; Cipullo, S.; Ahmad, B.B. Predicting uncertainty of machine learning models for modelling nitrate pollution of groundwater using quantile regression and uneec methods. Sci. Total Environ. 2019, 688, 855–866. [Google Scholar] [CrossRef]
Tien Bui, D.; Shirzadi, A.; Chapi, K.; Shahabi, H.; Pradhan, B.; Pham, B.T.; Singh, V.P.; Chen, W.; Khosravi, K.; Bin Ahmad, B. A hybrid computational intelligence approach to groundwater spring potential mapping. Water 2019, 11, 2013. [Google Scholar] [CrossRef] [Green Version]
Chen, W.; Pradhan, B.; Li, S.; Shahabi, H.; Rizeei, H.M.; Hou, E.; Wang, S. Novel hybrid integration approach of bagging-based fisher’s linear discriminant function for groundwater potential analysis. Nat. Resour. Res. 2019, 28, 1239–1258. [Google Scholar] [CrossRef] [Green Version]
Rahmati, O.; Naghibi, S.A.; Shahabi, H.; Bui, D.T.; Pradhan, B.; Azareh, A.; Rafiei-Sardooi, E.; Samani, A.N.; Melesse, A.M. Groundwater spring potential modelling: Comprising the capability and robustness of three different modeling approaches. J. Hydrol. 2018, 565, 248–261. [Google Scholar] [CrossRef]
Leira, M.; Cantonati, M. Effects of Water-Level Fluctuations on Lakes: An Annotated Bibliography. In Ecological Effects of Water-Level Fluctuations in Lakes; Springer: Berlin/Heidelberg, Germany, 2008; pp. 171–184. [Google Scholar]
Dai, X.; Wan, R.; Yang, G. Non-stationary water-level fluctuation in china’s poyang lake and its interactions with yangtze river. J. Geogr. Sci. 2015, 25, 274–288. [Google Scholar] [CrossRef] [Green Version]
Ahmadlou, M.; Karimi, M.; Alizadeh, S.; Shirzadi, A.; Parvinnejhad, D.; Shahabi, H.; Panahi, M. Flood susceptibility assessment using integration of adaptive network-based fuzzy inference system (anfis) and biogeography-based optimization (bbo) and bat algorithms (ba). Geocarto Int. 2019, 34, 1252–1272. [Google Scholar] [CrossRef]
Naghibi, S.A.; Pourghasemi, H.R. A comparative assessment between three machine learning models and their performance comparison by bivariate and multivariate statistical methods in groundwater potential mapping. Water Resour. Manag. 2015, 29, 5217–5236. [Google Scholar] [CrossRef]
Bowden, G.J.; Maier, H.R.; Dandy, G.C. Optimal division of data for neural network models in water resources applications. Water Resour. Res. 2002, 38, 2-1-2-11. [Google Scholar] [CrossRef] [Green Version]
Pradhan, B. Flood susceptible mapping and risk area delineation using logistic regression, gis and remote sensing. J. Spat. Hydrol. 2010, 9, 1–18. [Google Scholar]
Kia, M.B.; Pirasteh, S.; Pradhan, B.; Mahmud, A.R.; Sulaiman, W.N.A.; Moradi, A. An artificial neural network model for flood simulation using gis: Johor river basin, malaysia. Environ. Earth Sci. 2012, 67, 251–264. [Google Scholar] [CrossRef]
Tehrany, M.S.; Pradhan, B.; Jebur, M.N. Flood susceptibility mapping using a novel ensemble weights-of-evidence and support vector machine models in gis. J. Hydrol. 2014, 512, 332–343. [Google Scholar] [CrossRef]
Choubin, B.; Moradi, E.; Golshan, M.; Adamowski, J.; Sajedi-Hosseini, F.; Mosavi, A. An ensemble prediction of flood susceptibility using multivariate discriminant analysis, classification and regression trees, and support vector machines. Sci. Total Environ. 2019, 651, 2087–2096. [Google Scholar] [CrossRef]
Abbaszadeh, P.; Alipour, A.; Asadi, S. Development of a coupled wavelet transform and evolutionary l evenberg-m arquardt neural networks for hydrological process modeling. Comput. Intell. 2018, 34, 175–199. [Google Scholar] [CrossRef]
Asadi, S. Evolutionary fuzzification of ripper for regression: Case study of stock prediction. Neurocomputing 2019, 331, 121–137. [Google Scholar] [CrossRef]
Asadi, S.; Shahrabi, J. Complexity-based parallel rule induction for multiclass classification. Inf. Sci. 2017, 380, 53–73. [Google Scholar] [CrossRef]
Chen, W.; Li, Y.; Xue, W.; Shahabi, H.; Li, S.; Hong, H.; Wang, X.; Bian, H.; Zhang, S.; Pradhan, B. Modeling flood susceptibility using data-driven approaches of naïve bayes tree, alternating decision tree, and random forest methods. Sci. Total Environ. 2020, 701, 134979. [Google Scholar] [CrossRef] [PubMed]
Shahabi, H.; Shirzadi, A.; Ghaderi, K.; Omidvar, E.; Al-Ansari, N.; Clague, J.J.; Geertsema, M.; Khosravi, K.; Amini, A.; Bahrami, S. Flood detection and susceptibility mapping using sentinel-1 remote sensing data and a machine learning approach: Hybrid intelligence of bagging ensemble based on k-nearest neighbor classifier. Remote Sens. 2020, 12, 266. [Google Scholar] [CrossRef] [Green Version]
Wang, Y.; Hong, H.; Chen, W.; Li, S.; Panahi, M.; Khosravi, K.; Shirzadi, A.; Shahabi, H.; Panahi, S.; Costache, R. Flood susceptibility mapping in dingnan county (china) using adaptive neuro-fuzzy inference system with biogeography based optimization and imperialistic competitive algorithm. J. Environ. Manag. 2019, 247, 712–729. [Google Scholar] [CrossRef]
Chen, W.; Hong, H.; Li, S.; Shahabi, H.; Wang, Y.; Wang, X.; Ahmad, B.B. Flood susceptibility modelling using novel hybrid approach of reduced-error pruning trees with bagging and random subspace ensembles. J. Hydrol. 2019, 575, 864–873. [Google Scholar] [CrossRef]
Khosravi, K.; Melsse, A.M.; Shahabi, H.; Shirzadi, A.; Chapi, K.; Hong, H. Flood Susceptibility Mapping at Ningdu Catchment, China Using Bivariate and Data Mining Techniques. In Extreme Hydrology and Climate Variability; Elsevier: Amsterdam, The Netherlands, 2019; pp. 419–434. [Google Scholar]
Tien Bui, D.; Khosravi, K.; Shahabi, H.; Daggupati, P.; Adamowski, J.F.; Melesse, A.M.; Thai Pham, B.; Pourghasemi, H.R.; Mahmoudi, M.; Bahrami, S. Flood spatial modeling in northern iran using remote sensing and gis: A comparison between evidential belief functions and its ensemble with a multivariate logistic regression model. Remote Sens. 2019, 11, 1589. [Google Scholar] [CrossRef] [Green Version]
Bui, D.T.; Panahi, M.; Shahabi, H.; Singh, V.P.; Shirzadi, A.; Chapi, K.; Khosravi, K.; Chen, W.; Panahi, S.; Li, S. Novel hybrid evolutionary algorithms for spatial prediction of floods. Sci. Rep. 2018, 8, 1–14. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Tien Bui, D.; Khosravi, K.; Li, S.; Shahabi, H.; Panahi, M.; Singh, V.P.; Chapi, K.; Shirzadi, A.; Panahi, S.; Chen, W. New hybrids of anfis with several optimization algorithms for flood susceptibility modeling. Water 2018, 10, 1210. [Google Scholar] [CrossRef] [Green Version]
Shafizadeh-Moghadam, H.; Valavi, R.; Shahabi, H.; Chapi, K.; Shirzadi, A. Novel forecasting approaches using combination of machine learning and statistical models for flood susceptibility mapping. J. Environ. Manag. 2018, 217, 1–11. [Google Scholar] [CrossRef] [Green Version]
Khosravi, K.; Pham, B.T.; Chapi, K.; Shirzadi, A.; Shahabi, H.; Revhaug, I.; Prakash, I.; Bui, D.T. A comparative assessment of decision trees algorithms for flash flood susceptibility modeling at haraz watershed, northern iran. Sci. Total Environ. 2018, 627, 744–755. [Google Scholar] [CrossRef]
Nourani, V.; Komasi, M.; Mano, A. A multivariate ann-wavelet approach for rainfall–runoff modeling. Water Resour. Manag. 2009, 23, 2877. [Google Scholar] [CrossRef]
Wu, C.; Chau, K. Rainfall–runoff modeling using artificial neural network coupled with singular spectrum analysis. J. Hydrol. 2011, 399, 394–409. [Google Scholar] [CrossRef] [Green Version]
Bae, D.-H.; Jeong, D.M.; Kim, G. Monthly dam inflow forecasts using weather forecasting information and neuro-fuzzy technique. Hydrol. Sci. J. 2007, 52, 99–113. [Google Scholar] [CrossRef] [Green Version]
Bai, Y.; Chen, Z.; Xie, J.; Li, C. Daily reservoir inflow forecasting using multiscale deep feature learning with hybrid models. J. Hydrol. 2016, 532, 193–206. [Google Scholar] [CrossRef]
Noori, R.; Karbassi, A.; Moghaddamnia, A.; Han, D.; Zokaei-Ashtiani, M.; Farokhnia, A.; Gousheh, M.G. Assessment of input variables determination on the svm model performance using pca, gamma test, and forward selection techniques for monthly stream flow prediction. J. Hydrol. 2011, 401, 177–189. [Google Scholar] [CrossRef]
Yaseen, Z.M.; El-Shafie, A.; Jaafar, O.; Afan, H.A.; Sayl, K.N. Artificial intelligence based models for stream-flow forecasting: 2000–2015. J. Hydrol. 2015, 530, 829–844. [Google Scholar] [CrossRef]
Cigizoglu, H.K.; Kisi, Ö. Methods to improve the neural network performance in suspended sediment estimation. J. Hydrol. 2006, 317, 221–238. [Google Scholar] [CrossRef]
Kisi, O.; Shiri, J. River suspended sediment estimation by climatic variables implication: Comparative study among soft computing techniques. Comput. Geosci. 2012, 43, 73–82. [Google Scholar] [CrossRef]
Eslamian, S.; Abedi-Koupai, J.; Amiri, M.; Gohari, S. Estimation of daily reference evapotranspiration using support vector. Res. J. Environ. Sci. 2009, 3, 439–447. [Google Scholar]
Mehdizadeh, S. Estimation of daily reference evapotranspiration (eto) using artificial intelligence methods: Offering a new approach for lagged eto data-based modeling. J. Hydrol. 2018, 559, 794–812. [Google Scholar] [CrossRef]
Khosravi, K.; Shahabi, H.; Pham, B.T.; Adamowski, J.; Shirzadi, A.; Pradhan, B.; Dou, J.; Ly, H.-B.; Gróf, G.; Ho, H.L. A comparative assessment of flood susceptibility modeling using multi-criteria decision-making analysis and machine learning methods. J. Hydrol. 2019, 573, 311–323. [Google Scholar] [CrossRef]
Hong, H.; Tsangaratos, P.; Ilia, I.; Liu, J.; Zhu, A.-X.; Chen, W. Application of fuzzy weight of evidence and data mining techniques in construction of flood susceptibility map of poyang county, china. Sci. Total Environ. 2018, 625, 575–588. [Google Scholar] [CrossRef] [PubMed]
Tahan, M.H.; Asadi, S. Emdid: Evolutionary multi-objective discretization for imbalanced datasets. Inf. Sci. 2018, 432, 442–461. [Google Scholar] [CrossRef]
Tahan, M.H.; Asadi, S. Memod: A novel multivariate evolutionary multi-objective discretization. Soft Comput. 2018, 22, 301–323. [Google Scholar] [CrossRef]
Khosravi, K.; Cooper, J.R.; Daggupati, P.; Pham, B.T.; Bui, D.T. Bedload transport rate prediction: Application of novel hybrid data mining techniques. J. Hydrol. 2020, 124774. [Google Scholar] [CrossRef]
Khosravi, K.; Barzegar, R.; Miraki, S.; Adamowski, J.; Daggupati, P.; Alizadeh, M.R.; Pham, B.T.; Alami, M.T. Stochastic modeling of groundwater fluoride contamination: Introducing lazy learners. Groundwater 2019. [Google Scholar] [CrossRef]
Bui, D.T.; Khosravi, K.; Tiefenbacher, J.; Nguyen, H.; Kazakis, N. Improving prediction of water quality indices using novel hybrid machine-learning algorithms. Sci. Total Environ. 2020, 721, 137612. [Google Scholar] [CrossRef]
Bui, D.T.; Khosravi, K.; Karimi, M.; Busico, G.; Khozani, Z.S.; Nguyen, H.; Mastrocicco, M.; Tedesco, D.; Cuoco, E.; Kazakis, N. Enhancing nitrate and strontium concentration prediction in groundwater by using new data mining algorithm. Sci. Total Environ. 2020, 715, 136836. [Google Scholar] [CrossRef]
Salih, S.Q.; Sharafati, A.; Khosravi, K.; Faris, H.; Kisi, O.; Tao, H.; Ali, M.; Yaseen, Z.M. River suspended sediment load prediction based on river discharge information: Application of newly developed data mining models. Hydrol. Sci. J. 2019, 65, 624–637. [Google Scholar] [CrossRef]
Imani, S.; Niksokhan, M.H.; Jamshidi, S.; Abbaspour, K.C. Discharge permit market and farm management nexus: An approach for eutrophication control in small basins with low-income farmers. Environ. Monit. Assess. 2017, 189, 346. [Google Scholar] [CrossRef]
Imani, S.; Delavar, M.; Niksokhan, M.H. Identification of nutrients critical source areas with swat model under limited data condition. Water Resour. 2019, 46, 128–137. [Google Scholar] [CrossRef]
Gavili, S.; Javadi, S.; Banihabib, M.; Sanikhani, H. Comparison of intelligent models to predict water level fluctuations in zarivar lake considering groundwater level. Iran-Water Resour. Res. 2018, 14, 339–344. [Google Scholar]
Bahrami, S.; Wigand, E. Daily streamflow forecasting using nonlinear echo state network. Int. J. Adv. Res. Sci. Eng. Technol. 2018, 5, 3619–3625. [Google Scholar]
Hu, C.; Wan, F. Input Selection in Learning Systems: A Brief Review of Some Important Issues and Recent Developments. In Proceedings of the 2009 IEEE International Conference on Fuzzy Systems, Jeju Island, Korea, 20–24 August 2009; pp. 530–535. [Google Scholar]
Sharafati, A.; Khosravi, K.; Khosravinia, P.; Ahmed, K.; Salman, S.A.; Yaseen, Z.M.; Shahid, S. The potential of novel data mining models for global solar radiation prediction. Int. J. Environ. Sci. Technol. 2019, 16, 7147–7164. [Google Scholar] [CrossRef]
Ayele, G.T.; Teshale, E.Z.; Yu, B.; Rutherfurd, I.D.; Jeong, J. Streamflow and sediment yield prediction for watershed prioritization in the upper blue nile river basin, ethiopia. Water 2017, 9, 782. [Google Scholar] [CrossRef] [Green Version]
Taheri, K.; Shahabi, H.; Chapi, K.; Shirzadi, A.; Gutiérrez, F.; Khosravi, K. Sinkhole susceptibility mapping: A comparison between bayes-based machine learning algorithms. Land Degrad. Dev. 2019, 30, 730–745. [Google Scholar] [CrossRef]
Pham, B.T.; Prakash, I.; Khosravi, K.; Chapi, K.; Trinh, P.T.; Ngo, T.Q.; Hosseini, S.V.; Bui, D.T. A comparison of support vector machines and bayesian algorithms for landslide susceptibility modelling. Geocarto Int. 2019, 34, 1385–1407. [Google Scholar] [CrossRef]
Chen, W.; Hong, H.; Panahi, M.; Shahabi, H.; Wang, Y.; Shirzadi, A.; Pirasteh, S.; Alesheikh, A.A.; Khosravi, K.; Panahi, S. Spatial prediction of landslide susceptibility using gis-based data mining techniques of anfis with whale optimization algorithm (woa) and grey wolf optimizer (gwo). Appl. Sci. 2019, 9, 3755. [Google Scholar] [CrossRef] [Green Version]
Khosravi, K.; Daggupati, P.; Alami, M.T.; Awadh, S.M.; Ghareb, M.I.; Panahi, M.; Pham, B.T.; Rezaie, F.; Qi, C.; Yaseen, Z.M. Meteorological data mining and hybrid data-intelligence models for reference evaporation simulation: A case study in iraq. Comput. Electron. Agric. 2019, 167, 105041. [Google Scholar] [CrossRef]
Khozani, Z.S.; Khosravi, K.; Pham, B.T.; Kløve, B.; Mohtar, W.; Melini, W.H.; Yaseen, Z.M. Determination of compound channel apparent shear stress: Application of novel data mining models. J. Hydroinform. 2019, 21, 798–811. [Google Scholar] [CrossRef] [Green Version]
Quinlan, J.R. Combining Instance-Based and Model-Based Learning. In Proceedings of the Tenth International Conference on Machine Learning, Amherst, MA, USA, 27–29 June 1993; pp. 236–243. [Google Scholar]
Khosravi, K.; Mao, L.; Kisi, O.; Yaseen, Z.M.; Shahid, S. Quantifying hourly suspended sediment load using data mining models: Case study of a glacierized andean catchment in chile. J. Hydrol. 2018, 567, 165–179. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef] [Green Version]
Aldous, D.; Pitman, J. Inhomogeneous continuum random trees and the entrance boundary of the additive coalescent. Probab. Theory Relat. Fields 2000, 118, 455–482. [Google Scholar] [CrossRef] [Green Version]
LaValle, S.M. Rapidly-Exploring Random Trees: A New Tool for Path Planning; Citeseer: University Park, PA, USA, 1998. [Google Scholar]
Polo, J.M.; Liu, S.; Figueroa, M.E.; Kulalert, W.; Eminli, S.; Tan, K.Y.; Apostolou, E.; Stadtfeld, M.; Li, Y.; Shioda, T. Cell type of origin influences the molecular and functional properties of mouse induced pluripotent stem cells. Nat. Biotechnol. 2010, 28, 848–855. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Mohamed, W.N.H.W.; Salleh, M.N.M.; Omar, A.H. A Comparative Study of Reduced Error Pruning Method in Decision Tree Algorithms. In Proceedings of the 2012 IEEE International Conference on Control System, Computing and Engineering, Penang, Malaysia, 23–25 November 2012; pp. 392–397. [Google Scholar]
Taylor, K.E. Summarizing multiple aspects of model performance in a single diagram. J. Geophys. Res. Atmos. 2001, 106, 7183–7192. [Google Scholar] [CrossRef]
Houghton, J.T. The Scientific Basis; Contribution of Working Group I to the Third Assessment Report of the Intergovernmental Panel on Climate Change; Cambridge University Press: Cambridge, MA, USA, 2001. [Google Scholar]
Santhi, C.; Arnold, J.G.; Williams, J.R.; Dugas, W.A.; Srinivasan, R.; Hauck, L.M. Validation of the swat model on a large rwer basin with point and nonpoint sources 1. JAWRA J. Am. Water Resour. Assoc. 2001, 37, 1169–1188. [Google Scholar] [CrossRef]
Nash, J.E.; Sutcliffe, J.V. River flow forecasting through conceptual models part I—A discussion of principles. J. Hydrol. 1970, 10, 282–290. [Google Scholar] [CrossRef]
Moriasi, D.N.; Arnold, J.G.; Van Liew, M.W.; Bingner, R.L.; Harmel, R.D.; Veith, T.L. Model evaluation guidelines for systematic quantification of accuracy in watershed simulations. Trans. ASABE 2007, 50, 885–900. [Google Scholar] [CrossRef]
Gupta, H.V.; Sorooshian, S.; Yapo, P.O. Status of automatic calibration for hydrologic models: Comparison with multilevel expert calibration. J. Hydrol. Eng. 1999, 4, 135–143. [Google Scholar] [CrossRef]
Legates, D.R.; McCabe, G.J., Jr. Evaluating the use of “goodness-of-fit” measures in hydrologic and hydroclimatic model validation. Water Resour. Res. 1999, 35, 233–241. [Google Scholar] [CrossRef]
Behzad, M.; Asghari, K.; Coppola, E.A., Jr. Comparative study of svms and anns in aquifer water level prediction. J. Comput. Civ. Eng. 2010, 24, 408–413. [Google Scholar] [CrossRef]
Yoon, H.; Jun, S.-C.; Hyun, Y.; Bae, G.-O.; Lee, K.-K. A comparative study of artificial neural networks and support vector machines for predicting groundwater levels in a coastal aquifer. J. Hydrol. 2011, 396, 128–138. [Google Scholar] [CrossRef]
Kisi, O.; Shiri, J.; Karimi, S.; Shamshirband, S.; Motamedi, S.; Petković, D.; Hashim, R. A survey of water level fluctuation predicting in urmia lake using support vector machine with firefly algorithm. Appl. Math. Comput. 2015, 270, 731–743. [Google Scholar] [CrossRef]
Shiri, J.; Shamshirband, S.; Kisi, O.; Karimi, S.; Bateni, S.M.; Nezhad, S.H.H.; Hashemi, A. Prediction of water-level in the urmia lake using the extreme learning machine approach. Water Resour. Manag. 2016, 30, 5217–5229. [Google Scholar] [CrossRef]
Sahoo, S.; Russo, T.A.; Elliott, J.; Foster, I. Machine learning algorithms for modeling groundwater level changes in agricultural regions of the us. Water Resour. Res. 2017, 53, 3878–3895. [Google Scholar] [CrossRef]
Nhu, V.-H.; Rahmati, O.; Falah, F.; Shojaei, S.; Al-Ansari, N.; Shahabi, H.; Shirzadi, A.; Górski, K.; Nguyen, H.; Ahmad, B.B. Mapping of groundwater spring potential in karst aquifer system using novel ensemble bivariate and multivariate models. Water 2020, 12, 985. [Google Scholar] [CrossRef] [Green Version]
Chen, W.; Zhao, X.; Tsangaratos, P.; Shahabi, H.; Ilia, I.; Xue, W.; Wang, X.; Ahmad, B.B. Evaluating the usage of tree-based ensemble methods in groundwater spring potential mapping. J. Hydrol. 2020, 583, 124602. [Google Scholar] [CrossRef]
Chen, W.; Li, Y.; Tsangaratos, P.; Shahabi, H.; Ilia, I.; Xue, W.; Bian, H. Groundwater spring potential mapping using artificial intelligence approach based on kernel logistic regression, random forest, and alternating decision tree models. Appl. Sci. 2020, 10, 425. [Google Scholar] [CrossRef] [Green Version]
Balouchi, B.; Nikoo, M.R.; Adamowski, J. Development of expert systems for the prediction of scour depth under live-bed conditions at river confluences: Application of different types of anns and the m5p model tree. Appl. Soft Comput. 2015, 34, 51–59. [Google Scholar] [CrossRef]
Almasi, S.N.; Bagherpour, R.; Mikaeil, R.; Ozcelik, Y.; Kalhori, H. Predicting the building stone cutting rate based on rock properties and device pullback amperage in quarries using m5p model tree. Geotech. Geol. Eng. 2017, 35, 1311–1326. [Google Scholar] [CrossRef]
Sihag, P.; Karimi, S.M.; Angelaki, A. Random forest, m5p and regression analysis to estimate the field unsaturated hydraulic conductivity. Appl. Water Sci. 2019, 9, 129. [Google Scholar] [CrossRef] [Green Version]
Yi, H.-S.; Lee, B.; Park, S.; Kwak, K.-C.; An, K.-G. Short-term algal bloom prediction in juksan weir using m5p model-tree and extreme learning machine. Environ. Eng. Res. 2018. [Google Scholar] [CrossRef]
Onyari, E.K.; Ilunga, F. Application of mlp neural network and m5p model tree in predicting streamflow: A case study of luvuvhu catchment, south africa. Int. J. Innov. Manag. Technol. 2013, 4, 11. [Google Scholar]

Figure 1. Location of the study area [53].

Figure 2. The flow diagram of the methodology.

Figure 3. Line graphs of observed and predicted daily water levels of different models in the testing phase: (a) M5P; (b) RF; (c) RT; (d) REPT.

Figure 4. Scatter plots of observed vs. predicted water level (WL) s in the testing phase: (a) M5P; (b) RF; (c) RT; (d) REPT.

Figure 5. Taylor diagram of the models.

Figure 6. Boxplot of the models.

Table 1. The selected scenarios for input combinations.

No	Different Input Combinations	Output
1	WL(t-1)	WL(t)
2	WL(t-1), WL(t-2)	WL(t)
3	WL(t-1), WL(t-2) WL(t-3)	WL(t)
4	WL(t-1), WL(t-2), WL(t-3), WL(t-4)	WL(t)
5	WL(t-1), WL(t-2), WL(t-3), WL(t-4), WL(t-5)	WL(t)

Table 2. Comparison of model prediction power.

Models	R²	RMSE	MAE	NSE	PBIAS	PSR	Order
M5P	0.9874	0.05	0.01	0.98	0	0.11	1
RF	0.9697	0.09	0.02	0.96	0.001	0.17	2
RT	0.9654	0.09	0.03	0.96	0.001	0.19	3
REPT	0.965	0.1	0.033	0.95	0.002	0.2	4

Table 3. Pearson correlation coefficient (R) between input variables and bed load sediment transport rate.

Inputs Variables	WL(t-1)	WL(t-2)	WL(t-3)	WL(t-4)	WL(t-5)
Correlation coefficient (r)	0.981	0.964	0.946	0.928	0.925

Table 4. Best model input combinations during the training and test phases based on the correlation coefficient.

Input Combination No	M5P		RF		RT		REPT
	Train	Test	Train	Test	Train	Test	Train	Test
WL(t-1)	0.976	0.99	0.986	0.98	0.987	0.981	0.973	0.982
WL(t-1), WL(t-2)	0.976	0.993	0.991	0.978	0.994	0.961	0.973	0.977
WL(t-1), WL(t-2) WL(t-3)	0.979	0.993	0.991	0.979	0.994	0.961	0.972	0.982
WL(t-1), WL(t-2), WL(t-3), WL(t-4)	0.98	0.993	0.991	0.979	0.994	0.955	0.973	0.982
WL(t-1), WL(t-2), WL(t-3), WL(t-4), WL(t-5)	0.978	0.993	0.991	0.98	0.994	0.963	0.973	0.982

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nhu, V.-H.; Shahabi, H.; Nohani, E.; Shirzadi, A.; Al-Ansari, N.; Bahrami, S.; Miraki, S.; Geertsema, M.; Nguyen, H. Daily Water Level Prediction of Zrebar Lake (Iran): A Comparison between M5P, Random Forest, Random Tree and Reduced Error Pruning Trees Algorithms. ISPRS Int. J. Geo-Inf. 2020, 9, 479. https://doi.org/10.3390/ijgi9080479

AMA Style

Nhu V-H, Shahabi H, Nohani E, Shirzadi A, Al-Ansari N, Bahrami S, Miraki S, Geertsema M, Nguyen H. Daily Water Level Prediction of Zrebar Lake (Iran): A Comparison between M5P, Random Forest, Random Tree and Reduced Error Pruning Trees Algorithms. ISPRS International Journal of Geo-Information. 2020; 9(8):479. https://doi.org/10.3390/ijgi9080479

Chicago/Turabian Style

Nhu, Viet-Ha, Himan Shahabi, Ebrahim Nohani, Ataollah Shirzadi, Nadhir Al-Ansari, Sepideh Bahrami, Shaghayegh Miraki, Marten Geertsema, and Hoang Nguyen. 2020. "Daily Water Level Prediction of Zrebar Lake (Iran): A Comparison between M5P, Random Forest, Random Tree and Reduced Error Pruning Trees Algorithms" ISPRS International Journal of Geo-Information 9, no. 8: 479. https://doi.org/10.3390/ijgi9080479

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Daily Water Level Prediction of Zrebar Lake (Iran): A Comparison between M5P, Random Forest, Random Tree and Reduced Error Pruning Trees Algorithms

Abstract

1. Introduction

2. Study Area

3. Materials and Methods

3.1. Data Assemblage and Preparation

3.2. Methodology

3.2.1. M5P

3.2.2. Random Forest (RF)

3.2.3. Random Tree (RT)

3.2.4. Reduced Error Pruning Tree (REPT)

3.2.5. Model Evaluation and Comparison

4. Results and Analysis

5. Discussion

6. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI