Skip to main content
Log in

Ensemble Boosting and Bagging Based Machine Learning Models for Groundwater Potential Prediction

  • Published:
Water Resources Management Aims and scope Submit manuscript

Abstract

Due to the rapidly increasing demand for groundwater, as one of the principal freshwater resources, there is an urge to advance novel prediction systems to more accurately estimate the groundwater potential for an informed groundwater resource management. Ensemble machine learning methods are generally reported to produce more accurate results. However, proposing the novel ensemble models along with running comparative studies for performance evaluation of these models would be equally essential to precisely identify the suitable methods. Thus, the current study is designed to provide knowledge on the performance of the four ensemble models i.e., Boosted generalized additive model (GamBoost), adaptive Boosting classification trees (AdaBoost), Bagged classification and regression trees (Bagged CART), and random forest (RF). To build the models, 339 groundwater resources’ locations and the spatial groundwater potential conditioning factors were used. Thereafter, the recursive feature elimination (RFE) method was applied to identify the key features. The RFE specified that the best number of features for groundwater potential modeling was 12 variables among 15 (with a mean Accuracy of about 0.84). The modeling results indicated that the Bagging models (i.e., RF and Bagged CART) had a higher performance than the Boosting models (i.e., AdaBoost and GamBoost). Overall, the RF model outperformed the other models (with accuracy = 0.86, Kappa = 0.67, Precision = 0.85, and Recall = 0.91). Also, the topographic position index’s predictive variables, valley depth, drainage density, elevation, and distance from stream had the highest contribution in the modeling process. Groundwater potential maps predicted in this study can help water resources managers and policymakers in the fields of watershed and aquifer management to preserve an optimal exploit from this important freshwater.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data Availability

Not applicable.

References

  • Agarwal R, Garg PK (2016) Remote sensing and GIS based groundwater potential & recharge zones mapping using multi-criteria decision making technique. Water Resour Manag 30:243–260

    Article  Google Scholar 

  • Al-Abadi AM, Shahid S (2015) A comparison between index of entropy and catastrophe theory methods for mapping groundwater potential in an arid region. Environ Monit Assess 187(9):576

    Article  Google Scholar 

  • Alotaibi NN, Sasi S (2016). Tree-based ensemble models for predicting the ICU transfer of stroke in-patients. In 2016 International Conference on Data Science and Engineering (ICDSE). IEEE, Piscataway, pp 1–6

  • Aniya M (1985) Landslide-susceptibility mapping in the Amahata river basin, Japan. Ann Assoc Am Geogr 75(1):102–114

    Article  Google Scholar 

  • Ashraf MAM, Yusoh R, Sazalil MA, Abidin MHZ (2018) Aquifer Characterization and groundwater potential evaluation in sedimentary rock formation. In Journal of Physics: Conference Series, vol 995, No. 1. IOP Publishing, Bristol, p 012106

  • Beucher A, Møller AB, Greve MH (2017) Artificial neural networks and decision tree classification for predicting soil drainage classes in Denmark. Geoderma 320:30–42

    Google Scholar 

  • Breiman L (1996) Bagging predictors. Mach Learn 24:123–40

    Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45:5–32

    Article  Google Scholar 

  • Chatterjee S, Hadi AS, Price B (2000) Regression analysis by example (3rd ed.). Wiley, Hoboken. ISBN 978-0-471-31946-7

  • Chen W, Yeo CK, Lau CT, Lee BS (2015) Real-time twitter content polluter detection based on direct features. In 2015 2nd International Conference on Information Science and Security (ICISS). IEEE, Piscataway, pp 1–4

  • Chen W, Li H, Hou E, Wang S, Wang G, Panahi M, Li T, Peng T, Guo C, Niu C, Xiao L, Wang J, Xie X, Ahmad BB (2018) GIS-based groundwater potential analysis using novel ensemble weights-of-evidence with logistic regression and functional tree models. Sci Total Environ 634:853–67

    Article  Google Scholar 

  • Chowdhury A, Jha MK, Chowdary VM (2010) Delineation of groundwater recharge zones and identification of artificial recharge sites in West Medinipur district, West Bengal, using RS, GIS and MCDM techniques. Environ Earth Sci 59(6):1209

    Article  Google Scholar 

  • Conrad O, Olaya V (2012) SAGA-GIS module library documentation (v2. 2.3). Module Valley Depth. Available online: http://www.sagagis.org/saga_tool_doc/2.2.3/index.html

  • Das S (2019) Comparison among influencing factor, frequency ratio, and analytical hierarchy process techniques for groundwater potential zonation in Vaitarna basin, Maharashtra, India. Groundw Sustain Dev 8:617–29

    Article  Google Scholar 

  • Decker K, Heinrich M, Klein P, Kociu A, Lipiarski P, Pirkl H, Rank D, Wimmer H (1998) Karst springs, groundwater and surface runoff in the calcareous Alps: assessing quality and reliance of long-term water supply. IAHS Publ Ser Proc Rep Intern Assoc Hydrol Sci 248:149–156

    Google Scholar 

  • Duan H, Deng Z, Deng F, Wang D (2016) Assessment of groundwater potential based on multicriteria decision making model and decision tree algorithms. Math Probl Eng. https://doi.org/10.1155/2016/2064575

  • Feng C, Cui M, Hodge BM, Zhang J (2017) A data-driven multi-model methodology with deep feature selection for short-term wind forecasting. Appl Energy 190:1245–1257

    Article  Google Scholar 

  • Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55:119–139

    Article  Google Scholar 

  • Gebre T, Ahmad I, Dar MA, Gadissa E, Teka AH, Tolosa AT, Brhane ES (2018) Mapping of groundwater potential zones using remote sensing and geographic information system: A case study of parts of Tigray, Ethiopia. Environ Geosci 25:133–40

    Article  Google Scholar 

  • Gnanachandrasamy G, Zhou Y, Bagyaraj M, Venkatramanan S, Ramkumar T, Wang S (2018) Remote sensing and GIS based groundwater potential zone mapping in Ariyalur District, Tamil Nadu. J Geol Soc India 92:484–490

    Article  Google Scholar 

  • Hassan ZU, Kanth TA, Malik MI (2018) Groundwater potential zonation and prioritization of wular catchment of Kashmir using GIS based multi-criteria evaluation approach. Water Energy Int 60RNI:49–61

    Google Scholar 

  • Hastie TJ, Tibshirani RJ (2017) Generalized additive models. CRC Press, Boca Raton

    Book  Google Scholar 

  • Ho TK (1995) Random decision forests C3 - Proceedings of the International Conference on Document Analysis and Recognition, ICDAR. IEEE Computer Society, Washington, D.C., pp 278–82

  • Hofner B, Mayr A, Schmid M (2016) GamboostLSS: An R package for model building and variable selection in the GAMLSS framework. J Stat Softw 74(1):1–31

    Article  Google Scholar 

  • Johnson LE, Olsen BG (1998) Assessment of quantitative precipitation forecasts. Weather Forecast 13(1):75–83

    Article  Google Scholar 

  • Kalantar B, Pradhan B, Naghibi SA, Motevalli A, Mansor S (2018) Assessment of the effects of training data selection on the landslide susceptibility mapping: a comparison between support vector machine (SVM), logistic regression (LR) and artificial neural networks (ANN). Geomatics Nat Hazards Risk 9(1):49–69

    Article  Google Scholar 

  • Kordestani MD, Naghibi SA, Hashemi H, Ahmadi K, Kalantar B, Pradhan B (2019) Groundwater potential mapping using a novel data-mining ensemble model. Hydrogeol J 27:211–224

    Article  Google Scholar 

  • Kuhn M (2015) Caret: classification and regression training. Astrophysics Source Code Library. http://adsabs.harvard.edu/abs/2015ascl.soft05003K

  • Kuhn M, Johnson K (2013) Applied predictive modeling, vol 26. Springer, New York

    Book  Google Scholar 

  • Lee S, Hong SM, Jung HS (2018) GIS-based groundwater potential mapping using artificial neural network and support vector machine models: the case of Boryeong city in Korea. Geocarto Int 33(8):847–861

    Article  Google Scholar 

  • Lemmens A, Croux C (2006) Bagging and boosting classification trees to predict churn. J Mark Res 43(2):276–286

    Article  Google Scholar 

  • Liaw A, Wiener M (2002) Classification and regression by randomForest. R News 2(3):18–22

    Google Scholar 

  • Manap AM, Sulaiman WN, Ramli MF, Pradhan B, Surip N (2013) A knowledge-driven GIS modeling technique for groundwater potential mapping at the Upper Langat Basin, Malaysia. Arab J Geosci 6(5):1621–1637

  • Mayr A, Fenske N, Hofner B, Kneib T, Schmid M (2012) Generalized additive models for location, scale and shape for high dimensional data-a flexible approach based on boosting. J R Stat Soc Ser C Appl Stat 61:403–27

    Article  Google Scholar 

  • Miraki S, Zanganeh SH, Chapi K, Singh VP, Shirzadi A, Shahabi H, Pham BT (2019) Mapping groundwater potential using a novel hybrid intelligence approach. Water Resour Manag 33(1):281–302

    Article  Google Scholar 

  • Monserud RA, Leemans R (1992) Comparing global vegetation maps with the Kappa statistic. Ecol Model 62(4):275–293

  • Motevalli A, Naghibi SA, Hashemi H, Berndtsson R, Pradhan B, Gholami V (2019) Inverse method using boosted regression tree and k-nearest neighbor to quantify effects of point and non-point source nitrate pollution in groundwater. J Clean Prod 228:1248–1263

    Article  Google Scholar 

  • Murphree DH, Arabmakki E, Ngufor C, Storlie CB, McCoy RG (2018) Stacked classifiers for individualized prediction of glycemic control following initiation of metformin therapy in type 2 diabetes. Comput Biol Med 103:109–115

    Article  Google Scholar 

  • Naghibi SA, Dolatkordestani M, Rezaei A, Amouzegari P, Heravi MT, Kalantar B, Pradhan B (2019) Application of rotation forest with decision trees as base classifier and a novel ensemble model in spatial modeling of groundwater potential. Environ Monit Assess 191(4):248

    Article  Google Scholar 

  • Nampak H, Pradhan B, Manap MA (2014) Application of GIS based data driven evidential belief function model to predict groundwater potential zonation. J Hydrol 513:283–300

    Article  Google Scholar 

  • Prasad RK, Mondal NC, Banerjee P, Nandakumar MV, Singh VS (2008) Deciphering potential groundwater zone in hard rock through the application of GIS. Environ Geol 55(3):467–475

    Article  Google Scholar 

  • Quinlan JR (1996) Bagging, boosting, and C4. 5. AAAI/IAAI 1:725–730

  • Sachdeva S, Kumar B (2020) A comparative study between frequency ratio model and gradient boosted decision trees with greedy dimensionality reduction in groundwater potential assessment. Water Resour Manag. https://doi.org/10.1007/s11269-020-02677-3

  • Sameen MI, Pradhan B, Lee S (2019) Self-learning random forests model for mapping groundwater yield in data-scarce areas. Nat Resour Res 28:757–775

    Article  Google Scholar 

  • Sandman A, Isaeus M, Bergström U, Kautsky H (2008) Spatial predictions of Baltic phytobenthic communities: Measuring robustness of generalized additive models based on transect data. J Mar Syst 74:S86–S96

    Article  Google Scholar 

  • Sidle RC, Ochiai H (2006) Landslides: Processes, prediction, and land use. Water Resources Monogr 18. American Geophysical Union, Washington, D.C

  • Songara JC, Joshipura NM, Mehmood K, Prakash I (2015a) Assessment and management of watershed of Machhu Dam III, Morbi, Gujarat using geoinformatics technology. Int J Adv Eng Res Dev

  • Songara JC, Kadivar HT, Joshipura NM, Prakash I (2015b) Estimation of surface runoff of Machhu Dam III Chatchment Area, Morbi, Gujarat, India, using curve number method and GIS. Int J Sci Res Dev 3(3):2038–2043

    Google Scholar 

  • Stanski HR, Wilson LJ, Burrows WR (1989) Survey of common verification methods in meteorology. World Weather Watch Technical Report No. 8, TD No. 358, World Meteorological Organization, Geneva, 114 pp

  • Thuiller W, Lafourcade B (2009) BIOMOD: species/climate modelling functions. R Package Version 1.1-3/r118

  • Wang S, Chen S (2019) Insights to fracture stimulation design in unconventional reservoirs based on machine learning modeling. J Petrol Sci Eng 174:682–695

    Article  Google Scholar 

Download references

Acknowledgements

We thank the support of the Alexander von Humboldt Foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adrienn A. Dineva.

Ethics declarations

Conflicts of interest/Competing interests

Not applicable. 

Code Availability

Not applicable.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mosavi, A., Sajedi Hosseini, F., Choubin, B. et al. Ensemble Boosting and Bagging Based Machine Learning Models for Groundwater Potential Prediction. Water Resour Manage 35, 23–37 (2021). https://doi.org/10.1007/s11269-020-02704-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11269-020-02704-3

Keywords

Navigation