Prediction of droughts over Pakistan using machine learning algorithms
Introduction
Droughts are evolutionary phenomena which generally begin with a deficit in precipitation and can result in enormous socioeconomic losses (Khan et al., 2019b; Shahid and Behrawan, 2008; Xu et al., 2018). As droughts have a direct relationship with the availability of water, their changing characteristics due to changing climate will have a profound impact on water stress and food security (Adnan et al., 2018). Droughts and their possible occurrence in all geographical areas emphasize their disastrous nature among all natural hazards (Hao et al., 2018; Shahid, 2011). In the recent past, many droughts have occurred in different parts of the world. For example, the East African drought (2010–2011), the drought in Texas (2012), the United States Central Great Plains drought (2012), and the Californian drought in the USA (2012−2015) (Hao et al., 2018), the Millennium drought in Australia (1997–2010) (van Dijk et al., 2013), the Sahelian drought (2012), and the Balochistan drought in Pakistan (1997–2003) (Ahmed et al., 2016; Durrani et al., 2018). These major droughts have resulted in immense losses to agriculture devastatingly impacting the water supply and production of crops (Dutra et al., 2013; Hoerling et al., 2014; Nielsen-Gammon, 2011; Peterson et al., 2013). These impacts are more profound in Pakistan which is one of the most vulnerable countries to droughts due to its large dependency on agriculture. In Pakistan, about 43% of the country's labour force is employed in the agriculture sector, and agriculture contributes to about 21% of the gross domestic product (GDP) of the country (Ullah et al., 2018). Furthermore, factors such as lack of drought preparedness (Ahmad et al., 2019), and the high risk posed by the recent climate change have made Pakistan more susceptible to droughts (Adnan et al., 2018; Ahmed et al., 2018). The increase in temperature and changing patterns of climate will likely to continue (Khan et al., 2018a), causing droughts more intense and prolonged in the future. Therefore, drought prediction is an important exercise for early warning and preparing the most vulnerable communities to impacts of droughts.
The prediction of droughts has remained challenging to climatologists and hydrologists due to the complexities in their origins and spatiotemporal scales at which they occur (Hao et al., 2018). In general, statistical, dynamical, and hybrid models are used in the prediction of droughts (Mariotti et al., 2013; Mishra and Singh, 2011; Pozzi et al., 2013; Shahid, 2010). In the statistical prediction models, empirical relationships between climate variables and drought indicators derived from observations are used in predicting droughts (Yaseen et al., 2015). Unlike statistical models, dynamical models rely on the physical interactions between the land, ocean, and atmosphere. In dynamical models, these interactions are represented mathematically and resolved to produce simulations/predictions of droughts (Turco et al., 2017). On the other hand, a hybrid model is a combination of both statistical and dynamical models (Murakami et al., 2016; Strazzo et al., 2019). For example, the predictions of multiple dynamical models can be combined using a statistical framework which assigns weights to predictions of different dynamical models to derive an ensemble prediction (Madadgar et al., 2016). Owing to their simplicity and low computational requirements statistical models are being extensively used for predicting droughts (Belayneh et al., 2016; Ganguli and Reddy, 2014; Mariotti et al., 2013; Xu et al., 2018).
Machine learning (ML) algorithms are a set of commands that provide systems with the ability to automatically learn and improve from past data without being extensively programmed (Lantz, 2013; Sachindra and Kanae, 2019). Various ML algorithms are being used to develop models that can mimic linear and non-linear interactions between predictors and predictands in different hydroclimatic applications such as; rainfall prediction (Parmar et al., 2017) rainfall-runoff modelling (Yaseen et al., 2015), temperature and heat wave prediction (Khan et al., 2019c) and drought prediction (Tian et al., 2018). ML algorithms such as k-Nearest Neighbours (KNN), Artificial Neural Network (ANN), Extreme Learning Machine (ELM), Random Forests (RF), Support Vector Machine (SVM), Relevance Vector Machine (RVM) and Genetic Programming (GP) are widely used in modelling complex interactions between various predictors and predictands (Bourdin et al., 2012; Deo and Şahin, 2015; Fahimi et al., 2017; Nourani et al., 2014; Rhee and Im, 2017; Wang et al., 2009; Yaseen et al., 2015). The complex nonlinear interactions among drought indicators (e.g. standardised precipitation index) and the predictors in predicting droughts have been modelled using many ML techniques such as ANN (Mishra and Desai, 2006; Mishra et al., 2007; Morid et al., 2007), SVM (Ganguli and Reddy, 2014), RF, Regressions Trees (Feng et al., 2019; Granata, 2019) and ELM (Mouatadid et al., 2018).
Among the ML techniques, ANN and SVM can be regarded as the most widely used techniques for developing drought prediction models (Barua et al., 2012; Ghimire et al., 2019; Mishra and Desai, 2006; Mishra et al., 2007; Nagelkerke, 1991; Radhika and Shashi, 2009; Santos et al., 2014; Tripathi et al., 2006; Xiang et al., 2019; Yang et al., 2015). It has been observed that ANN-based models suffer from limitations such as overfitting in calibration and underfitting in validation and trapping at local minima. Similar to ANN-based models, SVM-based models are also vulnerable to overfitting in calibration and underfitting in validation. In SVM inputs are mapped into a higher dimensional space where the originally nonlinear relationship between the predictors and the predictand becomes a linear one. Unlike ANN, the mapping is achieved using a kernel function (Bourdin et al., 2012). SVM is able to learn from a much smaller data set and is capable of handling a large number of variables. SVM may overcome certain limitations of ANN such as trapping at local minima and overfitting, to some extent in predicting droughts (Ganguli and Reddy, 2014; Shawe-Taylor and Cristianini, 2004). These ML algorithms have also been applied successfully in predicting various other hydrological processes such as runoff modelling (Ghorbani et al., 2016; Liong and Sivapragasam, 2002), extreme rainfall forecasting (Hadi Pour et al., 2019) and groundwater level prediction (Salem et al., 2018) with a good degree of accuracy.
The selection and identification of predictors in the development of drought prediction models are important steps for accurate prediction of droughts. No previous studies have been conducted on the identification of predictors and prediction of droughts over Pakistan. However, the identification of large scale atmospheric phenomena influential on the catchment scale climate (e.g. heatwaves) over Pakistan have been studied in the past (del Río et al., 2013; Khan et al., 2019c; Latif and Syed, 2015). Droughts have caused severe damages to the economy and triggered prolonged famine and water shortages across Pakistan in the past (Ahmed et al., 2018). Therefore, there is a need to predict droughts accurately for early warning, preparedness and mitigation in order to minimize their adverse impacts. Droughts are influenced by many factors, hence its prediction often requires a suite of atmospheric, hydrologic, and oceanic predictors (Mariotti et al., 2013; Xu et al., 2018). The relationships between the predictors and the drought indicators are mostly non-linear (Khan et al., 2019c; Mishra and Singh, 2011). Therefore, for the development of the drought prediction models, the most influential predictors on droughts should be carefully identified considering the non-linearity in the predictor-predictand relationships. In the majority of the drought prediction studies, the most influential predictors were extracted using correlation analysis or composite analysis (which determines the basic characteristics of the predictors by calculating the composite means, standard deviations and statistical significances) (Hao et al., 2018).
The main objective of this study was to develop ML-based models for predicting moderate, severe and extreme droughts over Pakistan based on the Standardised Precipitation and Evapotranspiration Index (SPEI). The study uses SPEI as the drought index due to its incorporation of both the precipitation and potential evapotranspiration thus better representation of drought phenomena than other indices (Ahmed et al., 2018; Zhao et al., 2017). The details of SPEI and its applications can be found in Beguería et al. (2014), Vicente-Serrano et al. (2010a) and Vicente-Serrano et al. (2010b). As mentioned earlier, ANNs tend to trap at a local minimum unlike SVM. SVM on the other hand is rapidly becoming a popular technique in developing drought prediction models (Chiang and Tsai, 2012; Ganguli and Reddy, 2014; Liang et al., 2011). Although KNN has been used for predicting various hydroclimatic variables it has not yet been used in the development of drought prediction models. Thus, a performance comparison between drought prediction models developed with ANN, SVM and KNN will provide a unique opportunity to identify their pros and cons. Thus, in this study, the drought prediction models were developed using ANN, SVM and KNN. In addition to that, in this study, an ML-based predictor selection method called SVM-recursive feature elimination (SVM-RFE) was used for the first time to select sets of predictors for drought prediction models. SVM-RFE was used due to its proven ability to eliminate less important predictors effectively (Chen et al., 2018). Also, this is the first study which investigated the development of Pakistan-wide drought forecasting models. Prediction of droughts will assist in the formulation of better water resources management practices much needed for the heavily agriculture-dependent communities in Pakistan.
Section snippets
Study area and datasets
Pakistan (latitudes 23°30ˈN–37°30ˈN and longitudes 61°E–78°E), located in South Asia covers an area of 796,095 km2. Due to its location in the northern hemisphere the country experiences four seasons defined based on temperature; cool and dry winter (Dec–Feb), hot and dry spring (Mar–May), hot and humid summer (Jun–Aug) and dry autumn (Sep–Nov) (Khan et al., 2019d, 2019e). Also, Pakistan experiences two monsoon precipitation seasons; the Indian monsoon (July–Sep) and the western disturbance
Methodology
The procedure used in developing drought prediction models is outlined as follows.
- 1.
Using PGF temperature and precipitation datasets 1, 2, 3, 4, 5, and 6-month SPEI values (1–6-month SPEIs) were calculated for each of the 1437 PGF grid points (531 grid points for Rabi season and 906 grid points for Kharif season) spread across Pakistan for the period 1948–2016.
- 2.
Based on 1–6-month SPEIs droughts were categorized into moderate, severe and extreme for Rabi and Kharif cropping seasons.
- 3.
Different
Determination of droughts
The droughts were categorized under three different severity levels (i.e. moderate, severe and extreme) and the percentages of area of Pakistan affected by these categories of droughts were calculated based on 6-month SPEI over the period 1948–2016 and shown in Fig. 3 for Rabi and Kharif seasons. As seen in Fig. 3, compared to the Rabi season in the Kharif season the droughts had affected more area (around 40%) during the period 1948–1964. Afterwards, the spatial extent of droughts has reduced
Discussion and conclusions
KNN is one of the oldest ML algorithms (Cover and Hart, 1967). KNN has been used in developing various statistical prediction and forecasting models, but yet to be used in developing drought models. This encouraged the authors to compare the performance of KNN-based drought models with the drought models developed with the latest ML algorithms. In this study, it was seen that KNN-based drought models display limited performance in comparison to that of SVM and ANN-based drought models in
Author's contribution
Najeebullah Khan, D.A. Sachindra and Shamsuddin Shahid designed the manuscript. All the authors including Kamal Ahmed and Mohammed Sanusi Shiru wrote the manuscript. The analysis was done by Najeebullah Khan and Shamsuddin Shahid introduced the methodology and conducted the analysis. D.A. Sachindra and Nadeem Nawaz helped in preparation of different figures and visualization of the manuscript. All the author also contributed to improve the manuscript during the revision.
Declaration of Competing Interest
The author declare no conflict of interest.
Acknowledgement
None.
Reference (109)
- et al.
Impacts of climate variability and change on seasonal drought characteristics of Pakistan
Atmos. Res.
(2018) - et al.
Fidelity assessment of general circulation model simulated precipitation and temperature over Pakistan using a feature selection method
J. Hydrol. (Amst)
(2019) - et al.
Coupling machine learning methods with wavelet transforms and the bootstrap and boosting ensemble approaches for drought prediction
Atmos. Res.
(2016) - et al.
Application of the extreme learning machine algorithm for the prediction of monthly effective drought index in eastern Australia
Atmos. Res.
(2015) - et al.
Machine learning-based integration of remotely-sensed drought factors can improve the estimation of agricultural drought in South-Eastern Australia
Agric. Syst.
(2019) - et al.
Global solar radiation prediction by ANN integrated with European Centre for medium range weather forecast fields in solar rich cites of queensland Australia
J. Clean. Prod.
(2019) Back-propagation neural networks for modeling complex systems
Artif. Intell. Eng.
(1995)Evapotranspiration evaluation models based on machine learning algorithms—a comparative study
Agric. Water Manage.
(2019)Prediction of heat waves in Pakistan using quantile regression forests
Atmos. Res.
(2019)Selection of GCMs for the projection of spatial distribution of heat waves in Pakistan
Atmos. Res.
(2020)
Drought forecasting using feed-forward recursive neural network
Ecol. Modell.
Drought modeling–a review
J. Hydrol. (Amst)
Input selection and data-driven model performance optimization to predict the standardized precipitation and evaporation index in a drought-prone region
Atmos. Res.
Meteorological drought forecasting for ungauged areas based on machine learning: using long-range climate forecast and remote sensing data
Agric. For. Meteorol.
Statistical downscaling of precipitation using machine learning techniques
Atmospheric Research
Agricultural drought prediction using climate indices based on support vector regression in Xiangjiang River basin
Sci. Total Environ.
ABayesian regularized artificial neural network for stock market forecasting
Expert Syst. Appl.
Downscaling of precipitation for climate change scenarios: a support vector machine approach
J. Hydrol. (Amst)
A comparison of performance of several artificial intelligence methods for forecasting monthly discharge time series
J. Hydrol. (Amst)
Comparison of various drought indices to monitor drought status in Pakistan
Clim. Dyn.
Drought Mitigation in Pakistan: Current Status and Options for Future Strategies
Characterization of seasonal droughts in Balochistan Province, Pakistan
Stoch. Environ. Res. Risk Assess.
Future predictions of precipitation and temperature in Iraq using the statistical downscaling model
Arab. J. Geosci.
Role of predictors in downscaling surface temperature to river basin in India for IPCC SRES scenarios using support vector machine
Int. J. Climatol.
Artificial neural network–based drought forecasting using a nonlinear aggregated drought index
J. Hydrol. Eng.
Statistical downscaling of multi‐site daily rainfall in a South Australian catchment using a generalized linear model
Int. J. Climatol.
Standardized precipitation evapotranspiration index (SPEI) revisited: parameter fitting, evapotranspiration models, tools, datasets and drought monitoring
Int. J. Climatol.
Streamflow modelling: a primer on applications, approaches and challenges
Atmos. Ocean.
Bayesian regularization of neural networks
Artificial Neural Networks
Decision variants for the automatic determination of optimal feature subset in RF-RFE
Genes (Basel)
Reservoir drought prediction using support vector machines
Applied Mechanics and Materials
Support-vector networks
Mach. Learn.
Nearest neighbor pattern classification
IEEE Trans. Inf. Theory
Recent mean temperature trends in Pakistan and links with teleconnection patterns
Int. J. Climatol.
Analysis and prediction of a catastrophic Indian coastal heat wave of 2015
Nat. Hazards
Historical and future climatological drought projections over Quetta Valley, Balochistan, Pakistan
The 2010–2011 drought in the Horn of Africa in ECMWF reanalysis and seasonal forecast products
Int. J. Climatol.
Application of soft computing based hybrid models in hydrological variables modeling: a comprehensive review
Theor. Appl. Climatol.
Predicting chaotic time series
Phys. Rev. Lett.
Invariant error metrics for image reconstruction
Appl. Opt.
Gauss-Newton approximation to Bayesian learning
Drought forecasting: a review of modelling approaches 2007–2017
J. Water Clim. Change
Ensemble prediction of regional droughts using climate inputs and the SVM–copula approach
Hydrol. Process.
Are peak summer sultry heat wave days over the Yangtze–Huaihe river basin predictable?
J. Clim.
Variability and predictability of Northeast China climate during 1948–2012
Clim. Dyn.
Interpreting neural-network connection weights
AI Expert
Reliability of reanalyses products in simulating precipitation and temperature characteristics over India
J. Earth Syst. Sci.
Modeling river discharge time series using support vector machine and artificial neural networks
Environ. Earth Sci.
Seasonal drought prediction: advances, challenges, and future prospects
Rev. Geophys.
Cited by (145)
Multi-criteria evaluation of CMIP6 precipitation and temperature simulations over Iran
2024, Journal of Hydrology: Regional StudiesAn enhanced drought forecasting in coastal arid regions using deep learning approach with evaporation index
2024, Environmental ResearchShort-term drought Index forecasting for hot and semi-humid climate Regions: A novel empirical Fourier decomposition-based ensemble Deep-Random vector functional link strategy
2024, Computers and Electronics in AgriculturePrediction of agricultural drought index in a hot and dry climate using advanced hybrid machine learning
2024, Ain Shams Engineering Journal