Prediction of droughts over Pakistan using machine learning algorithms

https://doi.org/10.1016/j.advwatres.2020.103562Get rights and content

Highlights

  • For the first time drought prediction models were developed for Pakistan.

  • Support Vector Machine better captured spatiotemporal characteristics of droughts.

  • k-Nearest Neighbour showed limited ability in predicting characteristics of droughts.

  • Relative humidity, temperature and wind speed are indicators of droughts in Pakistan.

Abstract

Climate change has increased frequency, severity and areal extent of droughts across the world in the last few decades magnifying their adverse impacts. Prediction of droughts is immensely helpful in early warning and preparing the most vulnerable communities to their adverse impacts. For the first time, this study investigated the potential of developing drought prediction models over Pakistan using three state-of-the-art Machine Learning (ML) techniques; Support Vector Machine (SVM), Artificial Neural Network (ANN) and k-Nearest Neighbour (KNN). Three categories of droughts; moderate, severe, and extreme considering two major cropping seasons called Rabi and Kharif were estimated using Standardized Precipitation Evaporation Index (SPEI) and then predicted using the predictor data obtained from the National Centres for Environmental Prediction/National Centre for Atmospheric Research (NCEP/NCAR) reanalysis database. Also, for the first time in drought modelling, a novel feature selection approach called Recursive Feature Elimination (RFE) was used for identifying optimum sets of predictors. In validation, SVM-based models were able to better capture the temporal and spatial characteristics of droughts over Pakistan compared to those by ANN and KNN-based models. KNN which was used in developing drought models for the first time displayed limited performance in comparison to that by SVM and ANN-based drought models, in validation. It was found that in the Rabi season SPEI is positively correlated with relative humidity over the Mediterranean Sea and the region north of the Caspian Sea. In the Kharif season, SPEI is positively correlated with the humid region over the south-eastern part of the Bay of Bengal and the regions north of the Mediterranean and Caspian Seas. In developing a drought prediction model for Pakistan, relative humidity, temperature and wind speed should be considered with a domain which encompasses the Mediterranean Sea, the region north of the Caspian Sea, the Indian Ocean and the Arabian Sea.

Introduction

Droughts are evolutionary phenomena which generally begin with a deficit in precipitation and can result in enormous socioeconomic losses (Khan et al., 2019b; Shahid and Behrawan, 2008; Xu et al., 2018). As droughts have a direct relationship with the availability of water, their changing characteristics due to changing climate will have a profound impact on water stress and food security (Adnan et al., 2018). Droughts and their possible occurrence in all geographical areas emphasize their disastrous nature among all natural hazards (Hao et al., 2018; Shahid, 2011). In the recent past, many droughts have occurred in different parts of the world. For example, the East African drought (2010–2011), the drought in Texas (2012), the United States Central Great Plains drought (2012), and the Californian drought in the USA (2012−2015) (Hao et al., 2018), the Millennium drought in Australia (1997–2010) (van Dijk et al., 2013), the Sahelian drought (2012), and the Balochistan drought in Pakistan (1997–2003) (Ahmed et al., 2016; Durrani et al., 2018). These major droughts have resulted in immense losses to agriculture devastatingly impacting the water supply and production of crops (Dutra et al., 2013; Hoerling et al., 2014; Nielsen-Gammon, 2011; Peterson et al., 2013). These impacts are more profound in Pakistan which is one of the most vulnerable countries to droughts due to its large dependency on agriculture. In Pakistan, about 43% of the country's labour force is employed in the agriculture sector, and agriculture contributes to about 21% of the gross domestic product (GDP) of the country (Ullah et al., 2018). Furthermore, factors such as lack of drought preparedness (Ahmad et al., 2019), and the high risk posed by the recent climate change have made Pakistan more susceptible to droughts (Adnan et al., 2018; Ahmed et al., 2018). The increase in temperature and changing patterns of climate will likely to continue (Khan et al., 2018a), causing droughts more intense and prolonged in the future. Therefore, drought prediction is an important exercise for early warning and preparing the most vulnerable communities to impacts of droughts.

The prediction of droughts has remained challenging to climatologists and hydrologists due to the complexities in their origins and spatiotemporal scales at which they occur (Hao et al., 2018). In general, statistical, dynamical, and hybrid models are used in the prediction of droughts (Mariotti et al., 2013; Mishra and Singh, 2011; Pozzi et al., 2013; Shahid, 2010). In the statistical prediction models, empirical relationships between climate variables and drought indicators derived from observations are used in predicting droughts (Yaseen et al., 2015). Unlike statistical models, dynamical models rely on the physical interactions between the land, ocean, and atmosphere. In dynamical models, these interactions are represented mathematically and resolved to produce simulations/predictions of droughts (Turco et al., 2017). On the other hand, a hybrid model is a combination of both statistical and dynamical models (Murakami et al., 2016; Strazzo et al., 2019). For example, the predictions of multiple dynamical models can be combined using a statistical framework which assigns weights to predictions of different dynamical models to derive an ensemble prediction (Madadgar et al., 2016). Owing to their simplicity and low computational requirements statistical models are being extensively used for predicting droughts (Belayneh et al., 2016; Ganguli and Reddy, 2014; Mariotti et al., 2013; Xu et al., 2018).

Machine learning (ML) algorithms are a set of commands that provide systems with the ability to automatically learn and improve from past data without being extensively programmed (Lantz, 2013; Sachindra and Kanae, 2019). Various ML algorithms are being used to develop models that can mimic linear and non-linear interactions between predictors and predictands in different hydroclimatic applications such as; rainfall prediction (Parmar et al., 2017) rainfall-runoff modelling (Yaseen et al., 2015), temperature and heat wave prediction (Khan et al., 2019c) and drought prediction (Tian et al., 2018). ML algorithms such as k-Nearest Neighbours (KNN), Artificial Neural Network (ANN), Extreme Learning Machine (ELM), Random Forests (RF), Support Vector Machine (SVM), Relevance Vector Machine (RVM) and Genetic Programming (GP) are widely used in modelling complex interactions between various predictors and predictands (Bourdin et al., 2012; Deo and Şahin, 2015; Fahimi et al., 2017; Nourani et al., 2014; Rhee and Im, 2017; Wang et al., 2009; Yaseen et al., 2015). The complex nonlinear interactions among drought indicators (e.g. standardised precipitation index) and the predictors in predicting droughts have been modelled using many ML techniques such as ANN (Mishra and Desai, 2006; Mishra et al., 2007; Morid et al., 2007), SVM (Ganguli and Reddy, 2014), RF, Regressions Trees (Feng et al., 2019; Granata, 2019) and ELM (Mouatadid et al., 2018).

Among the ML techniques, ANN and SVM can be regarded as the most widely used techniques for developing drought prediction models (Barua et al., 2012; Ghimire et al., 2019; Mishra and Desai, 2006; Mishra et al., 2007; Nagelkerke, 1991; Radhika and Shashi, 2009; Santos et al., 2014; Tripathi et al., 2006; Xiang et al., 2019; Yang et al., 2015). It has been observed that ANN-based models suffer from limitations such as overfitting in calibration and underfitting in validation and trapping at local minima. Similar to ANN-based models, SVM-based models are also vulnerable to overfitting in calibration and underfitting in validation. In SVM inputs are mapped into a higher dimensional space where the originally nonlinear relationship between the predictors and the predictand becomes a linear one. Unlike ANN, the mapping is achieved using a kernel function (Bourdin et al., 2012). SVM is able to learn from a much smaller data set and is capable of handling a large number of variables. SVM may overcome certain limitations of ANN such as trapping at local minima and overfitting, to some extent in predicting droughts (Ganguli and Reddy, 2014; Shawe-Taylor and Cristianini, 2004). These ML algorithms have also been applied successfully in predicting various other hydrological processes such as runoff modelling (Ghorbani et al., 2016; Liong and Sivapragasam, 2002), extreme rainfall forecasting (Hadi Pour et al., 2019) and groundwater level prediction (Salem et al., 2018) with a good degree of accuracy.

The selection and identification of predictors in the development of drought prediction models are important steps for accurate prediction of droughts. No previous studies have been conducted on the identification of predictors and prediction of droughts over Pakistan. However, the identification of large scale atmospheric phenomena influential on the catchment scale climate (e.g. heatwaves) over Pakistan have been studied in the past (del Río et al., 2013; Khan et al., 2019c; Latif and Syed, 2015). Droughts have caused severe damages to the economy and triggered prolonged famine and water shortages across Pakistan in the past (Ahmed et al., 2018). Therefore, there is a need to predict droughts accurately for early warning, preparedness and mitigation in order to minimize their adverse impacts. Droughts are influenced by many factors, hence its prediction often requires a suite of atmospheric, hydrologic, and oceanic predictors (Mariotti et al., 2013; Xu et al., 2018). The relationships between the predictors and the drought indicators are mostly non-linear (Khan et al., 2019c; Mishra and Singh, 2011). Therefore, for the development of the drought prediction models, the most influential predictors on droughts should be carefully identified considering the non-linearity in the predictor-predictand relationships. In the majority of the drought prediction studies, the most influential predictors were extracted using correlation analysis or composite analysis (which determines the basic characteristics of the predictors by calculating the composite means, standard deviations and statistical significances) (Hao et al., 2018).

The main objective of this study was to develop ML-based models for predicting moderate, severe and extreme droughts over Pakistan based on the Standardised Precipitation and Evapotranspiration Index (SPEI). The study uses SPEI as the drought index due to its incorporation of both the precipitation and potential evapotranspiration thus better representation of drought phenomena than other indices (Ahmed et al., 2018; Zhao et al., 2017). The details of SPEI and its applications can be found in Beguería et al. (2014), Vicente-Serrano et al. (2010a) and Vicente-Serrano et al. (2010b). As mentioned earlier, ANNs tend to trap at a local minimum unlike SVM. SVM on the other hand is rapidly becoming a popular technique in developing drought prediction models (Chiang and Tsai, 2012; Ganguli and Reddy, 2014; Liang et al., 2011). Although KNN has been used for predicting various hydroclimatic variables it has not yet been used in the development of drought prediction models. Thus, a performance comparison between drought prediction models developed with ANN, SVM and KNN will provide a unique opportunity to identify their pros and cons. Thus, in this study, the drought prediction models were developed using ANN, SVM and KNN. In addition to that, in this study, an ML-based predictor selection method called SVM-recursive feature elimination (SVM-RFE) was used for the first time to select sets of predictors for drought prediction models. SVM-RFE was used due to its proven ability to eliminate less important predictors effectively (Chen et al., 2018). Also, this is the first study which investigated the development of Pakistan-wide drought forecasting models. Prediction of droughts will assist in the formulation of better water resources management practices much needed for the heavily agriculture-dependent communities in Pakistan.

Section snippets

Study area and datasets

Pakistan (latitudes 23°30ˈN–37°30ˈN and longitudes 61°E–78°E), located in South Asia covers an area of 796,095 km2. Due to its location in the northern hemisphere the country experiences four seasons defined based on temperature; cool and dry winter (Dec–Feb), hot and dry spring (Mar–May), hot and humid summer (Jun–Aug) and dry autumn (Sep–Nov) (Khan et al., 2019d, 2019e). Also, Pakistan experiences two monsoon precipitation seasons; the Indian monsoon (July–Sep) and the western disturbance

Methodology

The procedure used in developing drought prediction models is outlined as follows.

  • 1.

    Using PGF temperature and precipitation datasets 1, 2, 3, 4, 5, and 6-month SPEI values (1–6-month SPEIs) were calculated for each of the 1437 PGF grid points (531 grid points for Rabi season and 906 grid points for Kharif season) spread across Pakistan for the period 1948–2016.

  • 2.

    Based on 1–6-month SPEIs droughts were categorized into moderate, severe and extreme for Rabi and Kharif cropping seasons.

  • 3.

    Different

Determination of droughts

The droughts were categorized under three different severity levels (i.e. moderate, severe and extreme) and the percentages of area of Pakistan affected by these categories of droughts were calculated based on 6-month SPEI over the period 1948–2016 and shown in Fig. 3 for Rabi and Kharif seasons. As seen in Fig. 3, compared to the Rabi season in the Kharif season the droughts had affected more area (around 40%) during the period 1948–1964. Afterwards, the spatial extent of droughts has reduced

Discussion and conclusions

KNN is one of the oldest ML algorithms (Cover and Hart, 1967). KNN has been used in developing various statistical prediction and forecasting models, but yet to be used in developing drought models. This encouraged the authors to compare the performance of KNN-based drought models with the drought models developed with the latest ML algorithms. In this study, it was seen that KNN-based drought models display limited performance in comparison to that of SVM and ANN-based drought models in

Author's contribution

Najeebullah Khan, D.A. Sachindra and Shamsuddin Shahid designed the manuscript. All the authors including Kamal Ahmed and Mohammed Sanusi Shiru wrote the manuscript. The analysis was done by Najeebullah Khan and Shamsuddin Shahid introduced the methodology and conducted the analysis. D.A. Sachindra and Nadeem Nawaz helped in preparation of different figures and visualization of the manuscript. All the author also contributed to improve the manuscript during the revision.

Declaration of Competing Interest

The author declare no conflict of interest.

Acknowledgement

None.

Reference (109)

  • A. Mishra et al.

    Drought forecasting using feed-forward recursive neural network

    Ecol. Modell.

    (2006)
  • A.K. Mishra et al.

    Drought modeling–a review

    J. Hydrol. (Amst)

    (2011)
  • S. Mouatadid et al.

    Input selection and data-driven model performance optimization to predict the standardized precipitation and evaporation index in a drought-prone region

    Atmos. Res.

    (2018)
  • J. Rhee et al.

    Meteorological drought forecasting for ungauged areas based on machine learning: using long-range climate forecast and remote sensing data

    Agric. For. Meteorol.

    (2017)
  • D.A. Sachindra et al.

    Statistical downscaling of precipitation using machine learning techniques

    Atmospheric Research

    (2018)
  • Y. Tian et al.

    Agricultural drought prediction using climate indices based on support vector regression in Xiangjiang River basin

    Sci. Total Environ.

    (2018)
  • J.L. Ticknor

    ABayesian regularized artificial neural network for stock market forecasting

    Expert Syst. Appl.

    (2013)
  • S. Tripathi et al.

    Downscaling of precipitation for climate change scenarios: a support vector machine approach

    J. Hydrol. (Amst)

    (2006)
  • W.-C. Wang et al.

    A comparison of performance of several artificial intelligence methods for forecasting monthly discharge time series

    J. Hydrol. (Amst)

    (2009)
  • S Adnan

    Comparison of various drought indices to monitor drought status in Pakistan

    Clim. Dyn.

    (2018)
  • S. Ahmad et al.

    Drought Mitigation in Pakistan: Current Status and Options for Future Strategies

    (2019)
  • K. Ahmed et al.

    Characterization of seasonal droughts in Balochistan Province, Pakistan

    Stoch. Environ. Res. Risk Assess.

    (2016)
  • M. Al-Mukhtar et al.

    Future predictions of precipitation and temperature in Iraq using the statistical downscaling model

    Arab. J. Geosci.

    (2019)
  • A. Anandhi et al.

    Role of predictors in downscaling surface temperature to river basin in India for IPCC SRES scenarios using support vector machine

    Int. J. Climatol.

    (2009)
  • S. Barua et al.

    Artificial neural network–based drought forecasting using a nonlinear aggregated drought index

    J. Hydrol. Eng.

    (2012)
  • S. Beecham et al.

    Statistical downscaling of multi‐site daily rainfall in a South Australian catchment using a generalized linear model

    Int. J. Climatol.

    (2014)
  • S. Beguería et al.

    Standardized precipitation evapotranspiration index (SPEI) revisited: parameter fitting, evapotranspiration models, tools, datasets and drought monitoring

    Int. J. Climatol.

    (2014)
  • D.R. Bourdin et al.

    Streamflow modelling: a primer on applications, approaches and challenges

    Atmos. Ocean.

    (2012)
  • F. Burden et al.

    Bayesian regularization of neural networks

    Artificial Neural Networks

    (2008)
  • Q. Chen et al.

    Decision variants for the automatic determination of optimal feature subset in RF-RFE

    Genes (Basel)

    (2018)
  • J.L. Chiang et al.

    Reservoir drought prediction using support vector machines

    Applied Mechanics and Materials

    (2012)
  • C. Cortes et al.

    Support-vector networks

    Mach. Learn.

    (1995)
  • T.M. Cover et al.

    Nearest neighbor pattern classification

    IEEE Trans. Inf. Theory

    (1967)
  • S. del Río

    Recent mean temperature trends in Pakistan and links with teleconnection patterns

    Int. J. Climatol.

    (2013)
  • V.B. Dodla et al.

    Analysis and prediction of a catastrophic Indian coastal heat wave of 2015

    Nat. Hazards

    (2017)
  • I.H. Durrani et al.

    Historical and future climatological drought projections over Quetta Valley, Balochistan, Pakistan

  • E Dutra

    The 2010–2011 drought in the Horn of Africa in ECMWF reanalysis and seasonal forecast products

    Int. J. Climatol.

    (2013)
  • F. Fahimi et al.

    Application of soft computing based hybrid models in hydrological variables modeling: a comprehensive review

    Theor. Appl. Climatol.

    (2017)
  • J.D. Farmer et al.

    Predicting chaotic time series

    Phys. Rev. Lett.

    (1987)
  • J.R. Fienup

    Invariant error metrics for image reconstruction

    Appl. Opt.

    (1997)
  • F.D. Foresee et al.

    Gauss-Newton approximation to Bayesian learning

  • K.F. Fung et al.

    Drought forecasting: a review of modelling approaches 2007–2017

    J. Water Clim. Change

    (2019)
  • P. Ganguli et al.

    Ensemble prediction of regional droughts using climate inputs and the SVM–copula approach

    Hydrol. Process.

    (2014)
  • M. Gao et al.

    Are peak summer sultry heat wave days over the Yangtze–Huaihe river basin predictable?

    J. Clim.

    (2018)
  • Z Gao

    Variability and predictability of Northeast China climate during 1948–2012

    Clim. Dyn.

    (2014)
  • G.D. Garson

    Interpreting neural-network connection weights

    AI Expert

    (1991)
  • N. Ghodichore et al.

    Reliability of reanalyses products in simulating precipitation and temperature characteristics over India

    J. Earth Syst. Sci.

    (2018)
  • M.A. Ghorbani et al.

    Modeling river discharge time series using support vector machine and artificial neural networks

    Environ. Earth Sci.

    (2016)
  • Hadi, P.S., Wahab, A., Khairi, A., Shahid, S., Wang, X., 2019. Spatial pattern of the unidirectional trends in thermal...
  • Z. Hao et al.

    Seasonal drought prediction: advances, challenges, and future prospects

    Rev. Geophys.

    (2018)
  • Cited by (145)

    View all citing articles on Scopus
    View full text