Introduction

Glaciers are considered an icon of climate change, and they can clearly represent the emergence of climate globally (IPCC, 2018). Several studies have reported that mountain glaciers will significantly contribute to sea-level rise in the coming years, and it may change the hydrology of basins covered by permanent snow and glacier (Beniston et al. 2018; Hock et al. 2019). Snowmelt is the primary source of fresh water in many regions worldwide, and it is extremely important for the community living in Hindukush-Karakorum-Himalayas (HKH). HKH is also named as the “third pole” and “roof of the world” due to substantial glacial coverage in the high elevated basins (Yao et al. 2012; You et al. 2016). Glacial and snow cover in the HKH region constitute from 70 to 80% of the mean annual available freshwater from the Upper Indus Basin (UIB) (Immerzeel et al. 2009).

UIB is the main source of freshwater resources and plays a pivotal role in the sustainable development of Pakistan (Yaseen et al. 2020). The Upper Indus River system supplies sustainable water to the large population downstream of UIB for agriculture, industrial, and domestic purposes (Immerzeel et al. 2010). The seasonal water from UIB accounts for approximately half of the mean annual surface water available in Pakistan, which is essential for producing 3500 MW hydropower potential at Tarbela Dam (Hasson et al. 2017). Further, UIB also contributes to Pakistan's agrarian economy by satisfying the extensive irrigation requirements to meet rising food demand. Most of the south and southeast Asian basins are dependent on the summer monsoon; however, UIB is dependent on melted water from its ample glacial and snow coverage (Hasson et al. 2014).

Forecasting the streamflow and hazard management in such glacial basins plays a critical role in the region's sustainable development. Population in the Indus Basin depend on the river flows from UIB; therefore, streamflow forecasting is crucial for people living downstream of the HKH. Water resources of the Indus River basin should be managed by using real-time early warning systems (Krajewski et al. 2017). However, these systems usually require huge investments, which is difficult for communities living in UIB and the Indus River basin. Therefore, inexpensive, accurate, and innovative forecasting and simulation techniques are strongly recommended across the entire Indus Basin.

Estimating snowmelt and glacier-melt streamflow is vital for effective planning and management of surface water in the UIB. The changes in glaciers under the influence of climate change will strongly impact river flow and hydrological regimes in the UIB (Huss and Hock, 2018). However, the precise estimation of streamflow in a basin characterized by mountains covered with permanent snow and glacier is considered an unsolved problem, which deserves the attention of hydrology community (Lettenmaier et al. 2015).

Hydrological models neglect the trivial information related to the structure of the model, comprehend hydrological processes and powerful tools for effective decision-making related to the sustainable management of water resources (Nguyen et al. 2019; Rahman et al. 2020a; Tuo et al. 2016). Hydrological models play a vital role in allowing users to explain, estimate, and predict hydrological processes in basins characterized with limited, non-accessible, cost-efficient, and time-consuming in-situ observations (Baffaut et al. 2015). The structure of these hydrological models varies from a complex process-based distributed model to simple lumped models. Soil and Water Assessment Tool (SWAT) model belongs to the complex process-based distributed model (Arnold et al. 1998; Gassman et al. 2007; Nguyen et al. 2019).

SWAT model has been extensively used in modeling several types of hydrological processes in river basins across the globe (Abbaspour et al. 2015; Duan et al. 2019; Francesconi et al. 2016; Golmohammadi et al. 2017; Liu et al. 2016; Malagò et al. 2016; Nguyen et al. 2019; Rahman et al. 2020a; Tuo et al. 2016). SWAT is also extensively employed to assess the impact of snow on the water cycle across the mountainous basins (Debele et al. 2010; Rostamian et al. 2008; Grusson et al. 2015; Troin and Caya, 2014; Shahid et al. 2021). The temperature-index method has been widely used to model the snow processes in different basins using SWAT (Hock, 2003; Walter et al. 2005; Zhang et al. 2008), which is proved remarkably accurate in several studies (Debele et al. 2010; Luo et al. 2013).

The data-driven models, on the other hand, have been effectively used in several hydrological applications and such models provided high accuracy even without prior knowledge of underlying processes (LV et al. 2020; Senent-Aparicio et al. 2019; Yang et al. 2020). The approaches such as artificial intelligence (AI), soft computing (SC), data mining (DM), computational intelligence (CI), and machine learning analyze the system-related data and provide linkage between input and output variables without considering the explicit physical behavior of the objective system (Solomatine et al. 2009).

Recent studies have used several machine learning models to address different aspects of water resources management and hydrological modeling. For example, artificial neural network (ANN) is utilized to simulate rainfall-runoff, predict runoff, model river sediment process, predict storage inflow, and evaluate the water-powered energy (Choong et al. 2020; Pradhan et al. 2020). Fan et al. (2020) compared the short-term long memory (LSTM) model with SWAT and ANN models to simulate streamflow across Poyang Lake Basin. Results demonstrated the superior performance of LSTM model compared with SWAT and ANN models in simulating streamflow at a daily scale. Pradhan et al. (2020) investigated the performance of three ANN models and the SWAT model in predicting the streamflow and illustrated that ANN models have more accurate estimates as compared with SWAT. Similarly, Kumar et al. (2019) compared Emotional Neural Network (ENN) and ANN to simulate streamflow, where ENN was reported to have better performance. Koycegiz and Buyukyildiz (2019) compared SWAT with support vector machine (SVM) and ANN in the headwater of Carsamba River, situated in Konya Closed Basin, Turkey. Results demonstrated that data-driven models (ANN and SVM) have better performance in streamflow simulation as compared with SWAT. Several other models, including Adaptive Neuro-Fuzzy Inference System (ANFIS) and SVM, are also used for hydrological modeling (He et al. 2014; Moradkhani et al. 2004).

The comparative studies among data-driven and physically-based models ensured the successful application, selection of robust models, interpretation of the model outputs and reliable results. The evaluation and comparison of data driven model across catchments like UIB is extremely critical. UIB, being the source of freshwater for the entire Indus Basin, has several problems such as extreme climate variability owing to the complex topography, streamflow is extremely seasonal, subject to severe climate and land use changes, and most importantly a data scarce region. Minimal studies are available that developed and evaluated data-driven models and performed a comparison with physical-based models in glacial regions like UIB. Therefore, the current study adds to the available literature; (i) compare the performance of SWAT and MLP models in streamflow simulation for the first time across UIB, (ii) simulate streamflow that is less influenced by precipitation and more dependent on topography, air temperature, relative humidity, and solar radiation, and (iii) propose model that is capable to capture the seasonality in streamflow. The findings from current study will educate researchers and policy makers about robust alternatives to data-intensive hydrological models for streamflow simulation in data scarce regions characterized with complex topography and diverse climate.

Study area

UIB is situated in the extreme north of Pakistan, mostly covered by permanent snow and glacier cover, and located between 33.67° − 37.20° N and 70.50° – 77.50° E. The elevation of UIB ranges from 8500 above mean sea level (a.m.s.l) to 200 m a.m.s.l with the mean elevation of 3750 m a.m.s.l. UIB shares its boundary with China (north), Afghanistan (west), and India (east), as shown in Fig. 1. UIB depicts significant topographic and climatic variations, which include a complex terrain of HKH mountain ranges. HKH has discrete topographical landscapes with conflicting climate change signals (Archer, 2003; Cheema and Bastiaanssen, 2012). HKH and Tibetan Plateau are the greatest glacial regions of the world with approximately 22,000 km2 of glacial surface area and jointly host 11,000 glaciers (Ul Hasson et al. 2016). UIB is the originating source of freshwater for the Indus River, contributes to approximately half of the available surface water of Pakistan (Yaseen et al. 2020), and plays an important role in Pakistan's sustainable economic development.

Fig. 1
figure 1

Detailed information about the study area, a geographical location and elevation of Pakistan, b elevation of UIB, c selected basins and the distribution of rain gauges, climate and streamflow stations, d land use of UIB

Figure 1 represents detailed information about the study area, including elevation, selected basins for hydrological modeling, the spatial distribution of rain gauges (RGs), climate and streamflow stations, and land use map. The study region covers important basins of the UIB, including Shyok, Shigar, Hunza, Astore, Gilgit, Indus River basin at Bisham Qila, and Indus River basin at Shatial bridge. Shyok and Shigar basins are situated in the eastern and central parts of Karakoram. Approximately 24% of Shyok and 33.33% of Shigar basins are covered by snow (Bhambri et al. 2013; Yaseen et al. 2020). Westerly disturbances and monsoon are the main seasons/sources for precipitation in Shyok and Shigar basins (Hasson et al. 2017; Hasson et al. 2014; Latif et al. 2020). Hunza basin is situated in the western Karakoram ranges, with 28% of the area covered with glaciers, which is 21% of the total UIB glacial coverage (Bhambri et al. 2013; Hasson et al. 2014). Three high-altitude climate stations, i.e., Khunjrab, Naltar, and Ziarat, are situated in the Hunza basin. Discharge of Hunza basin is measured at Danyior bridge of Hunza River. Astore basin is located in the western Himalayan ranges with 14% of glacial and snow coverage, which is 3% of the total UIB glacial coverage (Hasson et al. 2014). There is only one climate station (Astore) measuring climate and precipitation in the Astore basin, where the discharge of the basin is measured at Doyian. Gilgit basin is situated in the eastern part of Hindukush ranges, which drains south-east and joins the Indus River. The climate and precipitation data in the Gilgit basin is recorded at Gilgit, Yasin, Gupis, and Ushkore stations, where the basin's discharge is measured at the Gilgit station and Alam bridge (the confluence of Gilgit and Hunza rivers). Bisham Qila station is the final station used in the current study, located at the Indus River, and is considered an exit point.

Datasets and methodology

In-situ data from rain gauges, climate stations, and streamflow gauges

The daily in-situ precipitation and climate (both maximum and minimum temperature, minimum and maximum relative humidity, wind speed, and solar radiation) data are collected from Pakistan Meteorology Department (PMD), and Water and Power Development Authority (WAPDA). It is worthy of mentioning that the streamflow gauges’ data is collected solely from WAPDA. Table 1 represents the names of RGs, climate stations, and streamflow gauges used in the current research. After the rigorous screening of collected data, a temporal span of 1995–2013 was selected to warm-up, calibrate, and validate the SWAT model across selected basins of UIB. It is ensured that all the gauges/stations have daily data without any significant missing information. PMD and WAPDA perform the manual collection of obtained data, which might have several types of inevitable errors, including instrumental and human errors, splashing effects, snow impact, and wind errors. These factors might deteriorate the quality of in-situ data (Rahman et al. 2020b). Therefore, PMD and WAPDA monitor and improve the data quality using the Guide to Hydrological Practices suggested by the world meteorology organization (WMO, 1994).

Table 1 Input data (daily time scale) from RGs, climate stations, and streamflow gauges during 1995–2013

SWAT model can automatically fill the missing meteorological input data by employing the weather generator. In order to fill out the data, more input observations and further efforts are needed (Rahman et al. 2020a). Furthermore, the accuracy of SWAT model (output) is dependent on the accuracy of input data. Therefore, the zero-order method was employed to fill the missing data (if any) in precipitation and climate data before its integration into SWAT and MLP models. Moreover, Kurtosis and Skewness methods are employed to check the quality of input data (Rahman et al. 2018).

UIB has only 2% of the cultivable land and the irrigation system is a traditional one (ICIMOD, 2017). Around 15% of the cultivable land is not cultivated because of unavailability or limited access to irrigation water and irrigation system complexity (ICIMOD, 2017). Further, the agricultural farmlands are sparsely distributed and most of them are above the level of flowing water in rivers. Irrigation in UIB is mainly rainfed irrigation due to its poor irrigation system and complex terrain (Malik and Azam, 2009; Parveen et al. 2015). Agriculture lands are irrigated from snow and glacier melt water through dug earthen irrigation channels, which are lengthy, rough, and crude, resulting in inadequate water supply for irrigation (Parveen et al. 2015; Khan et al. 2021). Therefore, we did not consider the irrigation and agricultural water use data, and cropping pattern in the current study due to its non-availability.

SWAT model

The SWAT model is a process-based semi-distributed, Hydrological Response Unit (HRU)-based, spatially explicit, and time-continuous hydrological model developed by the agricultural research service of the United States Department of Agriculture (Arnold et al. 1998). SWAT model divide the large basins into smaller sub-basins to provide more accurate spatial details, which make the model more reliable and accurate (Jha, 2004). SWAT model was designed to simulate as well as forecast the impacts of agricultural and land management decision/practices on water resources in terms of quantity and water quality across a range of basins sizes (Gassman et al. 2007). Further, the hydrological responses to land use and climate changes are mostly investigated using the scenario-based simulations through SWAT model (Yin et al. 2017). The computational efficiency of SWAT model makes the simulation across large basins or different types of management strategies easy (Coutu and Vega, 2007). In this research, ArcSWAT version 2012, revision 664 was used to simulate streamflow in UIB. Generally, the SWAT model is extensively used in water quality assessment, simulation of rainfall-runoff processes, evapotranspiration, and soil erosion. SWAT has the potential to assess climate change impact on water resources, transport of nutrients and sediments under the various circumstances of land use land cover (LULC), meteorological, and soil data (Ali et al. 2020; Khan et al. 2018; Marahatta et al. 2021; Song et al. 2011; Tuo et al. 2016).

The operating mechanism of SWAT model includes the division of entire basins into several sub-basins and finally into HRUs (which is a distinctive combination of slope, LULC, and soil type) based on digital elevation model (DEM). SWAT model generate HRUs using two methods, i.e., generate HRUs for individual sub-basin using the soil and LULC information and multiple HRUs based on threshold values (Arnold et al. 1998). As recommended by Setegn et al. (2009), the current study used 10%, 20%, and 10% threshold for land, soil and slope, respectively. After the successful overlaying of soil, slope, and LULC datasets, the number of sub-basins (HRUs) generated by SWAT model for Gilgit, Hunza, Shatial bridge, Yugo, Doyian, and Bisham Qila are 12 (22), 10 (18), 24 (45), 6 (10), 5 (8), and 9 (17), respectively.

SWAT model has two distinct phases, which are named as land and routing phases. The daily precipitation is used by SWAT model to simulate surface runoff for each HRU using the Soil Conservation Service (SCS) technique during the land phase hydrological component (USDA, 1972). Green and Ampt infiltration method (Green and Ampt, 1911) is the alternative method to SCS in the SWAT model to simulate surface runoff. Green and Ampt infiltration method require precipitation inputs on a sub-daily scale. Simulated streamflow is routed during the routing phase through streams/river network to the basin outlet using Muskingum or Variable storage techniques.

Input data for SWAT model

SWAT model requires soil properties, LULC and soil maps, and elevation data as input before simulating streamflow. For the current research, DEM with a resolution of 30 m retrieved from shuttle radar topographic mission (SRTM) was downloaded from USGS earth explorer (https://earthexplorer.usgs.gov/). The basin delineation and retrieval of required topographic parameters for the SWAT model were acquired from DEM. The LULC map (shown in Fig. 1d) was developed for 2005 using the supervised classification method. Landsat-7 satellite imageries were used for the preparation of LULC map. Landsat 7 ETM + images have eight spectral bands with 30 m spatial resolution for bands 1–7, while band 8 is a panchromatic band with 15 m spatial resolution. LULC map was developed using the SVM method. For a detailed description of the SVM method, readers are referred to Balkhair and Ur Rahman (2019). The soil data for the study region was extracted from a soil map prepared by the Food and Agriculture Organization (FAO) (http://www.fao.org/soils-portal/soil-survey/soil-maps-and-databases/faounesco-soil-map of-the-world/en/), which has a resolution of 1:5,000,000 following Rahman et al. (2020a). Harmonized World Soil Database v1.2, combined with the extracted soil map from FAO, was used to acquire the required soil properties for SWAT. Further, the default setting of the SWAT model, i.e., SCS method, Penman–Monteith equation, and variable storage method, is used in the current study to simulate streamflow. Streamflow using SWAT model is simulated by employing water balance approach, which depends temperature and precipitation inputs and can be presented as follows:

$${\text{SW}}_{t} = {\text{SW}}_{0} + \sum\limits_{i = 1}^{t} {\left( {P_{{{\text{day}}}} - R_{{{\text{Surf}}}} - ET_{{{\text{day}}}} - W_{{{\text{seep}}}} - R_{{{\text{gw}}}} } \right)}$$
(1)

where \({\text{SW}}_{t}\) and \({\text{SW}}_{0}\) represents the final and initial soil moisture conditions, \(P_{{{\text{day}}}}\) depicts the daily precipitation,\(R_{{{\text{Surf}}}}\) is surface runoff, \({\text{ET}}_{{{\text{day}}}}\) is the daily evapotranspiration (ET), \(W_{{{\text{seep}}}}\) is water seeped into the ground and \(R_{{{\text{gw}}}}\) is the groundwater return flow. The units of all the above-listed variables are in “mm”. Surface runoff is calculated using SCS-CN method, while ET is calculated using Penman–Monteith (PM) equation.

Calibration and validation of SWAT model

The calibration and validation (parameter optimization) of SWAT model is performed using Sequential Uncertainty Fitting version 2 (SUFI-2) in SWAT Calibration and Uncertainty Program (SWAT-CUP) (Abbaspour et al. 2015). In order to alleviate the impact of initial conditions and allow a stable SWAT performance, the first three years (1995–1997) were considered as a warm-up period (Tuo et al. 2016). The model is calibrated and validated at multiple sites shown in Table 1 during 1998–2005 and 2006–2013. Besides the data quality, accuracy of SWAT output depends on the careful selection of sensitive parameters. In the current study, sensitive parameters (listed in Table 2) for calibrating and validating the SWAT model were selected from extensive literature review (Abbaspour et al. 2015; Ali et al. 2020; Arnold et al. 2012; Arnold et al. 1998; Bhatta et al. 2019; Duan et al. 2019; Garee et al. 2017; Rahman et al. 2020a; Shah et al. 2020; Shahid et al. 2021; Shrestha et al. 2016). The multi-site calibration of SWAT model is preferred, which produces high accuracy compared with single-site calibration (Rahman et al. 2020a; Shrestha et al. 2016). Therefore, the SWAT model in the current study is calibrated and validated at multiple sites (five interior stations and one outlet station) by following the recommendations of Lerat et al. (2012), Duan et al. (2019), and Rahman et al. (2020a).

Table 2 List of sensitive parameters, their description, lower and upper bounds and units. “a,” “v,” and “r” represent an absolute increase to default values of selected parameters, replacing the actual values by new selected values, and relative change to the initial values of selected parameters, respectively

The most sensitive parameters are selected by employing Global sensitivity analyses in SWAT-CUP. The initial values for selected parameters were based on physically practical intervals for each parameter suggested in the official documents of SWAT (Arnold et al. 2012) and various studies (Grusson et al. 2015; Tuo et al. 2016). Four iterations with 1000 simulations (total number of 4000 simulations during calibration phase) were performed to calibrate the model with Nash–Sutcliffe Efficiency (NSE) (Nash and Sutcliffe, 1970) as an objective function. The parameters range (values) are narrowed down further after every iteration, based on the suggestions from SWAT-Cup (Abbaspour et al. 2004; Abbaspour et al. 2007) and their pre-defined physical ranges. Readers are referred to Abbaspour et al. (2015) for the detailed description of model calibration procedures. The performance of SWAT model was evaluated using several statistical indicators, including NSE, percent BIAS (PBIAS), coefficient of determination (R2), and mean absolute percentage error (MAPE), as calculated below.

$${\text{NSE}} = 1 - \frac{{\sum {\left( {Q_{m,i} - Q_{s,i} } \right)^{2} } }}{{\sum {\left( {Q_{m,i} - \overline{Q}_{m} } \right)^{2} } }}$$
(2)
$${\text{PBIAS}} = 100 \times \frac{{\sum\limits_{i} {\left( {Q_{m,i} - Q_{s,i} } \right)} }}{{\sum\limits_{i} {Q_{m,i} } }}$$
(3)
$$R^{2} = \frac{{\left| {\sum\limits_{i} {\left( {Q_{m,i} - \overline{Q}_{m} } \right)\left( {Q_{s,i} - \overline{Q}_{s} } \right)} } \right|^{2} }}{{\sum\limits_{i} {\left( {Q_{m,i} - \overline{Q}_{m} } \right)^{2} \sum\limits_{i} {\left( {Q_{s,i} - \overline{Q}_{s} } \right)^{2} } } }}$$
(4)
$${\text{MAPE}} = \frac{{\left| {\frac{{\left( {Q_{m,t} - Q_{s,t} } \right)}}{{Q_{m,t} }}} \right|}}{N} \times 100$$
(5)

where \(Q_{s,i}\)and \(Q_{m,i}\) are the simulated and observed streamflow, while the average of observed and simulated streamflow is represented by \(\overline{{Q_{m} }}\) and, \(\overline{{Q_{s} }}\) respectively.

NSE demonstrates the quantitative difference among observed and simulated streamflow, with the optimal value of 1. PBIAS depicts overestimation or underestimation of simulated streamflow, where the perfect value for PBIAS is 0, the positive and negative values illustrate overestimation and underestimation, respectively. The performance of SWAT model is categorized into four classes according to criteria defined by Moriasi et al. (2007): unsatisfactory (\({\text{NS}} \le 0.50\), \(\left| {{\text{PBIAS}}} \right| \ge 25\%\)), satisfactory (0.50 < NS ≤ 0.65, 15% ≤|PBIAS|< 25%), good (0.65 < NS ≤ 0.75; 10% ≤|PBIAS|< 15%), and very good (NS > 0.75, |PBIAS|< 10%). According to Lewis (1982), MAPE < 10 shows high accurate streamflow simulation, between 10 and 20 shows good simulation, between 20 and 50 shows reasonable simulation, while greater than 50 shows inaccurate simulation.

Artificial Neural Network (ANN)

Artificial Neural Networks have mainly two architectures, i.e., recurrent and non-recurrent. It was demonstrated that for hydrological modeling, non-recurrent type architecture is well suited. Non-recurrent architecture has a single-layer perceptron and multi-layer perceptron (MLP) (Mohammadi et al. 2020). Time series based non-linear hydrological problems such as non-linear precipitation prediction, streamflow and sediment modeling, rainfall-runoff modeling, river stage-discharge modeling, etc., can be easily solved using the MLP. A more detailed description of the MLP method can be found in existing studies, e.g. (ASCE, 2000a, b; Lohani et al. 2012; Nourani, 2014, Kushwaha and Kumar, 2017).

Structural description of the multi-layer perceptron (MLP)

The basic structure of MLP can be broadly divided into three layers, i.e., the input layer followed by a hidden layer and then the output layer (Fig. 2). The MLP layers are mainly represented by n-nh1-no, in which n, nh1, and no are the number of neurons in the input layer, first hidden layer, and output layer, respectively. The number of hidden layers can be increased or decreased depending on the requirement of suitable architecture, which has numerous processing elements and connections. Neurons in a typical MLP are designed to handle complex processes more efficiently and satisfactorily using several algorithms. The input layer does not have computational nodes, whereas hidden and output layers have computation nodes. The number of hidden layers and neurons in each hidden layer can be increased or decreased for a suitable network architecture requirement.

Fig. 2
figure 2

Example of the MLP model by three layers, viz. input layer, hidden layer, and output layer (Left). A typical processing element with the flow of signals (Right)

The crucial role of MLP is to convert a number of inputs into single or multiple outputs. If we consider, xi (i = 1,2, …, m) are inputs for a pre-organized model for which wi (i = 1,2, …, m) are the respective weights, which would be changed by error algorithms later on. Under such considerations, the net input to a single node can be expressed by Eq. 6. The activation function “f” converts the net input into an output (Eq. 7). The output of any node will behave like an input for the next computational node.

$${\text{net}} = \sum\limits_{i = 1}^{m} {x_{i} } w_{i}$$
(6)
$$y \, = \, f \, \left( {{\text{net}}} \right)$$
(7)

To determine the weights of the network, two learning mechanisms including supervised and unsupervised have been involved. In the present study, supervised learning has been used in which the user completely knows the input as well as output pattern of the typical architecture of the network. In MLP, the back propagation algorithm (BPA) is used to adjust the weights in each iteration to minimize the mean square error between known and unknown output. The known output is the observed values of the dataset, and the unknown output is the network computed values of the same dataset. Adjusting weights in each iteration or epoch is performed by two calculations: feed-forward calculations and the back-propagation of errors (also known as the mean square error (MSE)).

The feed-forward calculation occurs in each layer, which is represented by Eq. 8 through Eq. 11. The net input to jth node of the hidden layer is given by:

$${\text{neth}}_{j} = \sum\limits_{i = 1}^{n} {wh_{ji} } x_{i}$$
(8)

Connection weight always lies between two nodes. For establishing connection weights, this is not necessary that each node should be a computational node. In Eq. 8, connection weight whji is a weight between ith node of the input layer and jth node of the hidden layer, although there is no computation on nodes of the input layer. Now, the output of this particular node in the hidden layer is given by Eq. 9.

$$h_{j} = f\left( {{\text{neth}}_{j} } \right)$$
(9)

Furthermore, net input to kth node of the output layer is given by Eq. 10.

$${\text{nety}}_{k} = \sum\limits_{j = 1}^{{n_{h} }} {{\text{wo}}_{kj} h_{j} }$$
(10)

where wokj is the connection weight between the computational jth node of the hidden layer and the computational kth node of the output layer. The final output (Eq. 11) from kth node of the output layer is;

$$y_{k} = f\left( {nety_{k} } \right)$$
(11)

After calculating this output, the error between known and unknown output is back propagated using BPA. For a single pair of input and output of a dataset, the sum square error E will be given by;

$$E = \frac{1}{2}\sum\limits_{k = 1}^{no} {\left( {y_{k} - t_{k} } \right)}^{2}$$
(12)

where tk is the known or observed output at the kth node and yk is the unknown or calculated output at the same node. Levenberg–Marquardt (LM) learning algorithm is one of the most popular algorithms for minimizing the MSE of a typical MLP. This learning algorithm has been used in this study to train the network. The objective function in this algorithm for minimizing the error function is;

$$\Delta W = \left( {J^{T} J + {\mu I}} \right)^{ - 1 } J^{T} e$$
(13)

where J is the Jacobian matrix, e is the error vector, \(\Delta W\) is the increment of weights, T shows the target output, and μ is the parameter changed or learning rate which is to be updated using the β depending on the outcome. This study uses the Sigmoid activation function to convert the input signal to output signal at all computational nodes of hidden layers and the output layer.

In the current study, MLP has been applied individually for each sub-basin after selecting appropriate meteorological inputs from the gauging stations located inside a sub-basin. In this context, five model inputs viz. maximum and minimum temperature, maximum and minimum RH, solar radiation, and precipitation have been selected on a lumped basis and used for each station of the sub-basin. If any sub-basin has four gauge stations, then its model inputs have been selected as five multiplied by four, i.e., twenty, to prepare a streamflow model for that particular sub-basin. So, there are 20 inputs and a single output, which is streamflow for the MLP model. Besides these selections, the number of neurons in each hidden layer, transfer function, and learning algorithms is extremely important to obtain a suitable MLP architecture (selection of the number of hidden layers). In this study, several trials have been performed to obtain an appropriate MLP architecture as a schematic diagram shown in Fig. 3.

Fig. 3
figure 3

Schematic diagram for the MLP model used in the current study

Results

Streamflow simulation using SWAT model

Sensitivity analyses

The rank, t-test, and p-value for selected parameters achieved using the Global sensitivity analyses across each sub-basin is shown in Table 3. The t-test, which is calculated by dividing the coefficient by standard error, demonstrates the precision for measuring the regression coefficient. In another way, t-test illustrates the importance of selected parameter (the high magnitude of the absolute t-test depicts the most sensitive parameter). On the other hand, p-values depict variations in the mean of sample (streamflow observations). The minimum p-value and maximum t-test value show that the parameter is very sensitive (Abbaspour et al. 2015). The most sensitive parameters (as listed in Table 3) for most sub-basins were CN2, SOL_AWC, SOL_K, TIMP, SMFMX, and CH_K2. The major source of water in UIB is the melting of snow-cover and glaciers; therefore, SWAT model was found sensitive to SMFMX (maximum rate of snowmelt in the year) and TIMP (snowpack temperature lag factor). On the other hand, soil characteristics of plays an important role in the infiltration and surface-runoff. Therefore, the SWAT model in UIB is sensitive to SOL_AWC (available water capacity of soil layer) and SOL_K (saturated hydraulic conductivity). Further, curve number (CN2, across the basins) and hydraulic conductivity (CH_K2, along the streams) are the two important parameters in streamflow simulation across the basin. For the sensitivity analysis, it can be observed that SWAT model is sensitivity to two parameters of each snow-cover, channel flow paths and soil characteristics.

Table 3 The rank, t-test, and p-values for selected sensitive parameters across each of the six sub-basins (Gilgit, Hunza, Shatial bride, Yugo, Doyian, Bisham Qila)

Calibration and validation of SWAT model

SWAT model was calibrated at the Gilgit sub-basin of UIB by fitting the initial values for each parameter. Several iterations with 1000 simulations each are performed to get the final values for each parameter (which were narrowed down after each iteration). After calibrating the SWAT model at Gilgit, the model is then calibrated at Hunza, Shatial bridge, Yugo, Doyian, and Bisham Qila sub-basins using the initial values for the same set of parameters. Four parameters, i.e., CN2, SOL_AWC, SOL_K, and CH_K2 were initially selected for the calibration process. These parameters were found very sensitive to streamflow simulation in a number of studies (Immerzeel and Droogers, 2008; Rahman et al. 2020a; Shen et al. 2008; Shrestha et al. 2016). After every simulation, the remaining parameters (shown in Table 2) are supplemented in groups comprised of five parameters each. The SWAT model was calibrated at each station in the six sub-basins using the final selected list of parameters. The final calibrated fitted parameter values are presented in Table 4.

Table 4 List of selected sensitive parameters with the final calibrated values in each sub-basin

Evaluation of SWAT model performance

The accuracy (performance) of SWAT model was evaluated using NSE, PBIAS, and R2 based on the criteria suggested by Moriasi et al. (2007) and MAPE (Lewis, 1982). Table 5 presents the performance of SWAT model during calibration and validation periods. Results showed that the performance of SWAT model during the calibration period in selected sub-basins of UIB ranged from good to very good evaluated in terms of NSE and PBIAS. MAPE showed accurate streamflow simulation across all the sub-basins except for the Indus River at Shatial bridge and Bisham Qila sub-basins during the calibration period. However, the performance of SWAT during the validation period assessed using NSE and PBIAS ranged from good to very good. MAPE depicted good streamflow simulation for all the sub-basin while reasonable streamflow simulation at Indus River at Shatial bridge. PBIAS shows that the simulated streamflow is overestimated at most sub-basins during the calibration and validation periods except for Gilgit and Bisham Qila (only during the validation period) sub-basins.

Table 5 Performance evaluation of SWAT model during calibration and validation period using NSE, PBIAS, R2, and MAPE statistical indices

Figure 4 shows the daily scale calibration and validation of the SWAT model across Shatial bridge, Yugo, Doyian, and Bisham Qila sub-basins. Results from Gilgit and Hunza sub-basins (the remaining two sub-basins) are presented in the next section in comparison with MLP-based simulated streamflow. Overall, the performance of SWAT model ranged from good to very good during calibration and validation periods. PBIAS shows that SWAT model underestimated the streamflow across Bisham Qila sub-basin during the validation period. The simulated streamflow is mostly underestimated at the peaks during the monsoon season (May to September). Further, the analyses also demonstrated high overestimation during the validation period as compared with the calibration period.

Fig. 4
figure 4

Calibration (left) and validation (right) of SWAT model across a Shatial bridge, b Yugo, c Doyian, d Bisham Qila sub-basins

Streamflow simulation using MLP and its comparison with the SWAT model

Recent studies show that the application of data-driven approaches in hydrological modeling increases, and these models are proved more robust and accurate. Streamflow in the current research was also simulated using MLP across UIB, and the results are compared with the SWAT model (shown in Table 6 and Fig. 5). Table 6 shows that the MLP simulated streamflow with high accuracy as compared with the SWAT model. The performance of MLP is very good during the calibration and validation periods across all sub-basins evaluated based on criteria specified by Moriasi et al. (2007). In the MLP model, a training dataset was used to tune the parameters of MLP, and the remaining dataset has been applied to check model performance with the unknown testing dataset.

Table 6 Evaluation of MLP model performance in the calibration and validation periods using NSE, PBIAS, R2, and MAPE statistical indices
Fig. 5
figure 5

Comparison of streamflow simulated by SWAT (left panel) across a Gilgit and c Hunza sub-basins with MLP (right panel) across b Gilgit and d Hunza sub-basins

Figure 5 shows the comparison of MLP and SWAT simulated streamflow across Gilgit and Hunza sub-basins. Figure 5a shows that SWAT underestimated the streamflow across the Gilgit sub-basin while MLP has accurately captured the streamflow. A contrasting trend is observed for the Hunza sub-basin, where the SWAT model overestimated while MLP underestimated the streamflow. Maximum over/underestimation is observed during the monsoon season having peak flows. The network architecture (shown in Table 6), for example, 20–5–1 in Gilgit sub-basin, represents that there are 20 neurons in the input layer (one neuron per input), five neurons in one hidden layer, and one neuron in the output layer (one neuron per output).

The comparison of SWAT with MLP against the observed flow (considered as a reference) in the validation period is presented with the Tylor diagram (shown in Fig. 6). The results illustrate high performance of the MLP model (minimum standard deviation and maximum correlation coefficient values) across all sub-basins in UIB. The performance of both the models also varies with the area of sub-basins and magnitude of streamflow, i.e., relatively poor performance across Indus River at Shatial bridge and Bisham Qila sub-basins while significantly higher across the remaining sub-basins. Overall, the results demonstrate relatively accurate streamflow estimates across the UIB with a correlation coefficient greater than 0.90 (while 0.85 to 0.90 for Indus River at Shatial bridge and Bisham Qila sub-basins). Moreover, the standard deviation ranges from 169 to 3132 for SWAT and 138 to 2908 for the MLP model.

Fig. 6
figure 6

Performance comparison of SWAT and MLP models with Tylor diagram across all sub-basins

Discussion

It is an extremely arduous task to perform hydrological modeling (e.g., streamflow simulation) in poorly gauged basins characterized by complex topography and permanent glacier cover. Accurate hydrological modeling often requires a dense network of in-situ gauges or stations that can provide in-situ observations with high quality as input to hydrological models. However, the in-situ gauges/stations are sparsely distributed, especially in developing countries like Pakistan (Rahman et al. 2019), particularly across the complex topographic and diverse climatic regions of UIB. There are several factors associated with the relatively poor performance of different hydrological models in the glacial regions; including the unavailability of enough in-situ observations for calibration and validation of hydrological models (Rahman et al. 2020a), complex topography, the seasonal impact of snow, and glaciers on streamflow and river discharge (Tuo et al. 2018), and climate change (Huss and Hock, 2018; Lettenmaier et al. 2015).

Streamflow simulation in the glacial basin is extremely difficult and important from the perspective of effective water management, and it is considered an unsolved problem, which deserves the attention of hydrologists. Numerous studies have appraised the performance of different hydrological models to simulate streamflow in glacial basins (Chen et al. 2019; Shrestha et al. 2016; Sleziak et al. 2020; Wang et al. 2019; Wortmann et al. 2019). Some studies have employed the SWAT model in basins characterized by glacier and snow cover (Bhatta et al. 2019; Debele et al. 2010; Garee et al. 2017; Grusson et al. 2015; Khan et al. 2018; Luo et al. 2013; Rahman et al. 2013; Shah et al. 2020; Shahid et al. 2021; Troin and Caya, 2014; Tuo et al. 2018). However, these studies reported that the performance of hydrological models varies from satisfactory to good, which is subjected to several factors.

The relatively poor performance of the physical or distributed hydrological model across different basins has shifted the paradigm towards data-driven approaches. In the recent few decades, the application of machine learning approaches, e.g., ANN, ANFIS, and SVM, etc., have been significantly increased due to their high accuracy and robustness. Based on the results obtained, it was found that machine learning models (such as the MLP structure of ANN used in the current study) are time and computationally efficient, which do not require extensive investigations and have no restrictions for the type and number of input selection. To the best of our knowledge, very few studies extensively evaluated the performance of machine learning-based models in glacial regions like UIB. In the current study, the performance SWAT model is comprehensively compared with MLP (ANN-based data-driven model) to simulate streamflow in UIB.

Table 5 and Fig. 4 illustrated that set of parameters obtained through the calibrating and validating SWAT model has a strong influence on streamflow simulation. Sensitivity analyses in hydrological modeling demonstrate the share of each individual parameter in the propagation of uncertainties in model output. Hence, highly sensitive parameters will result in high shares of model uncertainty compared with less sensitive parameters. Therefore, sensitivity analysis is the first step that must be performed in model calibration. The regionalization procedure (one basin at a time) adopted in the current research helped in selecting sub-basin specific SWAT parameters, which resulted in good to very good (good to reasonable) performance in streamflow simulation evaluated using NSE, PBIAS, and R2 (MAPE). On the other hand, the MLP model depicted better performance during both calibration and validation periods as compared to SWAT by matching the observed streamflow, peak flows and presented better statistical indices (NSE > 0.90, PBIAS < 1, R2 > 0.90, and MAPE < 10%) across all sub-basin except for Shatial bridge and Bisham Qila (shown in Table 6).

The relatively poor performance of the SWAT model as compared with the MLP model might be attributed to several factors: (a) SWAT model may suffer from the identification of the most sensitive parameters (Cibin et al. 2010; Shen et al. 2008), (b) selected set of snow specific parameters for each sub-basin could not fit the snow conditions of each sub-basin, (c) the identified sensitive snow specific parameters might also be influenced by different sources of uncertainties, (d) the potential limitation of SCS-CN method used in SWAT model to simulate streamflow, which produces relatively poor results when there is significant proportion of impermeable land surface as in the case of UIB (Tasdighi et al. 2018), (e) discharge is accumulating from each sub-basin to the Indus River (Shatial bridge and Bisham Qila) and hydropeaking of discharge cannot be accurately reproduced by SWAT model. Therefore, the performance of the SWAT model in simulating streamflow is relatively poor compared with the MLP model.

During the analyses, few limitations associated with MLP were observed that cannot be ignored. These limitations include the transformation of MLP from input to output can be affected by the techniques having no physical bases. Since MLP is a lumped approach, there may be errors in averaging the sub-catchment parameters. Moreover, empirical models like MLP cannot make spatial predictions within the watershed for processes (e.g., runoff generation, soil moisture, or nutrient export). On the other hand, process-based models like the SWAT model requires many kinds of input data (e.g., elevation, land use/land cover, soil type, drainage, geology, climate data, etc.). However, some of these data are not fully available in many regions because of inevitable problems such as poor distributions observed stations, social and political issues, and restricted data sharing among countries in transboundary basins. In this case, machine learning models like MLP is useful as it is not a data-intensive approach. This study shows that with significantly less modeling effort and resources, still the performance of MLP is better than that of the SWAT model.

Conclusions

Streamflow simulation is extremely important in snow and glacier-dominated Upper Indus Basin (UIB), Pakistan, which serves as a water tower for domestic, agriculture, and industrial use downstream of the Indus River. In this study, streamflow across UIB was simulated using the SWAT model, and its performance was compared with the machine learning-based MLP (Multi-Layered Perceptron) model. The main findings of the current study are listed below:

  1. 1.

    Evaluation with multiple statistical indicators showed SWAT model performed reasonably well in simulating daily streamflow across different sub-basins of UIB, with model performance ranging from “good” to “very good”. However, the performance of SWAT is relatively poor as compared with the MLP model.

  2. 2.

    MLP model captured the streamflow dynamics and peak flows with extremely high accuracy. Evaluation with multiple statistical indicators showed that MLP performed better than SWAT and yielded very good and high accurate streamflow simulation with NSE > 0.90, PBIAS < 1%, R2 > 0.90, and MAPE < 10% for all the six sub-basins of UIB.

  3. 3.

    The comparatively poor performance of the SWAT model might be associated with several factors, including issues in the identification of sensitive parameters, selected snow parameters that might not fit the snow conditions in sub-basins, and the potential limitation of the SCS-CN method employed to simulate streamflow.

  4. 4.

    The poor performance of the SWAT model in Shatial bridge and Bisham Qila is due to the large size of the sub-basin and accumulation of sub-basin discharges to the Indus River resulting in hydropeaking, which cannot be accurately captured by hydrological models.

  5. 5.

    The results demonstrated that the development of a local hydrological model, e.g., MLP, might suit better in simulating streamflow, which considers the sub-basin specific characteristics.

Keeping in view the high performance and robustness of machine learning-based models, this study recommends the development and evaluation of further machine learning models across UIB. Moreover, in view of the advantages and disadvantages of SWAT and machine learning models, the hybrid models are expected to improve streamflow simulation and our understanding of the hydrological processes in snow-glacier-dominated regions.