Research papers
Long-term probabilistic streamflow forecast model with “inputs–structure–parameters” hierarchical optimization framework

https://doi.org/10.1016/j.jhydrol.2023.129736Get rights and content

Highlights

  • Identified driving predictors for streamflow forecasts using information entropy.

  • Introduced LSTM and GARCH models to precisely refine forecast equation and errors characterization.

  • Forecast model optimized hierarchically for accuracy, reliability, and stability.

Abstract

Accurate, reliable, and stable streamflow forecasts are essential for risk assessment and decision making in water resources management. Owing to the limited value of deterministic forecasting, probabilistic forecasting incorporating probability and confidence intervals is achieved by optimizing the predictors and forecast equation while identifying forecast error features. The combination of deterministic forecasts and error identification can facilitate water resources management. This study improved the traditional forecast model’s inputs selection, structure designation, and error parameters calibration to propose a long-term probabilistic streamflow forecast model with hierarchical optimization of the predictors, forecast equation, and errors characteristics. The model develops information entropy theory to screen the driving predictor set, the long short-term memory (LSTM) model to conduct deterministic forecasts, and the generalized autoregressive conditional heteroskedasticity (GARCH) model to identify time-varying errors. The proposed model, used in some case studies to forecast the monthly streamflow of two lakes, Hongze and Luoma lakes in China, was evaluated using the multidimensional index method considering “accuracy–reliability–stability” performance. The results revealed the following: (1) Predictor screening based on information entropy investigates the statistical characteristics of predictors, thereby improving the reliability of streamflow forecasts. (2) The LSTM model exploits the response between the driving predictors and streamflow, whereas GARCH model identifies the time-varying features of forecast errors effectively, which reduces the probability of forecast failure and increases the accuracy and stability of probabilistic forecasts. (3) Under current meteorological observation conditions and forecasting capability, the proposed forecasts can extend the maximum forecast lead time from 1 month to 3 months. (4) The proposed model improves the accuracy (root mean square error: 6.7%–34.8%), reliability (Brier score: 15.3%–27.9%), and stability (mistaken distance: 36.4–52.6%) in comparison with the benchmark of the two case studies, indicating that “inputs–structure–parameters” hierarchical optimization provides effective forecast information for water resources management.

Introduction

A long-term streamflow forecast, which often refers to a streamflow forecast with a lead time exceeding 1 month, can offer practical information for water resource planning and management (Yeh, 1985). Existing methods make it difficult to effectively simulate the streamflow process under the influence of complex runoff formulation, and deterministic streamflow forecasts cannot provide accurate and reliable information (Xu et al., 2019). Because reservoir regulation has a certain buffering effect on forecast error, a probabilistic streamflow forecast with quantitative probability and confidence intervals is proposed by integrating a deterministic forecast and errors with stochastic feature identification to facilitate reservoir regulation (Hashimoto et al., 1982, Krzysztofowicz, 1999).

Often, three basic metrics are used for the evaluation of a streamflow forecast: accuracy, reliability, and stability (Duan et al., 2007). Accuracy represents quantification of the error of deterministic forecasts, reliability defines the possibility of probabilistic forecasts covering the observed value, and stability evaluates the deviation when the forecast cannot cover the observed value under extreme conditions. Provision of a valuable streamflow forecast requires an accurate, reliable, and stable method (Ajami et al., 2008). These indexes are typically affected by the predictors, forecast equation, and forecast errors. The predictors and forecast equation represent the physical process of runoff generation (Fatichi et al., 2016). The former extract the major hydrological elements that affect natural streamflow via runoff generation or sequence decomposition, whereas the latter describes the relationship between these elements and streamflow through mathematical expression. The forecast error, which is generally a representation of the magnitude of deviation between the forecast and observed values, is a comprehensive term of the uncertainty in a streamflow forecast. Also, it is an important metric for evaluating the forecast quality of a forecast model (Benke et al., 2008, Sivakumar, 2000). The selection of different predictors and equations, which affects the deterministic forecast model and determines the statistical distribution of the errors, can impact the quality of probabilistic forecast performance. Therefore, probabilistic streamflow forecast can be enhanced by selecting appropriate predictors, modifying the model structure, and optimizing the error parameters (Shukla and Lettenmaier, 2011).

Selection of appropriate predictors is fundamental to ensuring forecast stability and enhancing model generalization (Liu and Gupta, 2007). Selection of predictors in previous research used correlation coefficients and principal component analysis (Fathian et al., 2016, Markonis et al., 2018). However, such methods can investigate only the linear correlation between predictors, which is unsuitable for capturing the complex nonlinear features of hydrological factors in long-term streamflow forecasts (Xie et al., 2022). Additionally, interactive effects between these factors make it difficult to identify combinational driving predictors. Ignoring such interactions could lead to redundancy, which would magnify the uncertainties affected by random perturbations in model inputs (Darbandsari and Coulibaly, 2020). Copula entropy (CE) in information theory, which links the copula function with information entropy, is similar to formulate the joint distribution of multivariate random variables that combines the marginal distribution and the conditional probability distribution (Chen et al., 2013). It is suitable for assessing the effect of interaction of numerous predictors and their combined effect on streamflow.

Determination of both the forecast structure and the forecast equation is beneficial to the improvement of forecast accuracy and reliability. Data-driven models can extract data features from historical streamflow series and establish the mapping function from appropriate predictors to streamflow, which remains the principal method for long-term streamflow forecasting (Gu et al., 2021, Zhao et al., 2021). Generally, models can be divided into the following five categories according to the passage of time: (1) models based on mathematical and statistical methods, e.g., regression analysis (Carlson et al., 1970), time series (Bender and Simonovic, 1994), and wavelet analysis (Lafrenière and Sharp, 2003); (2) models based on signal decomposition, e.g., empirical mode decomposition (Huang et al., 1998), complete ensemble empirical mode decomposition (Yeh et al., 2011), and variational mode decomposition (Wang et al., 2021); (3) models based on machine learning, e.g., support vector machines (Asefa et al., 2006), artificial neural networks (Hu et al., 2005), and chaos theory (Dhanya and Nagesh Kumar, 2011); (4) models based on deep learning, e.g., recursive neural networks (Kratzert et al., 2018), recurrent neural networks (Wang et al., 2021), and generative adversarial networks (Ma et al., 2022); and (5) integrated models based on optimization algorithms coupled with multiple models (Niu et al., 2021, Zhang et al., 2020). In recent years, with the rapid developments in high-performance computing and growth in artificial intelligence approaches, the application of deep learning models has made remarkable progress. In contrast to conventional machine learning and other techniques, this approach stresses the longitudinal depth of the network and might comprehend complex stochastic features from sufficient input by creating several nonlinear processing units (LeCun et al., 2015). Such methods can refine the system state with minimal model parameters while maintaining precise fitting (Wang et al., 2020, Xiang et al., 2020).

Identification of the stochastic characteristics of forecast errors enhances the reliability and stability of probabilistic forecasts. Numerous related studies have demonstrated that forecast errors do not satisfy the assumption of independence and temporal stationarity, but instead have time-varying characteristics influenced by the deviation perturbation and transmission of hydrometeorological processes in real-time forecasting (Liang et al., 2021, Xu, 2021). For example, climatological and hydrological variables influence the process of runoff generation and conservation, resulting in spatio-temporal correlation and a fat-tailed distribution of streamflow forecast uncertainties (Mo et al., 2021, Xu et al., 2022). As time proceeds, multistep extending forecasts tend to amplify such uncertainties, thereby reducing the value of the forecast. The generalized autoregressive conditional heteroskedasticity (GARCH) model is a stochastic model used in finance to describe the volatility of the ratio of earnings (Engle and Bollerslev, 1986). It is suitable for simulating time series with aggregated volatility, such as the systematically large errors of a streamflow forecast in consecutive periods attributable to failure in the forecast of precipitation. It can also be used to simulate nonstationary series with time-varying parameters and asymmetric volatility, and thus it has excellent application potential in the fields of hydrology and water resources management (Modarres and Ouarda, 2013, Nazeri-Tahroudi et al., 2022).

The structure of a streamflow forecast model can be optimized according to the hydrometeorological characteristics of the study area, including input preprocessing (predictor selection), structure adjustment (forecast equation modification), and output postprocessing (forecast error analysis), which together constitute hierarchical optimization. However, previous related studies typically concentrated on improving just one part of the model rather than considering overall optimization. This paper proposes a long-term probabilistic streamflow forecast model with “inputs–structure–parameters” hierarchical optimization. The driving predictors are screened based on physical cause analysis and information entropy, while the long short term memory (LSTM) model is used to produce deterministic streamflow forecasting. The GARCH model is used to identify the parameters of forecast errors to obtain probabilistic streamflow forecasting. Application of the proposed model to some case studies of the Hongze Lake basin and Luoma Lake (China) realized improved accuracy, dependability, and stability of streamflow forecasts.

Distinct from previous related work, the proposed model: (1) identified driving predictors for streamflow forecasts using information entropy, and copula entropy was introduced for capturing the interaction between these predictors, thereby improving the reliability of forecasts; (2) refined structure of streamflow forecasts via a deep learning model to characterize the mechanism of runoff generation and conservation on a long-term timescale, which enhanced forecast accuracy; (3) addressed forecast errors with clustering volatility and temporal dependency by the GARCH model, which described the stochastic characteristics of forecast errors precisely, maintaining certain stability of probabilistic streamflow forecasts; (4) optimized the designation of streamflow forecast model in inputs (predictors), structure (forecast equation), and outputs (forecast errors) factors, thus improving forecast performance and extending the lead time.

Section snippets

Methodology

The long-term streamflow forecast model can be expressed as follows:yt=f(Xt)+et=f(xt-1,,xt-τ)+etwhere yt is the streamflow for the current time t, Xt is the set of predictors, wherein xt-τ represent the predictors with interval τ periods from the current time, f represents the forecast equation, and et represents the forecast errors. The scenario of a probabilistic streamflow forecast can be proposed if the predictors, forecast equation, and errors with a particular statistical distribution

Case study

Hongze Lake and Luoma Lake (China) are selected as the study area for this research (Fig. 5). Hongze Lake, located in the downstream reaches of the Huai River, has a drainage area of 2596 km2, and it is the fourth-largest freshwater lake in China and the largest storage lake in the Huai River basin. Hongze Lake lies in the transition zone between subtropical and warm temperate climatic regions, and it has four distinct seasons and notable monsoons. The annual distributions of rainfall and

Conclusions

An accurate, reliable, and stable streamflow forecast is the basis and core of water resources planning and management, and it could provide effective decision-making information for water resources utilization. This study used “input–structure–parameter” hierarchical optimization to upgrade the predictors, forecast equation, and errors, respectively. First, the contribution of the predictors is evaluated based on information entropy theory, which directs the screening of the driving predictor

CRediT authorship contribution statement

Ran Mo: Methodology, Software, Funding acquisition. Bin Xu: Supervision, Methodology, Funding acquisition. Ping-an Zhong: Conceptualization. Yuanheng Dong: Formal analysis, Data curation. Han Wang: Investigation. Hao Yue: Investigation. Jian Zhu: Methodology. Huili Wang: Validation. Guoqing Wang: Conceptualization. Jianyun Zhang: Conceptualization.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This study is supported by the National Key Technologies R&D Program of China (Grant No. 2021YFC3000104), the National Natural Science Foundation of China (Grant Nos. 52121006, 41961124007), Postgraduate Research & Practice Innovation Program of Jiangsu Province (Grant No. KYCX23_0710). Data will be made available on contacting the first author.

References (54)

  • J. Ma et al.

    Mutual information is copula entropy

    Tsinghua Sci. Technol.

    (2011)
  • Y. Ma et al.

    Stochastic generation of runoff series for multiple reservoirs based on generative adversarial networks

    J. Hydrol.

    (2022)
  • Y. Markonis et al.

    Global estimation of long-term persistence in annual river runoff

    Adv. Water Resour.

    (2018)
  • R. Mo et al.

    Dynamic long-term streamflow probabilistic forecasting model for a multisite system considering real-time forecast updating through spatio-temporal dependent error correction

    J. Hydrol.

    (2021)
  • R. Modarres et al.

    Modeling rainfall–runoff relationship using multivariate GARCH model

    J. Hydrol.

    (2013)
  • W.-J. Niu et al.

    Parallel computing and swarm intelligence based artificial intelligence model for multi-step-ahead hydrological time series prediction

    Sustain. Cities Soc.

    (2021)
  • B. Sivakumar

    Chaos theory in hydrology: important issues and interpretations

    J. Hydrol.

    (2000)
  • Q. Wang et al.

    Sequence-based statistical downscaling and its application to hydrologic simulations based on machine learning and big data

    J. Hydrol.

    (2020)
  • J. Wu et al.

    Hydrological response to climate change and human activities: A case study of Taihu Basin, China

    Water Sci. Eng.

    (2020)
  • C.-Y. Xu

    Issues influencing accuracy of hydrological modeling in a changing environment

    Water Sci. Eng.

    (2021)
  • B. Xu et al.

    Identifying long-term effects of using hydropower to complement wind power uncertainty through stochastic programming

    Appl. Energy

    (2019)
  • J.-X. Zhao et al.

    Prediction of sediment resuspension in Lake Taihu using support vector regression considering cumulative effect of wind speed

    Water Sci. Eng.

    (2021)
  • N.K. Ajami et al.

    Sustainable water resource management under hydrological uncertainty

    Water Resour. Res.

    (2008)
  • M. Bender et al.

    Time-Series Modeling for Long-Range Stream-Flow Forecasting

    J. Water Resour. Plan. Manag.

    (1994)
  • R.F. Carlson et al.

    Application of Linear Random Models to Four Annual Streamflow Series

    Water Resour. Res.

    (1970)
  • L. Chen et al.

    Measure of Correlation between River Flows Using the Copula-Entropy Method

    J. Hydrol. Eng.

    (2013)
  • C.T. Dhanya et al.

    Predictive uncertainty of chaotic daily streamflow using ensemble wavelet networks approach

    Water Resour. Res.

    (2011)
  • Cited by (0)

    View full text