How certain are our uncertainty bounds? Accounting for sample variability in Monte Carlo-based uncertainty estimates
Introduction
It is common for simulations or predictions made using dynamical environmental system models (DESMs) to be presented in the form of “best” estimates, accompanied by 95% (or 90% or 99%, etc.) prediction intervals (PIs; see Fig. 1a). The PIs are typically indicated by reporting the positions of the 2.5% and 97.5% quantiles of the probability density function (PDF) characterizing the lack of precision (uncertainty) associated with the model-simulated value (Cho et al., 2016; Hassan et al., 2009; Hirsch et al., 2015; Inam et al., 2017; Roy et al, 2017, 2018; Yang, 2011). Throughout this paper, we use the term “prediction interval” to refer to prediction uncertainty, and the term “confidence interval” to represent uncertainty in some estimated quantity.
In many cases, the reported PIs are estimated via the use of Monte-Carlo sampling to approximate the form of the relevant PDF. Random samples are typically drawn from either the input space (Nikishova et al., 2017) or parameter space (Wagener and Kollat, 2007), or both, and used to generate an ensemble of model simulations from which the PIs are then calculated. The advantage of Monte-Carlo-based methods is that they help us to better understand model behaviors, sensitivities, and uncertainties (Wagener and Kollat, 2007). More detailed methods, also premised on Monte-Carlo sampling, have been proposed (Ajami et al., 2007; Stedinger et al., 2008; Yang et al., 2018).
Monte Carlo-based methods for estimating quantiles have been examined by the non-hydrology literature (e.g. Linnet, 2000; Sun and Lahiri, 2006; Bulter et al., 2017; and references therein), where it has been shown that if the cumulative distribution function (CDF) is differentiable and has a positive derivative at a population quantile, the sample quantile will be asymptotically normal. Further, for this case, the centered and scaled sample quantile , will converge to a standard normal distribution with zero mean and a variance of in the limiting case () where is the PDF (Sun and Lahiri, 2006); here is the sample quantile estimator, is the theoretical quantile, is the non-exceedance probability, and is the sample size.
One of the crucial considerations in Monte-Carlo sampling is the size of the sample. In Section 3, we show that for smaller sample sizes, these methods will typically underestimate the width of the PI due to unavoidable considerations of sampling variability. Through theoretical arguments supported by numerical experiments, we investigate and demonstrate the nature and severity of this problem, and its relationship to sample size. We also demonstrate how a better (more representative) estimate of the PI can be achieved by adjusting its width to account for the size of the sample used (Section 4). In Section 5, we briefly illustrate the application of this approach to streamflow uncertainty estimation via hydrological modeling of a catchment.
Section snippets
Computing the quantiles of a probability distribution function
Quantiles of a density function are points along the variable axis that divide the range of the PDF into contiguous intervals having equal probability mass (Fig. 1b). A quantile is defined as the value such that the cumulative density function (CDF) , meaning that z % of the total probability mass of lies to the left of on the variable axis (i.e., in the region for an unbounded distribution), where is the non-exceedance probability (NEP). So,
Uncertainty associated with sample-based estimates of the quantiles
It is important to note that since many possible equally likely sets can be randomly generated (i represents realizations) via Monte-Carlo sampling, there can be many possible estimates of the CDF, and hence, many possible corresponding estimates of the quantiles . In other words, any estimate of quantile is a statistic having its own sampling distribution , where the statistical properties of the distribution depend on both the form of the
Accounting for uncertainty in the estimated quantiles
Given that the PIs estimated using a single Monte-Carlo realization are subject to sampling variability, it would make sense to take this factor into account when reporting the precision associated with a model simulated variable (e.g., simulated streamflow). One way to acknowledge the imprecision in the estimate (a ‘hat’ is used to denote an estimate) of quantile is to compute the quantile associated with the quantile of interest (i.e., to take into consideration the sampling
Illustration of application to streamflow estimation via hydrological modeling
We illustrate the use of adjusted prediction intervals for the case of streamflow estimation using the conceptual catchment model HyMod (Boyle et al., 2000) and the Leaf River (Mississippi) data, which has been extensively used in several previous studies (e.g. Brazil and Hudlow, 1981; Gong et al., 2013; Moradkhani et al., 2005; Sorooshian et al., 1983). The model has five adjustable parameters, which can vary within ranges specified by the user. Given the hydrologic model generating the
Discussion
We have examined the effects of sampling variability on the estimates of prediction intervals computed for an uncertain quantity (e.g., streamflow), when the underlying theoretical PDF is not known, and is instead approximated via Monte-Carlo sampling. In particular, we have investigated the implications of sample sizes, such as those commonly used by modelers, that are not large enough to adequately represent (be properly informative about) the underlying form of the parent PDF.
Our analysis
Conclusions
The effects of sampling variability can significantly affect the estimation of prediction intervals, with significant implications to hydrologic applications, especially when using small Monte-Carlo sample sizes. In this study, we propose and demonstrate a method for adjusting the width of the prediction intervals to compensate for small sample sizes. The method is easy to implement and effectively accounts for the unavoidable effects of sampling variability. By proper adjustment of the
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
The second author acknowledges partial support by the Australian Centre of Excellence for Climate System Science (CE110001028). Data and codes used in this study are available upon request from the authors.
References (20)
- et al.
Modeling metal-sediment interaction processes: parameter sensitivity assessment and uncertainty analysis
Environ. Model. Software
(2016) - et al.
Using Markov chain Monte Carlo to quantify parameter uncertainty and its effect on predictions of a groundwater flow model
Environ. Model. Software
(2009) - et al.
A bootstrap method for estimating uncertainty of water quality trends
Environ. Model. Software
(2015) - et al.
Parameter estimation and uncertainty analysis of the spatial agro hydro salinity model (SAHYSMOD) in the semi-arid climate of rechna doab, Pakistan
Environ. Model. Software
(2017) - et al.
Dual state–parameter estimation of hydrological models using ensemble Kalman filter
Adv. Water Resour.
(2005) - et al.
Environmental Modelling & Software Uncertainty quanti fi cation and sensitivity analysis applied to the wind wave model SWAN
Environ. Model. Software
(2017) - et al.
Assessing hydrological impacts of short-term climate change in the Mara River basin of East Africa
J. Hydrol.
(2018) - et al.
Numerical and visual evaluation of hydrological and environmental models using the Monte Carlo analysis toolbox
Environ. Model. Software
(2007) Convergence and uncertainty analyses in Monte-Carlo based sensitivity analysis
Environ. Model. Software
(2011)- et al.
Environmental Modelling & Software Uncertainty analysis of a semi-distributed hydrologic model based on a Gaussian Process emulator
Environ. Model. Software
(2018)
Cited by (6)
Stochastic simulation and characteristic dependence analysis of urban short-duration rainstorms
2023, Journal of HydrologyInforming Stochastic Streamflow Generation by Large-Scale Climate Indices at Single and Multiple Sites
2021, Advances in Water ResourcesBenefit and risk evaluation of inland nuclear generation investment in Kazakhstan combined with an analytical MGT method
2022, Industrial Management and Data SystemsEvaluation of Uncertainty Propagation Predictions in River Water Quality Modeling
2021, Research Square