Abstract
Numerical weather predictions (NWPs) are systematically subject to errors due to the deterministic solutions used by numerical models to simulate the atmosphere. Statistical postprocessing techniques are widely used nowadays for NWP calibration. However, time-varying bias is usually not accommodated by such models. The calibration performance is also sensitive to the temporal window used for training. This paper proposes space–time models that extend the main statistical postprocessing approaches to calibrate NWP model outputs. Trans-Gaussian random fields are considered to account for meteorological variables with asymmetric behavior. Data augmentation is used to account for the censoring of the response variable. The benefits of the proposed extensions are illustrated through the calibration of hourly 10-m height wind speed forecasts in Southeastern Brazil coming from the Eta model.
Similar content being viewed by others
References
Ailliot P, Monbet V, Prevosto M (2006) An autoregressive model with time-varying coefficients for wind fields. Environmetrics 17(2):107–117
Ailliot P, Thompson C, Thomson P (2009) Space-time modelling of precipitation by using a hidden Markov model and censored Gaussian distributions. J R Stat Soc Ser C 58(3):405–426
Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19(6):716–723
Allard D, Bourotte M (2015) Disaggregating daily precipitations into hourly values with a transformed censored latent Gaussian process. Stochast Environ Res Risk Assess 29(2):453–462
Allcroft DJ, Glasbey CA (2003) A latent Gaussian Markov random-field model for spatiotemporal rainfall disaggregation. J R Stat Soc Ser C 52(4):487–498
Amarante OD, Silva F, Andrade P (2010) Atlas Eólico: Minas Gerais. CEMIG, Belo Horizonte
Anderson JL (1996) A method for producing and evaluating probabilistic forecasts from ensemble model integrations. J Clim 9(7):1518–1530
Bardossy A, Plate EJ (1992) Space-time model for daily rainfall using atmospheric circulation patterns. Water Res Res 28(5):1247–1259
Berrocal VJ, Raftery AE, Gneiting T (2007) Combining spatial statistical and ensemble information in probabilistic weather forecasts. Month Weather Rev 135(4):1386–1402
Black TL (1994) The new NMC mesoscale Eta model: description and forecast examples. Weather Forecast 9(2):265–278
Box GE, Cox DR (1964) An analysis of transformations. J R Stat Soc Ser B 211–252
Carter CK, Kohn R (1994) On Gibbs sampling for state space models. Biometrika 81(3):541–553
Chou S, Souza Cd, Gomes JL, Evangelista E, Osório C, Cataldi M (2007) Refinamento estatístico das previsões horárias de temperatura a 2 m do modelo Eta em estações do Nordeste do Brasil. Rev Bras Meteorol 22(3):287–296
Cressie N (1993) Statistics for spatial data. Wiley, New York
De Oliveira V, Kedem B, Short DA (1997) Bayesian prediction of transformed Gaussian random fields. J Am Stat Assoc 92(440):1422–1433
Diggle P, Ribeiro P, Geostatistics M-b (2007) Springer series in statistics
Eddelbuettel D, François R, Allaire J, Ushey K, Kou Q, Russel N, Chambers J, Bates D (2011) Rcpp: seamless R and C++ integration. J Stat Softw 40(8):1–18
Epstein ES (1969) Stochastic dynamic prediction. Tellus 21(6):739–759
Feldmann K, Scheuerer M, Thorarinsdottir TL (2015) Spatial postprocessing of ensemble forecasts for temperature using nonhomogeneous Gaussian regression. Month Weather Rev 143(3):955–971
Frühwirth-Schnatter S (1994) Data augmentation and dynamic linear models. J Time Series Anal 15(2):183–202
Fuentes M, Chen L, Davis JM (2008) A class of nonseparable and nonstationary spatial temporal covariance functions. Environmetrics 19:487–507
Gel Y, Raftery AE, Gneiting T (2004) Calibrated probabilistic mesoscale weather field forecasting: the geostatistical output perturbation method. J Am Stat Assoc 99(467):575–583
Gelfand AE (1996) Model determination using sampling-based methods. Markov chain Monte Carlo in practice 145–161
Gelman A, Rubin DB et al (1992) Inference from iterative simulation using multiple sequences. Stat Sci 7(4):457–472
Genton MG, Zhang H (2012) Identifiability problems in some non-Gaussian spatial random fields. Chilean J Stat 3(2):171–179
Glahn HR, Lowry DA (1972) The use of model output statistics (MOS) in objective weather forecasting. J Appl Meteorol 11(8):1203–1211
Gneiting T (2014) Calibration of medium-range weather forecasts. European Centre for Medium-Range Weather Forecasts, Reading
Gneiting T, Raftery AE (2007) Strictly proper scoring rules, prediction, and estimation. J Am Stat Assoc 102(477):359–378
Gneiting T, Raftery AE, Westveld AH III, Goldman T (2005) Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Month Weather Rev 133(5):1098–1118
Gneiting T, Schlather M (2004) Stochastic models that separate fractal dimension and the Hurst effect. SIAM Rev 46(2):269–282
Grimit EP, Mass CF (2007) Measuring the ensemble spread-error relationship with a probabilistic approach: stochastic ensemble results. Month Weather Rev 135(1):203–221
Guerrero VM (1993) Time-series analysis supported by power transformations. J Forecast 12(1):37–48
Hamill TM (2001) Interpretation of rank histograms for verifying ensemble forecasts. Month Weather Rev 129(3):550–560
Hansen FV (1993) Surface roughness lengths. Technical Report ARL-TR-61, U.S. Army Research Laboratory, p 45
Kalman RE (1960) A new approach to linear filtering and prediction problems. J Basic Eng 82(1):35–45
Krishnamurti T (1995) Numerical weather prediction. Ann Rev Fluid Mech 27(1):195–225
Lange M (2005) On the uncertainty of wind power predictions-analysis of the forecast accuracy and statistical distribution of errors. J Solar Energy Eng 127(2):177–184
Li W, Duan Q, Miao C, Ye A, Gong W, Di Z (2017) A review on statistical postprocessing methods for hydrometeorological ensemble forecasting. WIREs Water 4(e1246):1–24
Mao Q, McNider RT, Mueller SF, Juang H-MH (1999) An optimal model output calibration algorithm suitable for objective temperature forecasting. Weather Forecast 14(2):190–202
Matérn B (1986) Spatial variation, 2nd edn. Springer, New York
Matheson JE, Winkler RL (1976) Scoring rules for continuous probability distributions. Manage Sci 22(10):1087–1096
Mesinger F, Janjić ZI, Ničković S, Gavrilov D, Deaven DG (1988) The step-mountain coordinate: model description and performance for cases of Alpine lee cyclogenesis and for a case of an Appalachian redevelopment. Month Weather Rev 116(7):1493–1518
Monin AS, Obukhov AM (1954) Basic laws of turbulent mixing in the surface layer of the atmosphere. Contrib Geophys Inst Acad Sci USSR 151(163):e187
Piani C, Haerter J, Coppola E (2010) Statistical bias correction for daily precipitation in regional climate models over Europe. Theor Appl Climatol 99(1–2):187–192
R Core Team (2017) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org
Raftery A, Lewis S (1995) The number of iterations, convergence diagnostics and generic Metropolis algorithms. Practical Markov Chain Monte Carlo, 115–130
Raftery AE, Gneiting T, Balabdaoui F, Polakowski M (2005) Using Bayesian model averaging to calibrate forecast ensembles. Month Weather Rev 133(5):1155–1174
Saha S, Moorthi S, Wu X, Wang J, Nadiga S, Tripp P, Behringer D, Hou Y-T, Chuang H-Y, Iredell M et al (2014) The NCEP climate forecast system version 2. J Clim 27(6):2185–2208
Sanderson C, Curtin R (2016) Armadillo: a template-based C++ library for linear algebra. J Open Source Softw 1(2):26
Sansó B, Guenni L (1999) Venezuelan rainfall data analysed by using a Bayesian space-time model. J R Stat Soc Ser C 48(3):345–362
Sansó B, Guenni L (2004) A Bayesian approach to compare observed rainfall data to deterministic simulations. Environmetrics 15(6):597–612
Scheuerer M, Möller D et al (2015) Probabilistic wind speed forecasting on a grid based on ensemble model output statistics. Ann Appl Stat 9(3):1328–1349
Schuhen N, Thorarinsdottir TL, Gneiting T (2012) Ensemble model output statistics for wind vectors. Month Weather Rev 140(10):3204–3219
Sigrist F, Künsch HR, Stahel WA (2015) Stochastic partial differential equation based modelling of large space-time data sets. J R Stat Soc Ser B 77(1):3–33
Sigrist F, Künsch HR, Stahel WA et al (2012) A dynamic nonstationary spatio-temporal model for short term prediction of precipitation. Ann Appl Stat 6(4):1452–1477
Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A (2002) Bayesian measures of model complexity and fit. J R Stat Soc Ser B 64(4):583–639
Tanner MA, Wong WH (1987) The calculation of posterior distributions by data augmentation. J Am stat Assoc 82(398):528–540
Thorarinsdottir TL, Gneiting T (2010) Probabilistic forecasts of wind speed: ensemble model output statistics by using heteroscedastic censored regression. J R Stat Soc Ser A 173(2):371–388
Thorarinsdottir TL, Johnson MS (2012) Probabilistic wind gust forecasting using nonhomogeneous Gaussian regression. Month Weather Rev 140(3):889–897
Vannitsem S, Wilks D, Messner J (2018) Statistical postprocessing of ensemble forecasts. Elsevier, Amsterdam
Vihola M (2012) Robust adaptive metropolis algorithm with coerced acceptance rate. Stat Comput 22(5):997–1008
West M, Harrison P (1997) Bayesian forecasting and dynamic models, 2nd edn. Springer Verlag, New York
Whitaker JS, Loughe AF (1998) The relationship between ensemble spread and ensemble mean skill. Month Weather Rev 126(12):3292–3302
Willmott CJ (1981) On the validation of models. Phys Geogr 2(2):184–194
Wilson LJ, Vallée M (2002) The Canadian updateable model output statistics (UMOS) system: design and development tests. Weather Forecast 17(2):206–222
Zhang H (2004) Inconsistent estimation and asymptotically equal interpolations in model-based geostatistics. J Am Stat Assoc 99(465):250–261
Zhang H, El-Shaarawi A (2010) On spatial skew-Gaussian processes and applications. Environmetrics 21(1):33–47
Zhu X, Genton MG (2012) Short-term wind speed forecasting for power system operations. Int Stat Rev 80(1):2–23
Acknowledgements
We thank Dr. Chou Sin Chan (CPTEC/INPE) for providing the Eta model outputs used in this study and for her advice during the research. The first author also thanks the financial support from the partnership between Fundação de Amparo à Pesquisa do Estado de Minas Gerais (FAPEMIG) and Companhia Energética de Minas Gerais S.A. (CEMIG) under Project APQ-03813-12.
Author information
Authors and Affiliations
Corresponding author
Additional information
Handling Editor: Bryan F. J. Manly.
Appendices
Appendix A: Calculation of the roughness length
According to Hansen (1993), the vegetation present on a surface influences the aerodynamic roughness characteristics encountered by the mean wind flow over that surface, affecting both the mean wind speed and direction predicted by numerical models and various other atmospheric parameters. Therefore, the surface roughness length, \(z_0\), defined as the height at which the wind speed equals zero, has an important role in the modeling of atmospheric processes.
The aerodynamic roughness length parameter, \(z_0\), used as a covariate in the application, was estimated for each calibration site from key atmospheric variables, following some principles of the Monin-Obukhov similarity theory (Monin and Obukhov 1954). Particularly, assuming a logarithmic wind profile, the averaged wind speed \(\overline{u_i}\) at height \(z_i\) (10 m), given by:
was used to derive the roughness length \(z_0\) as
where k is the von Karman constant, \(u_{*}\) is the friction velocity, \(\Psi (\zeta _{i})\) is the stability correction function of the wind profile and \(\zeta _i=\frac{z_i}{L}\) is the dimensionless stability parameter given by the height above ground, \(z_i\), normalized by the Obukhov length, L.
Measurements of air temperature, air pressure, sensible heat flux, momentum flux, and wind stress can be used to derive the parameters \(u_{*}\), L and \(\Psi (\zeta _{i})\). In this application hourly air temperature and air pressure data were obtained from the available meteorological stations, while hourly reanalysis data from the CFSV2 model (Saha et al. 2014) were used as the source of heat and momentum fluxes, after interpolation from the CFSV2 regular grid to the calibration sites.
Hourly values of the roughness length, \(z_0\), were first obtained from (12) for the 2 years of available data. Then, median values of \(z_0\) by month and by hour within each month (288 values in total for each calibration site) were calculated, considering only those \(z_0\) values that were estimated during neutral conditions of atmospheric stability (such condition is achieved when \(|L|>500\)).
Appendix B: Robustness of prior distributions
In this section, we verify the robustness of prior distributions assigned to the parameters of the proposed postprocessing models applied in Sect. 4. Based on these prior distributions in simulated experiments, the posterior and prior distributions associated with the parameter vectors from DGOP and STEMOS are presented in Fig. 10. For clarity of exposition, only the results of a subset of common parameter \(\varvec{\theta }_0\) (vector with dimension r) are exhibited. Specifically, the intercept parameter at moment \(t=0\) represented by \(\theta _{0,0}\) is shown.
Even when using this specific set of prior distributions in a distinct application from that previously reported, Fig. 10 shows that the posterior distributions differ significantly from the prior distributions for all parameters. This is evidence that the gain of information is mostly provided by the data, ensuring that non-informative properties are preserved in general applications.
Appendix C: Sensitivity analysis of discount factors
This sensitivity analysis was performed to choose the best combination of discount factors (\(\tilde{\delta }_T\), \(\tilde{\delta }_S\), \(\tilde{\delta }_V)'\) for (\(\delta _{T}\), \(\delta _{S}\), \(\delta _{V})'\). In particular, we also consider a subset of 59 weather stations and a random selection of 5 days per season from the temporal range of the available dataset. Following the same model comparison criteria described in Sect. 4, Table 4 reports the sensitivity analysis of discount factors through average MAE, RMSE, d, IS, DIC, and LPML values for 24-h wind speed forecasts at 10-m height. The setups of the dynamic models differ only in the discount factor values. The results indicate greater sensitivity for \(\delta _T\), which significantly reduces the predictive performance when assuming lower values. In contrast, the variation of values for \(\delta _{S}\) and \(\delta _{V}\) only slightly alters the predictive performance. Thus, we determine (\(\tilde{\delta }_T\), \(\tilde{\delta }_S\), \(\tilde{\delta }_V)' = (0.99, 0.95, 0.99)'\).
Appendix D: Model comparison criteria
In this section, we briefly describe the model comparison criteria used to compare the prediction of the fitted models in Sect. 4. The first three criteria (RMSE, MAE and index of agreement) are appropriate to compare numerical predictions from the Eta model, which provides only deterministic estimates, with the proposed postprocessing models. The probabilistic forecasts are evaluated through IS, which takes into account the amplitude and coverage of the prediction intervals in a parsimonious way.
1.1 D.1 Mean absolute error and root-mean-square error
Standard measures of goodness of fit were also entertained in this study for comparison purposes. The root-mean-square error (RMSE) and the mean absolute error (MAE) are given by:
where \(\hat{y}_{t}(s_i)\) is obtained through a Monte Carlo estimate of the posterior mean of the predictive distribution, that is, \(E\left[ y_{t}(\mathbf{s}_i) \mid \mathbf{y}\right] \), across N draws. Smaller values of RMSE and MAE indicate better model fit.
1.2 D.2 Index of agreement
Willmott (1981) introduced a standard measure for assessing the quality of forecasts. The index of agreement (d) ranges between 0 (absence of agreement) and 1 (perfect agreement), and is given by:
where \(\bar{y} = \frac{1}{n} \sum _{i=1}^n\sum _{t=1}^{T} y_t(s_i)\).
1.3 D.3 Interval score
The interval score (IS, Gneiting and Raftery 2007) is a scoring rule for interval predictions considering the symmetric prediction interval with level \((1-\alpha )\times \)100%. The score is rewarded by accurate intervals and penalized when there is no coverage of the forecast. If actual values are contained in the prediction interval, this measure is reduced to the range amplitude. The average IS is given by:
where \(\hat{l}_t(s_i)\) and \(\hat{u}_t(s_i)\) are, respectively, the lower bound obtained by the \(\frac{\alpha }{2}\) quantile, and the upper bound, obtained by the \(1-\frac{\alpha }{2}\) quantile based on the predictive distribution. The indicator function is represented by \(\mathbb {1}\).
Smaller IS values indicate more efficient probabilistic forecasts.
1.4 D.4 DIC
Particularly useful in Bayesian model selection problems, the deviance information criterion (DIC, Spiegelhalter et al. 2002) is a hierarchical modeling generalization of the Akaike information criterion (AIC, Akaike 1974). This criterion is negatively oriented implying that models with smaller DIC should be preferred to models with larger DIC. The DIC is given by:
where \(\mathbf{y} \) is a vector of observed values and \(\Theta \) is the parameter vector. Thus, \(p(\mathbf{y} |\Theta )\) represents the likelihood function and \(p(\Theta | \mathbf{y} )\), the posterior distribution. Defined as a Bayesian measure of model complexity, \(\text {P}_\text {D}\) is given by:
with \(\tilde{\Theta }\) denoting the Bayes estimator.
A Monte Carlo approximation of DIC is given by:
where
with \({\Theta ^{(m)}}\) denoting the m-th MCMC sample of \(\Theta \) from posterior distribution \(p(\Theta |\mathbf{y} )\), \(m = 1, \dots , M\).
1.5 D.5 LPML
A component of the Bayes factor, the logarithm of the pseudo marginal likelihood (LPML, Gelfand 1996) is given by:
where \(\text {CPO}_i\) represents the conditional predictive ordinate (CPO) for location i and is given by:
with \(\mathbf{y} (s_{i})=\big ({y}_{1}(s_{i}), \ldots , {y}_{T}(s_{i})\big )'\) and \(\mathbf{y} (s_{-i})=\big (\mathbf{y} (s_{1}),\ldots ,\mathbf{y} (s_{i-1}), \mathbf{y} (s_{i+1}),\ldots , \mathbf{y} (s_{n})\big )'\), \(i = 1,\ldots ,n.\) Note that the CPO\(_i\) is based on the leave-one-out-cross-validation process and estimates the probability of \(\mathbf{y} (s_{i})\) given the observation of \(\mathbf{y} (s_{-i})\). In particular, a Monte Carlo approximation of CPO\(_i\) is given by:
with \({\Theta ^{(m)}}\) denoting the m-th MCMC sample of \(\Theta \) from posterior distribution \(p(\Theta |\mathbf{y} )\), \(m = 1, \ldots , M\).
Finally, the Monte Carlo approximation of LPML is given by:
The preferred model maximizes this criterion.
Appendix E: Supplementary results
Rights and permissions
About this article
Cite this article
Gomes, L.E.S., Fonseca, T.C.O., Gonçalves, K.C.M. et al. Space–time calibration of wind speed forecasts from regional climate models. Environ Ecol Stat 28, 631–665 (2021). https://doi.org/10.1007/s10651-021-00509-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10651-021-00509-0