Abstract
Integrating full-process high resolution hydrodynamic simulations within early warning system (EWS) for marine flooding is hindered by the large computation time cost of such numerical models. This problem can be alleviated through the statistical analysis of pre-calculated simulation results to build a fast (low computation time cost) statistical predictive model (named metamodel). Despite the success of this approach, a direct application of such techniques for EWS is not straightforward in all settings, more particularly in environments where the stochastic character of waves has a significant effect on the induced flood, i.e., where overtopping is on a duration smaller than 500 times the offshore wave period. In such environments, the numerical simulator is not deterministic and provides statistical quantities of the flooding indicators. By focusing on the estimates of quantiles, the objective of the present study is to explore the applicability of random forest (RF) models for marine flooding prediction by providing two levels of information: (1) the quantile of interest via a quantile random forest regression model (qRF); (2) the flooding probability via a classification random forest (cRF). We use the macrotidal site of Gâvres (French Atlantic coast) as an application case for which ~ 2000 numerical simulations were performed (i.e. stochastic simulations given 100 different extreme-but-realistic offshore meteo-oceanic input conditions were repeated 20 times) to compute local and global flooding indicators (respectively the maximum water depth at the coast and the total volume of water entering the territory). Through an extensive repeated cross-validation procedure, we tune the qRF parameters leading to high coefficient of determination of ~ 90% for the quantiles at 25–50–75%, and we show that the qRF models outperform the commonly used Tobit regression model. The comparison with the numerical results on historical events shows very satisfactory prediction for events both leading to major flooding and to absence of impact. For low quantile level and minor-to-moderate flooding events, the second level provided by the cRF-derived flooding probability shows its added value by enabling the EWS user to nuance the qRF prediction and to tag some situations where the prediction remains unsure.
Similar content being viewed by others
References
Ardhuin F, Rogers WE, Babanin AV, Filipot J, Magne R, Roland A, Van der Westhuysen A, Queffeulou P, Lefevre J, Aouf L, Collard F (2010) Semiempirical dissipation source functions for ocean waves. Part I: definition, calibration, and validation. J Phys Oceanogr 40(1):917–941
Athey S, Tibshirani J, Wager S (2019) Generalized random forests. Ann Stat 47(2):1148–1178
Bertin X (2016) Storm surges and coastal flooding: status and challenges. La Houille Blanche - Revue internationale de l’eau, EDP Sciences 1(2):64–70
Bertin X, Prouteau E, Letetrel C (2013) A significant increase in wave height in the North Atlantic Ocean over the 20th century. Global Planet Change 106:77–83
Binois M, Huang J, Gramacy RB, Ludkovski M (2019) Replication or exploration? Sequential design for stochastic simulation experiments. Technometrics 61(1):7–23
Boudiere E, Maisondieu C, Ardhuin F, Accensi M, Pineau-Guillou L, Lepesqueur J (2013) A suitable metocean hindcast database for the design of Marine energy converters. Int J Mar Energy 3–4:e40–e52. https://doi.org/10.1016/j.ijome.2013.11.010
Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit 30(7):1145–1159
Breiman L (2001) Random forests. Mach Learn 45:5–32
Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. Chapman and Hall/CRC, New York
Camus P, Mendez FJ, Medina R, Cofiño AS (2011) Analysis of clustering and selection algorithms for the study of multivariate wave climate. Coast Eng 58(6):453–462
Carrere L, Lyard F, Cancet M, Guillot A, Picot N (2016) FES 2014, a new tidal model—validation results and perspectives for improvements, presentation to ESA Living Planet Conference, Prague
Charles E, Idier D, Thiébot J, Le Cozannet G, Pedreros R, Ardhuin F, Planton S (2012) Present wave climate in the Bay of Biscay: spatiotemporal variability and trends from 1958 to 2001. J Clim 25(6):2020–2039
Cheung KF, Phadke AC, Wei Y, Rojas R, Douyere YJM, Martino CD, Houston SH, Liu PLF, Lynett PJ, Dodd N, Liao S, Nakazaki E (2003) Modeling of storm-induced coastal flooding for emergency management. Ocean Eng 30(11):1353–1386
Coles SG, Tawn JA (1991) Modelling extreme multivariate events. J R Stat Soc Ser B Methodol 53(2):377–392
Coles S, Bawa J, Trenner L, Dorazio P (2001) An introduction to statistical modeling of extreme values. Springer, London
Compo GP, Whitaker JS, Sardeshmukh PD, Allan RJ, McColl C, Yin X, Giese BS, et al (2015) NOAA/CIRES Twentieth century global reanalysis version 2c. Research Data Archive at the National Center for Atmospheric Research, Computational and Information Systems Laboratory. https://doi.org/10.5065/D6N877TW
Conover WJ (1980) Practical nonparametric statistics. Wiley, New York
Dee DP, Balmaseda M, Balsamo G, Engelen R, Simmons AJ, Thepaut J-N (2014) Toward a consistent reanalysis of the climate system. Bull Am Meteor Soc 95:1235–1248
Demeritt D, Stephens EM, Créton-Cazanave L, Lutoff C, Ruin I, Nobert S (2019) Communicating and using ensemble flood forecasts in flood incident management: lessons from social science. In: Handbook of hydrometeorological ensemble forecasting, pp 1131–1160
Dodet G, Mélet A, Ardhuin F, Bertin X, Idier D, Almar R (2019) The contribution of wind-generated waves to coastal sea-level changes. Surv Geophys 40(6):1563–1601
Freeman EA, Moisen GG (2008) A comparison of the performance of threshold criteria for binary classification in terms of predicted prevalence and kappa. Ecol Model 217(1–2):48–58
Giloy N, Hamdi Y, Bardet L, Garnier E, Duluc CM (2018) Quantifying historic skew surges: an example for the Dunkirk Area, France. Nat Hazards 98(3):1–25
Gouldby B, Méndez FJ, Guanche Y, Rueda A, Mínguez R (2014) A methodology for deriving extreme nearshore sea conditions for structural design and flood risk analysis. Coast Eng 88:15–26
Hashemi MR, Spaulding ML, Shaw A, Farhadi H, Lewis M (2016) An efficient artificial intelligence model for prediction of tropical storm surge. Nat Hazards 82(1):471–491
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. Springer, New York
Heffernan JE, Tawn JA (2004) A conditional approach for multivariate extreme values (with discussion). J R Stat Soc Ser B Stat Methodol 66(3):497–546
Herman GR, Schumacher RS (2018) Money doesn’t grow on trees, but forecasts do: forecasting extreme precipitation with random forests. Mon Weather Rev 146:1571–1600
Hothorn T (2018) Top-down transformation choice. Stat Modell 18(3–4):274–298
Idier D, Rohmer J, Pedreros R, Le Roy S, Lambert J, Louisor J, Le Cozannet G, Le Cornec E (2020) Coastal flood: a composite method for past events characterisation providing insights in past, present and future hazards. Joining historical, statistical and modeling approaches. Nat. Hazards (in press)
Janon A, Nodet M, Prieur C (2014) Uncertainties assessment in global sensitivity indices estimation from metamodels. Int J Uncertain Quantif 4(1):21–36
Khan N, Shahid S, Juneng L, Ahmed K, Ismail T, Nawaz N (2019) Prediction of heat waves in Pakistan using quantile regression forests. Atmos Res 221:1–11
Labopin-Richard T, Gamboa F, Garivier A, Stenger J (2019) Conditional quantile sequential estimation for stochastic codes. J Stat Theory Pract 13(4):55
Le Cornec E, Le Bris E, Van Lierde M (2012) Atlas des risques littoraux sur le departement du Morbihan. Phase 1 : Recensement et consequences des tempetes et coups de vent majeurs. GEOS-AEL and DHI report (in French), p 476
Le Roy S, Pedreros R, Andre C, Paris F, Lecacheux S, Marche F, Vinchon C (2015) Coastal flooding of urban areas by overtopping: dynamic modelling application to the Johanna storm (2008) in Gâvres (France). Nat Hazards Earth Syst Sci 15:2497–2510
Lumbroso DM, Suckall NR, Nicholls RJ, White KD (2017) Enhancing resilience to coastal flooding from severe storms in the USA: international lessons. Nat Hazards Earth Syst Sci 17:1357–1373
Malley JD, Kruppa J, Dasgupta A, Malley KG, Ziegler A (2012) Probability machines. Methods Inf Med 51(01):74–81
Marrel A, Iooss B, Laurent B, Roustant O (2009) Calculations of sobol indices for the gaussian process metamodel. Reliab Eng Syst Saf 94(3):742–751
Marrel A, Iooss B, Da Veiga S, Ribatet M (2012) Global sensitivity analysis of stochastic computer models with joint metamodels. Stat Comput 22(3):833–847
Meinshausen N (2006) Quantile regression forests. J Mach Learn Res 7:983–999
Minasny B, McBratney AB (2006) A conditioned Latin hypercube method for sampling in the presence of ancillary information. Comput Geosci 32(9):1378–1388
Muller H, Pineau-Guillou L, Idier D, Ardhuin F (2014) Atmospheric storm surge modeling along the French (Atlantic and English Channel). Ocean Dyn 64(11):1671–1692
Nutt WT, Wallis GB (2004) Evaluation of nuclear safety from the outputs of computer codes in the presence of uncertainties. Reliab Eng Syst Saf 83(1):57–77
Park S, Kwon J, Kim J, Oh HS (2018) Prediction of extremal precipitation by quantile regression forests: from SNU Multiscale Team. Extremes 21(3):463–476
Perrin G, Durantin C (2019) Taking into account input uncertainties in the Bayesian calibration of time-consuming simulators. Journal de la Société Française de Statistique 160(2):24–46
Poelhekke L, Jäger WS, van Dongeren A, Plomaritis TA, McCall R, Ferreira Ó (2016) Predicting coastal hazards for sandy coasts with a Bayesian Network. Coast Eng 118:21–34
Probst P, Boulesteix AL (2017) To tune or not to tune the number of trees in random forest. J Mach Lear Res 18:181–191
Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez J-C, Müller M (2011) pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinform 12(77):1–8
Rohmer J, Rousseau M, Lemoine A, Pedreros R, Lambert J, Benki A (2018a) Source characterisation by mixing long-running tsunami wave numerical simulations and historical observations within a metamodel-aided ABC setting. Stoch Env Res Risk Assess 32(4):967–984
Rohmer J, Idier D, Paris F, Pedreros R, Louisor J (2018b) Casting light on forcing and breaching scenarios that lead to marine inundation: combining numerical simulations with a random-forest classification approach. Environ Modell Softw 104:64–80
Romano A, Bellotti G, Briganti R, Franco L (2015) Uncertainties in the physical modelling of the wave overtopping over a rubble mound breakwater: The role of the seeding number and of the test duration. Coast Eng 103:15–21
Roy PT, El Moçayd N, Ricci S, Jouhaud JC, Goutal N, De Lozzo M, Rochoux MC (2018) Comparison of polynomial chaos and Gaussian process surrogates for uncertainty quantification and correlation estimation of spatially distributed open-channel steady flows. Stoch Env Res Risk Assess 32(6):1723–1741
Stansby P, Chini N, Apsley D, Borthwick A, Bricheno L, Horrillo-Caraballo J, McCabe M, Reeve D, Rogers BD, Saulter A, Scott A, Wilson C, Wolf J, Yan K (2013) An integrated model system for coastal flood prediction with a case history for Walcott, UK, on 9 November 2007. J Flood Risk Manag 6(3):229–252
Stauffer R, Mayr GJ, Messner JW, Umlauf N, Zeileis A (2017) Spatio-temporal precipitation climatology over complex terrain using a censored additive regression model. Int J Climatol 37(7):3264–3275
Taillardat M, Fougères AL, Naveau P, Mestre O (2019) Forest-based and semiparametric methods for the postprocessing of rainfall ensemble forecasting. Weather Forecast 34(3):617–634
Tobin J (1958) Estimation of relationships for limited dependent variables. Econometrica 26(1):24–36
Torossian, L., Picheny, V., Faivre, R., & Garivier, A. (2020). A review on quantile regression for stochastic computer experiments. Reliability Engineering & System Safety, 106858, in press. https://doi.org/10.1016/j.ress.2020.106858
Tran VN, Kim J (2019) Quantification of predictive uncertainty with a metamodel: toward more efficient hydrologic simulations. Stoch Env Res Risk Assess 33(7):1453–1476
Van der Meer JW, Allsop NWH, Bruce T, De Rouck J, Kortenhaus A, Pullen T, Schüttrumpf H, Troch P, Zanuttigh B, EurOtop (2016) Manual on wave overtopping of sea defences and related structures: an overtopping manual largely based on European research, but for worldwide application. www.overtopping-manual.com. Accessed Dec 2019
Verhaeghe H, De Rouck J, van der Meer J (2008) Combined classifier–quantifier model: a 2-phases neural model for prediction of wave overtopping at coastal structures. Coast Eng 55(5):357–374
Wager S, Hastie T, Efron B (2014) Confidence intervals for random forests: the jackknife and the infinitesimal jackknife. J Mach Learn Res 15(1):1625–1651
Willett P (1999) Dissimilarity-based algorithms for selecting structurally diverse sets of compounds. J Comput Biol 6(3–4):447–457
Wright MN, Ziegler A (2017) Ranger: a fast implementation of random forests for high dimensional data in C++ and R. J Stat Softw 77(1):1–17
Zamo M, Mestre O, Arbogast P, Pannekoucke O (2014) A benchmark of statistical regression methods for short-term forecasting of photovoltaic electricity production. Part I: deterministic forecast of hourly production. Sol Energy 105:792–803
Zanuttigh B, Formentin SM, van der Meer JW (2016) Prediction of extreme and tolerable wave overtopping discharges through an advanced neural network. Ocean Eng 127:7–22
Zhou P, Li Z, Snowling S, Baetz BW, Na D, Boyd G (2019) A random forest model for inflow prediction at wastewater treatment plants. Stoch Env Res Risk Assess 33(10):1781–1792
Zijlema M, Stelling GS (2008) Efficient computation of surf zone waves using the nonlinear shallow water equations with non-hydrostatic pressure. Coast Eng 55:780–790
Acknowledgements
The authors thank the ANR for its financial support to the RISCOPE project (ANR-16-CE04-0011). The following data providers are acknowledged: LEGOS, NOAA, LOPS-IFREMER. We also grateful to the consortium project of RISCOPE for fruitful discussions (A. Aurouet-Geohyd, F. Bachoc-IMT, F. Gamboa-IMT, T. Klein-IMT/Enac, J. Betancourt-IMT, S. Le Roy-BRGM). Random Forest models were fitted with the R package ranger (Wright and Ziegler 2017), and the ROC curves were analysed with the R package pROC (Robin et al. 2011). We thank both anonymous reviewers for their constructive comments which led to the improvement of the present work.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Sources of data for the coastal flooding application case
See Table 4.
Appendix B: Extreme value analysis
The procedure holds as follows. First, two classes of offshore conditions are considered: ‘amplitude’ variables X = (tswl, hs, u), which can take very large values, and covariates Xc = (tp, dp, du), which are dependent on the values of the ‘amplitude’ variables. Considering ‘amplitude’ variables, a multivariate extreme value analysis is conducted to extrapolate their joint probability density to extreme values by taking into account the dependence structure.
This step holds as follows:
-
The marginals of each ‘amplitude’ variable are first analysed and modelled by the combination of the empirical distribution of the Xi values, below a suitable high threshold, with the Generalised Pareto distribution (GPD), above the threshold (Coles and Tawn 1991). The threshold is selected by a combination of methods, namely visual inspection of quantile–quantile graphs, “mean residual life plots”, “modified scale and shape parameters plots” (Coles et al. 2001);
-
The dependence structure of the ‘amplitude’ variables (transformed into common standard Gumbel margins) is modelled by following the approach by Heffernan and Tawn (2004) as described below.
-
Once fitted, a Monte Carlo simulation procedure is used to randomly generate realizations of the ‘amplitude’ variables.
-
Based on the generated dataset, the covariates are generated by using the empirical distribution conditional on the values of the ‘amplitude’ variables.
Consider Xi=1,2,3 the different ‘amplitude’ variables (tswl, hs, u). The variables Xi are first transformed into common standard Gumbel margins. Let use denote Y-i the vector of all transformed variables except for the ith variable Yi, when Yi is large. A non-linear regression model is set up as follows:
with \(Y_{i} > \upsilon\) (i.e. Yi having large values), a and b are parameters vectors (one value per parameter for each pair of variables), ν a threshold to be defined and W a vector of residuals. The model is adjusted using the maximum likelihood method based on the assumption that the residuals W are Gaussian with a mean and variance to be calculated. For our case study, the threshold selected for ν in Eq. (6) was set up at 0.85 (expressed as a probability of non-exceedance) using the diagnostic tools described in Heffernan and Tawn (2004).
Appendix C: Historical events
Table 5 presents the offshore meteo-oceanic conditions associated to 21 historical events that were selected with respect to past damage data via historical data, past hydro-meteorological data and numerical hydrodynamic simulations (Idier et al. 2020).
Appendix D: Censored regression (Tobit)
The Tobit model (Tobin 1958) is based on the concept of censoring, which means that that the response y itself is limited to a certain threshold (or a range), and cannot be observed outside these limits. We assume that there is a latent unobservable process driving the response, which can be described by suitable covariates. In the present study, we focus on zero left-censored normal distribution that is defined as follows:
where y* is the unobservable ‘latent’ response that is assumed to follow a normal distribution (denoted N), given the parameters location μ and scale σ. The ‘observable’ response y is simply the maximum of the latent response and the censoring point.
The density probability function \(f_{{{\text{cens}}}}\) of y is:
where f and F are respectively the probability density function and the cumulative density function of the Normal distribution.
The parameters location μ and scale σ are assumed to follow a regression model as follows:
where x are the covariates (the offshore meteo-oceanic conditions in the case of our study), and \(\beta ,\gamma\) are the regression coefficients that can be estimated by maximizing the sum over the dataset of the log-likelihood function log(\(f_{{{\text{cens}}}} \left( {y{|}\mu ,\sigma ,0} \right)\)).
Rights and permissions
About this article
Cite this article
Rohmer, J., Idier, D. & Pedreros, R. A nuanced quantile random forest approach for fast prediction of a stochastic marine flooding simulator applied to a macrotidal coastal site. Stoch Environ Res Risk Assess 34, 867–890 (2020). https://doi.org/10.1007/s00477-020-01803-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00477-020-01803-2