Skip to main content

Advertisement

Log in

A nuanced quantile random forest approach for fast prediction of a stochastic marine flooding simulator applied to a macrotidal coastal site

  • Original Paper
  • Published:
Stochastic Environmental Research and Risk Assessment Aims and scope Submit manuscript

Abstract

Integrating full-process high resolution hydrodynamic simulations within early warning system (EWS) for marine flooding is hindered by the large computation time cost of such numerical models. This problem can be alleviated through the statistical analysis of pre-calculated simulation results to build a fast (low computation time cost) statistical predictive model (named metamodel). Despite the success of this approach, a direct application of such techniques for EWS is not straightforward in all settings, more particularly in environments where the stochastic character of waves has a significant effect on the induced flood, i.e., where overtopping is on a duration smaller than 500 times the offshore wave period. In such environments, the numerical simulator is not deterministic and provides statistical quantities of the flooding indicators. By focusing on the estimates of quantiles, the objective of the present study is to explore the applicability of random forest (RF) models for marine flooding prediction by providing two levels of information: (1) the quantile of interest via a quantile random forest regression model (qRF); (2) the flooding probability via a classification random forest (cRF). We use the macrotidal site of Gâvres (French Atlantic coast) as an application case for which ~ 2000 numerical simulations were performed (i.e. stochastic simulations given 100 different extreme-but-realistic offshore meteo-oceanic input conditions were repeated 20 times) to compute local and global flooding indicators (respectively the maximum water depth at the coast and the total volume of water entering the territory). Through an extensive repeated cross-validation procedure, we tune the qRF parameters leading to high coefficient of determination of ~ 90% for the quantiles at 25–50–75%, and we show that the qRF models outperform the commonly used Tobit regression model. The comparison with the numerical results on historical events shows very satisfactory prediction for events both leading to major flooding and to absence of impact. For low quantile level and minor-to-moderate flooding events, the second level provided by the cRF-derived flooding probability shows its added value by enabling the EWS user to nuance the qRF prediction and to tag some situations where the prediction remains unsure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

taken from google Earth(c)). The observation point P1 where the water depth is analysed is outlined by a grey-coloured dot

Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. https://en.wikipedia.org/wiki/Olympic-size_swimming_pool.

References

  • Ardhuin F, Rogers WE, Babanin AV, Filipot J, Magne R, Roland A, Van der Westhuysen A, Queffeulou P, Lefevre J, Aouf L, Collard F (2010) Semiempirical dissipation source functions for ocean waves. Part I: definition, calibration, and validation. J Phys Oceanogr 40(1):917–941

    Google Scholar 

  • Athey S, Tibshirani J, Wager S (2019) Generalized random forests. Ann Stat 47(2):1148–1178

    Article  Google Scholar 

  • Bertin X (2016) Storm surges and coastal flooding: status and challenges. La Houille Blanche - Revue internationale de l’eau, EDP Sciences 1(2):64–70

    Article  Google Scholar 

  • Bertin X, Prouteau E, Letetrel C (2013) A significant increase in wave height in the North Atlantic Ocean over the 20th century. Global Planet Change 106:77–83

    Article  Google Scholar 

  • Binois M, Huang J, Gramacy RB, Ludkovski M (2019) Replication or exploration? Sequential design for stochastic simulation experiments. Technometrics 61(1):7–23

    Article  Google Scholar 

  • Boudiere E, Maisondieu C, Ardhuin F, Accensi M, Pineau-Guillou L, Lepesqueur J (2013) A suitable metocean hindcast database for the design of Marine energy converters. Int J Mar Energy 3–4:e40–e52. https://doi.org/10.1016/j.ijome.2013.11.010

    Article  Google Scholar 

  • Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit 30(7):1145–1159

    Article  Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45:5–32

    Article  Google Scholar 

  • Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. Chapman and Hall/CRC, New York

    Google Scholar 

  • Camus P, Mendez FJ, Medina R, Cofiño AS (2011) Analysis of clustering and selection algorithms for the study of multivariate wave climate. Coast Eng 58(6):453–462

    Article  Google Scholar 

  • Carrere L, Lyard F, Cancet M, Guillot A, Picot N (2016) FES 2014, a new tidal model—validation results and perspectives for improvements, presentation to ESA Living Planet Conference, Prague

  • Charles E, Idier D, Thiébot J, Le Cozannet G, Pedreros R, Ardhuin F, Planton S (2012) Present wave climate in the Bay of Biscay: spatiotemporal variability and trends from 1958 to 2001. J Clim 25(6):2020–2039

    Article  Google Scholar 

  • Cheung KF, Phadke AC, Wei Y, Rojas R, Douyere YJM, Martino CD, Houston SH, Liu PLF, Lynett PJ, Dodd N, Liao S, Nakazaki E (2003) Modeling of storm-induced coastal flooding for emergency management. Ocean Eng 30(11):1353–1386

    Article  Google Scholar 

  • Coles SG, Tawn JA (1991) Modelling extreme multivariate events. J R Stat Soc Ser B Methodol 53(2):377–392

    Google Scholar 

  • Coles S, Bawa J, Trenner L, Dorazio P (2001) An introduction to statistical modeling of extreme values. Springer, London

    Book  Google Scholar 

  • Compo GP, Whitaker JS, Sardeshmukh PD, Allan RJ, McColl C, Yin X, Giese BS, et al (2015) NOAA/CIRES Twentieth century global reanalysis version 2c. Research Data Archive at the National Center for Atmospheric Research, Computational and Information Systems Laboratory. https://doi.org/10.5065/D6N877TW

  • Conover WJ (1980) Practical nonparametric statistics. Wiley, New York

    Google Scholar 

  • Dee DP, Balmaseda M, Balsamo G, Engelen R, Simmons AJ, Thepaut J-N (2014) Toward a consistent reanalysis of the climate system. Bull Am Meteor Soc 95:1235–1248

    Article  Google Scholar 

  • Demeritt D, Stephens EM, Créton-Cazanave L, Lutoff C, Ruin I, Nobert S (2019) Communicating and using ensemble flood forecasts in flood incident management: lessons from social science. In: Handbook of hydrometeorological ensemble forecasting, pp 1131–1160

    Chapter  Google Scholar 

  • Dodet G, Mélet A, Ardhuin F, Bertin X, Idier D, Almar R (2019) The contribution of wind-generated waves to coastal sea-level changes. Surv Geophys 40(6):1563–1601

    Article  Google Scholar 

  • Freeman EA, Moisen GG (2008) A comparison of the performance of threshold criteria for binary classification in terms of predicted prevalence and kappa. Ecol Model 217(1–2):48–58

    Article  Google Scholar 

  • Giloy N, Hamdi Y, Bardet L, Garnier E, Duluc CM (2018) Quantifying historic skew surges: an example for the Dunkirk Area, France. Nat Hazards 98(3):1–25

    Google Scholar 

  • Gouldby B, Méndez FJ, Guanche Y, Rueda A, Mínguez R (2014) A methodology for deriving extreme nearshore sea conditions for structural design and flood risk analysis. Coast Eng 88:15–26

    Article  Google Scholar 

  • Hashemi MR, Spaulding ML, Shaw A, Farhadi H, Lewis M (2016) An efficient artificial intelligence model for prediction of tropical storm surge. Nat Hazards 82(1):471–491

    Article  Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. Springer, New York

    Book  Google Scholar 

  • Heffernan JE, Tawn JA (2004) A conditional approach for multivariate extreme values (with discussion). J R Stat Soc Ser B Stat Methodol 66(3):497–546

    Article  Google Scholar 

  • Herman GR, Schumacher RS (2018) Money doesn’t grow on trees, but forecasts do: forecasting extreme precipitation with random forests. Mon Weather Rev 146:1571–1600

    Article  Google Scholar 

  • Hothorn T (2018) Top-down transformation choice. Stat Modell 18(3–4):274–298

    Article  Google Scholar 

  • Idier D, Rohmer J, Pedreros R, Le Roy S, Lambert J, Louisor J, Le Cozannet G, Le Cornec E (2020) Coastal flood: a composite method for past events characterisation providing insights in past, present and future hazards. Joining historical, statistical and modeling approaches. Nat. Hazards (in press)

    Article  Google Scholar 

  • Janon A, Nodet M, Prieur C (2014) Uncertainties assessment in global sensitivity indices estimation from metamodels. Int J Uncertain Quantif 4(1):21–36

    Article  Google Scholar 

  • Khan N, Shahid S, Juneng L, Ahmed K, Ismail T, Nawaz N (2019) Prediction of heat waves in Pakistan using quantile regression forests. Atmos Res 221:1–11

    Article  Google Scholar 

  • Labopin-Richard T, Gamboa F, Garivier A, Stenger J (2019) Conditional quantile sequential estimation for stochastic codes. J Stat Theory Pract 13(4):55

    Article  Google Scholar 

  • Le Cornec E, Le Bris E, Van Lierde M (2012) Atlas des risques littoraux sur le departement du Morbihan. Phase 1 : Recensement et consequences des tempetes et coups de vent majeurs. GEOS-AEL and DHI report (in French), p 476

  • Le Roy S, Pedreros R, Andre C, Paris F, Lecacheux S, Marche F, Vinchon C (2015) Coastal flooding of urban areas by overtopping: dynamic modelling application to the Johanna storm (2008) in Gâvres (France). Nat Hazards Earth Syst Sci 15:2497–2510

    Article  Google Scholar 

  • Lumbroso DM, Suckall NR, Nicholls RJ, White KD (2017) Enhancing resilience to coastal flooding from severe storms in the USA: international lessons. Nat Hazards Earth Syst Sci 17:1357–1373

    Article  Google Scholar 

  • Malley JD, Kruppa J, Dasgupta A, Malley KG, Ziegler A (2012) Probability machines. Methods Inf Med 51(01):74–81

    Article  CAS  Google Scholar 

  • Marrel A, Iooss B, Laurent B, Roustant O (2009) Calculations of sobol indices for the gaussian process metamodel. Reliab Eng Syst Saf 94(3):742–751

    Article  Google Scholar 

  • Marrel A, Iooss B, Da Veiga S, Ribatet M (2012) Global sensitivity analysis of stochastic computer models with joint metamodels. Stat Comput 22(3):833–847

    Article  Google Scholar 

  • Meinshausen N (2006) Quantile regression forests. J Mach Learn Res 7:983–999

    Google Scholar 

  • Minasny B, McBratney AB (2006) A conditioned Latin hypercube method for sampling in the presence of ancillary information. Comput Geosci 32(9):1378–1388

    Article  Google Scholar 

  • Muller H, Pineau-Guillou L, Idier D, Ardhuin F (2014) Atmospheric storm surge modeling along the French (Atlantic and English Channel). Ocean Dyn 64(11):1671–1692

    Article  Google Scholar 

  • Nutt WT, Wallis GB (2004) Evaluation of nuclear safety from the outputs of computer codes in the presence of uncertainties. Reliab Eng Syst Saf 83(1):57–77

    Article  Google Scholar 

  • Park S, Kwon J, Kim J, Oh HS (2018) Prediction of extremal precipitation by quantile regression forests: from SNU Multiscale Team. Extremes 21(3):463–476

    Article  Google Scholar 

  • Perrin G, Durantin C (2019) Taking into account input uncertainties in the Bayesian calibration of time-consuming simulators. Journal de la Société Française de Statistique 160(2):24–46

    Google Scholar 

  • Poelhekke L, Jäger WS, van Dongeren A, Plomaritis TA, McCall R, Ferreira Ó (2016) Predicting coastal hazards for sandy coasts with a Bayesian Network. Coast Eng 118:21–34

    Article  Google Scholar 

  • Probst P, Boulesteix AL (2017) To tune or not to tune the number of trees in random forest. J Mach Lear Res 18:181–191

    Google Scholar 

  • Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez J-C, Müller M (2011) pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinform 12(77):1–8

    Google Scholar 

  • Rohmer J, Rousseau M, Lemoine A, Pedreros R, Lambert J, Benki A (2018a) Source characterisation by mixing long-running tsunami wave numerical simulations and historical observations within a metamodel-aided ABC setting. Stoch Env Res Risk Assess 32(4):967–984

    Article  Google Scholar 

  • Rohmer J, Idier D, Paris F, Pedreros R, Louisor J (2018b) Casting light on forcing and breaching scenarios that lead to marine inundation: combining numerical simulations with a random-forest classification approach. Environ Modell Softw 104:64–80

    Article  Google Scholar 

  • Romano A, Bellotti G, Briganti R, Franco L (2015) Uncertainties in the physical modelling of the wave overtopping over a rubble mound breakwater: The role of the seeding number and of the test duration. Coast Eng 103:15–21

    Article  Google Scholar 

  • Roy PT, El Moçayd N, Ricci S, Jouhaud JC, Goutal N, De Lozzo M, Rochoux MC (2018) Comparison of polynomial chaos and Gaussian process surrogates for uncertainty quantification and correlation estimation of spatially distributed open-channel steady flows. Stoch Env Res Risk Assess 32(6):1723–1741

    Article  Google Scholar 

  • Stansby P, Chini N, Apsley D, Borthwick A, Bricheno L, Horrillo-Caraballo J, McCabe M, Reeve D, Rogers BD, Saulter A, Scott A, Wilson C, Wolf J, Yan K (2013) An integrated model system for coastal flood prediction with a case history for Walcott, UK, on 9 November 2007. J Flood Risk Manag 6(3):229–252

    Article  Google Scholar 

  • Stauffer R, Mayr GJ, Messner JW, Umlauf N, Zeileis A (2017) Spatio-temporal precipitation climatology over complex terrain using a censored additive regression model. Int J Climatol 37(7):3264–3275

    Article  Google Scholar 

  • Taillardat M, Fougères AL, Naveau P, Mestre O (2019) Forest-based and semiparametric methods for the postprocessing of rainfall ensemble forecasting. Weather Forecast 34(3):617–634

    Article  Google Scholar 

  • Tobin J (1958) Estimation of relationships for limited dependent variables. Econometrica 26(1):24–36

    Article  Google Scholar 

  • Torossian, L., Picheny, V., Faivre, R., & Garivier, A. (2020). A review on quantile regression for stochastic computer experiments. Reliability Engineering & System Safety, 106858, in press.  https://doi.org/10.1016/j.ress.2020.106858

  • Tran VN, Kim J (2019) Quantification of predictive uncertainty with a metamodel: toward more efficient hydrologic simulations. Stoch Env Res Risk Assess 33(7):1453–1476

    Article  Google Scholar 

  • Van der Meer JW, Allsop NWH, Bruce T, De Rouck J, Kortenhaus A, Pullen T, Schüttrumpf H, Troch P, Zanuttigh B, EurOtop (2016) Manual on wave overtopping of sea defences and related structures: an overtopping manual largely based on European research, but for worldwide application. www.overtopping-manual.com. Accessed Dec 2019

  • Verhaeghe H, De Rouck J, van der Meer J (2008) Combined classifier–quantifier model: a 2-phases neural model for prediction of wave overtopping at coastal structures. Coast Eng 55(5):357–374

    Article  Google Scholar 

  • Wager S, Hastie T, Efron B (2014) Confidence intervals for random forests: the jackknife and the infinitesimal jackknife. J Mach Learn Res 15(1):1625–1651

    Google Scholar 

  • Willett P (1999) Dissimilarity-based algorithms for selecting structurally diverse sets of compounds. J Comput Biol 6(3–4):447–457

    Article  CAS  Google Scholar 

  • Wright MN, Ziegler A (2017) Ranger: a fast implementation of random forests for high dimensional data in C++ and R. J Stat Softw 77(1):1–17

    Article  Google Scholar 

  • Zamo M, Mestre O, Arbogast P, Pannekoucke O (2014) A benchmark of statistical regression methods for short-term forecasting of photovoltaic electricity production. Part I: deterministic forecast of hourly production. Sol Energy 105:792–803

    Article  Google Scholar 

  • Zanuttigh B, Formentin SM, van der Meer JW (2016) Prediction of extreme and tolerable wave overtopping discharges through an advanced neural network. Ocean Eng 127:7–22

    Article  Google Scholar 

  • Zhou P, Li Z, Snowling S, Baetz BW, Na D, Boyd G (2019) A random forest model for inflow prediction at wastewater treatment plants. Stoch Env Res Risk Assess 33(10):1781–1792

    Article  Google Scholar 

  • Zijlema M, Stelling GS (2008) Efficient computation of surf zone waves using the nonlinear shallow water equations with non-hydrostatic pressure. Coast Eng 55:780–790

    Article  Google Scholar 

Download references

Acknowledgements

The authors thank the ANR for its financial support to the RISCOPE project (ANR-16-CE04-0011). The following data providers are acknowledged: LEGOS, NOAA, LOPS-IFREMER. We also grateful to the consortium project of RISCOPE for fruitful discussions (A. Aurouet-Geohyd, F. Bachoc-IMT, F. Gamboa-IMT, T. Klein-IMT/Enac, J. Betancourt-IMT, S. Le Roy-BRGM). Random Forest models were fitted with the R package ranger (Wright and Ziegler 2017), and the ROC curves were analysed with the R package pROC (Robin et al. 2011). We thank both anonymous reviewers for their constructive comments which led to the improvement of the present work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jeremy Rohmer.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Sources of data for the coastal flooding application case

See Table 4.

Table 4 Original datasets for the database of offshore meteo-oceanic conditions (after Idier et al. 2020)

Appendix B: Extreme value analysis

The procedure holds as follows. First, two classes of offshore conditions are considered: ‘amplitude’ variables X = (tswl, hs, u), which can take very large values, and covariates Xc = (tp, dp, du), which are dependent on the values of the ‘amplitude’ variables. Considering ‘amplitude’ variables, a multivariate extreme value analysis is conducted to extrapolate their joint probability density to extreme values by taking into account the dependence structure.

This step holds as follows:

  • The marginals of each ‘amplitude’ variable are first analysed and modelled by the combination of the empirical distribution of the Xi values, below a suitable high threshold, with the Generalised Pareto distribution (GPD), above the threshold (Coles and Tawn 1991). The threshold is selected by a combination of methods, namely visual inspection of quantile–quantile graphs, “mean residual life plots”, “modified scale and shape parameters plots” (Coles et al. 2001);

  • The dependence structure of the ‘amplitude’ variables (transformed into common standard Gumbel margins) is modelled by following the approach by Heffernan and Tawn (2004) as described below.

  • Once fitted, a Monte Carlo simulation procedure is used to randomly generate realizations of the ‘amplitude’ variables.

  • Based on the generated dataset, the covariates are generated by using the empirical distribution conditional on the values of the ‘amplitude’ variables.

Consider Xi=1,2,3 the different ‘amplitude’ variables (tswl, hs, u). The variables Xi are first transformed into common standard Gumbel margins. Let use denote Y-i the vector of all transformed variables except for the ith variable Yi, when Yi is large. A non-linear regression model is set up as follows:

$${\varvec{Y}}_{{ - {i}}} |Y_{i} = {\mathbf{a}} \cdot Y_{i} + { }Y_{i}^{{\text{b}}} \cdot {\varvec{W}}$$
(6)

with \(Y_{i} > \upsilon\) (i.e. Yi having large values), a and b are parameters vectors (one value per parameter for each pair of variables), ν a threshold to be defined and W a vector of residuals. The model is adjusted using the maximum likelihood method based on the assumption that the residuals W are Gaussian with a mean and variance to be calculated. For our case study, the threshold selected for ν in Eq. (6) was set up at 0.85 (expressed as a probability of non-exceedance) using the diagnostic tools described in Heffernan and Tawn (2004).

Appendix C: Historical events

Table 5 presents the offshore meteo-oceanic conditions associated to 21 historical events that were selected with respect to past damage data via historical data, past hydro-meteorological data and numerical hydrodynamic simulations (Idier et al. 2020).

Table 5 Description of the historical events used for validation; the three quantiles of interest for the local (Hmax) and global (Volume) indicators are provided in the following order: at 25, 50 and 75%

Appendix D: Censored regression (Tobit)

The Tobit model (Tobin 1958) is based on the concept of censoring, which means that that the response y itself is limited to a certain threshold (or a range), and cannot be observed outside these limits. We assume that there is a latent unobservable process driving the response, which can be described by suitable covariates. In the present study, we focus on zero left-censored normal distribution that is defined as follows:

$$y = \max \left( {0,y^{*} } \right), y^{*} \sim N\left( {\mu ,\sigma } \right)$$
(7)

where y* is the unobservable ‘latent’ response that is assumed to follow a normal distribution (denoted N), given the parameters location μ and scale σ. The ‘observable’ response y is simply the maximum of the latent response and the censoring point.

The density probability function \(f_{{{\text{cens}}}}\) of y is:

$$f_{{{\text{cens}}}} \left( {y{|}\mu ,\sigma ,0} \right) = \left\{ {\begin{array}{*{20}l} 0 \hfill & {{\text{if}}\;\;y < 0} \hfill \\ {F{(}y{|}\mu ,\sigma ) } \hfill & {{\text{if }}\;\;{\text{y}} = 0} \hfill \\ {f{(}y{|}\mu ,\sigma )} \hfill & {{\text{otherwise}}} \hfill \\ \end{array} } \right.$$
(8)

where f and F are respectively the probability density function and the cumulative density function of the Normal distribution.

The parameters location μ and scale σ are assumed to follow a regression model as follows:

$$\mu = {\mathbf{x}}^{{\text{T}}} \cdot {\varvec{\beta}},\user2{ }\quad {\log}\left( {\upsigma } \right) = {\mathbf{x}}^{{\text{T}}} \cdot {\varvec{\gamma}}$$
(9)

where x are the covariates (the offshore meteo-oceanic conditions in the case of our study), and \(\beta ,\gamma\) are the regression coefficients that can be estimated by maximizing the sum over the dataset of the log-likelihood function log(\(f_{{{\text{cens}}}} \left( {y{|}\mu ,\sigma ,0} \right)\)).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rohmer, J., Idier, D. & Pedreros, R. A nuanced quantile random forest approach for fast prediction of a stochastic marine flooding simulator applied to a macrotidal coastal site. Stoch Environ Res Risk Assess 34, 867–890 (2020). https://doi.org/10.1007/s00477-020-01803-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00477-020-01803-2

Keywords

Navigation