1 Introduction

There is widespread interest in quantifying the impacts of climate and other anthropogenic changes on the likelihood of very extreme natural hazards (IPCC 2012). When assessing the risk of a natural hazard, for example to design critical infrastructures, an estimate is needed of the magnitude of extremes expected to occur with a certain rarity, typically derived as quantiles of a statistical distribution. This estimation is carried out under the assumption that the past recordings of the variable under study would still be representative of the stochastic behaviour of the variable at present and in the future. This can be formalised as an assumption that the process under study is stationary (Coles 2001; François et al. 2019). With the increasing evidence of the impacts of climate and other anthropogenic changes, the validity of this assumption is increasingly being challenged. As a result, a large of number of studies have investigated the evidence for change in risk levels of several natural hazards in observational data (e.g. Coates et al. 2014; Guerreiro et al. 2018; François et al. 2019, and references therein), typically using statistical approaches based on Extreme Value Analysis (Cooley 2009; Katz et al. 2002). Often, a default distribution is assumed and one or more distribution parameters are modelled as a function of covariates, e.g. time. The hypothesis that the behaviour of extremes might have been changing is formulated as a functional relationship between one or more of the distribution parameters and the covariate(s). This type of change-permitting analysis is also referred to as non-stationary analysis, while the models with constant values for all parameters throughout time are often referred to as stationary analysis.

Table 1 shows an extremely non-exhaustive list of published articles based on such change-permitting analysis: the list is provided to showcase the variety of modelling choices and applications of change-permitting models. From the Table it can be seen that different combinations of distributions, model structures and covariates have been employed in the literature. All the differences in the model components eventually have an impact in the final understanding about changes in extremes. Specifically, most of the studies listed in Table 1 focus on the changes to the higher quantiles of the distribution.

In this study we discuss ways to describe the changes in frequencies and magnitude focussing on change in the quantile function, showing that different model structures necessarily result in different functional forms of changes in the quantile function. Further, we show that commonly adopted model structures can result in non-intuitive behaviour of the quantile function. We argue that a reflection on the type of change in higher quantiles which can be derived for each model structure should be performed at the initial stages of any analysis to ensure that the analysis results can provide the most appropriate communication of how extremes might be changing.

Table 1 Examples of studies which use change-permitting models to assess changes in annual maxima records

In particular, we formalise the investigation of how the different model structures impact the description of change on effective quantile estimates across different exceedance probabilities using two sets of criteria. The first set of criteria measures the impact of the changes which depend on the covariates used in the change-permitting models. The second set measures the effect of switching from a no-change model to a change-permitting model. Some of these measures have already been partially proposed in the literature: Vogel et al. (2011) introduced the magnification factor as a way to describe relative changes in quantiles in time. Further, the R package extRemes Gilleland and Katz (2016) provides functions to compute the difference in effective quantile estimates under different values of explanatory variables. Moreover, several studies in the literature, including those listed in Table 1, carry out some investigation of the implied impact on the magnitude of estimated quantile of using change-permitting models against results obtained when assuming constant parameters.

The analytical study of the change criteria shows that a minority of model structures results in simple definitions of change in the quantile function. One of these is a model structure which maintains a constant coefficient of variation, thus enforcing a change in the scale of the distribution when the location is allowed to change. Such models are not commonly used in the investigation of changes in extremes, even though a constant coefficient of variation is a common characteristic of estimates based on environmental extremes records (Overeem et al. 2009; Menabde et al. 1999; Serago and Vogel 2018; Blanchet et al. 2009).

The manuscript is organised as follow: a brief introduction of extreme values analysis and the models used to assess changes in extremes is given in Sect. 2. The quantile-based measures of change are presented in Sect. 3 and are applied to a case study of peak flow records in Massachusetts, USA, in Sect. 4. Section 5 closes the paper offering a perspective on the importance of model specification when investigating changes in extremes. The maximum likelihood estimation framework is adopted throughout the manuscript, although some of the concepts discussed would easily apply to models estimated within a Bayesian framework.

2 Statistical models for changing extremes

2.1 Extreme value models and design events; the fixed-parameters case

Assessing the risk of a natural hazard involves an assessment about the frequency at which events of given magnitudes can be expected to be exceeded. Typically, this is done by estimating the design event, which is expected to be exceeded with a annual exceedance probability (AEP) of \(p\) in any given year. If the distribution of event magnitudes is constant in time and events from year to year can be deemed independent from one another, the AEP has a direct relationship with the return period T, the average period of time over which one event at-least as big as the design event would be recorded, where \(T = 1/p\) (see Volpi 2019). Since the interest of the estimation focusses on the most extreme magnitudes which might be recorded, rather than the typical magnitudes, Extreme Value Analysis is used to model the distribution of the extreme values. In particular, extremes can be defined as the maximum value in a sample recorded over fixed periods of time (for example a year). It can be shown that the Generalised Extreme Value (GEV) distribution is the limiting distribution for maxima of stationary series (Coles 2001), although other distributions are sometimes used when estimating design events for engineering purposes (Castellarin et al. 2012).

In general, it is assumed that the variable of interest, denoted by \(Y\), follows a certain distribution parametrised by \(\varvec{\theta }\) \(= (\theta _1, \ldots , \theta _d)\) with probability density function \(f(y; \varvec{\theta })\). Typically a two (\(d=2\)) or three-parameter (\(d=3\)) distribution is used (Castellarin et al. 2012) and, in a first instance, all parameters are assumed to be constant. In the remainder we will focus on the GEV distribution due to its central role in the analysis of extremes. The GEV is a three parameter distribution characterised by a location parameter \(\mu\) (with \(\mu \in (-\infty , \infty )\)), a scale parameter \(\sigma\) (with \(\sigma \in (0, \infty )\)) and a shape parameter \(\xi\) (with \(\xi \in (-\infty , \infty )\)). When \(\xi = 0\) the distribution reduces to a two parameter distribution: the Gumbel distribution. The distribution’s domain depends on the sign of the shape parameter taking the following values: \(-\infty< y \le \mu - \sigma /\xi {\text { for }} \xi < 0\), \(\mu - \sigma /\xi \le y < \infty {\text { for }} \xi > 0\) and \(-\infty< y < \infty {\text { for }} \xi = 0\). For a given sample of extremes \({\varvec{y}} = (y_i, \ldots , y_n)\) it is possible and typically straightforward to find an estimate of \(\varvec{\theta }\) denoted as \(\hat{\varvec{\theta }}\). Estimates for the design event \(Q_T\) (where T indicates the return period) can be derived from the quantile function corresponding to the fitted distribution \(q(1-p; \hat{ \varvec{\theta }})\) (where \(p = 1/T\)) so that \(P(Y > Q_T) = p\). For the GEV distribution the quantile function takes the form:

$$\begin{aligned} q(1-p; \mu , \sigma , \xi ) = \left\{ \begin{array}{lr} \mu + \frac{\sigma }{\xi }\left[ (-\log (1-p))^{-\xi } -1 \right] &{} {\text {for }} \xi \ne 0\\ \mu + \sigma \{-\log ({-\log (1-p)})\} &{} {\text {for }} \xi = 0 \end{array} \right. \end{aligned}$$
(1)

where \(p\) indicates the exceedance probability. The quantile function for the GEV shown in Eq. (1) can be rewritten as:

$$\begin{aligned} q(1-p; \mu , \sigma , \xi ) = \left\{ \begin{array}{lr} \mu + \frac{\sigma }{\xi }\left[ y_p^{-\xi }-1 \right] &{} {\text {for }} \xi \ne 0\\ \mu + \sigma (-\log (y_p)) &{} {\text {for }} \xi = 0 \end{array} \right. \end{aligned}$$
(2)

taking \(y_p = (-\log (1-p))\). Notice that for \(\xi =0\) the quantile function of the Gumbel distribution is a linear transformation of \(-\log (y_p)\) with \(\mu\) being the intercept and \(\sigma\) being the slope of the line.

2.2 Incorporating change in extreme value models; the change-permitting case

The hypothesis that the behaviour of extremes might have been changing is typically formulated by assuming a functional relationship between one or more of the distribution parameters, and one or more covariates. The validity of the hypothesis is assessed by comparing the goodness-of-fit of the models with varying parameters against models with fixed parameters. This approach is established for extreme value analysis (Coles 2001) and has some similarity to Generalized Linear Models and their extensions such as Vector Generalised Additive Models (Yee 2015) or Generalised Additive Models for Location Scale and Shape (GAMLSS, Rigby and Stasinopoulos 2005; Wood 2017), which are also referred to as distributional regression models (Umlauf and Kneib 2018). In this work, to keep the presentation and computations as simple as possible, only linear models (with appropriate link functions) with one covariate are explored. The concepts can be extended to the case in which more than one covariate would enter the model and in which the relationship between the covariate and the distribution parameter (or their link transformation) would be better described by some non-linear function. Indeed, different covariates might enter the model in either a linear or non-linear fashion affecting one or more of the distribution parameters: see Yee and Stephenson (2007) for a thorough discussion on this within the extreme values framework.

As an example of the change-permitting approach the annual maxima series of the Aberjona River at Winchester, Massachusetts (USGS Gage 01102500) is used. Serago and Vogel (2018) reported that the watershed upstream of the station has undergone significant urban development which would typically increase flood magnitudes. The \(n=78\) annual maximum peak flow records are assumed to be independent and drawn from a GEV distribution, possibly with one or more parameters changing throughout the recording period. The peak flow records are shown in Fig. 1, and it seems clear that through time the observations have been increasing in magnitude and possibly also become more variable.

Fig. 1
figure 1

Peak flow data for the Aberjona River together with the effective 30-year design event according to different change-permitting models presented in the main text

Four models with different change-permitting structures are fitted to the observations to investigate possible changes in the peak flow distribution. At first a model \(M_{0}\) with all parameters kept constant is considered. It is assumed that each element in the sample is iid and GEV-distributed:

$$\begin{aligned} M_0: Y_i \sim GEV(\mu , \sigma , \xi ) {\text { for }} i = 1, \ldots , n. \end{aligned}$$

Next, two models in which the location of the distribution is allowed to change as a function of time with different link functions are used. As in Serago and Vogel (2018) time is used as a covariate in the model, as a surrogate for the increased urbanisation levels. Assuming that the variable describing river peak flow at time i follows a GEV distribution with a different location for each year, \(Y_i \sim GEV(\mu _i, \sigma , \xi )\), the following two models are fitted: a model with a linear link between the location and time (\(M_{L}\)) and a model using an exponential function to link the location parameter and time (\(M_{E}\)):

$$\begin{aligned} \left. \begin{array}{lll} M_L & : Y_i \sim GEV(\mu _i, \sigma _{C}, \xi _{C}), {\text {with}} \,\mu _i = \mu _{C0} + \mu _{C1} {\text {x}}_i &{} {\text { for }} i = 1, \ldots , n \\ M_E & : Y_i \sim GEV(\mu _i, \sigma _{C}, \xi _{C}), {\text {with}}\, \mu _i = \exp \{\eta _{C0} + \eta _{C1} \text {x}_i \} &{} \text { for } i = 1, \ldots , n \end{array} \right. \end{aligned}$$

In the last fitted model (\(M^{CV}_{E}\)) the location is modelled as a function of time via an exponential link function and the scale is allowed to change proportionally to the location:

$$\begin{aligned} M^{CV}_{E}: Y_i \sim GEV(\mu _i, \tau \mu _i, \xi _{C})\text {, with } \mu _i = \exp \{\eta _{C0} + \eta _{C1} \text {x}_i \} \text { for } i = 1, \ldots , n . \end{aligned}$$
(3)

where \(\tau\) is the ratio between the scale and the location of the distribution. The model structure is discussed further in Sect. 2.3. The C subscript is used to emphasise the parameters of change-permitting models.

In Fig. 1 the estimated 30 year event (i.e the design event with AEP = 1/30) under the four different models fitted to the data are shown. These are obtained by plugging the estimated parameter values in the quantile function in Eq. (1) and correspond to the effective quantiles discussed in Katz et al. (2002). Notice that for all four models shown in the Figure, different values of the scale and shape parameters will be estimated because different structures are used for the location function.

The maximum likelihood estimates of the model parameters in all four models discussed above are given in Table 2, together with their standard errors. In all models the coefficients describing the change in time for the location parameter are estimated to be positive and significantly different from 0 at the 5% significance level. The estimated value for the location parameter in 1940 and 2019 are also displayed in Table 2: the final estimation of the location at the beginning and at the end of the record is fairly similar for the three models in which the location is allowed to change: the choice of the link function has a minor effect on the estimated value of the location function within the recording period.

In Table 2 the estimated parameter values for an additional model (M\(_S\)) in which the scale is allowed to change with time using an exponential link function are also provided. The model structure is the following:

$$\begin{aligned} M_S : Y_i \sim GEV(\mu _{C}, \sigma _i, \xi _{C})\text {, with } \sigma _i = \exp \{ \gamma _{C0} + \gamma _{C1} \text {x}_i \} \text { for } i = 1, \ldots , n . \end{aligned}$$

This model should be used rarely and is presented here for comparative purposes: the scale parameter describes the variation of the distribution around its centre and typically one would first need to correctly model changes in the location parameter as a function of some predictors to correctly characterise changes in the scale. For this station there is little evidence that scale alone is changing when keeping the location fixed, although the variability of the record shown in Fig. 1 would appear to be higher in the later years. Indeed for the data in Fig. 1 there appear to be an increase in both the overall magnitude and the variability of flow extremes: this feature can be well described by the model \(\hbox {M}^{CV}_E\), in which the location is allowed to change as a function of time and the scale changes proportionally to the location.

Table 2 Maximum likelihood parameter estimates for the models presented in main text for Aberjona River peak flow data

To assess the goodness of fit of the change-allowing models to the data, beside assessing whether the parameter describing the change is significantly different from zero, it is often appropriate to check that the distribution estimated under each model gives a good fit to the observed data. As discussed, among others, in Coles (2001) this can be done using graphical tools such as probability-probability plots (pp-plot) or quantile-quantile plots (qq-plot) which can provide a visual check for the validity of the distributional assumption for the response variable. Software such as ismev and extRemes makes the testing of model significance and the assessment of the goodness of fit fairly straightforward. For the Aberjona river case the goodness of fit plots (not shown) indicate that the GEV assumption seem to hold. Nevertheless, when assessing the significance of a change-permitting model or selecting the most suitable model by means of information criteria (e.g. AIC or BIC), it is also important to assess and take into account the goodness of fit of different models to the data.

2.3 A model preserving a constant coefficient of variation

The model in Eq. (3) has rarely been employed in extreme value analysis: it is mentioned in passing by Smith (1999) that such a model gave a better fit to the rainfall data under study, but it has not been employed in the investigation of changes in environmental extremes, to the best of the authors’ knowledge.

The model is constructed so that when the location changes so does the scale, while the ratio between the scale and the location (\(\tau\)) remains constant. Although \(\tau\) is not exactly the coefficient of variation for a GEV distribution it is tightly related to its value. Most of the studies investigating changes in extremes in the literature (see for example Table 1), use GEV models with a varying location or scale (or location and scale which are allowed to vary independently from each other). These models have a varying coefficient of variation: a change in, say, the location parameter would involve also a change in the coefficient of variation. The model suggested in Eq. (3) keeps a constant coefficient of variation providing a parsimonious representation of changes in both location and scale. As mentioned in Serago and Vogel (2018), the estimates of the coefficient of variation are often found to be approximately constant across series measuring the same variable, while location and scale are well-known to vary greatly with some external variable. For example, in river flow measurements both the location and the scale typically vary as a function of watershed size and mean rainfall, while the estimates of the coefficient of variation are relatively constant across different watersheds. Similarly, rainfall accumulation across longer durations will tend to be larger and also exhibit more variability. These well-known properties of extremes are the basis for the widely adopted index-flood method and regional flood frequency analysis (Hosking and Wallis 1997) and for the standard methods used to derive Intensity Duration Frequency (IDF) curves (Menabde et al. 1999). These properties are sometimes referred to as the scaling properties of environmental extremes (Gupta and Waymire 1990). Beside the case of rainfall and river flow extremes, simple and multi-scale properties have been found to hold for, for example, snow accumulation, which scales with altitude (Blanchet et al. 2009), and wind speed, which scales with duration (Diebold and Heller 2019). The model in Eq. (3) is inspired by these so-called scaling properties of extremes: by keeping a constant coefficient of variation changes in the location result in proportional changes of the scale. As discussed in the next Section this also provide a straightforward description of changes in the distribution quantiles.

3 Measuring the impact of change

Once a relevant change in the distribution parameters is identified, this implies that the overall distribution of the variable of interest is deemed to have changed. The left panels in Fig. 2 gives an illustration of how potential changes in the parameters correspond to changes in the distribution by displaying the probability density functions (pdf) of a GEV distribution under different parameter values. The right panels of the Figure show instead the return plots, which depict the quantile curve as a function of a transform of the AEP. The curves shown in the Figure could represent the past and present day pdf and quantile functions under different change-permitting models: the black solid line defines a baseline, for example the estimate obtained for the beginning of the record when using time as covariate. The other three lines represent: (1) a change only in the location parameter (red dashed line), (2) a change only in the scale parameter (blue point and dash line), and (3) a change in both the location and scale parameter while maintaining the ratio between the scale and the location parameter constant (brown short and long-dashed line). The difference in the parameter values in the distributions results in quantile functions which all differ from the quantile function of the baseline model in a different way. The changes in the quantiles, which are related to the design events and are therefore the quantity of interest of extreme value analysis for environmental extremes, are the focus of the measures of change suggested in the next section. A number of metrics and approaches to redefine the concept of design event in the presence of change have been being proposed in the literature (see, among others, Rootzén and Katz 2013; Salas et al. 2018; Hu et al. 2018), but in this study we focus on metrics based on changes in quantiles.

Fig. 2
figure 2

Probability distribution functions and quantile functions for a Gumbel distribution and a GEV distribution with parameters as those indicated in the legend

3.1 Measuring change: some quantile-based metrics

Two quantile-based change metrics are introduced here: the difference, D, and the ratio, M, between two quantile functions. These two metrics can be computed to compare design events under different values of the covariate in the change-permitting models (indicated by superscript \(\delta\)) or to compare design events under the fixed-parameters and change-permitting models. In the first case, one could compare the estimated quantiles for time point x and time point \(x + \Delta x\), with \(\Delta x\) indicating the difference in time between the two time points under study (or the difference in any other covariate values). Denoting with \(\varvec{\theta }_C(x^*)\) the values of the parameters in the change-permitting model which are obtained evaluating the parameter functions at values \(x^*\) of the covariate, the following change metrics are defined:

$$\begin{aligned} D^{\delta }(p, \Delta x)= & {} q(p, \varvec{\theta }_C(x + \Delta x)) - q(p, \varvec{\theta }_C(x)) \end{aligned}$$
(4)
$$\begin{aligned} M^{\delta }(p, \Delta x)= & {} \frac{ q(p, \varvec{\theta }_C(x + \Delta x)))}{q(p, \varvec{\theta }_C(x))} \end{aligned}$$
(5)
$$\begin{aligned} D(p, x^*)= & {} q(p, \varvec{\theta }_C(x^*)) - q(p, \varvec{\theta }) \end{aligned}$$
(6)
$$\begin{aligned} M(p, x^*)= & {} \frac{q(p, \varvec{\theta }_C(x^*))}{q(p, \varvec{\theta })} \end{aligned}$$
(7)

The metrics of change introduced above can be used to assess the impact of the changing behaviour of extremes as described by different model structures on the estimated quantiles of the distribution, i.e. the design events. Note that the quantity in Eq. (5) is a generalisation of the magnification factor introduced by Vogel et al. (2011), while values of the quantity in Eq. (4) can be derived within the R package extRemes (Gilleland and Katz 2016).

3.2 Measuring change within the change-permitting models

The types of changes in the parameter values in Fig. 2 reflect some of the changes observed in parameter values obtained from change-permitting models often used in environmental studies. The model defined by a linear change in the location parameter of a GEV distribution with a constant scale, in which it assumed that \(Y_i \sim GEV(\mu _{C0} + \mu _{C1} x_i, \sigma , \xi )\), would corresponds to the red dashed line. This modelling assumption leads to a vertical shift of the quantile function. In this case the metric \(D^\delta (p, \Delta x)\) remains constant across all values of \(p\) and takes value \(D^\delta (p, \Delta x) = \mu _{C1} \Delta x\). In contrast, the ratio \(M^\delta (p, \Delta x)\) depends on \(p\), \(\Delta x\) and all the four parameter in the model: i.e. the ratio between quantiles would be different across exceedance probabilities (return periods) since

$$\begin{aligned} M^\delta (p, \Delta x) = \frac{\mu _{C1} \Delta x}{\mu _0 + \mu _{C1} x + \sigma (y_p^{-\xi }-1)/\xi } +1 . \end{aligned}$$

The values of \(D^\delta (p, \Delta x)\) and \(M^\delta (p, \Delta x)\) for a number of models commonly used when assessing the presence of change in environmental extremes are provided in the “Appendix”. There are only a few change-permitting GEV models for which one of the metric of change does not depend on the exceedance probability. One of them is the case in which the location is allowed to change, either linearly, as in the case discussed above, or as an exponential function of the covariate: in this case the value of \(D^\delta (p, \Delta x)\) is constant across all exceedance probabilities. When using the model proposed in Eq. (3), it is the ratio of quantiles that is a constant which only depends on \(\Delta x\) and \(\mu _{C1}\):

$$\begin{aligned} M^\delta (p, \Delta x) = \exp \{\mu _{C1} \Delta x\} \end{aligned}$$

Vogel et al. (2011) used time as a covariate and defined \(M^{\delta }\) as a “decadal” magnification factor based on \(\Delta x = 10\) years for the case of the two-parameter log-normal distribution, which is characterised by a constant coefficient of variation. This expression of change is independent of return period and has a simple interpretation. The model introduced here in Eq. (3) allows this concept to be applied with the more widely used GEV distribution and to provide a convenient way to communicate the outcome of change-permitting analysis to end-users.

3.3 Measuring change against the fixed-parameters model

For any model used to assess the existence of change in extremes, a natural question is also how estimates from a change-permitting model compare to the corresponding estimates obtained using fixed parameters, which are likely to be the basis of the current design event estimates. If the distribution is changing, then the probability of exceeding a pre-specified value would be different at the present time than it was at any time in the record and, more importantly, the estimation based on the assumption of a unique distribution for all data, would be biased.

The two metrics proposed in Eqs. (6) and (7), \(D(p, x^*)\) and \(M(p, x^*)\), provide a natural way to assess these impacts, but for almost no model can these two quantities reduce to simple metrics such as those available for \(D^\delta (p, \Delta x)\) or \(M^\delta (p, \Delta x)\). Nevertheless, it is possible for a fixed value of the covariate \(x^*\), to investigate how \(M(p, x^*)\) or \(D(p, x^*)\) change as a function of \(p\). Often, up to a certain value of \({\tilde{p}}\), the quantiles of the change-permitting model would be larger (respectively smaller) than the fixed-parameter model, while after \({\tilde{p}}\) the quantiles of the change-permitting model would be smaller (respectively larger). In practical applications, this can be a source of doubt for decision making, since events up to probability \({\tilde{p}}\) might be, say, underestimated when using fixed-parameters models, while events with probability of exceedance greater than \({\tilde{p}}\) would appear to be overestimated. This can happen, for example, when a linear trend is assumed for the location parameter and found to be, say, positive and some variability in the data can be explained by such trend, so that the estimate for the scale parameter diminishes in value: this is the case for the Aberjona river as shown in Table 2. When this happens one of the possible consequences is a shifting and “flattening” the return curve so that there is an increase in the quantiles which are exceeded relatively frequently, while the very rare events are found to be generally smaller. Similarly, when including a linear trend in the location parameter, the estimate for the shape parameter might vary so that the return curve under the change-permitting model might increase at a faster or slower rate.

Figure 3 shows a comparison of the scale and shape parameter for a fixed-parameters model and for a change-permitting model with a linear trend in the location as a function of time applied to 40 annual maximum series of instantaneous peak river flow in Massachusetts each containing more than 65 years of data (see Sect. 4 for a complete description of the dataset). For 31 out of 40 stations, the scale parameter is smaller for the change-permitting model, reflecting that allowing the location parameter to change explains a portion of the variability in the records. The shape parameter estimates also change (middle figure), with both increases and decreases observed. The impacts on the quantile estimation when migrating from a fixed-parameters to a change-permitting model can be assessed using \(M(p,x^*)\), i.e. the ratio between quantiles computed under the change-permitting and the fixed parameters models, for different value of \(p\). In the right panel of Fig. 3 the scatterplot of \(M(p,x^*)\) for the median (i.e. \(p=0.5\)) and for the 100-year event (i.e. \(p=0.01\)) is shown. For all stations in the datasets the change-permitting models are evaluated at the end of the record of the stations (\(x^* = max(x)\)). The sign of the change for the frequent (\(p=0.5\)) and rare (\(p=0.01\)) events is different at times (these are the red stars in the bottom right quadrant and top left quadrant). These instances represent cases where the introduction of a change-permitting model will result in changes in design flood estimates at high return periods which are different in direction than those for low return periods; a counter intuitive outcome for most practical applications. Moreover, the magnitude of the change can be different for the two AEP values, due to the difference in the scale and the shape parameter between the two models. For this dataset changes in the high quantile (AEP of 0.01) tend to be smaller than changes in the median, reflecting that the scale parameter estimates tend to be smaller for change-permitting model than for the fixed-parameters model.

Fig. 3
figure 3

The impact of fitting a linear trend in the location for the Massachusetts peak flow dataset. Left panel: ratio of the estimated scale parameter under the change-permitting model and the fixed-parameter model. Central panel: ratio of the estimated scale parameter under the change-permitting and the fixed-parameter model. Right panel: scatterplot of M(0.5, \(x^*\)) and M(0.01, \(x^*\)); red stars indicate stations for which the direction of change is different for \(p=0.5\) and \(p=0.01\)

3.3.1 Enforcing change structures

Although having different direction of changes for events of different rarity can be practically challenging, this property might be something that a modeller wishes to exploit when analysing extremes. For example, there might be some physical reasoning by which the magnitude of frequent events under some pre-specified value of the covariates is expected to be reduced while the rare events would become larger. For example, Sharma et al. (2018) discuss how different types of change of the distribution of high flows might be expected for different catchment types, while Guerreiro et al. (2018) show that changes in the magnitude of extreme rainfall are different for the relatively common and the most extreme events.

When fixing the shape parameter to be the same in the fixed-parameters and the change-permitting models, the point at which the two return curves cross can be found analytically for some of the commonly used models, namely the model with a linear change in the location and a model with an exponential change in the scale (see “Appendix”). Under the fixed parameters model the assumed distribution is \(Y \sim GEV(\mu , \sigma , \xi )\). Under the linear change in the location model, the assumed distribution is \(Y_i \sim GEV(\mu _{C0} + \mu _{C1} x_i, \sigma _{C}, \xi )\). Noticeably, the \(\xi\) parameter is kept to be the same under both models. The \(D(p, x^*)\) function for this change-permitting model is found to be:

$$\begin{aligned} D(p, x^*) = (\mu _{C0}+\mu _{C1} x^* - \mu ) + \frac{\sigma _C - \sigma }{\xi } (y_p^{-\xi }-1) \end{aligned}$$

The two quantile functions have the same value (i.e. they cross) at the point in which \(D(p, x^*) = 0\), which occurs at:

$$\begin{aligned} y_p = \left( 1 + \xi \frac{\mu -\mu _{C0}-\mu _{C1} x^*}{\sigma _C-\sigma }\right) ^{-1/\xi } \end{aligned}$$

provided that \(\xi (\mu -\mu _{C0}-\mu _{C1} x^*)/(\sigma _C-\sigma ) > -1\). Manipulating the formula for \(D(p, x^*)\) further, one can define the value that \(\sigma _C\) should take to ensure that the two quantile curves cross at a fixed \(y_p\) value for a given value of \((\mu _{C0}, \mu _{C1})\) as:

$$\begin{aligned} \sigma _C = \sigma + \frac{\mu -\mu _{C0}-\mu _{C1} x^*}{y_p^{-\xi }-1} \xi . \end{aligned}$$

This could be enforced in the model estimation to ensure that the change in the sign of \(D(p, x^*)\) happens exactly at a desired AEP \({{\tilde{p}}}\). Further, one can study the sign of \(D(p, x^*)\) as a function of \(p\). It can be found that quantiles from the change-permitting model exceed the quantiles from the fixed-parameters models (\(D(p, x^*) > 0\)) as long as:

$$\begin{aligned} \sigma _C > \sigma + \frac{\mu -\mu _{C0}-\mu _{C1} x^*}{y_p^{-\xi }-1} \xi . \end{aligned}$$
(8)

From this result it is possible to further investigate the properties a change-permitting model should have to ensure that effective return periods of a certain rarity are larger (\(D(p, x^*) > 0\)) or smaller (\(D(p, x^*) < 0\)) than the return periods under the fixed-parameters model. First, notice that

$$\begin{aligned} \left\{ \begin{array}{lr} y_p^{-\xi }-1> 0 &{} \text {for } \xi< 0 \ \text {and } p> 1-e^{-1}\\ y_p^{-\xi }-1> 0 &{} \text {for } \xi > 0 \ \text {and } p < 1-e^{-1} \end{array} \right. \end{aligned}$$

where \(p = 1-e^{-1}\) corresponds approximately to an AEP of 0.63 and a return period of T = 1.58. Assuming that one would want the two curves to cross at a value of \(p > 1-e^{-1}\), we see that for \(D(p, x^*)\) to be positive (i.e. to ensure that the dis-equality in Eq. (8) holds) one needs to find that the location parameter evaluated at \(x^*\) (i.e. the value \(\mu _{C0}+\mu _{C1} x^*\)) is smaller that the location parameter in the fixed parameters model (\(\mu\)). This would mean, for example, that if time is the covariate in the model and \(x^*\) is taken to be the end of the record, to ensure that the design events under the change-permitting model are larger than the ones obtained under the fixed parameter model for event with AEP lower than a certain value \({{\tilde{p}}}\), one would need to find a negative trend in the location and at the same time an increase in the scale parameter compared to the fixed-parameters model. Combining these findings it would be possible to define change-permitting models such that the effective return curve derived at a certain value of the covariate crosses the fixed-parameters return curve exactly at a pre-specified AEP \({{\tilde{p}}}\) and that has higher (or lower) estimated return levels than the fixed-parameters model for events with AEP smaller than \({{\tilde{p}}}\). This could be done by including constrains or using convenient re-parametrisation within the likelihood functions used to estimate both the fixed and the change-permitting models.

The constrains needed to ensure a positive \(D(p, x^*)\) under different change-permitting models are more cumbersome than the ones presented above for the case of the linear change in location. Nevertheless, it is generally true that if one wishes to ensure that the estimated magnitudes of rare events (i.e. events with a small AEP) evaluated at a given value \(x^*\) in the change-permitting model are larger than the ones under the fixed-parameters model, the scale parameter under the change permitting needs to be larger, while the location parameter needs to be smaller (provided the shape is kept to be the same in both models).

4 Case study: the Massachusetts peak flow data

In this Section records of annual maxima of instantaneous peak river flow recorded from rivers in the state of Massachusetts, USA, are used to explore some of the practical consequences for the estimation of quantiles when imposing different model structures to allow for change in the distribution of peak flows. The data consists of the annual maximum series of instantaneous peak flow for 40 stations in the state with at least 65 years of valid records. The longest record is 115 year long and all records end after 2015. Some of these stations record flow at locations for which the upstream watershed has undergone changes in urbanisation similar to those experienced by the Aberjona watershed. In this study we do not wish to assess whether peak river flow has changed across the state, nor to identify the drivers of such change: we instead focus on the consequences of different parametrisation of change on the estimated design events. We also do not attempt to make an assessment of which of the model structures employed to characterise possible changes in the records is the most suitable one for the different river flow records in Massachusetts: more complete checks on the goodness of fit of each model fit for every record would be needed. Overall, finding the most suitable statistical model to characterise possible changes in the distribution of river flow peaks would require a more thorough statistical investigation of the data and a deeper understanding of the possible causes driving the observed changes (e.g. changes in the land use in the watershed).

A number of different fixed-parameters and change-permitting models are fitted via standard maximum likelihood estimation to the data series of each station in the dataset. For all models time is used as a covariate and the following transformed logit function is used for the shape parameter:

$$\begin{aligned} \xi = \text {logit}(\zeta ) - 0.5 \end{aligned}$$

to ensure that \(\xi \in [-0.5,0.5]\). This constraint (adapted from the gevlss function in the mgcv R package, Wood 2017) ensures that the mean and variance of the GEV are finite and the maximum likelihood estimates are consistent (see Smith 1985). At first, the standard model with fixed parameters, \(M_{0}\), is estimated for each record, together with the different change-permitting models (\(M_{L}\), \(M_{E}\), \(M_{S}\) and \(M^{CV}_{E}\)) which were applied to the Aberjona river data as seen in Table 2.

4.1 From fixed- to change-permitting models

At first we compare the support in favour of a change in the behaviour of extremes across the different models. For each of the model different assumptions are made regarding which property of the distribution is changing and how this change is related to the covariate. Therefore, rather than comparing the raw estimated values for the parameters describing the estimated changes across different stations and different models, the parameter estimates are standardised by their estimated standard deviation and compared in Fig. 4. In the left panel the trend parameters for the model \(M_L\) and \(M_E\) are compared. For the model with a linear link function, \(M_L\), the parameter of interest is \(\mu _{C1}\), while for \(M_E\), in which an exponential link function is used, the parameter of interest corresponds to \(\eta _{C1}\). For these two models, the detected changes in the location parameter are similar in sign and strength: most changes are positive, indicating an increase in the location parameter over time for the majority of stations. As in the Aberjona case the estimated location function within the recording period is similar when using a linear or an exponential link function.

Fig. 4
figure 4

Scatterplot of scaled trend coefficients under different modelling assumptions

In contrast, the sign and strength of the trend when modelling location and scale are very different. The scatterplot in central panel of Fig. 4 indicates that for sites with strong changes in the location parameter there is no corresponding strong evidence of change in the scale parameter when this is modelled as a changing function of time while keeping the location fixed (\(M_S\)). However, when the scale is allowed to change proportionally to the location (\(M_{CV}^{E}\)), the strength and direction of the trend is similar to those found when allowing only the location to change (right panel). At each site, using the model in Eq. (3) would results in identifying a changing behaviour of extremes similar in sign and strength to the one identified when only allowing the location parameter to change.

Importantly, the consequence of the identified change on the quantiles would be different for different models due to differences in model structures. These differences are shown for the Aberjona river in Fig. 5. The top panel and the bottom left panel in Fig. 5 show in detail the comparison of the return curves for the fixed-parameter model (\(M_0\)) and for different change-permitting models evaluated in year 2019 (the final year in the record). These curves are derived from the estimated parameters shown in Table 2: changes in the parameters in the change-permitting models imply different relative changes across the AEP values. For models allowing changes in the location parameter it was found that the parameter value in year 2019 is generally larger than the one of the fixed parameter model and this implies an overall increase in the return curve. In contrast, the minimal difference between the parameter estimates for the fixed-parameter model \(M_0\) and the model in which the scale is allowed to change in time \(M_S\) is reflected in almost identical return curves (top-right panel) and consequently an almost flat M(p, 2019) line (lower right panel). The relative changes of the return curves for the different change-permitting models in all stations are shown in Fig. 6. For all stations in all models \(M(p, x^*)\) is evaluated in the last year available in the record, thus Fig. 6 shows the ratio of the effective return curve of the different change-permitting model structures evaluated at the end of the record period against the estimated return curve for the fixed parameter model. The left panel shows the relative change in quantiles across AEP induced by assuming an exponential trend in the location parameter. In this case the relative changes implied in the quantile functions tend to be minimal for the events with AEP of approximately 0.1 and the relative changes in the frequent events (Gumbel transform smaller than .4) tend to be larger than the relative changes for rare events (Gumbel transform larger than 4.6) . The contrary is true when the scale alone is allowed to change (central panel): overall there are much larger relative changes for the rarer events. When the scale is allowed to change proportionally to the location (right panel) the relative changes for the different AEP are fairly comparable for most stations. As in Fig. 3, for a handful of stations the direction of change for the median and the 100-year event are different when only the location is allowed to change; this is also true when the scale only is allowed to change. When using the model in Eq. (3) the changes have instead always the same direction in the range of AEP considered in Fig. 6, with most of them being fairly constant. In conclusion, the assumption made in terms of what parts of the distribution are allowed to change has a strong impact on the implied changes in higher quantiles which are typically of interest for engineering design.

Fig. 5
figure 5

Comparison of return curves under the fixed-parameters model (dashed black line) and different change-permitting models for the Aberjona river. The bottom right panel shows the evolution of \(M(p, 2019)\) for the change-permitting models across different AEP p

Fig. 6
figure 6

\(M(p, x_{end})\) across different AEP p for the different change-permitting models. The purple and green vertical lines indicate, respectively, the AEP = 0.5 and AEP = 0.01 values

4.2 Changes over the record-period

Figure 7 shows the return curves for the change-permitting models fitted to the Aberjona river series and evaluated for year 1965 and 2015. This allows the comparison of the possible changes in the quantiles of the peak flow distribution in 50 years. The bottom right panel shows the \(M^{\delta }(p, 50)\) values: as expected when the parametrisation presented in Eq. (3) is used the relative change in the two quantiles is the same and is equal to \(\exp \{\mu _{C1} \Delta t\}\).

Fig. 7
figure 7

Comparison of return curves for the change-permitting models in 1965 (dashed lines) and 2015 (solid lines) for the Aberjona river. The bottom right panel shows the evolution of \(M^{\delta }(p, 50)\) across different AEP p

Figure 8 compares the relative changes of the quantiles in 2015 and 1965 as a function of AEP (\(\hbox {M}^{\delta }(p,50)\)) for the change-permitting models \(M_{L}\), \(M_{E}\) and \(M^{CV}_{E}\) for all stations in the study. Thus, the left panel shows the relative changes over 50 years in the effective return levels when the model structure allow changes in the distribution as an exponential change in the location parameter. In this case, the changes in the frequent events are much larger than those in the rarer events: an increase in the location parameter would entail much larger estimates in the later years for the median, but only mildly larger estimates in the later years for the 100-year event. The behaviour is reversed when the scale only is allowed to change: frequent events are estimated to have relatively similar magnitude in 2015 and 1965, while the estimated magnitude of rare events is very different with effective design events for low AEP being either much larger or much smaller in 2015 than in 1965. The consequence of using the parametrisation of change presented in Eq. (3) is evident in the right hand panel: changes are constant across AEP giving an easy way to assess how the quantiles of the distribution might have changed over the 50 year period.

Fig. 8
figure 8

\(M^{\delta }(p, 50)\) across different AEP p for the change-permitting models. The purple and green vertical lines indicate, respectively, the AEP = 0.5 and AEP = 0.01 values

No single model of change can be appropriate to study changes in all extremes series, and an evaluation of the goodness of fit for the fitted models can be useful to assess whether any model can provide a better representation than others for the series under study. This might include both comparative measures such as AIC or BIC and graphical tools such as qq-plots which can indicate whether the distributional assumption made for the data is suitable. Furthermore, models fitted to the data should be reasonable and coherent with the physics principles underlying the process under study. Nevertheless, some decisions are typically made at an initial modelling stage of the parametric forms of change to consider as possible models. This decision could also be informed by considerations of whether any of the assumed forms can provide more usable metrics to describe changes in design events (when these are of interest) or any other quantity of interest derived from the fitted model.

4.3 Assessing uncertainty of quantile estimates

The assessment of the uncertainty in the quantities of interest is a key element when performing any statistical analysis. Different approaches have been employed in the literature to assess the variability of the estimated quantiles, the most common ones being the profile likelihood, the delta method, and the non-parametric and parametric bootstrap (Cooley 2013; Kyselỳ 2008). In particular, Kyselỳ (2008) shows that the latter method appears to provide good uncertainty estimates under several cases. The parametric bootstrap is therefore employed here to compare the variability of the estimated quantiles under different model structure for all stations in the study. A brief outline of the parametric bootstrap procedure is provided below.

For any model fitted to a river flow series of size n, an estimate for the vector of d parameters \(\hat{\varvec{\theta }}\) is obtained. Using the estimated parameters, B bootstrap samples of size n are randomly generated from the appropriate distribution parametrised by \(\hat{\varvec{\theta }}\). For each bootstrap sample the vector \(\hat{\varvec{\theta }}^*\) is estimated, together with the quantiles of interest possibly evaluated at a specific covariate value \({\tilde{x}}\), \(q^*= q(1-p; \hat{ \varvec{\theta }}^*({\tilde{x}}))\). This results in a bootstrap sample of quantiles \((q^*_1, \ldots , q^*_B)\). Percentile based \((1-\alpha )*100\)% confidence intervals for the quantile of interest are derived as the \(\alpha /2\) and \(1-\alpha /2\) percentiles of the bootstrap sample.

The scatterplots in Fig. 9 compare the width of the 95% confidence intervals for the design event of AEP equal to 0.5 derived at all stations for the different change-permitting model structures against the width of the interval for the fixed-parameters model. For all station the quantiles for the change-permitting models are evaluated at the last available year, i.e the end of the record. Noticeably, for all models in which the location is allowed to change (\(M_L\) and \(M_E\)) confidence intervals are found to be wider. In contrast, the width of the confidence intervals for the frequent events under a model with a varying scale \(M_S\) (bottom right panel) appears to be comparable to the one derived when assuming fixed parameters in time (\(M_0\)). This is not the case though when the intervals under comparison refer to rarer events. Figure 10 is structured as Fig. 9, but showing the width of confidence intervals for rare design events (AEP = 0.01). The scatterplot in bottom right panel in this case shows that the intervals for the model in which scale is allowed to change (\(M_S\)) are generally wider than those obtained when no change is allowed (\(M_0\)). This is not the case for models in which the mean only is allowed to change (\(M_L\) and \(M_E\)): in the top two panels the interval widths for the \(M_L\) and \(M_E\) model are similar to those for the no-change (\(M_0\)) model. The width of the intervals obtained using the model which allows the scale to change proportionally to the location (\(M_{CV}^{E}\)) is also typically wider: the variability of estimates of the magnitude of rare events is highly influenced by changes in the scale parameter.

Overall the variability of quantile estimates can be quite different under the various model structures investigated in this study. A better understanding of how the variability of the quantiles of interest might be affected by different parametrisation could also contribute to the decision of which model is adopted to describe change.

Fig. 9
figure 9

Bootstrap 95% confidence interval width (log-scale) for the effective design event of AEP = 0.5 under different change-permitting model structures against the width of the interval for the fixed parameter model (log-scale)

Fig. 10
figure 10

Bootstrap 95% confidence interval width (log-scale) for the effective design event of AEP = 0.01 under different change-permitting model structures against the width of the interval for the fixed parameter model (log-scale)

5 Discussion

In this paper the effect of different model structures used to describe changes in extremes is discussed by focusing on the impacts on quantiles as estimated under different change-permitting parametrisations. Although throughout the paper only univariate parametric models have been discussed, the results can be easily extended to the case of multivariate and non-parametric models. Further, although all analysis and calculations have been carried out assuming a GEV distribution, the findings could be useful for other distributions whose quantile function has the same structure as the one in Eq. (2), such as the Generalised Pareto distribution which is typically employed when analysing peaks-over-threshold records and has also been used to detect changes in environmental extremes (Silva et al. 2016, 2017). In that case, changes in the location parameter would correspond to changes in the threshold above which value are considered to be extremes. This has already been already proposed in the literature (see for example Roth et al. 2012; Eastoe and Tawn 2009), although with no specific consideration of the impacts of the modelling assumption on the quantile estimation.

The results from this study show that extreme value models routinely used in the applied literature can lead to very different assessments of change, but the consequences of the different modelling choices on the estimated design events are rarely discussed and explored. Moreover, many of the models typically employed entail a varying coefficient of variation, leading to changes in quantile estimates which are difficult to study under different values of the covariates and cumbersome to reconcile with the estimates derived under the fixed-parameters models. A wider discussion on what types of changes can be expected under different conditions (see for example Sharma et al. 2018), might help in clarifying what model structures are more apt to capture the expected changes. Possibly, the modelling options which are currently explored are not the ones which best describe the changes expected from a warming climate or an increase in population and urban areas. Using simple metrics which describe changes such as those proposed in Eqs. (4) to (7) can be useful to explore the consequences of different models structures on the estimation of quantities which are routinely extracted from any estimation procedure. By comparing the expected changes to the types of changes which can be described under different models, it would be possible to ensure that models of change employed the most suitable parametrisation to describe what is perceived to be the most likely direction and shape of change. As shown, sentences like “We expect all quantiles to have changed in time by a constant factor” or “We would expect events of AEP smaller than 0.02 to be larger in the future and events with AEP larger than 0.02 to be smaller” can be translated into models with specific parametrisation and possibly constrained values for certain parameters. It is important that the description of change of the quantities of interest derived from change-permitting models can be interpretable and therefore considered credible by the end users. We believe the model based on \(\tau\) presented in Eq. (3), in which the scale is allowed to change proportionally to the location, is a first step towards such models, since it allows for the ratio between quantiles to be the same for all exceedance probabilities, and therefore provides a direct communication of how changes in the distributions affect the final main output of interest.