1 Introduction

The wide-reaching observed and predicted ecological and socioeconomic consequences of contemporary climate change (IPCC AR5 2014) create a demand for regional-scale estimates of twenty-first century climate evolution to allow careful planning in both public and private sectors to mitigate impacts (IPCC AR5 2014). One tool to achieve this is General Circulation Models (GCMs), which are based on our understanding of atmospheric physics and allow us to model climates in dynamic equilibrium with prescribed forcings. The Representative Concentration Pathways (RCPs) describe different forcing scenarios that correspond to possible trajectories of demographic and technological developments (IPCC AR5 2014), and drive GCMs to simulate possible near future (twenty-first century) climates. While GCMs generally offer accurate estimates of many elements of the climate system, meso-scale processes are often poorly represented, leading to inaccurate estimates of aspects of climate change on a more regional scale (e.g. Meehl et al. 2007). However, these estimates are needed to assess regional-scale impacts, such as changes in the hydrological cycle (Diaz-Nieto and Wilby 2005), glacial retreat (Mutz et al. 2015), air quality (Hogrefe et al. 2004), local industry and human health (Patz et al. 2005), and allow for effective planning and mitigation strategies. The GCMs’ shortcomings on such scales are typically circumvented by dynamical downscaling with Regional Climate Models (RCMs) (e.g. Maussion et al. 2011; Mannig et al. 2013), or statistical downscaling (e.g. Eden et al. 2012).

Empirical statistical downscaling (ESD) is an umbrella term for methods that statistically relate local expressions of the climate system, such as temperature or precipitation recorded at weather stations, to relevant synoptic- to meso-scale atmospheric variability (e.g. Hewitson et al. 2014). For example, coarse climate model output with location-dependent systematic errors in rainfall may be fitted to local-scale observational data of rainfall (e.g. Eden et al. 2012). The resulting statistical model may then be coupled to climate or weather model output and use these large-scale conditions as predictors for local-scale rainfall, i.e. the target variable or predictand. Such statistical modelling is well established in the climatological and meteorological community (e.g. Glahn and Lowry 1972; Hansen and Emanuel 2003; Shongwe et al. 2006; Paeth 2011; Mutz et al. 2015). It is computationally less expensive than dynamical downscaling and implicitly takes in-situ geographical realities, such as local topography and micro-climate controls, into consideration without the need to explicitly parameterise these as is the case for RCMs. Furthermore, statistical downscaling avoids propagation of systematic errors by GCMs and errors pertaining to parameterisation schemes (e.g. Errico et al. 2001; Hewitson et al. 2014). While dynamical downscaling over South America has significantly advanced in the past two decades (Solman 2013; Giorgi and Gutowski 2013), the ESD community in South America is incipient. The potential for ESD in the region has not been explored as thoroughly as in other parts of the world (CORDEX 2020). The high potential for ESD is demonstrated by the dense network of automated weather station (AWS) maintained by the Meteorological Directorate of Chile (Fig. 1), and researched links between regional climate and synoptic-scale atmospheric variability, such the Antarctic Oscillation (Garreaud et al. 2013).

Fig. 1
figure 1

Topography and location of the 24 numbered weather stations in Chile. The stations are numbered by their latitudinal position with 1 being the northernmost station and 24 being the southernmost station. See supplemental Table 1 for station information

Chile’s large latitudinal extent and prominent orography lead to a variety of synoptic- and meso-scale climate controls. The El Niño Southern Oscillation (ENSO), South Pacific Anticyclone (SPA) and mid-latitude westerlies are important controls on climate and weather in Chile (Fig. 2). This results in large temperature and precipitation gradients in Chile (Fig. 3). ENSO affects northern Chile directly, leading to anomalously high temperature and precipitation during El Niño phases relative to non-El Niño phases (e.g. Rutllant and Fuenzalida 1991; Garreaud et al. 2009), and impacts southern Chile indirectly via atmospheric teleconnections (e.g. Garreaud et al. 2009). The Andes mountains cool sea surface temperatures (SSTs) off the west coast of South America by evaporative cooling, thereby creating an enhanced North-to-South cooling gradient in the east Pacific that deflects the intertropical convergence zone (ITCZ) and related convective precipitation north (Takahashi and Battisti 2007). The extreme aridity in northern Chile can be explained by a cold marine boundary layer (MBL) and subsidence associated with the quasi-permanent SPA (Fig. 2), which is stabilised by the cold SSTs associated with the Humboldt current. When combined, these processes create a thermal inversion that drastically inhibits convection and rainfall. A stratocumulus layer forms in the upper MBL, thereby reducing solar insolation below and ensuring cold (near-)surface temperatures (Rutllant et al. 2003). Furthermore, the Andes create a topographic barrier that prevents advection of moist air from the East, thus contributing to northern Chile aridity. Variations in aridity can, at least in parts, be explained by the Pacific Decadal Oscillation (e.g. Schulz et al. 2011), a multidecadal mode of climate variability in the Pacific (Mantua and Hare 2002). While the Pacific Decadal Oscillation (PDO) shows links to low-frequency fluctuations in precipitation in northern Chile, connections between low-frequency precipitation fluctuations and the Atlantic Multi-Decadal Oscillation (AMO) are significant in north and south Chile (Valdés-Pineda et al. 2018). The Antarctic Oscillation (AAO) (Rogers and Loon 1982; Thompson and Wallace 2000) is related to a low pressure field above Antarctica and the belt of westerly winds around it. The AAO is partially coupled to ENSO (Fogt and Bromwich 2006). A positive AAO phase is associated with anomalously low pressure over Antarctica and relatively high pressure in Southern Hemisphere mid-latitudes, and results in the contraction of the belt of westerlies; this directs westerlies southward and thus impacts temperature and rainfall variation in south Chile (e.g. Gillett et al. 2006). More specifically, a positive mode of AAO is associated with increased surface pressure, warmer temperature, and lower precipitation south of 40° S (e.g. Garreaud et al. 2009). ENSO and AAO are related to higher frequency changes in precipitation, esp. in northern and southern Chile respectively, whereas the PDO and AMO affect low-frequency variability (Valdés-Pineda et al. 2018). Finally, the uplift of low-level westerly winds over the western side of the Andes produces strong orographic precipitation in southern Chile (esp. the Patagonia region).

Fig. 2
figure 2

The major large-scale controls on Chilean climate on the 24 weather stations (grey circles) include a more direct influence of the El Niño Southern Oscillation (ENSO) and the South Pacific Anticyclone (SPA) and subsidence in the north, and mid-latitude westerlies in more southern parts of Chile

Fig. 3
figure 3

ERA-Interim mean annual near surface (2 m) temperatures (left) and mean annual precipitation (right) highlight the pronounced climate gradients in Chile due to orographic and latitude-specific forcings

This study complements previous regional studies of Chilean climate and estimates of future climate change in South America by investigating large-scale climatic controls on regional climate, and downscaling GCM-based temperature estimates for the twenty-first century. The specific aims of the study are:

  1. 1.

    To find large-scale predictors for local temperatures based on existing knowledge of Chilean climatology, and to quantify their predictive skill.

  2. 2.

    To train empirical-statistical downscaling models for a set of 24 suitable meteorological stations, and to test their performance.

  3. 3.

    To apply the trained models to GCM simulations of twenty-first century climates to estimate local temperature responses to different forcing scenarios.

We address these aims by (a) training the statistical models for a set of suitable meteorological stations across Chile with appropriate synoptic and meso-scale predictors, thus building on previous studies of Chilean climate, and (b) applying ESD directly to GCM output, thus circumventing the (computationally) expensive intermediate step of RCM nesting.

2 Data and methods

2.1 Weather Stations

Weather station data of the Dirección Meteorológica de Chile (DMC) was digitised to serve as predictand time series for the ESD models developed for this study. These models are empirical in nature and therefore require sufficiently long records of observation for training and validation. Weather stations with short records were thus excluded from the study. There is no universally valid threshold for the number of years of observational data required for training, since the goodness of the model is partially determined by whether or not the predominant predictor-predictand relationships are observed in the recorded time window (e.g. Hewitson et al. 2014). To ensure the inclusion of potentially useful short weather station records, the minimum record length is set to 10a. The models are calibrated using monthly means of 2 m air temperature with previously filtered annual cycle. Therefore, only data with temporal resolution of at least 1 month was included. The 24 weather stations that fulfill these criteria cover the latitudinal span of Chile, as well as different altitudinal ranges and proximity to the Pacific Ocean and the Andes. A summary of station ID’s, names, coordinates and altitude of their location is provided in Table 1.

Table 1 Weather station IDs, names, locations (latitude, longitude and altitude above sea level)

2.2 ERA-interim reanalysis

ERA-Interim reanalyses provided by the ECMWF (European Centre for Medium Range Weather Forecasting) are based on records from various observation systems that are dynamically interpolated with numerical forecasting in a data assimilation scheme to produce a spatially homogenous global dataset covering the time period from 1979 to present-day (Dee et al. 2011). Its spatial resolution corresponds approximately to a horizontal 80 × 80 km grid with 60 vertical levels from the surface to 0.1 hPa, and has a temporal resolution of 6 h. For the purposes of this study, monthly values of 2 m air temperature (t2m), mean sea level pressure (msl), zonal wind at 850 hPa (u850), meridional wind at 850 hPa (v850) and geopotential height at 700 hPa (h700) are derived from the dataset.

2.3 GCM simulations

General Circulation Models (GCMs) simulate global climate on the basis of our understanding of atmospheric physics. They are a well established tool in atmospheric sciences and used to investigate atmospheric processes, contemporary climate change (IPCC 2014 and references therein) and reconstruction of past climates (e.g. Haywood et al. 2010, Bracannot et al. 2012, Mutz et al. 2018). GCMs simulate climates in dynamic equilibrium with the prescribed boundary conditions. When those boundary conditions correspond to various RCP (Representative Concentration Pathways) scenarios, reflecting climate forcings due to possible demographic and technological developments in the near future (Moss et al. 2010), GCMs are able to provide estimates of possible future climates (IPCC 2014 and references therein). Simulations of the Max Planck Institute (MPI) Earth System Model (ESM) MPI-ESM, as provided by the World Climate Research Programme’s (WCRP) Coupled Model Intercomparison Project phase 5 (CMIP5) (Taylor et al. 2012), provide the basis for the downscaling efforts of this study. More specifically, the trained ESD models (Sect. 2.5) are driven with MPI-ESM simulations based on RCP scenarios of RCP2.6, RCP4.5 and RCP8.5 (Moss et al. 2010), so that the downscaling results represent local-scale responses to a wide range of possible future scenarios. In addition to these simulations, Atmospheric Model Intercomparison Project (AMIP) style simulations for present-day climate are coupled to the ESD models and used for calibration.

2.4 Predictors

2.4.1 Selection criteria

Prior to automatic selection and elimination of predictors in the model training procedure (Sect. 2.5), a set of reasonable predictors must be identified. Two major criteria have to be met by predictors in this pre-selection step: (1) Physical relevance to the predictand and (2) adequate representation of predictors in GCMs. The reasons for these restrictions are as follows:

  1. 1.

    In ESD models, predictors represent the large-scale atmospheric variability that is physically related and empirically relevant to the prediction of the local-scale atmospheric variability of predictands at each of the weather stations. Since the training of these models is based solely on empirical relationships, a careful selection of physically meaningful predictors ensures that physically irrelevant variables showing spurious correlations with predictands in the observation time period are not included in the models. Therefore, the predictors in Table 2 were chosen. Details about their construction are given below.

  2. 2.

    The ESD models in this study are based on training with ERA-interim reanalyses are part of the family of models based on the Perfect Prognosis (PP) rather than Model Output Statistics (MOS) approach (Lowry and Glahn 1969). While MOS fits the ESD models with simulated predictor data, such as output produced by GCMs, the PP approach has the advantage of training based on observational records (weather station and reanalysis data). This approach is chosen to allow better physical interpretation of model transfer functions. As a result of choosing the PP approach, further restrictions have to be placed upon the selection of predictors. Since the PP approach does not take model biases into account implicitly, as is the case for MOS-style model training, and the ESD models are coupled to GCM simulations, the chosen predictors have to be adequately represented in GCMs to ensure more robust estimates.

Table 2 Acronym and full name of predictors used in the training of empirical-statistical downscaling models

2.5 Predictor construction

For downscaling of temperature at least one atmospheric circulation describing predictor (e.g. sea level pressure) and one temperature predictor (e.g. 2 m air temperature) should be included in the training procedure (Huth 2002). For each station, regional (500 km scale) means of 2 m air temperature (t2m), mean sea level pressure (msl), geopotential height at 700 hPa (h700), and meridional and zonal winds at 850 hPa (v850 and u850 respectively) were used. Due to the known influence of ENSO (e.g. Rutllant and Fuenzalida 1991) and the AAO (e.g. Garreaud et al. 2009) on climate in northern and southern Chile respectively, indices for these atmospheric phenomena serve as additional predictors. For the former, the Multivariate ENSO Index (MEI) (Wolter and Timlin 1993) was computed by means of a principal component analysis (PCA) without the clustering step, following the approach of Wolter and Timlin (2011). The loading patterns associated with this particular index are shown in supplemental figure S01. Furthermore, the inclusion of cloudiness was omitted following the ECMWF’s recommendations due to inconsistencies in its calculations (Hennermann 2016). An index for the atmospheric variability associated with the AAO (AAOI) was computed in another PCA by projecting the 700 hPa anomaly field south of 20° S onto the loading patterns from the Climate Prediction Center (CPC) at the National Centers for Environmental Prediction (NCEP) (NOAA-CPC 2020). Finally, following (Garreaud et al. 2009), the first principal component of Pacific SST anomalies north of 20°N was used as an index for the PDO (PDOI). For consistency, the same ENSO, AAO and PDO loading patterns were used for calculating the MEI AAOI and PDOI from GCM output for the investigated RCPs. The full set of predictors described above is summarised in Table 1.

2.6 Model training and validation

The climatic and geographic setting for the 24 weather stations (Sect. 2.1) are highly varied. One of the strengths of ESD is the implicit consideration of highly localised conditions. We exploit this strength in this study by training models for the 24 stations separately using linear parametric models. Such models are commonly used in downscaling (e.g. Huth 2002; Zorita and von Storch 1999) due to the ease of interpretation of results, a possible increase predictive ability due to restriction of function complexity (Hastie et al. 2001) and better predictive ability outside the training data value range when compared to equivalents such as Random Forests regressions (e.g. Pollinger et al. 2017). At its core, model training consists of a cross-validated stepwise multiple linear regression procedure with random bootstrapping and forward selection (e.g. von Storch and Zwiers 1999). The models themselves comprise a set of constants and transfer functions that linearly relate a set of predictors, denoted as the matrix X, to the predictand y. More specifically, an additive model was assumed, such that:

$$y = {\beta_0} + \beta \cdot X + \varepsilon ,$$

where ε is the variance of y that cannot be explained by the set of predictors X, vector β comprises the regression coefficients for each of the large-scale predictors and represents their influence on a local-scale predictand, and vector β0 is the y intercept of the linear function that includes constant local effects and will no longer be mentioned from here on for the sake of brevity. For m number of predictors and n observations, y, β and X are:

$$\left(\begin{array}{c}{y}_{1}\\ {y}_{2}\\ \vdots \\ {y}_{n}\end{array}\right) , \left(\begin{array}{c}{\beta }_{1}\\ {\beta }_{2}\\ \vdots \\ {\beta }_{m}\end{array}\right) \text{and} \left(\begin{array}{c}{x}_{11}{x}_{12}\dots {x}_{1m}\\ {x}_{21}{x}_{22}\dots {x}_{2m}\\ \vdots \\ {x}_{n1}{x}_{n2}\dots {x}_{nm}\end{array}\right).$$

The model is fitted with observation-based (ERA-interim) data and thus belongs to the Perfect Prognosis rather than the Model Output Statistics model families (Lowry and Glahn 1969). The total record length n for each station is subdivided into training (1979–2000) and validation (2001–2015) periods. The focus lies on models trained from 1979 to 2000 and validated in 2001–2015, following established conventions (e.g. CORDEX 2020). Additional models are trained from 2001 to 2015 and the full observational period 1979–2015 to evaluate the sensitivity of results to ESD model training periods. For data within the training period, the model is trained through a successive set of iterations. For each iteration, a random set of observations is retained for in-training cross-validation (Michaelsen 1987). For this, contiguous blocks of data are excluded following recommendations for potentially auto-correlated data (Roberts et al. 2017) to avoid overfitting. This results in the exclusion of a third of the data. For the remaining data, the above problem is solved using a multivariate ordinary least-squares (OSL) regression on a subset of predictors, yielding parameter estimates \(\widehat{\beta }\) that allow prediction \(\widehat{y}\) for the retained data of the validation set. In the forward selection approach (e.g. von Storch and Zwiers 1999; Mutz et al. 2015), the initial models start with an empty set of predictors and predictors are added to the set successively if they contribute to the significant improvement of the model. The measure for significant improvement is an increase in the variance of y that can be explained by the inclusion of the predictor into the model. This was assessed by computation of the coefficients of multiple determination (R2):

$${R}^{2}=1-\frac{{\sum }_{i=1}^{n}{\left({y}_{i}-\widehat{{\beta }_{0}}-{\sum }_{j=1}^{m}\widehat{{\beta }_{j}}\cdot {x}_{ij}\right)}^{2}}{{\sum }_{i=1}^{n}{\left({y}_{i}-\overline{y}\right)}^{2}}$$

following a full multiple regressions for the current active set of predictors and every predictor not yet included (von Storch and Zwiers 1999). The distance of predicted from observed predictand values, i.e. the general model accuracy, is given by the root mean square error:

$$RMSE=\sqrt{\frac{{\sum }_{i=1}^{n}{\left(\widehat{{y}_{i}}-{y}_{i}\right)}^{2}}{n}}$$

Once the RMSE decreases following the inclusion of a predictor, the predictor in question is excluded and the predictor set for the iteration is established. After completion of all iterations, the model for each station is finalised by averaging \(\widehat{\beta }\) over all iterations and imposing a robustness filter that permits inclusion of a predictor into the final model only if it significantly increased the explained variance of y in at least 50% of all iterations (Mutz et al. 2015). Once the models are constructed, they are applied to predict \(\widehat{y}\) for the independent validation period (2000–2015). While the synoptic RSME and R2 from cross-validation are useful model performance measures for accuracy and explanatory power respectively, the RSME and prediction score were calculated additionally based on validation with completely independent data. The latter is a measure for how much better the prediction is than predicting only the annual cycle. Finally, the entire procedure was repeated with a predictor set that excludes regional 2 m air temperatures, since this allows better interpretation of results in the context of underlying mechanisms.

2.7 GCM coupling and prediction of local-scale twenty-first century response

Each of the MPI-ESM simulations were forced with boundary conditions for different twenty-first century climate forcing scenarios RCPs 2.6, 4.5 and 8.5. For these simulations, the same predictors described above (Sect. 2.5) were extracted from the model output. The index-based predictors (MEI, AAOI and PDOI) were reconstructed by projection of anomaly fields, created by subtracting the means, onto the loading patterns of the predictors used for model training to ensure a consistency in the meaning of index values (e.g. Mutz et al. 2015). The 24 ESD models were then directly applied to the predictor sets for the RCP2.6, RCP4.5 and RCP8.5 scenarios to generate a 21st time series of local temperature responses for the weather stations’ coordinates. Finally, seasonal (DJF, MAM, JJA, and SON) means, annual means and 20a climatologies (2040–2060 and 2080–2100) were calculated to provide summaries of 2 m air temperature response predictions for each station. Figure 4 outlines the general method applied in this study, including predictor construction, model training, predictor-reconstruction from twenty-first century GCM simulations and prediction of local temperature responses based on the coupling of ESD models to GCM-based estimates.

Fig. 4
figure 4

Predictors for local temperature variability in Chile are created from reanalysis data. The constructed predictors and weather station temperature data are used in the model training and validation procedure to generate ESD (Empirical Statistical Downscaling) models and accompanying performance metrics for each of the weather stations. Predictors are reconstructed from GCM (General Circulation Model) output and drive the same ESD models to generate estimates of the local temperature response in Chile to twenty-first century climate change under different RCP forcing scenarios

3 Results

3.1 Model summaries

The training and validation procedure (Sect. 2.5) for the ESD models was carried out for all 24 stations, 3 training time periods (1979–2000, 2001–2015 and 1979–2015) and 2 subsets of predictors (the full set and the set omitting t2m). This combination of parameters results in a total of 144 models. The focus of the results section lies on the models trained in 1979–2000, because they (a) can be validated in a completely separate period, (b) consider more data for training than models based on 2001–2015 observations, and (c) adhere to established conventions for the training period (e.g. CORDEX 2020). Following validation, these models were coupled to GCM outputs (Sect. 3.2). The exact model parameters for the 24 models trained with the full predictor set in 1979–2000 are listed in supplemental Table 1.

Three different metrics are used to assess the strength of the models. The RMSE can be interpreted as a measure of the overall model accuracy, while the score describes now much better the prediction is in comparison to predicting the annual cycle (supplemental table 2). The total explained variance of each model (Figs. 5, 6) represents the amount of variability of the predictand time series, namely temperature anomalies at each weather station, that can be explained by the trained models. Since additive models are used, the total explained variance can be split into the fractions of the predictands’ variance that can be explained by individual predictors. Since this the most intuitive and versatile measure allowing more in-depth interpretation and discussion, the model descriptions focus primarily on this, while the other performance metrics (RMSE and scores) are included in the supplemental material (supplemental table 2).

Fig. 5
figure 5

Predictor-specific partial explained variances of weather station temperatures. Colours correspond to specific predictors and the stations are ordered from left to right by their latitudinal position (north to south). The complete predictor set consists of the Multivariate ENSO Index (MEI, green), Antarctic Oscillation Index (AAOI, purple), Pacific Decadal Oscillation Index (PDOI, brown), regional means of 2 m air temperature (t2m, red), regional means of mean sea level pressure (msl, blue), regional means of zonal wind speeds at 850 hPa (u850, orange), regional means of meridional wind speeds at 850 hPa (v850, beige) and regional means of geopotential height at 700 hPa (h700, pink)

Fig. 6
figure 6

3a moving average of twenty-first century mean annual temperature anomalies for each station as observed (grey) and estimated based on empirical-statistical downscaling models and GCM simulations forced with scenarios RCP2.6 (blue), RCP4.5 (purple) and RCP8.5 (red). The reference for anomalies are observation based long term means for 1979–2000

The ESD models trained on 1979–2000 data with the full set of predictors (Table 2; MEI, AAOI, PDOI, t2m, msl, u850, v850, h700) are associated with explained variances of ca. 40–90% (Fig. 5). The models’ ability to explain predictand variance generally increases with (higher) latitude. The dominant predictors are 2 m air temperature and MEI, followed by regional pressure fields and meridional wind speed components. While the contribution of the MEI to the total explained variance is the greatest for most of the northern weather stations, it decreases with higher latitudes and creases to contribute significantly for stations south of Quintero (32.8° S). The RMSE and scores range from 0.3 to 1.0 and 0.3 to 0.9 respectively (supplemental table 2). The scores increase with higher latitudes as is the case with explained variances. The exclusion of t2m from the predictor set results in lower (30–70%) explained variances especially in higher latitudes. Regional pressure fields are able to explain ca. 50% of extratropical temperatures, and meridional and zonal wind speed components contribute additional 2–20% and 2–8% respectively in Chile’s temperate zone (supplemental figure S02).

3.2 Downscaling of GCM simulations

The coupling of the 24 models (Sect. 3.1) to twenty-first century global climate simulations forced with RCP2.6, RCP4.5 and RCP8.5 result in twenty-first century monthly time series for each station and RCP emission scenario. Results are presented as deviations from the respective annual temperature means at each station. For clarity, the annual means computed from these time series were low pass filtered using a 3 year moving arithmetic mean with central-year assignment (Fig. 6). The modelled regional responses to different RCPs only show a significant deviation from each other from mid-century onward. On the other hand, differences in 20a end-of-century climatologies (Fig. 7) are very pronounced with predicted responses ranging from ca. − 0.5–2 K, 0.5–3 K and 2–7 K for RCP2.6, RCP4.5 and RCP8.5 respectively. The stations experiencing the highest positive end-of-century temperature deviations also experience higher mid-century temperature deviations than other stations (supplemental figure S04). There is no apparent association between the magnitude of deviations and proximity to the ocean, latitude, altitude or annual means in the observation period. The construction of seasonal (DJF, MAM, JJA and SON) end-of-century 20a climatologies (Fig. 8) reveal uneven warming among seasons in response to RCP forcings. More specifically, several stations (e.g. 7, 8, 11, 13, 19–24) experience accentuated warming in austral summer (DJF) and autumn (MAM), while others show accentuated warming only in autumn (Fig. 8). For example, end-of-century temperature climatologies for RCP8.5 (Fig. 6, bottom panel) have higher values in austral autumn (MAM) than most other seasons, indicating a prolonged summer (or longer warm season). Accentuated warming in winter (JJA) and spring (SON) under scenario RCP8.5 are experienced by stations 2, 5 and 12, which are among the stations modelled to experience the most warming overall (Fig. 8). While the relative seasonal temperature change pattern across stations is similar for all scenarios, the warming modelled for RCP4.5 is less pronounced. Stations and seasons for which relatively little warming is modelled in RCP8.5 and RCP4.5 experience slight cooling in RCP2.6. Stations and seasons for which the strongest warming is predicted in RCP8.5 and RCP4.5 still experience warming in RCP2.6. Mid-century 20a climatologies of seasonal means (supplemental figure S05) show preferential warming in the same seasons.

Fig. 7
figure 7

Climatologies of observed mean annual temperatures (grey) and climatological anomalies of mean annual temperatures based on empirical-statistical downscaling models and GCM simulations forced with scenarios RCP2.6 (blue), RCP4.5 (purple) and RCP8.5 (red). RCP anomalies are based on end-of-century (2080–2100) means and reference climatologies created from 1979 to 2000 observations

Fig. 8
figure 8

End-of-century (2080–2100) climatological anomalies of mean seasonal temperatures based on empirical-statistical downscaling models and GCM simulations forced with scenarios RCP2.6 (top), RCP4.5 (middle) and RCP8.5 (bottom) for each station and seasons DJF (December–January–Febuary), MAM (March–April–May), JJA (June–July–August) and SON (September–October-November). The reference for anomalies are observation based long term means for 1979–2000

4 Discussion

4.1 Model evaluation and controls on regional temperatures

Overall, the explained variances, RMSEs and scores of ESD models for regional temperatures were high (40–90%, 0.3–0.9 and up to 0.9 respectively), indicating a strong dependence between regional temperatures and large-scale atmospheric controls, as well as the ability of ERA-interim and the GCM (MPI-ESM) to adequately represent those. The previous model performance metrics indicate that the ESD-based predictions work best in Chile’s higher latitudes, such as for the stations near Concepción and Punta Arenas. This finding is consistent with reports of downscaling producing better results in wet regions, particularly in the mid-latitudes, than in dry regions (Cavazos and Hewitson 2005; Fowler et al. 2007). The high explained variances in these mid-latitudes by 2 m air temperature alone indicate that implicitly considered conditions and feedbacks in the vicinity of weather stations determine a large part of mid-latitude model parameters and appear to stay constant over the observational period. The scores and RMSEs that were computed for stations south of Santiago Pudahuel (33.4°S) for in training period 1979–2000 are similar to those computed for training period 2001–2015, which indicates very little change in the local geographical conditions and predictor-predictand relationships over the entire observational period. This temporal stationarity lends more confidence in the predicted twenty-first century regional temperature response based on higher-latitude (33.4° S–5° S3) models. The reasons for the highly variable scores and mostly lower explained variances and RMSEs for low-latitude stations in northern Chile (18.3° S–33.4° S) are varied. Generally, it can likely be attributed to non-stationarity of transfer functions due to different conditions in the training and validation periods, and to observational records not being sufficiently long to capture all prediction-relevant dynamics. For example, the selection of the MEI in the model training procedure suggests a more direct impact of ENSO on regional temperatures. The period 1979–2000 experienced 2 very strong (MEI = 2.5–3) and 2 strong (MEI ≈ 2) El Niño years, whereas the period 2001–2015 only experienced one strong El Niño year. Therefore, models for northern stations like Arica Chacalluta (18.3° S), Iquique Diego Aracena (20.5° S) and Antofagasta Cerro Moreno (23.4° S) do not have the same high skill (score = 0.5–0.8) as the higher-latitude models despite the relatively long observation records at these stations.

When coupled to Atmospheric Model Intercomparison Project (AMIP) style GCM experiments (Taylor et al. 2012) for the same observational period, the ESD models reproduce low frequency temperature oscillations well (cf. Examples in supplemental figure S03). Given that the GCM experiments are forced with updated observed sea surface temperatures, it is reasonable to expect temperatures that are controlled by large scale atmospheric oscillations like ENSO and PDO to be well predicted. This is on par with similar findings by Lau and Nath (1994) for ENSO, Hoerling et al. (2001) for NAO and Sutton and Hodson (2005) for AMO.

While 2 m air temperature has been demonstrated to be the most powerful predictor overall, it does not explain the underlying controlling mechanisms. ENSO has a strong direct influence in northern Chile, as apparent by the explained variances (Fig. 5 and supplemental figure S03), low frequency features and timing of minima and maxima (supplemental figure S04) in our models. On the other hand, temperatures in central and southern Chile are governed by frontal systems, as suggested by high explained variance by mean sea level pressure and 700 hPa geopotential height fields (supplemental figure S03). Exclusion of the latter predictor resulted in its replacement by zonal winds at the 850 hPa level (not shown here) for stations between 30°S and 40°S, which further supports the influence of westerlies on Chilean climate (Garreaud et al. 2009). The AAOI has not proven a robust predictor despite the known influence of the AAO (e.g. Garreaud et al. 2009). This may be due to its variability already being captured by the regional means predictors. While the PDO has demonstrated to be of relevance to Chilean climate (e.g. Valdés-Pineda et al. 2018), its influence was not captured in the ESD models. This is likely due to the relatively short observational records that do not adequately cover the PDO’s low-frequency fluctuation on scales of 15, 25, 50 and 70 years (Mantua and Hare 2002).

4.2 Twenty-first century regional temperature response in Chile

The RCP-specific ensembles of predicted temperature anomalies for each station begin to diverge significantly around the mid-century (2040–2050), resulting in end-of-century regional climates with annual mean temperatures that increase by ca. − 0.5–2 K, 0.5–3 K and 2–7 K for RCP2.6, RCP4.5 and RCP8.5 respectively. This warming is unevenly distributed among seasons, so that many of the stations experience accentuated warming during the austral summer (DJF) and autumn (MAM). Some stations show accentuated warming only in autumn, indicating a prolonged warm season. These changes in annual temperature means and seasonality likely impact vegetation, ecosystems, local industry, adaptation and conservation strategies. For example, these predicted changes modify the climatic factors considered in the classification of Chile’s terrestrial ecosystems that are conducted to facilitate conservation efforts (Martínez-Tilleria et al. 2017).

4.3 Comparison to other studies

This study’s selection of predictors is based on previous studies examining large-scale control on Chilean weather and climate (e.g. Rutllant and Fuenzalida 1991; Rutllant et al. 2003; Montecinos and Aceituno 2003; Garreaud et al. 2013; Valdés-Pineda et al. 2018). Our results consolidate their findings with some limitations: (1) The AAOI was not chosen as a robust large-scale predictor for regional temperatures. However, given the climatic setting and findings of previous studies (e.g. Garreaud et al. 2009), this is likely due to multicollinearity among predictors and the relevant AAOI variability already captured by other predictors like regional-scale temperature means. (2) The exclusion of the PDO as a robust predictor is likely caused by the weather station records that do not adequately cover the PDO’s low-frequency fluctuations and its influence on regional climate (e.g. Valdés-Pineda et al. 2018). Souvignet et al. (2010) performed ESD of temperature (and precipitation), but the study was regionally restricted to a watershed in north-central Chile. Prior to this study, no ESD-based modelling of twenty-first century temperature changes has been conducted with the spatial coverage and the 24 stations presented here.

4.4 Assumptions and limitations

ESD is based on the premise that large-scale atmospheric variability leads to local-scale changes in atmospheric conditions (von Storch 1995), and assumes that empirical relationships between predictors and predictands have an underlying physical relationships, so that the statistical transfer functions of the ESD models continue to be valid through time (Zorita and von Storch 1999). The latter is known as the assumption of temporal stationarity of transfer functions. This study has largely demonstrated the validity of this assumption for the completely independent validation period. While other studies demonstrated decent performance of related techniques for other regions in the pre-industrial (Reichert et al. 1999) and Last Glacial Maximum (Vrac et al. 2007), temporal stationarity may not be given for more dramatically different climates and through geological time. Since highly localised effects of vegetation- and topography are only implicitly taken into consideration, changes in such geographical conditions would likely result in different optimal transfer functions.

Another major caveat concerns models, in which ENSO is an important predictor. The accuracy of predictions made by such models depends on the adequacy of ENSO representation and accuracy of ENSO prediction by the GCMs they are coupled to. CMIP5 coupled ocean–atmosphere GCMs only showed modest improvements in ENSO representation compared to CMIP3 models. These include slightly better representations of latent heat flux feedbacks, amplitude and seasonal phase locking (Bellenger et al 2014). However, many challenges in ENSO modelling remain. Persistent problems include inaccuracies in amplitude and phase lock, the simulation of the switch between positive and negative shortwave feedbacks in the eastern equatorial Pacific, which only 30% of CMIP5 models are able to achieve, and a significant, ableit reduced, cold bias in the western Pacific (Bellenger et al 2014). The GCM used in this study shows relatively good ENSO representation, as assessed by scores in primary characteristics of amplitude, structure, spectrum and seasonality (Bellenger et al 2014). Nevertheless, the modelled ENSO response to climate change strongly depends on the GCM and predictions vary even across models with better ENSO representation. The results of this study are therefore biased towards the MPI-ESM representation and prediction of ENSO and will likely improve only with more significant leaps in the accuracy of modelled ENSO characteristics and behaviour.

5 Conclusions

We presented a first step in empirical-statistical downscaling of temperatures across Chile based on digitised records from 24 weather stations provided by the DMC. We employed a cross-validated stepwise multiple linear regression with a literature-informed predictor set and perfect prognosis approach to train the ESD models. We then coupled these to GCM simulations forced with RCP2.6, RCP4.5 and RCP 8.5 scenario conditions to generate estimates for the regional twenty-first century temperature response to said scenarios. The key conclusions of this study are:

  • Regional 2 m temperature means are the most robust predictors for temperatures of mid-latitude Chile, while the Multivariate ENSO Index explains most of the lower-latitude temperature variance in northern Chile.

  • The described approach and predictor set works well for the investigated stations. This is demonstrated by explained variances of up to 70% in northern Chile and up to ca. 90% in southern Chile and other performance metrics. The performance of mid-latitude models is higher than that of low-latitude models, suggesting the approach is more suitable for the westerlies dominated climate in Chile.

  • Circulation based predictors based on pressure fields and winds explain up to ca. 50% and 70% weather station temperature variance in northern Chile and southern Chile respectively when regional temperatures means are removed from the analysis.

  • Downscaling based end-of-century local temperature responses range from − 0.5–2 K, 0.5–3 K and 2–7 K for the RCP2.6, RCP4.5 and RCP8.5 scenarios respectively. The warming for RCP4.5 and RCP8.5 is accentuated in summer and autumn (DJF, MAM) at several stations across Chile, suggesting a prolonged warm season.

There are numerous possible improvements for empirical-statistical downscaling in Chile, and ways to build upon the approach of this study. These improvements could include (1) a refinement of predictors, such as an AAOI based on coastal Chile pressure fields to improve the prediction performance of circulation based predictors, (2) preparation and inclusion of additional weather station records to improve coverage and comparison to RCM based estimates, (3) the inclusion of other GCM simulations and investigation of ensembles to better assess the robustness of predictions and discrepancies in predictions arising from different modelled ENSO responses to climate change, and (4) the application of hybrid models to exploit the superior skills of random forests regressions within the observational range of values and the superior ability of parametric regression based models to predict values outside the observational range (e.g. Pollinger et al. 2017) to improve overall performance.