1 Introduction

Climate conditions are expected to change substantially in the coming decades. Projections discussed in the last report of the Intergovernmental Panel on Climate Change (IPCC Stocker et al. 2013) show estimations that indicate a remarkable warming of the Earth system with significant changes in hydroclimate, sea level and other important climate parameters. In particular, the Iberian Peninsula (IP), as part of the Mediterranean Region, has been pointed out as one of the hot-spots of climate change where a substantial warming and drying, as well as an increase in the occurrence of extreme summer heat and drought events, is projected (Giorgi 2006; Giorgi and Lionello 2008). Projections of regional surface air temperature and precipitation anticipate clear impacts on many sectors, such as agriculture, livestock, health or water management for the end of the 21st century (Giorgi et al. 2004; Field 2014; Hoegh-Guldberg et al. 2018). These impacts will be produced by increases of temperature (especially in the warm season), modifications in the precipitation patterns and a decrease in the number of days with precipitation. Thus, a potential increase in the desertification and aridity is projected at lower latitudes of the IP (Gao and Giorgi 2008).

Global General Circulation Models (GCMs) or nowadays Earth System Models (ESMs) are the tools used to simulate future climate conditions. However, their horizontal resolutions of about 100–300 kms (Randall et al. 2007) limit their capability to reflect regional climate details, due to the simplified representation of surface properties (e.g. albedo, land uses, topography, von Storch 1995; Giorgi 2006). The downscaling approach bridges this gap and can be either of statistical or dynamic nature; hybrid approaches are also available (von Storch 1995). Statistical techniques train some mathematical function during a calibration period that represents the relationship between large scale predictor variables and regional/local scale predictants. Such relationships are later verified with independent data. Dynamical techniques use a Regional Climate Model (RCM) to produce climate simulations over a limited spatial domain with higher spatial resolution. Subgrid scale processes are parameterized as in GCMs, albeit typically with a higher diversity of physical options for a better representation of different climate types (Washington and Parkinson 2005). The RCM is driven with boundary conditions obtained from a GCM in the borders of the target area. In areas of relatively complex topography and great climate diversity like the IP (Tullot 2000), downscaling approaches and in particular RCMs enhance spatial resolution and can potentially improve the representation of surface and low atmosphere physics, with a noticeable added value with respect to GCM output.

Alternatively, if an accurate representation of the actual internal variability of the system is required, GCM runs including data assimilation, i.e. reanalysis runs (Saha et al. 2010), may be used to produce boundary conditions for a RCM, thus incorporating a realistic evolution of both past internal and externally forced variability (Kotlarski et al. 2014). Also, RCMs can be used assimilating observations to obtain a more accurate local representation of the climate conditions, e.g. González-Rojí et al. (2019) for the IP. Within this frame, RCMs can bridge the gap between the global/large scales and the regional scales and thereby provide regional fields, that can be used for a variety of applications like assessing the potential of the wind resource (Carvalho et al. 2012); studying the impact of climate change on ecosystems (Riedo et al. 1999); or driving chemistry-transport models (CTMs) used to estimate the air quality in a specific region (Colette et al. 2013).

Past regional simulation exercises over the European domain assessed the expected changes in relevant surface variables under future climate change scenarios (e.g. Déqué et al. (2005); Giorgi et al. (2009); Jacob et al. (2014)). The most recent examples within the European Coordinated Regional Climate Downscaling Experiment (EURO-CORDEX) initiative (Jacob et al. 2014) reached horizontal resolutions of 12.5 km. They were driven by global GCM outputs under forcing conditions defined by historical and climate change Representative Concentration Pathways (RCPs) scenario specifications (Taylor et al. 2012) within the 5th Phase of the Coupled Model Intercomparison Project (CMIP5). Such exercises require some level of evaluation and assessment of the realism of RCM performance have been done by comparing with observational data simulations of the same models driven by global reanalysis data. This allows for evaluating both model biases and the reproduction of internal variability captured by reanalysis (Jacob et al. 2007). For instance, Kotlarski et al. (2014) analyzed an initial EURO-CORDEX 17 member RCM ensemble driven by ERA-Interim (Dee et al. 2011) by focusing on near surface temperature and precipitation over the 1989–2008 period using the E-OBS dataset (Haylock et al. 2008) as the observational reference. Seasonally and regionally averaged temperature biases are smaller than \(1.5\, ^{\circ }\hbox {C}\), while precipitation biases are typically located in the ± 40\(\%\) range. García-Díez et al. (2015) used a 7 member WRF physics ensemble over Europe at 0.44\(^{\circ }\) resolution driven by ERA-Interim over the 2002–2006 period and compared them also to the E-OBS dataset (Haylock et al. 2008). Some winter negative biases over snow covered regions are due to a poor representation of snow-atmosphere interactions amplified by albedo feedbacks and sometimes alleviated by large cloud cover biases (García-Díez et al. 2015). Such analyses contribute to understand the uncertainties in simulations and help to have more reliable confidence assessments of climate change experiments.

At more reduced regional scales over specific countries, the examples of model evaluation are scarce. Over the Iberian Peninsula, Jiménez-Guerrero et al. (2013) used a variety of RCMs with a horizontal resolution of 25 km driven by ERA-Interim to evaluate simulated temperatures and precipitation variability over the Iberian Peninsula. They found seasonal negative (positive) biases of up to \(-2.5 \, ^{\circ }\hbox {C}\,(2 \,^{\circ }\hbox {C})\) in reproducing maximum (minimum) temperatures, larger during winter (summer). Nearly all models simulated lower precipitation than the observations in the Iberian Mediterranean coast.

In this work, we will focus on the analysis of some of the Weather Research and Forecasting (WRF) model experiments (Skamarock et al. 2008) done in the framework of the EURO-CORDEX initiative. These simulations cover the broader European domain and have been driven by global experiments performed with the Institut Pierre Simon Laplace Coupled Model (IPSL) Earth System Model (Dufresne et al. 2013), developed as part of the CMIP5 under historical and climate change RCP scenarios (Taylor et al. 2012). IPSL presented a intermediate level of performance over the north-east Atlantic region with respect to the rest of the CMIP5 global climate models, improving in winter and getting worse in summer (Perez et al. 2014). Thus, the analysis herein will focus on describing the historical and RCP response of surface temperatures over the area of the Iberian Peninsula using as a reference the E-OBS dataset. However, as these WRF simulations are not driven by reanalyses simulations, their realization of internal variability will differ from that shown by the observations. Therefore, the application of any metrics based on temporal covariance analysis (Kotlarski et al. 2014) is not expected to show any agreement between observations and simulations. Thus, the aim of this article is to assess the realism of the IPSL and the WRF simulations on the reproduction of annual and seasonal spatial characteristics of the temperature (mean, standard deviation and extremes) over the IP. Additionally, the different responses of both regional and global models for 4.5 and 8.5 RCP scenarios are presented and discussed, taking into account the uncertainties of the evaluation before mentioned. However, an accurate model performance evaluated from the present climate does not guarantee reliable predictions of the future scenarios (Reichler and Kim 2008).

Section 2 describes the model simulations, observational data and methods used in this study. Section 3.1 shows how both models perform to reproduce the temperature observed trends during the last decades in the IP. Section 3.2 presents the expected temperature changes in 2006–2100 for the RCP4.5 and RCP8.5 scenarios. Finally, conclusions are wrapped up in Sect. 4.

Fig. 1
figure 1

Domains and model resolution used in this work. Grayscale orography (m) indicates the IPSL gridpoints. Color shading indicates the WRF domain over Europe, with an spatial resolution of 50 km. The area of this study corresponds to the red box over the IP

2 Methods

2.1 Models and simulations

This section presents a general description of the models as well as details of the specific setup configuration.

We have used simulations of the ESM IPSL-CM5A developed as part of the CMIP5 in the Institut Pierre Simon Laplace (Dufresne et al. 2013; Marti et al. 2010).

The model includes the atmosphere, ocean, land-surface and sea-ice components, and also integrates biogeochemical processes for stratospheric and tropospheric chemistry, aerosols, terrestrial and oceanic carbon cycle and can be considered as an ESM. Atmospheric general circulation is simulated by the LMDZ model (Hourdin et al. 2006) in IPSL-CM5. The simulations used herein were produced with the LMDZ5A version, where the dynamical part of the code is based on a conservative finite-difference formulation of primitive equations on a staggered longitude-latitude grid (Sadourny 1972).

The REPROBUS (Reactive Processes Ruling the Ozone Budget in the Stratosphere) is a module coupled to the LMDZ model to compute the global distribution of trace gases, clouds and aerosols within the stratosphere (Jourdain et al. 2008). On the other hand, the distribution of gases and aerosols within the troposphere (Hauglustaine et al. 2004) was estimated with the INCA (INteraction with the Chemistry and Aerosol) model. The energy and water cycles of soil and vegetation, the terrestrial carbon cycle, and the vegetation composition and distribution are simulated with the ORCHIDEE (ORganizing Carbon and Hydrology In Dynamic EcosystEms) (Krinner et al. 2005) module in IPSL-CM5.

The ocean and sea-ice component is based on NEMOv3.2 (Nucleus for European Modelling of the Ocean; Madec 2008), which includes: OPA ocean GCM for the dynamics of the ocean (Madec et al. 1997); PISCES ( Pelagic Interaction Scheme for Carbon and Ecosystem Studies, Aumont and Bopp 2006) for the simulation of carbon, oxygen and the major nutrients determining phytoplankton growth; and LIM2 the two-level thermodynamic-dynamic sea ice model (Louvain-la-Neuve Sea Ice Model, Version 2) (Fichefet and Maqueda 1999).

Table 1 Main schemes and parameterizations adopted in the WRF model simulations used herein

The IPSL historical simulation include CMIP5 annual solar irradiance changes (Lean 2009; Schmidt et al. 2011; Taylor et al. 2012). The \(\hbox {CO}_{2}\), \(\hbox {CH}{_{4}}\), \(\hbox {N}{_{2}}\,\hbox {O}\) and CFCs concentrations follow CMIP5 datasets as described in Meinshausen et al. (2011). Ozone concentrations are simulated with the INCA and the REPROBUS models following Szopa et al. (2013). The radiative impact of dust, sea salt, black carbon and organic carbon aerosols are modeled in LMDZ following Balkanski (2011). The land use changes are also modeled using the transient historical and future crop and pasture datasets developed by Hurtt et al. (2011).

In this study a historical (1851–2005) simulation of IPSL-CM5A medium resolution (MR; 2.5 × 1.25 lat × lon and 39 vertical levels) is used (Dufresne et al. 2013). Additionally, two scenario simulations are used to represent the climate conditions for the period 2006–2100: one under the RCP4.5 scenario and the other under the RCP8.5 scenario.

We use WRF (Skamarock et al. 2008) 3.3.1 simulations performed by the Institut National de l’Environnement Industriel et des Risques (INERIS) in the context of the CORDEX project (Menut et al. 2013). The WRF simulations were driven by the IPSL-CM5A-MR ESM (Dufresne et al. 2013). Figure 1 shows the resolution of the IPSL-CM5A-MR runs and the domain of the WRF simulations over the European region. Table 1 includes a summary with the main parameterizations adopted for the WRF model. Boundary conditions from the IPSL simulations (pressure, temperature, humidity and wind speed and direction) were hourly read by WRF without using nudging techniques. These simulations have a spatial resolution of 50 km and the domain covers the whole of Europe and the north of Africa with 119 × 116 grid points in longitude and latitude; it is used with a Lambert projection (Fig. 1). The model was used with a non-hydrostatic configuration. A total of 32 levels were used in the vertical direction from the surface to 50 hPa where the top of the model is fixed. Menut et al. (2013) presented a detailed evaluation of the IPSL-CM5A-low resolution (LR)-WRF regional climate modelling suite for air quality modelling purpose. An updated, higher resolution version of the ESM (IPSL-CM5A-MR) is used here as boundary conditions; we will refer to this version as IPSL hereafter.

The WRF simulations during the historical period span the interval 1971–2005, that will be considered hereafter as the \({ reference}\) period. The corresponding WRF simulations under the RCP4.5 and RCP8.5 scenarios, driven by the IPSL-CM5A-MR RCP simulations, were also made available for this work (Dufresne et al. 2013). Both the IPSL-CM5A-MR and WRF scenario runs were used as a continuation of the previously described experiments during the historical/reference period.

Fig. 2
figure 2

Spatial distribution of the mean (a), standard deviation (b), \(5{\rm th}\) percentile (c) and \(95{\rm th}\) percentile (d) of E-OBS temperatures in the Iberian peninsula \((^{\circ }\hbox {C})\) during the reference period (1971–2005) for: annual (top) DJF (middle) and JJA (bottom)

2.2 Observations

The European Climate Gridded data set (E-OBS) is used to evaluate the simulated temperatures (Haylock et al. 2008). E-OBS use more than 3600 meteorological stations through Europe (71 stations over the IP) carrying out a strict quality test to develop its gridded data set and to remove errors and unrealistic values. This dataset is available at four resolutions. The \(0.5^{\circ }\) regular grid product has been selected as it presents comparable resolution to the WRF RCM simulations used herein. E-OBS have been selected for this analysis on the basis of its free availability and the quality assurance applied to their development. Nevertheless, other products exist for our area of interest (Herrera et al. 2012) that may provide larger detail at resolutions higher than that used in the WRF simulations, albeit with somewhat reduced stations measurements (Herrera et al. 2012; Hofstra et al. 2009; Rajczak et al. 2013).

2.3 Analysis methods

2.3.1 Model evaluation

In order to assess the ability of the models to represent the observations, several methods are applied to the output. First, the biases between the IPSL and WRF models with respect to the observed values of the E-OBS dataset during the reference period (1971–2005) are calculated. The bias is defined as the difference between the model and the observational values using the nearest model grid point to the E-OBS grid point:

$$\hbox {Bias}=(\overline{f}_{i}-\overline{o}_{i})$$

where \(\overline{f}_{i}\) represents the average of the selected statistic (mean, standard deviation or percentiles) of simulated values and \(\overline{o}_{i}\) that of the observed values for each gridpoint during the reference period.

Also, we evaluated the statistical significance of the biases found between models and observations through significance testing for the difference of means and ratios of variances between normal populations. The significance is evaluated for the difference of means with a Student’s T test and for the ratio of variance with a Fisher’s F test (von Storch and Zwiers 2001); the significance level is set to \(\alpha = 0.05\). For the extremes (percentiles 5th and 95th), the analysis of confidence intervals was carried out using Chi-square testing between the observed and expected absolute frequencies (Yamane 1973).

Taylor diagrams are used to provide a concise statistical summary of how well simulated and observed spatial patterns of temperature match each other in terms of their correlation, their root-mean-square difference (RMSE), and the ratio of their variances (Taylor 2001). The Taylor diagrams constructed herein focus on the comparison of spatial patterns produced by the IPSL and WRF models with observations. Thus, maps of means, standard deviation, and extremes (5th and 95th percentiles) are evaluated for their similarity between models and observations. For these analysis the spatial mean is subtracted for each pattern so that the biases analyzed earlier do not affect the results.

The analysis of the spatial and temporal analysis of climate response is complemented with Empirical Orthogonal Functions (EOFs; Preisendorfer and Mobley 1988). This method is often used to reduce the dimensionality of a dataset describing a complex phenomenon. It replaces the \(\textit{N}\) original variables by a smaller number \(\textit{n}\) of derived variables, the principal components (PCs), that are linear combinations of the original variables. This method decomposes a space-time field into spatial patterns and associated time indices according to the following equation:

$$\begin{aligned} \hbox {T(t, s)}=\sum _{i=1}^{N}\hbox {EOF}_{i}\hbox {(s)PC}_{i}\hbox{(t)} \end{aligned} $$

where N is the dimensionality of the field, \(\hbox {EOF}_{i}\)(s) are herein defined as the product of the \(\textit{i}\)th eigenvalue multiplied by the corresponding eigenvector and \(\hbox {PC}_{i}\)(t) is their associated temporal scores, the principal components (Jolliffe and Cadima 2016). This allows for gaining understanding of the system based on just a few patterns and series as long as its are physically meaningful. The IPSL, WRF and observational EOFs and PCs of the reference period are obtained from the monthly and annual anomaly data after dettrending their long term trends to ensure stationarity of the input data in the determination of the EOFs. Subsequently, the original undetrended data are projected onto the EOFs, thus capturing the long term trends of temperature. The EOF loadings shows the spatial distribution of the trends expressed in the associated PC. Following Zorita et al. (2005) the analysis is extended during the 21st century by projecting the anomalies of each model onto the EOFs obtained during the reference period. For further details about EOF analysis the reader is referred to Preisendorfer and Mobley (1988), and von Storch and Zwiers (2001).

2.3.2 Assessment of the added value of the regional simulation

We use the Brier skill score (Brier 1950; von Storch and Zwiers 2001) to determine the potential added value of WRF with respect to IPSL. The mean squared error is defined as:

$$\hbox {MSE}=\frac{1}{N}\sum _{i=1}^{N}(\hbox {f}_{i}-\hbox {o}_{i})^{2}$$

where N represents the number of total values, \(\hbox {f}_{i}\) the set of simulated values and \(\hbox {o}_{i}\) the observed values. Thus, the Brier skill score is formulated as:

$$\hbox {BSS}=1-\frac{MSE}{MSEref}$$

where \(\textit{MSE}\) is the mean squared error of the WRF model relative to observations and \(\textit{MSE}_{ref}\) is the mean squared error of the reference model (IPSL). If the result is larger than zero, \(\textit{MSE}\) is lower than \(\textit{MSE}_{ref}\) and thus, the WRF model improves the results offered by IPSL. On the other hand, if the result is negative WRF produces larger MSE values than those of the reference model. If the result is 0 the WRF model does not provide any relative improvement (Wilks 2011). Also, for these analysis the spatial mean is subtracted for each pattern so that the biases analyzed previously do not affect the results of the Brier skill score.

Fig. 3
figure 3

Spatial distribution of mean differences (a), ratio of standard deviations (b), differences in \(5{\rm th}\) percentile (c) and differences in \(95{\rm th}\) percentile (d) for WRF temperatures (ºC) relative to observations in the IP during the reference period (1971–2005) for: annual (top) DJF (middle) and JJA (bottom). Dots indicate significant biases \((\alpha < 0.05)\). In c and d dots indicate significant \(\chi ^{2}\) test differences for the sample distribution

2.3.3 Assessment of future scenarios

The evolution of temperatures simulated by IPSL and WRF during the 21st century is analyzed for temperature anomalies relative to reference period 1971–2005. The anomalies are calculated for monthly values with respect to the mean annual cycle by subtracting the monthly long term means. Statistical significance of the changes between the reference period and future scenarios for each model is evaluating accounting for significance. Using reference values computed over the same time period establishes a baseline from which all anomalies are calculated and allows for a continuous perspective of the temperature evolution since 1850–2100.

Finally, we assess the temperature trends in the WRF and IPSL models and their significance for the two scenarios analyzed in this study along the 21st century. Trends are obtained from a simple linear regression of temperature anomalies. The significance of future trends is evaluated by a Student’s T test accounting for autocorrelation (von Storch and Zwiers 2001). All the analysis are performed for both the overall annual averages and for winter (DJF) and summer (JJA) seasons.

Fig. 4
figure 4

As Fig. 3 but for the IPSL model

3 Results

3.1 Evaluation of models performance during the historical reference period

The spatial distribution of the observed temperatures during the reference period (1971–2005) is shown in Fig. 2. The mean annual temperature E-OBS for this period is between 10 and \(17 \, ^{\circ } \hbox {C}\) over a large part of the IP and presents a north–south gradient. The coldest areas are located at the highest altitudes, such as the Pyrenees, with means of \(5 \, ^{\circ } \hbox {C}\). We find the same north-south gradient during DJF and JJA with values ranging from –2 to \(12\, ^{\circ } \hbox {C}\) and 13 to \(24\, ^{\circ } \hbox {C}\), respectively. The annual standard deviation during the reference period varies between 0.4 and \(0.8\, ^{\circ } \hbox {C}\) throughout the IP with the exception of the south east where it reaches \(1.4\, ^{\circ }\hbox {C}\). Temperature variability increases across the IP in DJF ranging from 0.8 to \(1.5\, ^{\circ } \hbox {C}\) and high values are even more widespread during JJA. The 5th percentile of seasonal averages shows a north-south gradient similar to the case for the mean temperatures. This gradient is more pronounced in DJF and JJA. The most extreme cold values are found consistently in the Pyrenees, with the 5th percentile reaching \(-5\, ^{\circ } \hbox {C}\) during winter. The range of temperatures for the annual 5th percentile is similar to those shown during DJF. Similarly, the warmest years or seasons, as described by the 95th percentile, are found in the southern continental lands both in summer \((30\, ^{\circ } \hbox {C})\) and considering annual values \((25\, ^{\circ } \hbox {C})\); winter maxima are attained in the southern coast \((15\, ^{\circ } \hbox {C})\).

Figure 3 shows annual, winter and summer surface temperature biases and ratios of variability for WRF estimates with respect to E-OBS. The annual mean temperature differences between WRF and E-OBS show an important negative bias over almost the entire IP. These values are between –1 and \(-4\, ^{\circ}\,\hbox{C}\). The differences are larger in the center of the IP. Negative biases are closer to zero in DJF, with some balance between positive and negative biases during this season in the east and north respectively. The most negative biases occur during the summer, when WRF can be 4–6 \(\, ^{\circ }\hbox {C}\) colder than observations throughout the IP, except for the south west region where slightly higher values are reached. Biases are significant \((\alpha < 0.05)\) over a large part of the IP for both annual and seasonal values. Several studies showed that mean temperatures are typically underestimated by WRF over large parts of Europe in wintertime (Kotlarski et al. 2014; Katragkou et al. 2015). Also, Knist et al. (2017) found that the cold summers produced by WRF could be due to excessive precipitation and soil moisture. This fact may be explained by soil moisture-temperature feedbacks. For instance, Fernández et al. (2019) observed that the negative temperature biases produced by models during summer were likely related to the biases developed by the same models simulating more soil moisture during this season over the IP. The ratio of annual standard deviation during the reference period is close to 1 in large part of the IP, being higher than 1.2 in the north west and less than 0.8 in the south east. We find the same gradient north west-south east, although more pronounced in JJA. The ratio of standard deviation ranges between 0.4 and 0.8 throughout the study region in DJF, except in the Pyrenees where it is higher than 1.3. Therefore except for the north west, the observed temperature variability is underestimated in summer and annual. The 5th percentile temperature differences between WRF and E-OBS are close to zero, although with a predominance of negative biases at the lowest temperatures. The pattern of bias for the \(5^{th}\) percentile is very similar for annual and winter values, with extreme negative biases in the western IP \((2\, ^{\circ }\hbox {C})\). In summer, biases reach values of \(-6\, ^{\circ }\hbox {C}\) in the continental IP. The 95th percentile differences present strong negative biases, especially in the East of the IP in summer and annual values. Also, negative biases predominate in DJF, but mostly not significant. Jiménez-Guerrero et al. (2013) found that maximum surface temperature simulated by WRF showed negative biases up to \(-2\, ^{\circ }\hbox {C}\) in an annual and seasonal basis for a large part of the IP. Thus, the values obtained herein are in the same or a slightly smaller range. Note that this can also be due to the use of a different observational reference or a different primary ESM, model version and setup. It seems obvious though, that the use of an European wide domain for which boundary conditions are distant from the target subdomain (IP) has not increased biases in this case. Chi-square differences (dots in Fig. 3c, d) between the observed and WRF absolute frequency distributions show widespread significance during the summer and tend to show significance in the north east (south west) quarter of the IP in winter (annual) data. Therefore WRF shows overall a negative bias distribution, shifting the mean and cold and warm temperature extremes to lower temperatures.

Fig. 5
figure 5

Spatial taylor diagrams for: a mean, b standard deviation, c 5th and d 95th percentiles and e EOF1 for the global (IPSL) and regional (WRF) simulations and observations during the reference period for annual (circles), DJF (squares) and JJA (triangles) values in the Iberian Peninsula. Black dots represent the ideal prediction, if simulated and observed values would be identical

Figure 4 shows annual, summer and winter surface temperature biases as well as standard deviation ratios for the IPSL model. IPSL data have been interpolated for comparison to the WRF grid. The annual mean temperature differences between IPSL and E-OBS show negative biases with values close to 0 in most of the IP. However, we observe positive annual biases in eastern and southern Spain and the Pyrenean region. A similar behavior of biases is found for the other seasons, with the exception of the large negative biases over western Portugal during JJA. These differences between model and observation are statistically significant over many grid cells of the IP for annual and seasonal values. This pattern of bias found during the two seasons in this study is similar to that presented in ensemble-mean temperature bias for the global models within CMIP5 (Cattiaux et al. 2013). In particular, Hourdin et al. (2013) found that the IPSL-CM5A-MR model exhibits a remarkable negative bias in the North Atlantic sea surface temperatures due to a strong underestimation of the Atlantic meridional overturning circulation. This finding offers a plausible explanation for the negative biases in most of the IP and their persistence in the west coast during JJA. Mooney et al. (2013) found that European temperatures simulated by WRF are more sensitive to the longwave radiation and land surface schemes than microphysics and PBL parametrizations. In these simulations the Rapid Radiation Transfer Model (RRTM) is chosen as the longwave radiation scheme and the Noah Land Surface Model as the land-surface scheme. Borge et al. (2008) selected in sensitive tests performed over the IP the Noah Land Surface Model as the best option and RRTM as the second best option for each scheme. However, in those experiments (Mooney et al. 2013; Borge et al. 2008) the WRF dynamical downscaling used reanalysis as boundary conditions. In this case the WRF dynamical downscaling may amplify the negative biases of the driving IPSL fields (Fig. 3). This additional cooling was observed in previous experiments performed between both models (Menut et al. 2013; Colette et al. 2012).

The ratio of annual standard deviation between IPSL and E-OBS during the reference period is close to 1 over a large part of the IP. However, we find ratios lower than 0.7 in the north west and higher than 1.2 in the south east of the IP. The behavior is similar to what happens in summer. Deviation ratios are close to 1 during winter, except in the south of the IP where values are higher than 1.2. 5th percentile temperature differences between IPSL and E-OBS present positive biases (greater than \(2\, ^{\circ }\hbox {C}\)) in eastern and southern Spain and the Pyrenean region, both for annual and winter averages. In summer, while the eastern coasts and the Pyrenees also show mostly positive biases, the south west and west coast show negative biases. For the differences in the 95th percentile we see positive bias in the north east Spain and North Africa. A similar pattern is shown in summer. The negative biases predominate in the center of IP during DJF, despite the positive biases continue to appear near the Pyrenees. The Chi-square differences of absolute frequencies are statistically more significant in summer than in winter, showing an irregular distribution over the entire Iberian Peninsula in both seasons. In general biases in WRF are negative everywhere, both for the mean field and the extremes. In turn, IPSL produces smaller biases both in mean and extremes, that tend to be positive in the northern and eastern coast and negative elsewhere. Thus, the dynamical downscaling introduces spatial detail but it does not improve mean biases. This additional detail is considered next.

The Brier skill score shows (Table 2) that after correcting the biases belonging to each model, WRF produces a significant improvement in the spatial pattern over the IPSL results, both for the spatial average and for the 5th and 95th percentiles. This may be a consequence of the increase in the resolution and representation of surface physics in WRF, allowing for the improvement of accuracy when it comes to capture the shape of regional patterns. However, the added value is clearly higher for winter compared with summer (Table 2). In the case of the standard deviation, WRF does not improve the results of IPSL in winter and summer. This is due to the greater regional variability introduced by WRF model in the Pyrenees and the west coast during winter and in the Mediterranean and west coast during summer. A regional variability which is not reflected in the observations. Thus, the improvement in the annual value may be a result of a compensation of errors.

Table 2 Brier skill scores for mean, standard deviation, 5th and 95th percentiles and EOF1 between global (IPSL) and regional (WRF) models during the reference period (1971–2005) for annual, DJF and JJA

The improvement produced by WRF is also observed in the Taylor diagrams of WRF and IPSL with observations (Fig. 5). Spatial correlation of spatial means and percentiles increase in WRF relative to IPSL for both the annual average and for winter and summer, reaching values of 0.95. The RMSE decreases below 0.6 in mean and percentiles for the WRF model, while for IPSL values are around 0.7–0.8. The improvement is not as evident in the case of the standard deviation, that still underestimates the observed values, mainly for winter and annual values. Correlations do not exceed 0.6 in any case with very limited (less than 0.1) improvement by WRF and RMSE increases both for winter and summer. As discussed previously, this result is caused by the regional variability introduced by WRF and that is not present in the observations.

Fig. 6
figure 6

31 years moving averages of temperature anomalies with respect to the 1971–2005 reference period: annual (top), DJF (middle), and JJA (bottom). The shaded grey area indicates the reference period. Time series represent E-OBS observations and IPSL (left) and WRF (right) historical and scenario simulations (see legend). The shaded area indicates the range between the 5th and 95th percentiles of the annual anomaly map

Figure 6 shows the evolution of the mean temperature anomalies for observations and models over the IP. Lines depict the yearly average and shadings the 5th to 95th percentile range of values in the domain of the IP. Both models show a similar warming trend, reaching the highest temperature values at the end of the reference period, amounting to a total change of about \(1\,^{\circ }\hbox {C}\) since 1850 in the IPSL historical simulation. This trend is slightly more (less) pronounced in summer (winter). The observations also indicate a warming trend over time, generally in agreement with that of models. The range between the 5th–95th percentiles indicated by the shaded area is considerably higher in summer than winter for the IPSL during the historical period. The range of variability reaches higher values in IPSL than WRF and the observations during JJA. A phenomenon that is not observed during DJF when the two models show a very similar spatial range of temperatures.

Fig. 7
figure 7

First EOF of annual (top) DJF (middle) and JJA (bottom) temperatures in a the E-OBS observational dataset, b the WRF and c the IPSL during the reference period. The percentages of variance accounted for by each PC are indicated at the bottom left of each panel

The first EOF and PC from E-OBS and model temperature anomalies for annual, DJF and JJA are shown in Figs. 7 and 8. The EOFs and PCs allow for separating the spatial and temporal responses that relate to climate change and to modes of internal variability that also contribute to multidecadal and long term trends under climate change conditions (Stocker et al. 2013; Zorita et al. 2005). The first mode shows an EOF of uniform sign over all the domain in E-OBS and also in both models. The associated PCs are also positive, thus accounting for the temperature rise in Fig. 6. This mode accounts for more than 70% of the variance in all cases. The large amount of variance explained by the first mode implies that these patterns and time series describe a large part of the spatial structure of warming trends. Gómez-Navarro et al. (2010) discussed the shape of these warming patterns and their relationship with geographical factors like the altitude in winter and the distance to the sea in summer. The annual EOF1 of E-OBS indicates that the increase is larger in the center of the IP and lower in the coast. WRF shows a similar pattern, albeit with larger loadings. In DJF and JJA observations also show larger loadings over the interior of the IP, whereas WRF shows larger loadings than the observations over the north east in DJF and also over the west in JJA; IPSL shows less clear patterns but seems to seed WRF with larger loadings over the same areas. Thus, the warming patterns seem to be transmitted from the large to the regional scale during DJF, although these patterns are developed by WRF with greater intensity in some areas such as Pyrenees.

Fig. 8
figure 8

PC1 of annual (top) DJF (middle) and JJA (bottom). Projections of E-OBS observations, IPSL (left) and WRF (right) historical and scenario simulations onto the EOFs. The shaded area indicates the reference period. All time series are 31yr moving averages. See legend for details about the individual datasets

Table 3 Correlation between global (IPSL) and regional (WRF) principal components over the domain of interest during the reference period (1971–2005) for annual, DJF and JJA. Values in bold indicate significance \((\alpha < 0.05)\)
Fig. 9
figure 9

As in Fig. 7 but for the \(2^{nd}\) EOF

The evolution of the first PC during the reference period (Fig. 8) is very similar to that found in the series of temperature anomalies (Fig. 6). This is due to the large percentage of variance explained by the first mode and confirmed by the high correlation values (> 0.99) observed between these series. The highest values of the historical period are attained at the end of the 1971–2005 time interval. The behavior in the evolution of the two models is similar for the entire reference period. It is because although the spatial patterns present substantial differences for each model and observations, the first mode shows an uniform positive sign over all the domain. Both the observed and simulated WRF and IPSL historical trends in the PCs are larger in JJA than in DJF.

The second and the third EOF modes account for lower percentages of variance in all cases and with similar amounts in observations and simulations (Figs. 9, 10, 11 and 12). EOF patterns that show north–south and east-west gradients have been obtained from observations and simulations in the second and third modes. Nevertheless, their influence can be important for subregional variability and trends and show dependencies with large scale modes of circulation. In some cases, the modes can be exchanged between the second and third pairs of EOF and PC because of their similar percentages of explained variance. This situation becomes evident in the analysis of the correlations between the PCs of the two models shown in Table 3. The three WRF PCs are well correlated with their IPSL counterparts for annual and DJF. However, WRF and IPSL PC2 and PC3 correlations are swapped in JJA; also WRF PC1 correlates highly with IPSL PC2 in JJA. Thus, in spite of the regional differences among the patterns in Figs. 9 and 11 the correlations between PCs (Table 3) indicate the same model modes are at play in Fig. 10. Similar patterns were found by Gómez-Navarro et al. (2011) for EOF2 and EOF3 which can be related to different modes of variability in the North Atlantic-European region.

Fig. 10
figure 10

As in Fig. 8 but for the PC2. Note the different ranges in the vertical axis relative to Fig. 8

Fig. 11
figure 11

As in Fig. 7 but for the 3rd EOF

The second and third PCs show positive and negative trends (Figs. 10 and 12). The trends of both models for PC1 are similar during the reference period but for PC2 and PC3 they show differences, particularly in summer. This can be: as noted above a consequence of the exchange of the second and third modes happening during summer (Table 3), a consequence of the differences in their regional EOF patterns, or a consequence of internal variability developing in the large WRF domain detached from the IPSL boundary conditions. Note that IPSL boundary conditions are imposed without nudging and most differences are emphasized during summer when low thermal cells can develop at mesoscale over the IP (Gaertner et al. 1993).

While PC1 trends in Fig. 8 are the dominant contribution to trends over the broader IP domain trough their respective EOF1 patterns in Fig. 7, trends in the lower order modes shape subregional trends with contributions of different sign (Figs. 9, 10, 11 and 12). Such trends can be influenced by changes in large scale circulation modes. Several studies (Rodríguez-Puebla et al. 2010; Brunet et al. 2007; Xoplaki et al. 2003) have addressed the connections between temperature variability and large scale variables over the IP and found that they were linked to teleconnection indices and a preferred mode of geopotential height (500 hPa) over the North Atlantic. In order to explore the possible relationships between the E-OBS temperature trends and the large scale modes of climate variability we use some indices (Table 4) developed by the National Oceanic and Atmospheric Administration (NOAA) that represent modes of circulation variability of the Northern Hemisphere (Barnston and Livezey 1987). The indices are used in monthly resolution and correlated with the PCs of the E-OBS during the reference period. Table 4 shows the results of the correlations for annual, DJF and JJA. PC2 presents a significant correlation with the index of the North Atlantic Oscillation (NAO), the Arctic Oscillation (AO) and the East Atlantic/Western Russia (EAWR) for the annual and DJF cases. In JJA some significant correlations are found with the AO (PC1), East Atlantic (EA) and Pacific Decadal Oscillation (PC3). These correlations are suggestive of links to circulation modes in PC2 and PC3. In the case of the model runs, since their internal variability is not constrained by observations as in the case of reanalysis, they are not expected to correlate with the observed evolution of the modes of large scale circulation. However, Table 4 suggests that the simulated PC2,3 will also relate to the simulated modes of circulation.

Fig. 12
figure 12

As in Fig. 8 but for the PC3. Note the different ranges in the vertical axis relative to Fig. 8

Table 4 Correlation between climatic variability indices (Barnston and Livezey 1987) downloaded from https://www.ncdc.noaa.gov and E-OBS PCs (black lines in Figs. 8, 10 and 12) over the domain of interest during the reference period (1971-2005) for annual, DJF and JJA. Correlations have been obtained from the anomalies of the monthly values with respect to the long term monthly averages. Values in bold indicate significance \((\alpha < 0.05)\)
Fig. 13
figure 13

WRF spatial distribution of mean temperature anomalies (C) in the Iberian peninsula during 2031–2065 (a, b) and 2066–2100 (c, d) for RCP4.5 (a, c) and RCP8.5 (b, d) scenarios with respect to the reference period for: annual (top) DJF (middle) and JJA (bottom). Dots indicate significant changes \((\alpha < 0.05)\)

3.2 Climate change projections

The mean spatial surface temperature changes over the IP during the \(21{\rm st}\) century in WRF and IPSL models for all RCP scenarios are presented in Fig. 6 in comparison to the reference and historical periods. Changes do not exceed \(2\, ^{\circ }\hbox {C}\) with respect to the reference period until the mid of the 21st century for annual and wintertime for RCP8.5; for summer this increase is achieved two decades earlier. Nevertheless, the increase of temperature projected by WRF starting around 2040 is smaller than the one suggested by IPSL for both scenarios. Also, Fernández et al. (2019) found a systematic reduction of the temperature change by the RCM ensembles with respect to their driving GCM in the IP. In these simulations this behavior is more pronounced in summer and the differences between both simulations appear earlier in time. The biggest difference between WRF and IPSL occurs at the end of the century for the RCP8.5 scenario in JJA, when the temperature rise reaches \(5.5\, ^{\circ }\hbox {C}\) in IPSL and not more than \(4\, ^{\circ }\hbox {C}\) in WRF with respect to the reference period. The differences in the simulated temperature evolution for both models during the future scenarios may be an extension of the different biases found for the reference period. WRF favoured an additional cooling over almost the entire IP during the historical period. This negative feedback explains the lower temperature increase simulated by WRF in comparison to IPSL during the future scenarios as a continuation of the results observed during the historical period. Thus, as discussed earlier, if WRF continues simulating an excessive soil moisture in the future scenarios as well as during the historical period (Knist et al. 2017), the extra radiative forcing from future scenarios can be partly used into latent heat to evaporate soil water producing a lower temperature increase. As mentioned earlier, the range of temperature simulated between the different regions of the IP is greater during summer for the future projections, reaching the highest values in IPSL for the scenario RCP8.5. This fact indicates that the spatial range of the temperature simulated by IPSL in the IP is greater during JJA over time. The spatial range of variability of the projections is similar during DJF for both models, as it happened in the historical period. For summer and annual the spatial range of WRF simulated values overlaps with the lower half of the simulated IPSL range. The temperature decrease after 2075 in the RCP4.5 scenario in annual and JJA for WRF is noteworthy. The differences in the evolution of temperature between WRF and IPSL are negligible during winter. The mean temperature increase reaches \(3.5\, ^{\circ }\hbox {C}\) for the RCP8.5 scenario at the end of the century. On the contrary, the increase of temperature is stabilized around 2060 for the RCP4.5 scenario during DJF at about \(1.5 \, ^{\circ }\hbox {C}\) relative to the reference period.

Fig. 14
figure 14

As in Fig. 13 but for the IPSL simulations

Figures 13 and 14 show the spatial distribution of mean temperature changes over the IP for WRF and IPSL between 2031–2065 and 2066–2100 with respect the reference period (1971–2005). The mean WRF temperature anomalies do not exceed \(2.5\, ^{\circ }\hbox {C}\) throughout the IP during 2031–2065 for the RCP4.5 scenario. The greatest anomalies are observed on the northeastern and southeastern coast of the IP during JJA. By contrast, only the largest DJF anomalies reach 1.5 degrees over the Pyrenees. We can see a slight latitudinal gradient during the same period for the RCP8.5 scenario where the biggest anomalies are located in southeastern Spain during summer, reaching close to \(3\, ^{\circ }\hbox {C}\) values. In winter, the latitudinal gradient is not observed and major anomalies up to \(2\,^{\circ }\hbox {C}\) only appear in the Pyrenees. The annual mean temperature anomalies show the lowest values in the northwestern region of the IP during 2066-2100 for RCP4.5 and RCP8.5 scenarios (Fig. 6). Annual temperature anomalies increase to the south east. A similar behavior occurs in summer but with a steeper gradient, especially for the RCP8.5 scenario where the maximum temperature changes are located in the south east with anomalies greater than \(5\, ^{\circ }\hbox {C}\). The mean temperature IPSL anomalies range between 0.5 and \(3.5\,^{\circ }\hbox {C}\) during 2031–2065 for RCP4.5 scenario. The biggest anomalies develop in the center and north east of the IP in JJA. The anomalies show a similar warming pattern for the scenario RCP8.5, but with higher values during this period. IPSL anomalies are maximum at the end of 21st century for the RCP8.5 scenario, larger in summer when anomalies greater than \(6\, ^{\circ }\hbox {C}\) are reached in most of the peninsula. Also, Cattiaux et al. (2013) found that summer warming appears more pronounced than winter warming (\(5\,^{\circ }\hbox {C}\) average) over Mediterranean areas. The IPSL anomalies show no clear spatial pattern for any scenario as shown by WRF except for perhaps more warming over the center of the IP. Also, IPSL anomalies are more homogeneous across the IP as a result of lower resolution in comparison to WRF. The temperature anomalies between the reference period and the future projections are statistically significant over the whole IP in all scenarios, indicating a significant warming over the entire territory for both models. Note that the biggest differences in the evolution of simulated temperature for both models appear in JJA. This is consistent with the greater cooling of WRF during the reference period.

Fig. 15
figure 15

IPSL (a, b) and WRF (c, d) spatial distribution of mean temperature trends \(( ^{\circ }\hbox {C}/\hbox {decade})\) in the Iberian peninsula during 2006–2100 for RCP4.5 (a, c) and RCP8.5 (b, d) scenarios for: The annual (top) DJF (middle) and JJA (bottom). Dots indicate significant changes \((\alpha < 0.05)\)

The evolution of the projections of the first PC for RCP4.5 and RCP8.5 scenarios (Fig. 8) is similar to that found in the series of temperature anomalies (Fig. 6). Also, this was the case in the reference period, reflecting the large percentage of variance accounted for the PC1 and the high correlation between WRF and IPSL PC1; the largest differences occurring in summer also as discussed earlier (Table 3). The behavior in the evolution of the two models is similar in the first two decades. However, IPSL begins to simulate higher annual temperatures over time from 2050, even more in summer for the scenario of maximum forcing as in the case of mean temperatures over IP (Fig. 6). The differences in the evolution of temperature between WRF and IPSL are negligible during winter for both scenarios. The RCP4.5 scenario is stabilized since 2040 during DJF while we see a decrease since 2075 to 2100 in JJA.

The second and third PC projections present variable interdecadal trends, mainly positive. The evolution for the PC2 and PC3 during winter is very similar for both models (Figs. 10 and 12) in spite of the differences in the EOFs used for projections (Figs. 9 and 11). Similar to the historical period the three WRF PCs present significant correlations (>0.60) with their IPSL counterparts in annual and DJF for both RCP scenarios. It suggests that the WRF PC2 and PC3 respond to large modes of circulation variability transmitted from the IPSL model during DJF. WRF and IPSL PC2 and PC3 are significantly correlated (>0.40) in JJA for both scenarios again similar to the results of the historical period (Table 3). The correlations during the RCP scenarios are calculated for unfiltered raw data obtained for projecting the original data onto the EOFs.

Figure 15 shows the spatial distribution of the temperature trends during the 21st century. The IPSL model presents a fairly homogeneous warming rate for RCP4.5 scenario over the IP. However, the growth rate of temperature is slightly lower in the western and southern coast. The warming rate is significantly higher during summer, reaching \(0.3 \, ^{\circ }\hbox {C}\)/decade over the center of the IP. The warming rate is much higher for the scenario RCP8.5, especially during summer when values higher than \(0.6\, ^{\circ }\hbox {C}\)/decade are found in a large area of the IP. The IPSL model has a slight east–west gradient in temperature trends during winter, that broadly agrees with that of WRF, accounting for the differences in resolution; the largest summer WRF trends being simulated over the Pyrenees. By contrast, the WRF model presents a north west-south east gradient clearly visible in the annual and summer values of both scenarios, reaching maximum trend values over the southeastern coast. Trend significance is in general widespread.

4 Conclusions

We have analyzed the capability of the WRF RCM model to represent the observed temperatures over the IP during the period 1971–2005. WRF historical and climate change simulations over Europe, produced in the frame of CORDEX are considered. These simulations are driven by CMIP5 IPSL simulations providing boundary conditions. The IPSL simulations show negative biases in most of the IP, a feature that may be related with an underestimation of the Atlantic meridional overturning circulation (Hourdin et al. 2013). Moreover, WRF favours an additional cooling, both in the spatial distribution of mean and extreme values. The negative feedback between both models increases the negative biases of the forcing fields (Colette et al. 2012). This cooling effect is likely related to moisture-temperature feedbacks, where an excessive precipitation and soil moisture simulated by WRF during this season can produce lower simulated temperatures over the IP (Fernández et al. 2019; Knist et al. 2017). Noticeably, the IPSL simulation shows considerably lower biases than the WRF experiment in the reference period. From this perspective the application of nudging techniques to perform this type of simulations may be advisable (von Storch et al. 2000).

However, observational uncertainties, comparable to the uncertainties within state-of-art RCMs driven by reanalysis boundary conditions, even in areas covered by dense monitoring networks such as Spain (Gómez-Navarro et al. 2012) need to be considered. Also, other WRF simulations over the domain of the IP show biases slightly larger than the ones presented here (Jiménez-Guerrero et al. 2013). In those experiments (Gómez-Navarro et al. 2012; Jiménez-Guerrero et al. 2013), more than one factor changes relative to the case presented herein. For example, GCM boundary conditions, WRF version and configuration as well as the observational references used are different, thus a direct comparison and the possibility to attribute physical causes is difficult. In order to do so experiments including specific changes in model configuration, typically one at time, should be performed.

The approach considered herein involves the evaluation of historical and climate change RCM runs instead of considering reanalysis driven simulations. Therefore, simulations are not expected to capture the temporal evolution of internal variability and covariance based metrics are ruled out of the assessment. After removing the biases belonging to each model, WRF produces a significant improvement in the spatial patterns of the mean and extremes (5th and 95th percentiles) over the IPSL results. In the case of standard deviation the results do not show an improvement of WRF relative to the IPSL results.

In conclusion, this specific configuration of WRF seems to increase the negative biases found in the IPSL driving fields, especially in summer. However, WRF improves the representation of the spatial patterns over the IP in comparison to IPSL once this biases are removed. Nevertheless, projected changes based on the climate model sensitivity and its variability may be meaningful metrics rather than the absolute changes over a given period (Vial et al. 2013).

The first EOF accounts for more than 70\(\%\) variance and identifies a temperature rise throughout the IP for observation and models. The second and third EOFs explain lower percentages of variance in observations and simulations. These EOFs present two spatial patterns that can be exchanged because of their similar percentages of explained variance. Nevertheless, IPSL and WRF EOFs patterns are related, albeit with WRF EOFs presenting enhanced regional detail.

The increase in the evolution of the first PC during the reference period is similar to that found in the series of temperature anomalies, responding to external forcings. This is due to the large percentage of variance explained by EOF1 and the high correlation values observed between these series. Therefore, the first EOF and PC (1st mode of variability) is responsible for most of the warming trends experienced in the reference period over the broader IP. PC2 and PC3 present high multidecadal variability that is suggestive of an influence of large scale dynamics and contribute to shape subregional trends.

We have analyzed the temperature changes in the future over the IP according to the IPSL and WRF simulations. We observe an increase for both models, more pronounced in the RCP8.5 scenario, reaching the \(5\, ^{\circ }\hbox {C}\) at the end of the 21st century. Nevertheless, after 2040 the WRF model simulates a smaller temperature increase than IPSL, except in winter. The additional cooling shown by WRF during the reference period may contribute to explain their lower temperature increase in comparison to IPSL during the future scenarios. Thus, this effect seems persistent over time.

The evolution of the projections of the first PC for RCP4.5 and RCP8.5 scenarios is similar to that found in the series of temperature anomalies for the same reasons that during the reference period. The PC2 and PC3 projections present variable interdecadal trends that may relate to changes in different modes of internal variability.

Also, it should be noted that the use of multimodel ensembles has demonstrated to improve the outcome of climate simulations respect to any individual model (Lambert and Boer 2001; Taylor et al. 2004; Randall et al. 2007). Although the cause of this improvement is not clear, it is assumed that the random noise from the internal variability and the uncertainties in the formulation of each model can be reduced by averaging (Reichler and Kim 2008).