1 Introduction

Snow is a critical component of the Arctic climate system. Over northern Eurasia and North America, the duration of snow cover ranges from 7 to 10 months per year (Brown et al. 2017), with the maximum snow extent covering over 40% of the Northern Hemisphere land area (approximately \( 47 \times 10^{6} \;{\text{km}}^{2} \)) each year (Robinson and Frei 2000; Lemke et al. 2007). Snow affects a variety of high-latitude climate processes and feedbacks (Cohen and Rind 1991; Groisman et al. 1994). The high reflectivity (albedo being 0.8–0.9 for dry snow) and low thermal conductivity of snow have a cooling effect and modulate snow-albedo feedback (Qu and Hall 2007; Fletcher et al. 2015; Wegmann et al. 2018). The contribution of terrestrial snow to the Earth’s radiation budget at the top of the atmosphere through snow-albedo feedback is closely comparable to that of sea ice (Flanner et al. 2011; Singh et al. 2015). Snow also prevents large energy losses from the underlying soil and notably ice growth and the development of seasonal permafrost (Lawrence and Slater 2009; Gouttevin et al. 2012; Koven et al. 2013; Slater et al. 2017). As a natural water reservoir, snow plays a critical role in the high-latitude hydrological cycle, including evaporation and runoff (Groisman et al. 2017). The runoff generated by snow in the Arctic drainage basin contributes up to 75% of the total annual flow in the Northwest Territories in Canada (Woo and Marsh 1978). Mankin et al. (2015) found that 68 Northern Hemisphere river basins providing water availability to approximately 2 billion people are expected to be impacted by a 67% decrease in the snow supply during this century.

Snow is also one of the most variable components of the climate system (Gutzler and Rosen 1992; Henderson et al. 2018). With the Arctic warming twice as fast as the global rate (e.g., Serreze and Barry 2011), the Arctic snow cover duration is declining by 2–4 days per decade (Wang et al. 2013; Brown et al. 2017), with snow melting starting earlier in spring (Cayan et al. 2001; Stewart 2009). However, the snow response to increasing temperature and precipitation is quite complex and depends regionally on the climate regime and elevation (Brown and Mote 2009; Hernández-Henríquez et al. 2015).

The pattern of snow cover duration trends is not homogeneous in all Arctic sectors (Choi et al. 2010; Peng et al. 2013; Brown et al. 2017). Barichivich et al. (2013) stated that the Eurasian Arctic experienced larger reductions in the snow-covered season (12.6 days) than the North American Arctic region (6.2 days) during 1982–2011. Integrated over the Arctic, the snow cover extent (SCE) has considerably decreased in recent decades (e.g., Rupp et al. 2013; Kunkel et al. 2016; Hori et al. 2017), and there is also evidence from multiple data sources that the maximum snow accumulation is decreasing, with the strongest decline identified in spring (Brown 2000). However, trend magnitudes vary spatially in both the Eurasian and Canadian sectors (Bulygina et al. 2011; Atkinson et al. 2006). With increasing temperatures and atmospheric moisture in the midlatitudes, snowfall is projected to decrease partly due to the transition of solid-state precipitation to liquid-state precipitation (Räisänen 2008). The pan-Arctic tendencies in SCE and the duration of the snow period in the Arctic are projected to continue in the 21st century (Collins et al. 2013). However, the manifestation of these large-scale tendencies on the regional scale with different trends in various regions suggests that natural climate variability also contributes to the changes in snow characteristics (Mudryk et al. 2014).

State-of-the-art terrestrial snow-cover models differ in complexity, ranging from those explicitly accounting for detailed snow stratigraphy (e.g., CROCUS, Brun et al. 1997; Vionnet et al. 2012) to models of intermediate complexity with 2–3 layers and to highly simplified configurations with zero-layer (combined with soil) or single-layer snow models (e.g., Slater et al. 2001). Earth system models (ESMs) usually employ zero- and single-layer configurations, which are computationally efficient but involve many limitations (Bokhorst et al. 2016; Krinner et al. 2018). Generally, snow water equivalent (SWE) is a prognostic variable (ES-DOC, https://es-doc.org) resulting from the balance between the snowfall rate, evaporation and snowmelt rate (Manabe 1969; Thackeray et al. 2016). However, snow cover fraction or extent is diagnosed from SWE through different parameterizations (Wu and Wu 2004) and is also mediated by various conservation properties (e.g., water discharge-storage), coupled with the atmosphere and land cover types, which depend essentially on characteristics of the land model integrated in the global climate model (GCM). For example, to diagnose SCE, the ISBA surface model (“Interaction Sol-Biosphère-Atmosphère”) component of CNRM-CM5, uses an asymptotic function of SWE (Douville et al. 1995), whereas the CLM 4 (“Community Land Model”) land model from CCSM4 uses a hyperbolic tangent approximation (Xu and Dirmeyer 2013).

Several studies have evaluated snow characteristics (SWE, SCE and snowfall) in climate models from phase 5 of the Coupled Model Intercomparison Project (CMIP5) (Brutel-Vuilmet et al. 2013; Kapnick and Delworth 2013; Terzago et al. 2014; Connolly et al. 2019). The main finding is the underestimation of the observed decreasing trend in spring SCE in the Northern Hemisphere over 1979–2015 (Derksen and Brown 2012) by CMIP5 models, which is typically explained by the underestimation of the boreal temperature in models (Brutel-Vuilmet et al. 2013). Likewise, the spread in snow-albedo feedback (SAF) was not reduced in CMIP5 with respect to CMIP3, likely due to a broad spread in the approaches used to analyze vegetation in models (Qu and Hall 2013). The association of the structural differences in the snowpack in models with a spread in vegetation and albedo parametrizations was recently confirmed by Thackeray et al. (2018). Notably, models displaying the largest bias in SAF also show clear structural and parametric biases in the representation of snow characteristics. In most works, snow regimes in autumn have received less attention than those in spring.

In this respect, the representation of snow-associated feedbacks in climate models, especially during the shoulder seasons (when Arctic snow cover exhibits the strongest variability), is of special importance. During the offset season, downwelling shortwave and longwave radiation enable snow melt, and during the onset season, temperatures are sufficiently cold to favor solid precipitation and snow accumulation (Sicart et al. 2006). In spring, the SAF is stronger as snow cover starts to age and recede due to increasing temperature and insolation (Qu and Hall 2013; Thackeray et al. 2016). While variations in snow characteristics are smaller in autumn than in spring, they also experience changes that are linked to atmospheric circulation (Henderson et al. 2018). Observational (Cohen et al. 2007) and modeling (Peings et al. 2012) studies established a link between SCE anomalies over Eurasia and the winter phase of the Arctic Oscillation (AO) and the North Atlantic Oscillation (NAO). However, Gastineau et al. (2017), using an ensemble of CMIP5 models, demonstrated that this relationship is simulated by only four models and is largely underestimated by the majority of the CMIP5 ensemble. Douville et al. (2017) questioned the robustness of the snow-NAO relationship and argued for the importance of eastward phases of the Quasi-Biennial Oscillation (QBO) in modulating snow cover variability. There are also reports about other snow-associated large-scale teleconnections. For instance, positive (negative) SCE winter-to-spring anomalies in Eurasia may be followed by negative (positive) anomalies in rainfall during the Indian summer monsoon (Prabhu et al. 2017; Senan et al. 2016). We note, however, that this snow-related teleconnection is quite controversial (Peings and Douville 2010; Zhang et al. 2019).

In summary, there is a need for a comprehensive evaluation of snow characteristics in climate models using available observations over the last decades to demonstrate which Arctic snow features are most robust across different models and which are not well represented. Here, we focus on the representation of Arctic terrestrial snow (SCE and SWE) during the onset (October–November) and melting (March–April) seasons in the CMIP5 model ensemble. The paper is organized as follows. Section 2 briefly describes the snow data sources (CMIP5 models, reanalyses and observations) and the methodology of comparison. An evaluation of the long-term climatology of snow characteristics and annual cycles is presented in Sect. 3. The characteristics of climate variability in SCE and SWE are evaluated in Sect. 4. Section 5 summarizes the results and suggests some possible lines of future research.

2 Data and methods

2.1 Models and observational data sets

We use historical simulations of 16 CMIP5 models (Table 1) and focus on the representation of SCE and SWE in these models. For each model, we analyzed the model ensemble mean derived from all available ensemble realizations for a given model. For some assessments, we considered the multimodel mean derived as the average across all ensemble means for selected CMIP5 models. For validation of SCE and SWE in the CMIP5 models, we used two reanalyses (NOAA 20th Century Reanalysis and NCEP/CFSR), a satellite dataset from Rutgers University and a dataset combined from satellite observations, reanalyses and a snow product (CanSISE). Details of these datasets are given in Table 2.

Table 1 CMIP5 models, modeling center and institute acronym, resolution, resolution and number of members considered (https://cmip.llnl.gov/cmip5/)
Table 2 Major characteristics of observational datasets compared in this study

NOAA-CIRES 20th Century Reanalysis version 2 (‘NOAAV2c’) (Compo et al. 2011) covers the period from 1871 to 2012, with an output temporal and spatial resolution of 3 h and \({2.0^{\circ } \times 1.75^{\circ }}\), respectively. The National Center for Environmental Prediction Climate Forecast System Reanalysis (‘NCEP/CFSR’) (Saha et al. 2010) covers the period from 1979 to 2010 and has a 6-hourly temporal resolution and \({0.5^{\circ }\times 0.5^{\circ }}\) spatial resolution. From these two products, we used SCE only. The Northern Hemisphere weekly SCE Climate Data Record from Rutgers University is freely available at the NOAA (National Oceanic and Atmospheric Administration) National Center for Environmental Information (NCEI) (‘NOAA CDR’) and represents the longest satellite-based record of snow cover for the period from 1966 to the present at an \({89 \times 89}\) cartesian grid with a resolution of 190.6 km at \({60^{\circ }}\)N latitude. The data are binary, with ‘1’ indicating grid cells with at least 50% snow cover and ‘0’ indicating cells considered to be snow free (Robinson and Frei 2000). CanSISE Observation-Based Ensemble Version 2 of the Northern Hemisphere Terrestrial Snow Water Equivalent (Mudryk and Derksen 2017) (’CanSISE’) is a terrestrial SWE dataset based on passive microwave observations and ground-based weather stations, two reanalyses (ERA-Interim/Land, Balsamo et al. (2015) and MERRA, Rienecker et al. (2011)), the Crocus snowpack model (Brun et al. 2013) and the GLDAS product (Global Land Data Assimilation System Version 2) (Rodell et al. 2004; Rodell and Beaudoing 2013). This dataset has a daily temporal resolution and \({1.0^{\circ }}\) spatial resolution and covers the period from 1981 to 2010.

2.2 Preprocessing and methods

We analyzed the period from 1979 to 2005 for SCE and from 1981 to 2005 for SWE. For intercomparison, all models, reanalyses and observational datasets (except for NOAA CDR) have been regridded onto a \({1.0^{\circ } \times 1.0^{\circ }}\) lat-lon grid using bilinear interpolation. NOAA CDR was used on its original grid. We defined the terrestrial Arctic as the region north of \({50^{\circ }{\text{N}}}\). For large-scale averages applied in some analyses, we separately considered the North American sector and the Eurasian sector, which were separated by \({180^{\circ }{\text{E}}}\) (Fig. 1). Being focused on terrestrial snow, we analyzed the grid cells with a land fraction of at least 50% and with a permanent ice fraction of less than 15%. We applied to all analyzed products the land-ice mask of the GISS-E2-R model, which is the most restrictive mask, especially over coastal areas of North America. This ensures effective elimination of ocean regions and ice cells among all models.

Fig. 1
figure 1

Ice-free land Arctic domain and geographical characteristics features

To produce monthly time series of snow characteristics, all reanalyses and CanSISE products were averaged over calendar months. In the NOAA CDR dataset, weekly binary charts are attributed to the fifth day of the week, which was found to be the most representative day of the week regarding snow chart boundaries (Robinson et al. 1993). Furthermore, the monthly averaging of these data is provided by averaging all weekly charts that fall into a particular month. For instance, the chart week from 30 October to 5 November is attributed to October. This procedure introduces some uncertainty, which is, however, small, especially given that for many estimates, we used 2-monthly averaging periods.

We compute the snow margin position (SMP), which is defined as the 50% SCE contour. For both reanalyses, NOAAV2c and NCEP/CFSR, the SMP is computed directly by considering the contour of 50% SCE of the March–April and October–November climatologies. The NOAA CDR data are not included in this analysis due to the incompatibility between grid projections, as mentioned in the previous paragraph. To achieve comparability between CMIP5 model data and reanalyses, we assigned binary values in each model, with ‘1’ assigned to all grid cells with \({\ge }\) 50% and ‘0’ assigned to the remaining grid cells. To construct the binary field for models providing ensemble members, we first applied the conversion of SCE to the binary form for each ensemble member and then set the grid cell to ‘1’ if more than half of the ensemble members for a particular model report ‘1’, otherwise, the grid cell in the ensemble averaged model time series was set to ‘0’. By doing so, for each season, we obtain 16 binary fields for each CMIP5 model. To further evaluate the SMP among CMIP5 models, we summed the number of grid cells with a value of ‘1’ in each season over the model ensemble. For example, a value of ‘4’ in a grid cell implies that 4 out of 16 models display \({\ge }\) 50%. If this grid cell is within the contour of the snow margin in the reanalyses, the SMP in these 4 models is considered to be in agreement with the reanalyses. Taylor diagrams (Taylor 2001) providing an informative graphical summary of the extent to which model patterns are matching observations in terms of correlation, centered root-mean-square (RMS) difference and the amplitude of variability quantified by standard deviations (STD), were built using standard Python routines. RMS errors applied to the analysis of model consistency with data were computed using a standard procedure by taking the square root of the sum of squared residuals.

3 Evaluation of long-term climatology of snow characteristics

3.1 Snow cover extent

The climatology of SCE in the NCEP/CFSR reanalysis (Fig. 2) is used here as a reference for evaluation of SCE in CMIP5 models. The October-November climatology in NCEP/CFSR (Fig. 2a) shows the maximum SCE (80–100%) over northern Alaska, the Canadian Arctic Archipelago and northeastern Eurasia. Over North America, high values of SCE are observed in the Brooks mountain range, in the Mackenzie Mountains and over the Canadian Arctic Archipelago. Over Eurasia, the maximum SCE is observed over northeastern Siberia with maximum values of 90–100% over the Verkoïansk, Tcherski and Koryak mountain ranges.

Fig. 2
figure 2

Climatology of snow cover extent in October–November (a) and March–April (b) from the NCEP/CFSR reanalysis over the period 1979–2005

In March–April (Fig. 2b), complete snow cover (100%) is observed over northeastern Eurasia and northern Canada and Alaska. Note that the areas covered by more than 80% snow match the areas of continuous permafrost well (Brown et al. 1997). In the southern part of Western Eurasia, snow has almost melted, with the SCE being approximately 15%, whereas in Scandinavia, SCE values of 80–100% are observed north of \({60^{\circ }{\text{N}}}\). Over North America, the decrease in SCE from north to south is more pronounced than over Eurasia in both seasons.

Fig. 3
figure 3

Difference of October–November snow cover climatology of each ensemble CMIP5 minus NCEP/CFSR. Only differences with p-values less than 0.01 from the t-Student test are shown

Fig. 4
figure 4

Same as in Fig. 3 but for March–April

We now turn to the evaluation of differences in climatology between the ensemble of CMIP5 models and the NCEP/CFSR reanalysis (Figs. 3, 4). In October-November, 7 of the 16 models (bcc-csm1-1, CSIRO-Mk3-6-0, MIROC’s and MPI’s versions; Fig. 3a, e, i, j, k, l, m) show a general underestimation of SCE of up to 60% over the whole terrestrial Arctic. In contrast, the models CCSM4, inmcm4, MRI-CGCM3 and NorESM1-M/ME (Fig. 3c, h, n, o, p) demonstrate general overestimation of SCE over North America and eastern Eurasia by 50–60%. GISS-E2-H/R (Fig. 3f, g) exhibits overestimation in western Eurasia and eastern North America by 30% and underestimation in the coastal regions of the East Siberian Sea and Bering Sea by 25%. The smallest differences are observed for CanESM2 and CNRM-CM5 (Fig. 3b, c), and they show good agreement with NCEP/CFSR, especially in central Siberia.

Analysis of differences in March–April SCE between selected CMIP5 models and NCEP/CFSR reanalysis identify three groups of models. Compared to NCEP/CFSR, the first group (bcc-csm1-1, CSIRO-Mk3-6-0 and both versions of MPI-ESM-LR/P, Fig. 4a, e, l, m) generally underestimates snow cover by almost 50%, with the largest differences observed in the southern part of western Eurasia. The second group (CanESM2, MIROC-ESM/-CHEM and MRI-CGCM3; Fig. 4b, i, j, n) broadly overestimates SCE, with the largest differences observed in the Rocky Mountains (almost 60% by CanESM2). These four models show a zonal underestimation of 35% along the southern edge of western Eurasia (Fig. 4b, i, j), although the underestimation of MRI-CGCM3 is smaller at 15% (Fig. 4n). The third group (CNRM-CM5, GISS-E2-H/R and MIROC5; Fig. 4d, f, g, k) exhibits a dipole structure of differences, with underestimation of SCE in the north and overestimation in the south, both being within 30%. The model inmcm4 (Fig. 4h) shows overestimation by 15–40% in the southern part of eastern Eurasia and local underestimation by less than 30% in western Canada. Similar patterns are shown by CCSM4 (Fig. 4c) and NorESM1-M/-ME (Fig. 4o, p), with underestimation by up to 30% in western Eurasia and eastern Canada and overestimation by 15% in southeastern Eurasia and western Canada. The smallest differences are observed in both versions of NorESM1-M/ME (Fig. 4o, p) and range within \({\pm }\) 35% in southern Eurasia and in the Rocky Mountains.

The differences between the CMIP5 models and the NOAAV2c reanalysis (not shown) are very similar to those found for NCEP/CFSR in terms of spatial patterns. However, NOAAV2c tends to slightly overestimate the SCE climatology of NCEP/CFSR (Fig. S1). Therefore, when models have shown a positive (negative) bias compared to NCEP/CFSR, these differences are smaller (larger) in magnitude and less (more) significant when using NOAAV2c as a reference.

3.2 Snow water equivalent

The climatology of Arctic SWE in the melting (March–April) and onset (October–November) seasons over 1981–2005 derived from the CanSISE ensemble product is shown in Fig. 5. In October–November (Fig. 5a), maximum values of up to 100 \({\text{kg}}\cdot{ {\text{m}}^{-2}}\) are observed east of the Yenisei River. All over the Arctic, the SWE is approximately 20 \({\text{kg}}\cdot{ {\text{m}}^{-2}}\), with higher values of up to 60 \({\text{kg}}\cdot{ {\text{m}}^{-2}}\) over the Verkhoyansk, Tcherski and Koryak mountain ranges. In March–April (Fig. 5b), SWE values are higher than in October–November. The areas with high SWE include the Kamchatka Peninsula and the region east of the Yenisei River, where the SWE reaches 300 \({\text{kg}}\cdot{ {\text{m}}^{-2}}\). High values of up to 210 \({\text{kg}}\cdot{ {\text{m}}^{-2}}\) are observed over the Ural and Kolyma Mountains and eastern Canada. The remaining part of the terrestrial Arctic is characterized by SWE values of approximately 50 \({\text{kg}}\cdot{ {\text{m}}^{-2}}\).

Fig. 5
figure 5

Climatology of snow water equivalent in October–November (a) and March–April (b) of the CanSISE product over the period 1981–2005

White areas in Fig. 5 correspond to the land-ice mask of the CanSISE ensemble product visible over the western coast of Canada and Alaska and over Scandinavia. This mask is originally from the MERRA reanalysis of Rienecker et al. (2011), which considers a grid cell masked when the land-ice fraction is \({>}\) 0 (L. Mudryk, personal communication). As mentioned in Sect. 2, this mask was not used for the analysis; instead, the land-ice mask from GISS-E2-R was used.

Fig. 6
figure 6

Difference of October–November snow water equivalent climatology of each CMIP5 model minus the CanSISE Ensemble product over the period 1981–2005. Only differences with p-values less than 0.01 from the t-Student test are shown

Fig. 7
figure 7

Same as in Fig. 6 but for March–April. Note that color bar range is multiplied by a factor of 3 from Fig. 6

We now turn to evaluate the difference in SWE climatology between the CMIP5 ensemble and the CanSISE product (Figs. 6, 7). The October-November differences between CanSISE and modeled SWE (Fig. 6) are smaller than the March–April differences (by almost a factor of 3; Fig. 7), with the strongest overestimation found for NorESM1-M and NorESM-ME (Fig. 6o, p) and reaching 80 \({\text{kg}}\cdot{ {\text{m}}^{-2}}\) over far eastern Eurasia. In March–April, CSIRO-Mk3-6-0 and MPI-ESM-LR/P show a broad underestimation of SWE, which is especially strong in Eurasia. In March–April, there is an overall overestimation of SWE across nearly all CMIP5 models compared to the CanSISE ensemble product, with differences ranging from 40 to 120 \({\text{kg}}\cdot{ {\text{m}}^{-2}}\) over most of northern Eurasia and North America. Local underestimation by up to \(-120\) \({\text{kg}}\cdot{ {\text{m}}^{-2}}\) is observed in the majority of models in the delta of the Yenisei River and is especially pronounced in GISS-E2-H/R, MIROC-ESM/-CHEM and MIROC-5 (Fig. 7f, g, i, j, k). This pattern likely results from the lack of capability of large-scale GCMs to capture this local effect. The pattern of underestimation of SWE over western Eurasia is evident in the CCSM4, CSIRO-Mk3-6-0, MPI-ESM-LR/P and NorESM1-M models. Among these, CSIRO-Mk3-6-0 demonstrates a pan-Arctic pattern of underestimation with differences of approximately 26 \({\text{kg}}\cdot{ {\text{m}}^{-2}}\) (Table 4). The relative differences are larger in March–April than in October–November, which suggests that the discrepancies in the SWE representation are more related to the melting process, i.e., SWE changes in March–April are more sensitive to temperature changes.

3.3 Annual cycle

We turn now to the comparative analysis of the annual cycle of snow characteristics in CMIP5 models, reanalyses and observations for different Arctic regions. Regional climatological monthly means were obtained for the period 1979–2005 for SCE and for 1981–2005 for SWE. Note that we applied a model-specific land-ice mask to each CMIP5 model and the GISS-E2-R mask to the reanalyses (NCEP/CFSR, NOAAV2c) and CanSISE product. For the NOAA CDR dataset (which has a binary land mask but does not provide an ice mask), we applied the July SCE climatology (1979–2005) to generate a virtual ice fraction mask. Thus, we masked the grid cells with more than 85% SCE, which is equivalent to a perennial SCE (ice fraction) of less than 15% as we had done with CMIP5.

Fig. 8
figure 8

Annual cycle of snow cover extent (top row; 1979–2005) and of snow water equivalent (bottom row; 1981–2005) for the Arctic (left column; a, d), Eurasia (center column; b, e) and North America (right column; c, f) of the ensemble mean of CMIP5 selection compared to the reanalyses (NCEP/CFSR and NOAAV2c) and NOAA CDR for SCE and to the CanSISE product for SWE

For the whole Arctic (Fig. 8a), the annual cycle of SCE in both NCEP/CFSR and NOAAV2c is in good agreement with the NOAA CDR, with the largest differences not exceeding 10%. Between both reanalyses, the highest differences are only approximately 10% in May, and they are observed in Eurasia. Compared to the NOAA CDR, the multimodel mean generally reproduces the annual cycle, with the largest discrepancies of up to 25% in October and November.

The majority of the CMIP5 models tend to underestimate SCE throughout the year, most notably in autumn and winter (October–February). This underestimation is most distinct in North America (Fig. 8c), while some models (CCSM4, MRI-CGCM3 and NorESM1-M/ME) match the NOAA CDR quite closely over Eurasia (Fig. 8b), with differences within 10%. The largest discrepancies with respect to observational data (close to 40%) are identified in the MIROC versions during the months of October and November and over both subdomains (Fig. 8b, c). When melting starts, discrepancies across the models are also evident. Considering the March-June period, MPI-ESM-LR/P, CSIRO-Mk3-6-0, bcc-csm1-1 and inmcm4 tend to underestimate SCE by up to 30% in May, mainly in Eurasia (Fig. 8b), and report complete snow melt in June. In contrast, the other models maintain some snow cover until July. This underestimation is mostly visible over North America, with values of up to 40% in MPI-ESM-LR (Fig. 8c). Other models (CanESM2, CNRM-CM5 and MRI-CGCM3) tend to slightly overestimate SCE by up to 10% in May primarily due to the Eurasia pattern (Fig. 8b). During the May–August period, compared to NOAA CDR, both versions of NorESM1-M/ME show an overestimation of SCE by up to 15%, implying that these two models are not capable of completely melting snow in summer, especially over Eurasia (Fig. 8b).

Compared to SCE, the annual cycle of SWE (Fig. 8d–f) is characterized by a relatively stronger spread among the models and larger deviations from observations. There is a definite overestimation of SWE in the CMIP5 models with respect to the CanSISE product. The multimodel mean shows a positive bias throughout the annual cycle, with the largest overestimation during March–May, when the difference for the whole Arctic reaches 40 \({\text{kg}}\cdot{ {\text{m}}^{-2}}\) (Fig. 8d), with values of 65 \({\text{kg}}\cdot{ {\text{m}}^{-2}}\) in North America (Fig. 8f) and 30 \({\text{kg}}\cdot{ {\text{m}}^{-2}}\) in Eurasia (Fig. 8e). Considering the whole Arctic (Fig. 8d), the largest differences are observed during the melting season, from March to June, reaching 20–30 \({\text{kg}}\cdot{ {\text{m}}^{-2}}\). Among CMIP5 models, the greatest SWE overestimation is in excess of 100 \({\text{kg}}\cdot{ {\text{m}}^{-2}}\) and occurs in April for MRI-CGCM3 (Fig. 8d). The differences are larger over North America (Fig. 8f) than over Eurasia (Fig. 8e).

Notably, there are three models (CSIRO-Mk3-6-0 and both versions of MPI-ESM-LR/P) that match the annual cycle of the CanSISE product fairly well for the whole Arctic and exhibit differences in SWE from 10 to +20 \({\text{kg}}\cdot{ {\text{m}}^{-2}}\). For the period from August to September, when CanSISE is characterized by complete snow melt in the Arctic (Fig. 8d), some models (MIROC5 and both versions of NorESM1-M/ME) still show values of up to 50 \({\text{kg}}\cdot{ {\text{m}}^{-2}}\) over North America (Fig. 8c). During the snow season (January to May), nearly all models display large overestimations over all subdomains. The only exception is one group of models (MPI-ESM-LR/P, CSIRO-Mk3-6-0 and NorESM1-ME) that shows a negative bias over Eurasia (Fig. 8e).

3.4 Snow margin position

We analyze the SMP for the onset (October–November) and melting (March–April) seasons for the CMIP5 ensemble and the two reanalyses. Both reanalyses are in better agreement with each other in March–April (Fig. 9b) than in October–November (Fig. 9a). In March–April, the snow margins in both products match each other almost perfectly. However, in October–November, the reanalyses exhibit large spatial discrepancies with a deviation between the SMPs of approximately 10\(^\circ \) latitude. These discrepancies are more pronounced over North America than over Eurasia. This is due to a less intense onset of SCE in NOAAV2c than in NCEP/CFSR, which is consistent with the faster rate of increase in SCE in NCEP/CFSR than in NOAAV2c in October–November (Fig. 8c).

Fig. 9
figure 9

The snow margin position (SMP) is defined as the contourline of 50% snow cover fraction averaged over 1979–2005 in the NCEP/CFSR and NOAAV2c reanalyses (purple and pink lines, respectively) for October–November (a) and March–April (b). The color bar shows the number of CMIP5 models that display an ensemble seasonal mean of snow cover fraction equal or superior to 50%

The SMP is also in a better agreement between different models and reanalyses during March–April than in October–November. This suggests that the spatial discrepancies in the SMP are likely due to differences in the starting time of the melting season among the CMIP5 models. The mismatch is accentuated in southwestern Siberia, with only 3–4 models being consistent with the reanalyses. The SMP over North America is more accurately represented, with 10–12 models being in close agreement with the reanalyses. In central and northern Eurasia, over Scandinavia, Canada and Alaska, all models match the reanalyses reasonably well (Fig. 9b). However, in October–November (Fig. 9a), the agreement between the model datasets and the reanalyses becomes less evident. Only three of the 16 models feature SMPs close to that of NCEP/CFSR, whereas the SMPs of 10 other models are closer to that of NOAAV2c. A reasonably good agreement across all datasets is identified only over the Siberian Plateau, the Verkhoyansk mountainous region and northern Canada. The worst agreement in the SMP is observed in the southern part of North America and the region west of Lake Baikal. Some disagreements observed in coastal areas are associated with different ice-land masks.

3.5 Summary of evaluation of climatologies

Analysis of the CMIP5 ensemble spread in the estimates of SCE and SWE is summarized in Tables 3 and 4, which present the long-term means over the terrestrial Arctic. During October–November, the multimodel mean (68% SCE) is in general agreement with observations closely matching the NOAA CDR and overestimating the NCEP/CFSR and NOAAV2c by approximately 15%. Among CMIP5 models, CCSM4, inmcm4, NorESM1-M/ME, and MRI-CGCM3 tend to overestimate the observed SCE by approximately 20–25%, with the remaining models showing SCE values close to those of the observations. In March–April, there is good agreement between the reference datasets (NCEP/CFSR and NOAAV2c), which deviate from each other by less than 10%; however, in October–November, differences may amount to 15%. In March–April, the observed SCE is approximately 80–85%, implying that almost all the terrestrial Arctic is covered in snow (as in Fig. 9b). The multimodel CMIP5 mean strongly underestimates the observed SCE in March–April by approximately 30%. There is a group of models (bcc-csm1-1, CSIRO-Mk3-6-0, inmcm4, and MPI-ESM-LR/P) with March–April SCE values ranging from 30 to 45%. The remaining models yield March–April SCE values of 60–70%, which are closer to the observed values but are nevertheless strongly negatively biased. For SWE, the observed CanSISE mean is 10 \({\text{kg}}\cdot{ {\text{m}}^{-2}}\) smaller than the multimodel mean (Table 4). The best agreement with observations is demonstrated by both versions of MPI-ESM-LR/P and CSIRO-Mk3-6-0. The NorESM1-M model is an obvious outlier and yields SWE values that are nearly twice as high as the observations. In March–April, the observed CanSISE mean is approximately 60% smaller than the multimodel mean. The spread in model SWE in March–April is quite large and ranges from a minimum of 33 \({\text{kg}}\cdot{ {\text{m}}^{-2}}\) (CSIRO-Mk3-6-0) to a maximum of approximately 148 \({\text{kg}}\cdot{ {\text{m}}^{-2}}\) (MRI-CGCM3). Only three models (inmcm4 and both versions of MPI-ESM-LR/P) are in relatively good agreement with the observations.

Table 3 October–November and March–April SCE mean (\(\mu \), %), standard deviation (\(\sigma _{\mu }\), %), trend (T, %/dec) and its associated standard trend error (\(\sigma _{T}\), %/dec) computed over the terrestrial Arctic during the period 1979–2005
Table 4 October–November and March–April SWE mean (\(\mu \), \({\text{kg}}\cdot{ {\text{m}}^{-2}}\)), standard deviation (\(\sigma _{\mu }\), \({\text{kg}}\cdot{ {\text{m}}^{-2}}\)), trend (T, \({\text{kg}}\cdot{ {\text{m}}^{-2}}\)/dec) and its associated standard trend error (\(\sigma _{T}\), \({\text{kg}}\cdot{ {\text{m}}^{-2}}\)/dec) computed over the terrestrial Arctic during the period 1981–2005

4 Evaluation of trends and interannual variability in snow characteristics

We turn now to the evaluation of the past variability in SCE and SWE over the terrestrial Arctic in the CMIP5 model historical runs for the onset (October–November) and melting (March–April) seasons during 1979–2005 (for SCE) and 1981–2005 (for SWE).

4.1 Interannual variability of snow characteristics

Highlighting the assessment presented above (Sect. 3.5), the analysis of seasonal time series of SCE and SWE (Fig. 10) also sheds some light on the origins of the differences revealed by Tables 3 and 4. Thus, disagreement between the observed SCE in October-November is especially evident after 1995 when the NOAA CDR shows an upward tendency, whereas the NCEP/CFSR and NOAAV2c both show a decrease in SCE (Fig. 10a). This may also affect trend estimates, which is considered below.

Fig. 10
figure 10

Time series of snow cover (top row; 1979–2005) and snow water equivalent (bottom row; 1981–2005) averaged over the terrestrial Arctic in October–November (a, c) and in March–April (b, d). Individual CMIP5 models are shown in color dotted lines, multimodel mean in black thick line, SCE reanalyses (NCEP/CFSR and NOAAV2c) in grey and NOAA CDR for SCE and CanSISE ensemble for SWE are in black-squared

Fig. 11
figure 11

Taylor diagram used in the evaluation of the time series of snow cover (top row; 1979–2005) and snow water equivalent (bottom row; 1981–2005) over the Arctic in October–November (a, c) and in March–April (b, d) in CMIP5 models (a–p letters). For SCE a, b, NCEP/CFSR reanalysis (Q) and NOAAV2c reanalysis (R) are included versus the satellite-based NOAA CDR (black star). For SWE c, d we use as reference the CanSISE ensemble product (black star). The radial distance to the origin indicates the standard deviation, the centered root-mean-squared (RMS) is the distance to the reference point (star in x-axis) and the azimuthal positions give the correlation coefficient R. The RMS difference is computed once the overall bias between model and observations has been removed

Figure 11 shows Taylor diagrams (Taylor 2001) built for the seasonal time series. In October–November (Fig. 11a), the magnitude of interannual variability in the NCEP/CFSR reanalysis is close to the observational reference (4.8% and 4.5%, respectively). By contrast, NOAAV2c reanalysis and the majority of models exhibit standard deviations (STDs) that are considerably smaller than in observations. In March–April (Fig. 11b), the discrepancies with the NOAA CDR are smaller than those in October–November, with the centered RMS difference being larger for all models and reanalyses. The characteristics of interannual variability in NCEP/CFSR, NOAAV2c and NOAA CDR SCE are in agreement with each other, with STDs ranging from 2.4 to 2.9% (Fig. 11b). However, only NOAAV2c showed a positive and significant correlation with NOAA CDR. Most of the CMIP5 models show much smaller STDs than the observations and practically no correlation with observations on the interannual scale, except for bcc-csm1-1 and MIROC5, which show weak and marginally significant correlations of 0.4.

In October–November (Fig. 11c), the majority of the CMIP5 models underestimate the magnitude of the interannual variability in SWE reported by the observations, except for inmcm4, NorESM1-ME and all versions of MIROCs (Fig. 11c), which are considered to be outliers with respect to STDs (more than 4 \({\text{kg}}\cdot{ {\text{m}}^{-2}}\)) and are not shown in Fig. 11c. The March–April SWE in the CMIP5 models is characterized by a large spread (Fig. 11d). The observed interannual variability in SWE (4.45 \({\text{kg}}\cdot{ {\text{m}}^{-2}}\)) is well captured by CCSM4, MIROC-ESM and MPI-ESM-P. For the majority of models, the centered RMS differences are approximately 6 \({\text{kg}}\cdot{ {\text{m}}^{-2}}\). The MIROC5, inmcm4 and NorESM1-ME models are characterized by the highest RMS values and STDs. The lowest magnitude of interannual variability in CSIRO-Mk3-6-0 is likely due to the number of ensemble members averaged (10 members, Table 1).

All CMIP5 models display low correlation with the observational reference dataset for both SCE and SWE. This is not surprising considering the fundamental limitations of GCMs in reproducing interannual variability (Taylor et al. 2012). Thus, the simulated temporal evolution in the models may miss the impact of unforced internal variations such as El Niño, the NAO and the Atlantic Multidecadal Oscillation (AMO). As a result, the interannual variability correlates significantly with observations only by chance. The Northern Hemisphere temperature (crucial for snow melting in March–April) and precipitation (the origin of snow during October–November) are dominated to a large extent by internal climate variability, such as the NAO (Hurrell et al. 2003), and the associated positioning of cyclone tracks (Gulev et al. 2002; Popova 2007; Tilinina et al. 2013; Webster et al. 2019).

4.2 Seasonal trend analysis

We estimate SCE and SWE trends in CMIP5 models from the individual model ensemble members, the ensemble of a single model and from the multimodel ensemble mean. In total, we analyze 73 individual model realizations provided by 16 models (Table 1) and compare them with estimates based upon observations.

Fig. 12
figure 12

Land Arctic trend of snow cover (top row; 1979–2005) and snow water equivalent (bottom row; 1981–2005) in October–November (a, c) and March–April (b, d). Trends are computed for each individual model realization (circle), the ensemble model mean (diamond) and the multimodel mean (black square). Shaded regions indicate the standard error of the trend in observation reference dataset when is statistically significant at 90% of confidence level (t-test). Filled markers indicated statistical significance at 90% of confidence level (t-test)

In October–November (Fig. 12a, Table 3), the NOAA CDR displays a trend of \(+3.28 \pm 0.91\%\)/decade and is thus a clear outlier with respect to the other reference datasets. The positive October trend estimated by the NOAA CDR in the recent period has already been reported by Brown and Derksen (2013) and Estilow et al. (2015); however, there is mounting factual evidence that this is misrepresented by GCMs and reanalysis products (Allchin and Déry 2019). NCEP/CFSR and NOAAV2c show SCE trends of \(-1.61 \pm 1.15\%\)/decade and \(-0.76 \pm 0.59\%\)/decade, respectively, but they lack statistical significance. Consequently, we consider NCEP/CFSR as a reference for model evaluation since it displays a p-value slightly lower than that of NOAAV2c (0.17 and 0.21, respectively). The multimodel mean shows an October–November SCE trend of \(-0.72 \pm 0.12\%\)/decade, which falls in the range given by the NCEP/CFSR estimates. We also note generally weaker SCE trends in models with respect to those from NCEP/CFSR. Thirteen of 16 models are within the observed SCE trend range. Considering individual ensemble members, 37 estimates fall into the range implied by observations; thus, approximately 50% of individual model ensemble members present trends that are qualitatively consistent with observations. The largest spread across ensemble members is observed for bcc-csm1-1, CanESM2 and both versions of MPI-ESM-LR/P.

NCEP/CFSR shows an insignificant increase in March–April SCE, disagreeing with NOAA CDR and NOAAV2c, as both consistently display significantly negative trends of \(-1.21 \pm 0.58\%\)/decade and \(-1.26 \pm 0.68\%\)/decade, respectively (Fig. 12b, Table 3). The multimodel mean with a value of \({1.10} \pm 0.13\%\)/decade falls in the range of estimates given by NOAA CDR and NOAAV2c. However, five models (inmcm4, MIROC-ESM-CHEM, MPI-ESM-LR, MRI- CGCM3 and NorESM1-ME) report SCE trends outside of the observation range, but the deviations are statistically significant in only two of them (MIROC-ESM-CHEM and MPI-ESM-LR) (Table 3). Among the individual ensemble members, 39 of 73 (more than 50%) report significant trends consistently with observations. Remarkably, some models (bcc-csm1-1, CNRM-CM5, MIROC5 and NorESM1-M) demonstrate a quite strong spread in trend estimates among their ensemble members.

In October–November (Fig. 12c, Table 4), most CMIP5 models show stronger negative trends in SWE compared to the CanSISE product, which displays a weak and insignificant SWE trend of \(-0.63 \pm 0.76\)   \({\text{kg}}\cdot{ {\text{m}}^{-2}}\)/decade, while the multimodel mean trend is \(-1.65 \pm 0.18\)  \({\text{kg}}\cdot{ {\text{m}}^{-2}}\)/decade. Six out of the 16 CMIP5 models (CanESM2, CSIRO-Mk3-6-0, GISS-E2-H/R and MPI-ESM- LR/P) fall inside the range given by observations; however, only two of them present significant trends (GISS-E2-H and MPI-ESM-LR). None of the individual model members simulate a significant SWE trend inside the range given by the CanSISE product (except for one member of CSIRO-Mk3-6-0).

In March–April (Fig. 12d), SWE in the CanSISE product shows an insignificant downward tendency for the whole Arctic (\(-1.09\pm 1.25\)  \({\text{kg}}\cdot{ {\text{m}}^{-2}}\)/decade). The multimodel model mean, however, shows a strong, negative significant trend in SWE of \(-2.85 \pm 0.34\)  \({\text{kg}}\cdot{ {\text{m}}^{-2}}\)/dec. Five models (bcc-csm1-1, CSIRO-Mk3-6-0, GISS-E2-R, MPI-ESM-LR and MRI-CGCM3) report trends in March-April SWE falling in the range given by observations, but only two of them are statistically significant (CSIRO-Mk3-6-0 and GISS-E2-R). Among individual ensemble members, only 5 of the 73 significant SWE trends fall within the observed range (all belonging to the CSIRO-Mk3-6-0 model). Thus, less than 7% of ensemble simulations are in agreement with the observed SWE trend in March-April. The individual realizations of bcc-csm1-1, CNRM-CM5, MIROC5 and MRI-CGCM3 have the largest spread of trend estimates. Generally, the spread of individual members is smaller for October–November than for March–April, except for the MIROC5 and MRI-CGCM3 models.

Fig. 13
figure 13

Left column: number of CMIP5 models displaying a positive/negative SCE trend in October–November (a) and in March–April (c) during 1979–2005. Only SCE trends statistically significant at 80% from t-Student distribution are counted. Right column: spatial trend of SCE trend (%/dec) from NCEP/CFSR reanalysis in October–November (b) and March–April (d) during 1979–2005. Hatched regions indicate SCE trends statistically significant at 90% from t-Student distribution

Fig. 14
figure 14

Left column: number of CMIP5 models displaying a positive/negative SWE trend in October–November (a) and March–April (c) during 1981–2005. Only SWE trends statistically significant at 80% from t-Student distribution are counted. Right column: spatial trend of SWE trend ((\({\text{kg}}\cdot{ {\text{m}}^{-2}}\))/dec) in October–November (b) and March–April (d) during 1981–2005 of CanSISE ensemble product. Hatched regions indicate SWE trends statistically significant at 90% from t-Student distribution

The spatial patterns of linear trends in individual models are extremely noisy, with trends of different signs coexisting in close proximity. Figures 13a, c and 14a, c quantify the degree of consensus across the 16 models [based on 14 snow depth (SND) models] in terms of the linear trends in SCE and SWE. For each grid cell, we compute the maximum number of models agreeing on the sign of trend and on its significance. For instance, if 8 models in a particular cell show significant positive trends, 5 show significant negative trends and 3 show insignificant trends, this grid cell has a “\(+8\)” value. If the numbers of models with positive and negative trends are equal, the grid cell is set to zero. For this analysis, we used the 80% significance level (t-test) to estimate the statistical significance of individual trends.

Figure 13b shows the NCEP/CFSR trend pattern in SCE in October–November. NCEP/CFSR shows decreasing trends over eastern Europe and north of the Eurasian dry land band (EDLB, Groisman et al. 2018) in Siberia. Over North America, negative trends in SCE (− 8 to 12%/decade) dominate over most regions except the western mountainous regions (Fig. 13b). Locally, positive trends (up to \(+\) 12%/decade) are observed over northeastern Eurasia in the Cherskly Mountain Range. The model ensemble exhibits negative trends dominating over positive ones practically everywhere (Fig. 13a). This coincides with NCEP/CFSR over eastern Europe and north of the EDLB in Siberia but not over the northeastern Eurasia and western North America.

In March–April (Fig. 13d), NCEP/CFSR exhibits a significant positive trend over the eastern part of Siberia, with a maximum of \(+\) 12%/decade observed east of Lake Baikal. Positive SCE trends of \(+\) 10 to 12%/decade are also observed over the mountainous regions in the western part of North America. Most of eastern Europe is characterized by significantly negative trends, amounting to − 12%/decade over eastern Scandinavia and south of the Ural Mountains. The best agreement between CMIP5 models and observations is located over western Eurasia, where 8–12 models (out of 16) display significantly negative trends. Over northeastern Eurasia, 4–8 models exhibit positive trends, in agreement with NCEP/CFSR. The North American pattern of negative trends is captured by 4–8 CMIP5 models; however, the CMIP5 ensemble does not capture the pattern of positive trends over the Rocky Mountains and over the Laurentian Plateau (Fig. 13c).

A similar analysis is carried out for SWE during the period 1981–2005 (Fig. 14). The CanSISE product shows very few locations with statistically significant trends in both seasons. In October–November (Fig. 14b), SWE in the CanSISE product demonstrates negative trends over northwestern Canada and the southern part of Eastern Eurasia and locally positive trends over northern Eurasia along the coast of the Arctic Ocean. In March–April (Fig. 14b), a statistically significant decrease in SWE is observed in northern Eurasia along the Arctic Ocean coast, and a local increase is identified over the southern part of western Siberia. In October–November (Fig. 14a), the model ensemble reports negative trends nearly everywhere, with approximately 8 models showing downward SWE changes in North America and western Eurasia. Eastern Eurasia is characterized by little consensus among the CMIP5 models, with estimates showing mostly insignificant trends and less than 4 of the 16 models demonstrating statistically significant trends. In March–April, most models report primarily negative trends in SWE over eastern Eurasia and primarily positive tendencies over northern Siberia (Fig. 14c).

5 Summary and conclusion

We analyzed the representation of SCE (1979–2005) and SWE (1981–2005) in the Arctic during the onset (October–November) and melting season (March–April) in the historical runs of CMIP5 climate models on the basis of two reanalysis products, one satellite-observed product and an ensemble of different datasets. For SCE, we identified three groups of CMIP5 models: those broadly overestimating the observed SCE (up to 60% regionally), those underestimating the observed climatological SCE (with the greatest differences of 40–50%) and those overestimating the SCE over the northern regions and underestimating the SCE over the southern regions. Differences are especially remarkable in March–April but are also pronounced during October–November. For SWE, we identified a pan-Arctic overestimation in most CMIP5 models except for CSIRO-Mk3-6-0 and MPI-ESM-LR/P, which show either general underestimation or an east–west (underestimation–overestimation) pattern over Eurasia. Again, these patterns are most pronounced during March–April, when positive biases may be on the same order as the mean values. The observed annual cycle of SCE is relatively well captured by CMIP5 models, mainly in autumn, but during the spring–summer season, most models tend to significantly underestimate the observed SCE, especially over North America. At the same time, the annual cycle of SWE is largely positively biased with respect to observations in most CMIP5 models, indicating persistent snow during summer months, thus contradicting observations.

The SMP is better captured by CMIP5 models in March–April, with the best agreement found over Eastern Eurasia and Western North America. In October–November, the CMIP5 ensemble identifies a considerable northward shift in the SMP, and overall agreement occurs only in central Siberia, whereas only 4–6 out of the 16 models match the location of the observed snow margin.

We found that the magnitudes of interannual variability in SCE and SWE are significantly underestimated in most CMIP5 models compared to observations. With respect to pan-Arctic interdecadal trends, most CMIP5 models show weakly negative (but in many cases statistically significant) trends in both SCE (in March–April) and SWE (in both seasons), which is not the case for the observational datasets, which show mostly insignificant trends in pan-Arctic snow characteristics. Regionally, the CMIP5 ensemble captures a relatively good observational trend pattern in SCE, but trends in SWE show no consistency with the regional trend patterns over both continents.

In agreement with previous studies (Brutel-Vuilmet et al. 2013; Mudryk et al. 2014), we found that the SCE annual cycle is relatively well captured by CMIP5 models compared to the three different reference datasets used. Nevertheless, the significant overestimation and the large spread of snow mass over the Northern Hemisphere reported by Roesch (2006) in the previous generation of models (CMIP3) has not been significantly reduced in CMIP5. Regional differences in snow cover duration in autumn (Peng et al. 2013; Brown et al. 2017; Liston and Hiemstra 2011) are likely the cause of the spatial differences in the position of the snow margin in the onset season in both reanalyses and models.

In March–April, we have found that SCE discrepancies in CMIP5 models are observed over the southern part of the Arctic, where temperature changes are substantial and may cause the snow to recede. This is consistent with the inability of CMIP5 models to capture the sensitivity of snow cover to temperature changes (Fletcher et al. 2012; Brutel-Vuilmet et al. 2013; Thackeray et al. 2016). Conversely, during the onset season, snow changes are less dependent on temperature variations. In fact, snow onset is highly coupled with variability in regional precipitation and surface temperature, as both are needed to initiate snow cover (Mudryk et al. 2017; Ye 2019). The accumulation and persistence of snow are both linked to modes of variability in the atmosphere, such as the NAO and AO (Bamzai 2003; Cohen et al. 2012; Liu et al. 2012). However, these modes of variability are not always accurately reproduced in the CMIP5 ensemble (Ning and Bradley 2016; Davini and Cagnazzo 2014), which may result in a poor representation of snow-atmosphere coupling (e.g., Furtado et al. 2015; Gastineau et al. 2017). We hypothesize that the October-November mismatches and the strong overestimation of SWE during the entire snow season may be linked to an inadequate representation of atmospheric internal variability and snow-atmosphere coupling. This is consistent with the relatively low interannual variability found in CMIP5 models (Fig. 11) and the internal limitations of CMIP5 models (Taylor et al. 2012).

An important lesson from our work is that the use of multiple data sources is strongly recommended for model evaluation of snow characteristics. We used multiple datasets to evaluate climate variability in CMIP5 model experiments. We have found that in both the March-April and October–November seasons, CMIP5 snow characteristics display negative but weaker-than-observed trends, consistent with previous studies at the Northern Hemisphere scale (Brutel-Vuilmet et al. 2013; Mudryk et al. 2014; Connolly et al. 2019). Additionally, we confirm that, compared to the other data sources, the October–November trend of the NOAA CDR dataset (\(+3.28 \pm 0.91\%\)/dec) is an outlier, as previously reported by Brown and Derksen (2013) and Estilow et al. (2015). The discrepancies found in SCE trends among the reference datasets imply the importance of a multidataset approach to evaluating snow characteristics in models. As far as evaluation is sensitive to the choice of the reference dataset (e.g., Gómez-Navarro et al. 2012), the use of a single reference dataset may result in biased conclusions. Analysis of multiple datasets (Mudryk et al. 2015) or an ensemble product (CanSISE, Mudryk and Derksen 2017) helps to overcome the limitations of individual datasets and provides a better assessment of the climate state.

Overall, we can conclude that there is still a lack of confidence in climate model simulations of snow in the Arctic partially due to high spatial and temporal variability in snow characteristics but also due to model skill limitations. This raises serious concerns about the robustness of future projections of snow in climate models (Hinzman et al. 2013). Polar amplification (Déry and Brown 2007; Hernández-Henríquez et al. 2015) enhances SAF over northern latitudes and higher elevations and has already surpassed climate projections over the 2008–2012 period (Derksen and Brown 2012). In the context of a warmer and wetter Arctic (Screen and Simmonds 2013; Bintanja and Selten 2014; Dufour et al. 2016), the impact of snow on surface albedo (Thackeray et al. 2018), circulation patterns (Rydzik and Desai 2014), permafrost (Biskaborn et al. 2019) and water resources (Mankin et al. 2015) should be considered a high priority. In this respect, more targeted simulations with ESM-SnowMIP (Krinner et al. 2018) and with a new generation of models (Eyring et al. 2016) are highly desirable for enriching our knowledge of snow-climate interactions and for providing improved future projections of snow characteristics under polar amplification.

Further development of this work will focus on the analysis of CMIP6 simulations, many of which provide a larger number of ensemble members and daily data on snow-associated parameters in a few models. This will allow for the comparative assessment of the progress achieved by CMIP6 compared to CMIP5 in representation of snow parameters in historical simulations and will form the foundation for the evaluation of climate projections in both CMIP5 and CMIP6. This assessment will concentrate on the analysis of the development of new snow parameterizations in many CMIP6 configurations (Eyring et al. 2016; Krinner et al. 2018). On the observational side, given new reference periods in historical simulations in CMIP6, new comparisons should include the extensive use of the Arctic System Reanalysis (ASRv2, Bromwich et al. 2018) potentially in conjunction with experiments with regional configurations developed under Arctic-CORDEX (http://climate-cryosphere.org/activities/targeted/polar-cordex/arctic, Koenigk et al. 2015) performed on rotated polar grids with up to 10 regional climate models.