1 Introduction

The general importance of ocean processes for seasonal forecasts has long been acknowledged, but slow processes with time scales of years or even decades are often assumed not to be of relevance. However, the slow variations of deep density-driven circulations such as the Atlantic Meridional Overturning Circulation (AMOC) modulate the atmospheric surface climate over large regions of the globe. Here, we show that the initialization of the AMOC can play a crucial role in providing skilful seasonal forecasts for the North Atlantic.

The AMOC has a major influence on European climate, because it transports vast amounts of heat from the tropics to high latitudes in the North Atlantic. The AMOC varies strongly on decadal or even longer time scales due to changes in water mass properties (Biastoch et al. 2008), with superimposed higher-frequency wind-driven variability (Balan et al. 2011; Zhao and Johns 2014; Roberts et al. 2013b). There is strong evidence that the AMOC is also a crucial driver for the Atlantic Multidecadal Oscillation (Zhang et al. 2019), a pattern of coherent variability of sea surface temperature in the North Atlantic that affects European climate (Sutton and Hodson 2005).

Continuous observations of the AMOC have only been available since 2004 (Hirschi et al. 2007; Cunningham et al. 2007), but indirect observational evidence and model simulations suggest that it undergoes strong decadal changes (Robson et al. 2012; Hodson et al. 2014), with important implications for North American and European climate.

Because of the long time scale of its thermohaline component, the mean AMOC in seasonal forecasts is extremely close to the AMOC in the initial conditions, whereas other aspects in the forecasts can change more quickly. This can lead to imbalances in the forecast. Most commonly, initial conditions for seasonal forecasts come from an ocean reanalysis, a continuous integration with an ocean model forced by the observed surface conditions that ingests available ocean observations by means of data assimilation. In the absence of a dense in-situ ocean observing system, it is difficult to initialize the AMOC correctly (Karspeck et al. 2017). Even if the ocean reanalysis seems to exhibit a realistic AMOC, it is difficult to prove or disprove whether it is sustained by the correct balance of terms, or whether different errors are compensating for each other.

For instance, the terms in the ocean reanalysis that are designed to bring the model state closer to observations (data assimilation increments as well as flux adjustments or bias corrections) can compensate for model errors but typically disappear in the forecasts (see Luo et al. (2011), Kharin and Scinocca (2012) or Mulholland et al. (2016) for further discussion). The question whether the AMOC in an ocean reanalysis is sustained by the correct balance of terms has rarely been addressed before, but it plays a pivotal role for the study presented here.

Balanced initialization of slow components of the Earth system is particularly important if a long history of reforecasts is to be used for calibrating a seasonal forecasting system and assessing its skill. As pointed out by Kumar et al. (2012), changing balances associated with changing forecast biases over the history of the reforecasts make a reliable calibration and fair skill assessment of real-time forecasts extremely challenging. Hence, the second main point of this study is to document and understand an important case of non-stationary bias in seasonal forecasts, and thus to help users to extract a maximum of information from the real-time seasonal forecasts.

Since the imbalances in the AMOC manifest in SST on seasonal time scales, there is the potential to affect seasonal forecasts in the North Atlantic and its enclosing continents. Numerous studies have documented a winter-time seasonal impact of extra-tropical North Atlantic SST on circulation and surface temperatures over Europe, e.g. Drévillon et al. (2001), Czaja and Frankignoul (2002), Junge and Stephenson (2003), Lin and Derome (2003), Ferreira and Frankignoul (2005). Therefore, an investigation of the atmospheric impact of the non-stationary SST bias is carried out here as well.

The remainder of this paper is structured as follows: Sect. 2 describes the seasonal forecasting system SEAS5 and how it is initialized. In Sect. 3 the non-stationary forecast bias of North Atlantic SST and its connection to the AMOC is characterized, and numerical experimentation is presented to demonstrate its causes and potential remedies. Finally, in Sect. 4 we discuss the impact of North Atlantic SST and AMOC signals on seasonal forecasts of the atmospheric circulation.

2 The seasonal forecasting system SEAS5

SEAS5 is the fifth operational version of the ECMWF seasonal forecasting system. It was implemented in November 2017, replacing the previous forecasting system S4 that had been running since 2011 (Molteni et al. 2011). It features substantial changes in atmospheric model physics, initialization techniques, and atmosphere and ocean resolution (Stockdale et al. 2018; Johnson et al. 2019). The atmospheric model of SEAS5 is ECMWF’s IFS at model cycle 43r1. The IFS is run on a cubic octahedral grid with a T319 spectral resolution, corresponding to approximately 36 km horizontal resolution. There are 91 vertical levels, with the top level at 0.01 hPa. For a comprehensive description of model dynamics, numerical procedures and physical processes see ECMWF (2016a, b).

The ocean model of SEAS5 is NEMO at version 3.4 Madec (2013), and the sea-ice model is LIM2 (Fichefet and Maqueda 1997). The ocean and sea-ice model are run on a tripolar grid with a horizontal resolution that is approximately 25 km in the tropics and increases to approximately 10 km at high latitudes in the Northern Hemisphere. There are 75 vertical levels. The effects of ocean surface waves are modelled explicitly by the ECMWF wave model ECWAM (Breivik et al. 2015). ECWAM provides Stokes drift velocity, turbulent kinetic energy and modified wind stress to NEMO and provides IFS with a sea-state-dependent surface roughness.

The ocean initial conditions for SEAS5 are provided by ORAS5, the fifth operational version of the ECMWF ocean reanalysis system (Zuo et al. 2019). It uses the exact same ocean-sea-ice model configuration as SEAS5. Surface forcing fields for ORAS5 are from ERA-Interim from January 1979 to December 2014, and from operational analyses and forecasts thereafter. ORAS5 assimilates ocean temperature and salinity profiles, sea-level anomalies and sea-ice concentration using a 3DVAR-FGAT scheme. Observational data sets used are as follows: SST is from HadISST version 2 until December 2007 and from the OSTIA analysis thereafter. Sea-ice concentration is from the Reynolds OIv2d SST data set (Reynolds et al. 2007) from September 1981 to December 1984, and the OSTIA product (Donlon et al. 2012) from January 1985. Full details on the ORAS5 ocean reanalysis system are given by Zuo et al. (2018, 2019).

In ORAS5, observations of SST are assimilated by imposing a damping heat flux of \(200\,\mathrm{W}\mathrm{m}^{-2}\mathrm{K}^{-1}\). This strong constraint to observed SST has been proven to work well in ORAS4, the previous version of the ECMWF ocean reanalysis system (Balmaseda et al. 2013). In ORAS5, the SST constraint in general makes a positive contribution to the quality of the reanalysis (as demonstrated by sensitivity experiments; not shown), however, it introduces a problematic feature in the north-west Atlantic, which we are discussing here.

The particular choice of assimilating SST in ORAS5 by a damping heat flux is mostly motivated by the simplicity of the scheme’s implementation and maintenance. For future versions of the ECMWF ocean reanalysis system, it is planned to include SST into the variational data assimilation. However, this still requires a substantial amount of work on data assimilation methodology, for instance on schemes to determine appropriate vertical correlation length-scales, and appropriate pre-processing of raw observations to treat observational biases and ensure a good balance with the much sparser in-situ ocean data.

When comparing SEAS5 reforecasts of near-surface temperatures over the oceans to its predecessor S4, SEAS5 shows improved prediction skill in the tropical Pacific and in the vicinity of the sea-ice edge (Johnson et al. 2019). The former is mainly owing to a reduction of systematic biases, while the latter comes from the introduction of a dynamical instead of a statistical sea-ice model. In the mid-latitudes, one of the most striking changes is a strong degradation of skill in a region of the North Atlantic, north-east of the Grand Banks of Newfoundland – the same region where the SST constraint has a detrimental effect on the reanalysis quality (not shown) in the early period (1981–1996) of the reforecasts.

3 Non-stationary biases in the North Atlantic SST

3.1 Characterization of the bias

In the western part of the North Atlantic Subpolar Gyre, the skill of winter (DJF) surface temperature reforecasts initialized on 1 November is substantially degraded in SEAS5 with respect to its predecessor S4. The affected region is north-east of the Grand Banks of Newfoundland, approximately bounded by the longitudes 50–30 °W and the latitudes 45–55 °N; we refer to it as the North-East-Grand-Banks (NEGB) region. The deterioration of skill in this region can potentially affect forecasts over Europe through advection by the prevailing westerly winds. It is therefore important to understand this problem, to establish its impact on atmospheric variables, and find a remedy when developing and implementing future seasonal forecasting systems.

The atmospheric surface layer in this region in winter is heated by the underlying relatively warm ocean via large turbulent heat fluxes. Ocean mixed layers are up to several hundred metres deep, so any problems with surface temperatures might cause – or be caused by – problems in the properties of the sub-surface ocean waters. The NEGB region contains several large-scale ocean currents that are key to the North Atlantic ocean circulation (e.g.Buckley and Marshall (2016)): The Gulf Stream, the North Atlantic Current, and the Labrador Current. The latter two currents form part of the North Atlantic Subpolar Gyre, whose dynamics drive seasonal to decadal variability in the North Atlantic. It is challenging to represent the complex bathymetry and oceanic fronts that are present in the region with ocean models of spatial resolution of 1/4 degree or less, so most ocean models exhibit large biases in this region. Hence, any skill degradation in surface temperature forecast skill in this region is likely to be rooted in problems with the ocean model.

To illustrate the skill degradation in SEAS5, Fig. 1 shows a map of continuous ranked probability skill score (CRPSS) for 2 m temperature for SEAS5 reforecasts 1981–2016. The expected pattern of relatively high skill in the tropics, and relatively low skill in the mid-latitudes is evident. In general, over the mid-latitudes ocean there is more skill than over the mid-latitudes land areas. A notable exception is the North Atlantic, where SEAS5 exhibits a sizeable area of strongly negative skill, which means that SEAS5 forecasts in this area have larger errors than a climatological forecast. This regional skill minimum was not present in the previous seasonal forecasting system, and it is this skill deterioration that we are discussing here.

Fig. 1
figure 1

Skill of SEAS5 in forecasting winter (DJF) 2 m temperature from November initial conditions as measured by the CRPSS w.r.t. climatology verified against ERA5. The region north-east of the Grand Banks (NEGB region) is marked by a grey rectangle

The principal contribution to this skill degradation comes from a non-stationary bias of SST in SEAS5, rather than an inability to forecast interannual variability. Figure 2 shows time series of SST in the NEGB region in SEAS5 and its predecessor S4. SEAS5 has a strong warm SST bias in the NEGB region of about 2 K from 1981 to 1996, after which the bias transitions to a very slight negative bias. The skill of SEAS5 as shown in Fig. 1 is assessed after an a-posteriori calibration that removes the average bias of reforecasts from 1981 to 2016. Therefore, after calibration, the forecasts in the early period 1981–1996 will be about 1 K too warm, and the later forecasts 1K too cold, which leads to poor skill regardless of the verification metric used. The anomaly correlation for winter SST in the NEGB region is negative for SEAS5 (\(-0.31\)). This contrasts with the behaviour of S4, which had a constant cold bias of about 1 K for SST in the NEGB region. After removing this bias, S4 captures both decadal and interannual variations of SST well, resulting in a positive anomaly correlation of 0.53. It is worth noting though that SEAS5 has a smaller bias than S4 in the late period 2001 to 2016.

Fig. 2
figure 2

Time series of winter (DJF) sea-surface temperature over the NEGB region (50–30 °W, 45–55 °N) from observations (ERA5), and from S4 and SEAS5 reforecasts started on 1 November. The solid lines connect ensemble means, and the error bars denote the ensemble standard deviation. Panel (a) shows the time series before bias correction, and panel (b) afterwards

Given the apparent regime shift in the mid-1990s in SEAS5 errors, we consider the SST biases for the early period 1981–1996 and the late period 2001–2016 separately. Figure 3 shows the spatial pattern of SST bias in the North Atlantic. As already seen in Fig. 2, in the NEGB region S4 has a cold bias in the early period, whereas SEAS5 has a strong warm bias. In the late period, S4 has a cold bias that is similar to but slightly stronger than that in the early period, whereas SEAS5 has a very small negative bias. It is the non-stationary nature of the SEAS5 bias that prevents the computation of meaningful anomalies by removing a constant bias term and leads to the skill degradation discussed earlier. From Fig. 3 it is also evident that there is a positive SST bias in the Gulf Stream region in both the early and the late period in both S4 and SEAS5. This is connected to the long-standing and well-known failure of low-resolution ocean models to simulate the separation of the Gulf Stream from the North American coast correctly (Chassignet and Marshall 2008). Note that this bias has improved throughout in SEAS5, an improvement attributed to higher ocean resolution.

Fig. 3
figure 3

Winter (DJF) SST bias for November reforecasts w.r.t. ERA5 for (a, b) SEAS5 and (c, d) S4. The bias during the early period 1981–1996 is shown in (a, c), the bias during the late period 2001–2016 is shown in (b, d)

3.2 Numerical experimentation and physical processes

The previous version of the ECMWF operational reanalysis and seasonal forecasting system (ORAS4/S4) did not exhibit the non-stationary warm bias in the early period. Here, we perform a series of numerical experiments to investigate the most likely causes for this change in behaviour (details on the operational forecasting systems and experimental reforecasts are given in Table 1).

We will focus on two aspects: first, the ocean resolution of the reanalysis and forecast model has been increased from 1 degree to 1/4 degree between ORAS4/S4 and ORAS5/SEAS5. This has far-reaching implications both for the physics of the forecast model and the data assimilation techniques, so we will test the resolution sensitivity by running two low-resolution reanalysis and reforecast experiments. Second, changes to the data assimilation methods, possibly in conjunction with resolution changes, might have introduced the non-stationary warm bias, so we will look at reference ocean simulations where all data assimilation has been deactivated and the ocean state is influenced by the surface forcing only. Finally, we will demonstrate that small modifications to ORAS5 successfully prevent the non-stationary warm bias in SEAS5.

Table 1 Overview of operational and experimental reforecasts discussed in the text. ACC is the anomaly correlation coefficient of reforecasts for DJF SST in the NEGB region with respect to ERA5 for 1981–2014

3.2.1 Ocean model resolution

During the development of ORAS5, which is an ocean analysis system with a 1/4 degree spatial resolution, a low-resolution analogue at 1 degree spatial resolution was used, with observation and data assimilation settings as similar as possible to the high-resolution analysis. When seasonal forecasts are started from these initial conditions (experiment LR-DA), the North Atlantic warm bias in the early period is not present. In fact, as Fig. 4a shows, seasonal forecasts from the low-resolution ORAS5 analogue outperform S4 forecasts of DJF SST in the NEGB region (anomaly correlation 0.86 versus 0.53 over the period 1981–2014).

Fig. 4
figure 4

Time series of DJF SST in the NEGB region 50–30 °W, 45–55 °N. The black line represents observations (ERA5), and the coloured lines represent reforecast ensemble means. Panel (a) compares SEAS5 to low-resolution reforecasts S4, LR-DA and LR-INT, and panel (b) compares SEAS5 to control reforecasts Ctrl-SST and Ctrl-noSST. The reforecast time series are not bias-corrected

This finding raises the question of whether the warm bias is related to the high-resolution ocean in the forecast model, or whether the problem lies with the ORAS5 initial conditions. To answer this question, we perform another reforecast experiment (LR-INT) with the same low-resolution ocean in the forecast model as in LR-DA, but with initial conditions from ORAS5 interpolated to the lower resolution (see also Roberts et al. (2020)). As evident from Fig. 4a, experiment LR-INT exhibits the same kind of non-stationary behaviour as SEAS5: a warm bias in the early period is followed by a rapid cooling in the 1990s. This means the non-stationary bias can be reproduced with a low-resolution-ocean forecast model that uses ORAS5 initial conditions, which is a clear indication that the problem is with ORAS5 initial conditions rather than the forecast model.

3.2.2 Control simulations without data assimilation

Given that the problem lies with the initial conditions from the high-resolution ORAS5 reanalysis, we carry out two reanalysis control experiments to identify which aspect of ORAS5 gives rise to the problematic behaviour: one without any data assimilation, and a second one which has no data assimilation except the SST relaxation to observations. These experiments allow an assessment of the behaviour of the forced ocean model in the absence of assimilation increments.

As Fig. 4b shows, seasonal reforecasts from these control initial conditions are very revealing. For the reforecasts initialized from the control experiment without any data assimilation (Ctrl-noSST), the early-period warm bias of NEGB SST has disappeared, so that the reforecasts are virtually unbiased. In the following years 1996–2001, there is a shift towards colder SST that resembles the shift in SEAS5 (although it is weaker). This suggests that at least some of the shift towards colder NEGB SST after the early period is inherent to the dynamics of the high resolution ocean forced by Era-Interim, rather than caused by data assimilation or SST relaxation in ORAS5. In the late period 2001–2016, Ctrl-noSST develops a cold SST bias of more than 1 K.

In contrast, reforecasts initialized from the control experiment with SST relaxation (Ctrl-SST) in the early period show an even stronger warm SST bias than SEAS5. This bias gradually decreases after the early period, and SST converges to observations in the late 2000s. The cooling after the early period exhibited by both control experiments Ctrl-noSST and Ctrl-SST is not present in observations. Thus, we conclude that the SST relaxation in the ocean reanalysis is primarily responsible for the strong warm bias of SST in the early period. The rather gradual cooling in Ctrl-SST and Ctrl-noSST in comparison with the rapid cooling in SEAS5 also points towards the interplay between SST relaxation and the assimilation of ocean observations in ORAS5.

3.2.3 The interplay between AMOC and SST relaxation

In the previous section, we determined that the SST relaxation plays an important role for the warm bias in the early period. We now demonstrate that the SST relaxation and warm bias are tightly coupled to the strength of the AMOC in the reanalyses that provide the forecast initial conditions. As Fig. 5 shows, there is a very high (r = 0.88) correlation between the DJF forecast of SST in the NEGB region and the annual-mean simulated AMOC strength in the forecast initial conditions. Since the time scale of the density-driven part of the AMOC is much longer than the lead time of the forecasts, the mean AMOC in the forecasts is practically the same as in the initial conditions (not shown). This suggests that the warm SST bias in the NEGB could be directly caused by excessive heat transport.

Fig. 5
figure 5

Scatter plot of DJF forecast SST in the NEGB region and the annual-mean AMOC in the reanalysis that each reforecast is started from. The Pearson product correlation across all forecasts is 0.88

The strength of the AMOC differs substantially between ORAS5 and the two control simulations with and without SST relaxation, particularly in the early period 1981–1996 (Fig. 6). The AMOC is strong in ORAS5, extremely strong in the initial conditions of Ctrl-SST, and weaker in the initial conditions of Ctrl-noSST.

For the early period, the reconstruction of the observed AMOC is very uncertain, so it is difficult to ascertain which of the simulations is most realistic. However, the Florida Strait transports (FST) have been relatively well observed in the early period by means of voltage in a submarine telecommunications cable (Baringer and Larsen 2001). There is some evidence that the interannual variations of the FST and AMOC are quite coherent (Balan Sarojini et al. 2011), so we choose to use the former as a proxy of the latter. We compute the FST as the meridional transport through a section between Florida and Grand Bahama Island at approximately 27N. Based on this, Fig. 6b suggests that the AMOC in the early period is too strong in ORAS5 (around 24 Sv), much too strong in the initial conditions of Ctrl-SST (around 34 Sv) and about right in the initial conditions of Ctrl-noSST (around 17 Sv).

For the late period, the observed values of AMOC and FST are well reproduced by ORAS5 and the Ctrl-SST reanalysis, but are seriously underestimated by the Ctrl-noSST reanalysis. As with SST, all the model estimates of FST exhibit a clear reduction between the early and the late period which is not present in the observational estimate.

Fig. 6
figure 6

Time series of a AMOC strength and b Florida Strait transport in the initial conditions for Ctrl-SST, Ctrl-noSST (coloured lines) and SEAS5 (red shading indicating ensemble minimum and maximum of ORAS5) together with observational estimates (black triangles). Shown is a 12-month running mean. In the model, AMOC is diagnosed as the maximum of the meridional stream function in the North Atlantic, and Florida Strait transport is diagnosed as the northward transport through a zonal section at \(26.7\hbox {N}\) between Florida and Grand Bahama Island

The excessive AMOC in the initial conditions appears in combination with extremely strong cooling from the relaxation to observed SST in ORAS5. In the core region of the warm bias, the SST relaxation provides an additional cooling of more than of \(150 \mathrm{Wm}^{-2}\) (Fig. 7). The additional heat flux is required to keep ORAS5 close to observed SST. There is a large body of literature linking enhanced buoyancy loss at high latitudes in the North Atlantic to enhanced deep-water formation and increased AMOC (e.g. see Roberts et al. (2013a); Buckley and Marshall (2016); Ortega et al. (2017)). For the case presented here, the obvious hypothesis is an established positive feedback loop of compensating errors in the early period of ORAS5: the NEGB region receives a surplus of heat and salt from an unrealistically strong Gulf Stream, which triggers strong additional buoyancy loss from the relaxation to observed SST. The excessive transport of warm and saline surface waters by the Gulf Stream into the North Atlantic in combination with the surface cooling from the SST relaxation favours dense surface waters in the North Atlantic, which have the potential to enhance deep-water formation, which in turn would invigorate the Gulf Stream. Indeed, the winter mixed-layer depth in the Labrador Sea is deeper in Ctrl-SST than in Ctrl-noSST (not shown), which lends credence to the hypothesis. However, the interaction between North Atlantic transports and mixed-layer depth is complex, and a full investigation into this hypothesis is beyond the scope of this paper.

It is difficult to infer the root cause of this pair of compensating errors once the feedback loop is already established. Nevertheless, it readily explains the forecast bias in SEAS5, since the time scales of the two processes involved are very different: the SST relaxation is switched off immediately when the SEAS5 forecasts start, while the time scale of mean AMOC changes is much longer than the forecast lead time. Therefore, in the SEAS5 and Ctrl-SST reforecasts, the excessive ocean transports still provide excess warming to the NEGB region, while the cooling from the SST relaxation is absent, leading to the warm SST bias. This line of reasoning is well supported by the spatial pattern of the SST relaxation in ORAS5 shown in Fig. 7, which correlates well with the pattern of DJF SST bias in SEAS5 reforecasts in the early period (Fig. 3a). Figure 7 also shows very strong cooling from the SST relaxation close to the ice edge in the Labrador Sea, where the main deep-water formation occurs in the model. This indicates that the SST relaxation in the early period of ORAS5 directly enhances deep-water formation by additional cooling of surface waters.

Fig. 7
figure 7

Additional surface heat flux (Wm−2) imposed in ORAS5 by the relaxation to observed SST during winter (DJF) in the early period 1981–1996. In ocean areas shown as white, fluxes exceed the values of the chosen colour scale

3.2.4 The dominant role of advection

The forecast skill deterioration in the NEGB region of SEAS5 occurs primarily during the winter season, when surface cooling leads to a deepening of the ocean mixed layer to a few hundred meters. It is not a priori clear whether temperature biases at the surface are dominated by erroneous representation of vertical heat exchanges during the seasonal mixed-layer deepening, or whether they are dominated by advection from outside the region. Here, we use vertical profiles of the mean temperature in the NEGB region to answer this question.

From Fig. 8(a,b) it can be seen that SEAS5 is initially close to ORAS5 in the first month after initialization (November). Four months into the forecast (February), the SST bias is fully developed, and temperatures in the upper 200 m are on average about 2 K warmer in the SEAS5 forecasts than they are in ORAS5 at the same time.

Fig. 8
figure 8

Vertical profile of mean ocean temperature in the NEGB region during the early period in (a, b) SEAS5 and ORAS5 and (c, d) SEAS5m and ORAS5m. Shown are (a, c) November and (b, d) February. The horizontal lines denote the standard deviation of the forecast mean when calculated for different ensemble members. The black semi-circle denotes SST from ERA5

It can also be seen from Fig. 8(a,b) that most of the warm bias is not directly from the initial temperatures, but is produced by the forecast model due to insufficient seasonal cooling: whereas surface temperature in ORAS5 decreases by almost 3 K from November to February, it decreases by only 1.5 K in SEAS5. There are two possible reasons for the insufficient cooling: either insufficient heat loss through the surface, or excessive heat transport into the region by horizontal advection. The upward surface heat flux in SEAS5 from November through February for the early period is about \(245\, \mathrm{Wm}^{-2}\), which is considerably larger than the \(180\,\mathrm{Wm}^{-2}\) in S4, the predecessor of SEAS5 (not shown). This is not surprising, given that higher surface temperatures in winter will give rise to larger turbulent and longwave heat loss in winter. However, this means that the excessive surface heat loss must be more than compensated by excessive ocean heat transport into the NEGB region.

Owing to high advection speeds in the Gulf Stream and NEGB region (mean currents of up to 0.5 m/s in the upper 300 m; not shown), temperature anomalies within the region are the result of competing effects of strong surface cooling and fast advection of warm water into the region. Therefore, a small change in the advection speed has the potential to greatly affect the resulting upper-ocean temperature. Figure 9 compares the spatial patterns of barotropic transport in ORAS5 between the early and the late period. Note that the mean transport changes very little over the lead time of the seasonal forecast, therefore the mean transport in the ORAS5 initial conditions is a very good approximation for the mean transport in the SEAS5 reforecasts.

The differences in the barotropic stream function (BSF) shown in Fig. 9 confirm that the Gulf Stream is much stronger in the early than in the late period (compare Fig. 6). Moreover, the overall structure of the Subpolar Gyre is different: in the late period, the eastern part of the Subpolar Gyre is strengthened, while cyclonic circulation in the western part of the Gyre is weaker. This is associated with an anomalous diffuse northward transport across the centre of the Subpolar Gyre (50–35 °W) in the early period, which disappears in the late period. This anomalous northward transport increases inflow into the Labrador Sea at the expense of a weakened North Atlantic Current, and might therefore be responsible for enhanced deep-water formation in the Labrador Sea. The BSF difference between the early and the late period corresponds suspiciously to the pattern of the relaxation surface heat flux in the early period which is needed to keep the SST in ORAS5 close to observations (Fig. 7). Hence, it corroborates the hypothesis that the main reason for the warm SST bias is a Gulf Stream that is both too strong and feeds too much into the western part of the Subpolar Gyre.

Fig. 9
figure 9

Mean barotropic stream function (BSF) in Sverdrup (Sv) of ORAS5 a in the early period 1981–1996, b the late period 2001–2016, and (c) the difference (ba). Note that barotropic flow is along contours of the BSF, positive values of the BSF indicate clockwise rotation, and the magnitude of the barotropic volume transport between any two BSF contours is given by their difference

3.2.5 Towards an improved system

Having established that the strong SST relaxation is responsible for the warm-SST bias in the early period, we re-run the ORAS5 reanalysis with two small modifications: (1) reducing the SST relaxation strength from 200 to \(80\,\mathrm{Wm}^{-2}\mathrm{K}^{-1}\) and (2) increasing the use of observations close to coasts. We call this reanalysis ORAS5m and perform reforecast experiments with initial conditions from ORAS5m. These reforecasts are identical to SEAS5 reforecasts except for the stated changes in their initial conditions (see Table 1); we refer to them as the SEAS5m reforecasts in the following.

The additional change to increase the use of coastal observations is necessary to completely remove the early-period warm-SST bias: without it, there is still a hint of the bias remaining. The change is implemented by (1) reducing the inflation factor for observational uncertainty within 600 km of a coast from 6 to 2 and (2) allowing observations in water less than 500 m deep.

Figure 8c and d show the temperature profiles for SEAS5m reforecasts and their corresponding ocean reanalysis ORAS5m. In November, ORAS5m temperatures in the upper 150 m in the NEGB region are notably more stratified than those in ORAS5. Importantly, the mean discrepancy between reforecast and reanalysis is substantially smaller between SEAS5m and ORAS5m than between SEAS5 and ORAS5 at all lead times. This indicates a more consistent and balanced reanalysis and reforecast system. It is remarkable that, in spite of the weaker SST relaxation, SST in the NEGB region are closer to observations in ORAS5m than in ORAS5.

In the SEAS5m reforecasts, the NEGB SST biases are substantially reduced in the early period, while they are as small as in SEAS5 for the late period (Fig. 10a, b). The relatively high correlation of forecast DJF SST with observations in the NEGB region that was present in S4 is restored and even improved upon (from 0.53 for S4 to 0.62 for SEAS5m, see Fig. 10c and Table 1).

Fig. 10
figure 10

Spatial pattern of DJF SST bias of SEAS5m in the NEGB region in (a) the early period and (b) the late period. Panel (c) compares the time series of DJF SST in the NEGB region for SEAS5m, SEAS5 and ERA5

A global assessment of surface temperature skill of SEAS5m reveals that reforecast skill in the NEGB region is dramatically improved, but is very similar to SEAS5 elsewhere. Figure 11 shows the difference in continuous ranked probability skill score (CRPSS) for DJF 2m temperature forecasts between SEAS5m and SEAS5 for the reforecast period 1981–2016. The change in the NEGB stands out, whereas elsewhere there are mixed changes of slight improvements and degradations. A statistical significance test after DelSole and Tippett (2016) shows that the entire skill improvement pattern in the North Atlantic is significant at the 95% level, whereas most of the other weaker changes are not statistically significant.

Fig. 11
figure 11

Difference in CRPSS for forecasting DJF 2m temperature from November between SEAS5m and SEAS5. Verification is with respect to ERA5

The experimental reanalysis ORAS5m and reforecasts SEAS5m demonstrate that two simple changes to the ORAS5 ocean reanalysis would have sufficed to avoid the loss of skill in the North Atlantic for SEAS5. Thus, they provide an important milestone in developing the operational ocean analysis system that succeeds ORAS5. However, preliminary experimentation with extended-range (sub-seasonal) reforecasts using initial conditions from ORAS5m shows that the reduced observational constraint on SST in ORAS5m leads to some deterioration in extended-range reforecasts in other regions. Therefore, work is currently ongoing to further refine the changes to the SST relaxation. An eventual successor of ORAS5 will contain an improved treatment of SST alongside many other changes and improvements, in order to deliver improved initial conditions not only for seasonal but also for medium- and extended-range forecasts.

4 Atmospheric forecast impact

While the previous sections were addressing the cause for the skill deterioration in parts of the North Atlantic, we now discuss the potential for wider impact on the atmospheric forecasts. Observation-based and modelling studies suggest that both North Atlantic absolute SST as well as their gradients play a role in forcing the atmosphere (e.g. Booth et al. 2012; Parfitt et al. 2016). However, the mechanisms are complex, and long time series are required to obtain statistically robust results (e.g.Czaja and Frankignoul 2002).

To assess the atmospheric impact of the warm SST bias in the NEGB region, we compare the change in mean state between SEAS5 and SEAS5m in the early (1981–1995) period. Figure 12 presents the mean-state changes for various fields in the DJF reforecasts initialized on 1 November of SEAS5 and SEAS5m during the early period. The SST bias of SEAS5 has a clear imprint on the mean change of 2m temperature, which is shown in Fig. 12a. There are widespread positive differences in the same regions where SST are different (compare Figs. 3a and  10a), with magnitudes of around 1.5 K. This is consistent with enhanced air-sea fluxes into the atmosphere in the early period of SEAS5 in the NEGB region (not shown).

Figure 12b shows the mean change for sea ice concentration. Comparatively lower sea ice concentration in SEAS5 is consistent with higher SST. We suspect that sea ice changes react to and reinforce the changes 2m temperature, the latter showing positive differences extending far north into Baffin Bay. Mean-state differences for mean sea-level pressure (MSLP, see Fig. 12c) exhibit significantly lower values in the NEGB region. This indicates that SEAS5 produces lower MSLP than SEAS5m over the North Atlantic during the early period, consistent with Booth et al. (2012), who report more intense cyclogenesis in the North Atlantic associated with warmer SST.

The mean changes in MSLP and sea-ice concentration found here are also consistent with earlier studies that focused on the impact of sea-ice concentration anomalies on atmospheric circulation. For example, Alexander et al. (2004) report reduced MSLP in the Labrador Sea as a response to reduced sea ice in this region. Thus, it is likely that the sea ice and SST biases in SEAS5 are associated, and act together to reinforce the atmospheric response.

Differences of 500hPa geopotential height (Z500, Fig. 12d) between SEAS5 and SEAS5m are positive in the NEGB region and north-west of it, although for this variable the differences are not significant at the 95% level. Comparison to MSLP differences indicates that the atmospheric response in the North Atlantic is baroclinic. The mean change in Z500 exhibits some similarity with the results by Alexander et al. (2004) when they prescribed a low Arctic sea-ice extent anomaly to an atmospheric model. The positive Z500 differences over Canada indicate a tendency towards stronger mid-tropospheric ridges in SEAS5 in the early period and associated warm air advection on their rear side, which could explain the weakly positive 2m temperature difference over the Canadian Archipelago.

Although, strictly speaking, the results described in this subsection cannot solely be attributed to the changing SST bias in the NEGB region, most features seem dynamically consistent and in agreement with the literature. The difference patterns are deemed robust, as results are very similar when comparing mean differences of SEAS5 and LR-DA in the same manner. However, a 15-year period (as 1981–1995) is relatively short to obtain robust results, especially for highly variable fields such as Z500. We also note that the differences between SEAS5 and SEAS5m do not entirely vanish when considering the late period after 2000.

Another measure for the impact of the non-stationary SST bias in SEAS5 on precipitation reforecast errors (w.r.t. ERA5) can be derived from Canonical Correlation Analysis (CCA; Wilks 2006), which represents a more objective alternative to the simple mean differences shown in Fig.  12. Before performing the CCA, we first apply a three-point running average to SEAS5 DJF forecast errors (three points mean three DJF seasons) to emphasize the low-frequency variations in precipitation forecast error and then filter the fields by performing an Empirical Orthogonal Function (EOF; Wilks 2006) analysis, retaining the first five EOFs and associated Principal Components (PC). The CCA is then applied to the five respective PCs of SST and precipitation forecast errors (the predictor and predictand fields), yielding two sets of five canonical weights, which are used to compute linear combinations of the EOFs of SST and precipitation with maximally correlated forecast errors, resulting in five canonical pairs of predictor and predictand patterns, respectively.

Figure 13a and b show the first canonical pair, which exhibits the highest temporal correlation (r = 0.97, significant at the 95% level; see Fig. 13c). The predictor pattern (SEAS5 DJF SST forecast errors) clearly resembles the pattern of the non-stationary SST bias in SEAS5 and explains 47% of total SST forecast error variance. The predictand pattern shows the strongest positive precipitation forecast errors in regions of positive SST forecast errors and vice versa. This explains 16% of total precipitation forecast error variance. Note that, like with EOF, there is no straightforward way of formally testing the point-wise statistical significance of a CCA pattern.

Temporal evolution of the canonical patterns shows the expected change in sign in the mid-1990s. Figure 13c additionally shows the time evolution of SEAS5 DJF precipitation forecast error in the NEGB region, which exhibits an exceedingly large drop by 30 mm/month in the mid-1990s, co-varying with changes in SST errors.

The CCA predictand pattern in Fig. 13b is consistent with the mean difference in precipitation (not shown), which is statistically significant over the North Atlantic. The pattern also suggests some remote impacts in the Mediterranean and along the Norwegian coast. However, these might be unrelated to the non-stationary SST bias in the NEGB region: in the Mediterranean and along the Norwegian coast, the loading in the CCA pattern is not associated with a statistically significant mean precipitation difference between the early and the late period, and the experiment SEAS5m exhibits a similar pattern (not shown).

Thus, the linear impact of the non-stationary NEGB SST bias in SEAS5 on precipitation errors over Europe is probably small, at least for seasonal means. Assessing impact on other moments of the distribution would require longer records. Furthermore, the present analysis does not preclude that there is impact on specific events. Earlier studies also suggest that the atmosphere is more sensitive to SST perturbations in other regions than the NEGB region. For example, Ferreira and Frankignoul (2005) investigated the atmospheric response to SST anomaly patterns associated with the NAO and thereby found that the SST anomalies south of ca. 40N (i.e. south of the NEGB region) play a more important role in the forcing compared to the anomalies north of 40N. Finally, it has been suggested that, for winter Western Europe surface temperatures at least, the absence of significant impact from North Atlantic SST changes can be explained by associated circulation changes that mask the expected response (Yamamoto et al. 2015; Yamamoto and Palter 2016).

Fig. 12
figure 12

Difference between 1981–1995 DJF reforecast averages \(\mathrm{SEAS5} - \mathrm{SEAS5m}\) in (a) 2 m temperature, (b) sea-ice concentration, (c) mean sea level pressure (contours) and 10m winds (vectors), and (d) 500 hPa geopotential height. White stippling denotes regions where differences are significant on the 95% level based on a two-tailed t-test

Fig. 13
figure 13

Canonical Correlation Analysis using SEAS5 SST forecast error in DJF as predictor and SEAS5 precipitation error in DJF as predictand. Verification is done against ERA5. The resulting predictor pattern is shown in (a), the predictand pattern is shown in (b), and associated time series are shown in (c). The time series of SEAS5 precipitation forecast error in the NEGB region in DJF is shown in (c) as well

5 Conclusions

We have discussed low frequency variations of winter SST errors over a region north-east of the Grand Banks of Newfoundland (the NEGB region) that occur in reforecasts with SEAS5. SEAS5 is the currently operational seasonal forecasting system of the European Weather Centre for Medium-Range Weather Forecasts. During the 1980s and early 1990s, SEAS5 SST reforecasts show a strong warm bias in a region approximately bounded by 50–30 °W, 45–55 °N, where seasonal-mean forecast SST are up to 3 K higher than observed values.

The bias is non-stationary: it disappears from the 2000s onwards. As a result, the observed decadal variability in the region, a slight gradual warming from the early 1980s until about 2010, is not reproduced in the SEAS5 reforecasts. Although there is skill in forecasting the year-to-year variability of SST, skill assessments of SEAS5 that depend on successful calibration against stationary forecast biases show a total loss of skill in surface fields for the affected region.

A range of sensitivity experiments show that the problem lies with the ocean initial conditions provided by ORAS5, the currently operational ECMWF ocean reanalysis. ORAS5 is constrained to observed SST by imposing an additional heat flux of \(-200\,\mathrm{Wm}^{-2}\) per Kelvin of SST deviation from observations. In the early period 1981–1996, in the absence of sub-surface ocean observations, this SST relaxation term imposes an additional cooling of more than \(200\,\mathrm{Wm}^{-2}\) over some areas in the western North Atlantic, including the sites of deep-water formation.

The warm bias coincides with an overly strong Gulf Stream and Atlantic Meridional Overturning Circulation (AMOC), which strongly suggests that in ORAS5 the ocean state in the reanalysis is kept reasonably close to observations only because the exaggerated heat transport into the NEGB region is compensated by strong surface cooling provided by the relaxation to observed SST. In fact, the two errors should be expected to reinforce each other: on the one hand, many studies have shown the effectiveness of anomalous surface cooling at high latitudes to invigorate the Atlantic overturning circulation, while on the other hand, the increased circulation brings excessive amounts of heat and salt into the North Atlantic.

For the seasonal reforecasts initialized from the reanalysis, the additional cooling is suddenly not present any more, thus creating an imbalance. The established thermohaline ocean transports have a response time scale of years or decades, so a seasonal forecast of DJF started from November essentially has the same thermohaline ocean transports as the reanalysis. This is the root cause for the too high SST in the NEGB region. Corroborating evidence can be derived from the very high correlation (r = 0.88) between NEGB SST and AMOC across all reforecasts, and high anti-correlation between the patterns of imposed heat flux from the SST relaxation on the one hand, and the SST bias on the other hand.

We present evidence that reducing the strength of the SST relaxation combined with using more coastal observations in the reanalysis avoids the non-stationary SST bias in the NEGB region. This suggests that future reanalysis products need better ways to ingest SST observations. In the short term, small modifications to the simple SST relaxation scheme can be applied to avoid the degradation of the ocean reanalysis discussed here. In the longer term, a more refined treatment of SST observations is desirable by including them into the variational data assimilation scheme employed for all other observations in the ocean reanalysis.

The SST biases in the NEGB region have a detectable impact on the mean Northern Hemisphere atmospheric circulation and surface temperatures. The largest signals are over the North Atlantic itself. The impact over Europe is not as large, confirming earlier studies on the impact of North Atlantic SST alone on atmospheric circulation over Europe. Seasonal forecast skill is obviously degraded in the NEGB region, but the remote impact in the northern hemisphere is difficult to quantify, mainly because overall skill levels are low and often not statistically significant.

The current analysis has unveiled additional questions which need to be explored further. The reasons for the non-stationary bias are not well understood, although there are some plausible (yet competing) hypotheses: the reduction of bias after 2000 could be related to increased assimilation of in-situ observations for the recent period. However, a non-stationary bias is also present when the reforecasts are initialized from control simulations without ocean observations. Furthermore, it is intriguing that the non-stationary bias is not present when the ocean initial conditions are produced with a lower resolution ocean model. This may be due to the increased ability of the 1/4 degree ocean model to sustain high values of Gulf Stream transports, and therefore being more sensitive to its variations. It might also be due to the fact that the existing ocean observations are more effective in constraining the ocean state in the low resolution ocean (less degrees of freedom). Lastly, due to the lack of AMOC observations in the early period, we cannot exclude the possibility that at least some aspects of the regime shift simulated by ORAS5 represent real changes in the climate system.

This study highlights the importance of a balanced ocean initialization. A balanced initialization implies that the observational information is retained during the forecast, which is only possible when assimilation increments are consistent with the model dynamics. In the presence of large model errors, a balanced initialization is difficult. Still, one can aim to produce a more balanced state by including constraints that limit the growth of fast errors. These constraints may consist of better formulation of background error covariances, or more sophisticated techniques such as 4D-VAR that take into account the temporal as well as the spatial dimension. Most of the operational oceanographic applications today assume that error growth is slow, and therefore prioritize minimizing the magnitude of errors over minimizing the growth rate of errors. The results presented here demonstrate that the error growth in sea surface temperature resulting from an unbalanced AMOC initialization can – somewhat surprisingly – be fast enough to have substantial impact on seasonal forecasts, affecting not only the forecast performance but also the state estimation because the estimated AMOC depends very much on the weight given to the SST observations. The balancing of processes that act on different time scales presents a significant challenge for ocean data assimilation methodology. This challenge needs to be tackled in order to improve ocean reanalyses and the performance of coupled forecasts initialized from them.