1 Introduction

Extreme heat will become more likely and more extreme under global warming (Meehl and Tebaldi 2004; IPCC 2013; Mora et al. 2017; Lewis et al. 2019). Extreme heat leads to increased heat-related mortality and illness, worsening the risk of heat exhaustion, dehydration, and cardio-vascular and kidney diseases (Kjellstrom et al. 2010; De Blois et al. 2015; Guo et al. 2017; Lee et al. 2019). Furthermore, it can also lead to ecological and socio-economical impacts, such as decreased labour productivity, increased risk of wildfires, decreased agricultural efficiency, habitat loss, ecosystem and crop failure, as well as render some regions partially uninhabitable (Kjellstrom et al. 2010; Sherwood and Huber 2010; Dunne et al. 2013; Gourdji et al. 2013; IPCC 2014; Bowman et al. 2017). Already under current global warming, extreme temperatures and high humidity together with insufficient infrastructure caused the death of thousands in the 2015 heatwaves in India and Pakistan (Wehner et al. 2016). Similarly, the combination of extreme daytime temperatures and lack of nighttime cooling caused more than 70.000 additional deaths over 16 European countries during the 2003 summer (Robine et al. 2008; Laaidi et al. 2012; Mitchell et al. 2016). By 2 °C of global warming above pre-industrial levels, one out of every two summer months are projected to be on average warmer than the extreme 2010 summer over Europe (Suarez-Gutierrez et al. 2018); and over parts of India and Pakistan, conditions equivalent to the 2015 heatwave could occur every year (Matthews et al. 2017). Here we investigate how global warming aggravates extreme summertime heat based on a robust sampling of high-impact, low-probability extremes for five different heat metrics, simulated by the largest ensemble of a comprehensive coupled-model. Our ultimate goal is twofold: to identify which regions become major risk hotspots for different forms of extreme heat, and to determine which maximum global warming level allows us to avert these risks, once the irreducible uncertainty that arises from internal variability is taken into account.

Extreme events are by definition rare, and occur intrinsically by chance due to chaotic internal variability. Therefore, to evaluate how the strength and frequency of such extremes change under warming, it is crucial to sufficiently sample internal variability. This sufficient sampling is necessary to capture the low-probability extremes at the tails of the distribution robustly, as the distribution itself changes under warming, and eliminates the need for the parametrizations of extreme value statistics. Furthermore, probability distributions that are well defined up to their tails for different levels of global warming are crucial to distinguish which events could be avoided by limiting global warming to fixed target temperature levels, versus which events are within the irreducible range of possibilities for such warming levels (Suarez-Gutierrez et al. 2018). Single-model large ensembles of simulations from comprehensive, fully-coupled climate models are the best available tools to sample internal variability and thus to evaluate the changing characteristics of low-probability extreme events in a warming world. Here, we use the currently largest of such ensembles: the 100-member Max Planck Institute Grand Ensemble (MPI-GE; Maher et al. 2019).

To evaluate how global warming aggravates heat extremes it is also crucial to consider the most relevant elements that define our vulnerability to extreme heat. The foremost element of extreme heat events are maximum temperatures. Yet some of the events with the largest impacts to date were events that combined the effect of extreme temperatures with other conditions that exacerbate heat stress, such as high humidity or nighttime temperatures (Laaidi et al. 2012; Wehner et al. 2016). To combine all these aspects, we evaluate the risk of extreme summertime heat with five different metrics: (i) maximum absolute temperatures, (ii) return periods of very extreme temperatures, (iii) maximum temperature variability, (iv) sustained tropical night temperatures, and (v) wet bulb temperatures.

To identify where summertime temperatures become most extreme under warming we investigate (i) maximum absolute temperatures. In contrast, we assess how frequent fixed extreme temperature levels that were very extreme under pre-industrial conditions become in terms of (ii) return periods of maximum temperatures. Another key aspect of our vulnerability to extreme heat that we investigate is (iii) absolute maximum temperature variability range, and how it changes in a warmer world. In climates of low temperature variability, such as the tropics, a shift towards a warmer mean state may imply conditions outside the historical range of adaptation and acclimatization, with relatively small temperature fluctuations resulting in large impacts (Mahlstein et al. 2011; Harrington et al. 2016; Gasparrini et al. 2017; King and Harrington 2018; Samset et al. 2019). Climates of high temperature variability, such as mid-latitude continental interiors, face a broad range of possible conditions, thus low-probability extremes characteristic of higher warming levels may happen before possible adaptation (Suarez-Gutierrez et al. 2018). Moreover, heat stress can worsen if variability changes under warming. An increase in variability, as projected over certain regions (Fischer et al. 2012; Donat et al. 2017; Bathiany et al. 2018; Suarez-Gutierrez et al. 2020), leads to heat extremes with increased amplitude and frequency, and to overall larger deviations from the mean state that impose bigger adaptational challenges. On the other hand, a decrease in variability implies that the effects of global warming are less likely to be temporarily counteracted by internal variability on any given summer, resulting on summer temperatures that are consistently warmer.

Alongside extreme maximum daytime temperature, heat stress can be exacerbated due to an absence of nighttime cooling that impedes organisms to recover from extreme daytime heat (Laaidi et al. 2012). These tropical night conditions occur when nighttime minimum temperatures are above \(20\,^\circ \hbox {C}\) to \(28\,^\circ \hbox {C}\), depending on the average climatic conditions of the region. Our bodies can, with time and within physiological limits, acclimatize to these conditions (Kjellstrom et al. 2010; Vicedo-Cabrera et al. 2018). However, for unadapted individuals in unadapted environments, this lack of restorative cooling is directly linked to increased heat-related hospitalizations and mortality, particularly if sustained over several days (Basu and Samet 2002; Laaidi et al. 2012; Royé 2017; Murage et al. 2017; Guo et al. 2017; Russo et al. 2019). There is overall agreement that a shift towards a warmer mean state results in higher minimum temperatures (Russo and Sterl 2011). Yet it remains unclear to what extent sustained tropical night conditions can be averted by limiting global warming. We investigate how (iv) the probability of tropical night conditions sustained for an entire month changes under warming, and quantify the maximum global warming level that allows us to avoid these conditions over non-adapted areas.

Lastly, we investigate extreme heat conditions involving simultaneous high temperature and humidity. Under high humidity, evaporative cooling loses efficiency, and combined with extreme temperatures, we become unable to maintain a stable body temperature (Sherwood and Huber 2010). Some evidence indicates that high humidity during extreme high temperature events in recent years does not significantly increase observed total mortality (Armstrong et al. 2019). On the other hand, studies find aggravated heat stress and associated mortality for observed high humidity and simultaneous high temperatures above certain thresholds (Mora et al. 2017); and that these thresholds are rarely surpassed under current global warming levels (Sherwood and Huber 2010; Im et al. 2017). The combined effect of temperature and humidity can be quantified using a variety of indices (Willett and Sherwood 2012; Buzan et al. 2015). We focus on one of the most commonly used, (v) wet bulb temperature (W; Sherwood and Huber 2010; Pal and Eltahir 2015; Im et al. 2017; Coffel et al. 2018; Brouillet and Joussaume 2019). W is defined as the temperature that an air parcel would reach through evaporative cooling once is fully saturated. As opposed to comfort-based heat indexes (Russo et al. 2017; Matthews et al. 2017; Li et al. 2018) or more complex heat stress measures considering the effect of wind chill and solar irradiation (Dunne et al. 2013; Fischer and Knutti 2013; Newth and Gunasekera 2018), W establishes the clear thermodynamic threshold of \(35\,^\circ \hbox {C}\) for which health impacts cannot be overcome by adaptation (Sherwood and Huber 2010). Exposure to instantaneous W values above \(35\,^\circ \hbox {C}\) during periods as short as a few hours may be lethal even for acclimated healthy individuals. However, harmful to deadly levels of heat stress can occur at instantaneous W levels as low as \(28\,^\circ \hbox {C}\), depending on individual conditions and level of physical activity (Dunne et al. 2013; Buzan et al. 2015; Coffel et al. 2018). Here we investigate how the most extreme W events change under warming, and determine the maximum global warming for which dangerous levels of W can be avoided.

Most previous studies evaluating how these heat stress metrics change under global warming are based on smaller multi-model ensembles (Dunne et al. 2013; Fischer and Knutti 2013; Seneviratne et al. 2016; Donat et al. 2017; Mishra et al. 2017; Matthews et al. 2017; Russo et al. 2017; Mora et al. 2017; Newth and Gunasekera 2018; Coffel et al. 2018; Li et al. 2018; Bathiany et al. 2018; Brouillet and Joussaume 2019), smaller regional model ensembles (Pal and Eltahir 2015; Im et al. 2017), atmosphere-only ensembles (Lewis et al. 2019), or smaller single-model ensembles (Sherwood and Huber 2010; Willett and Sherwood 2012; Mishra et al. 2017). These smaller ensemble sizes imply a potential misrepresentation of the severity of future extremes in these studies. In particular, and additionally to being based on multi-model ensembles, most studies of variability change under warming are constrained to using detrended quantities and standard deviation changes as a proxy for variability (Fischer et al. 2012; Donat et al. 2017; Bathiany et al. 2018). However, this combination does not allow a clean separation between internal variability and forced warming, and can lead to misleading results. Lastly, most studies explore changes linked to different forcing scenarios (e.g., Dunne et al. 2013; Fischer and Knutti 2013; Matthews et al. 2017; Coffel et al. 2018; Li et al. 2018; Bathiany et al. 2018; Brouillet and Joussaume 2019), complicating the link to changes under different global warming levels (e.g., Russo et al. 2017; Mishra et al. 2017).

In contrast, we base our analysis on the currently largest existing initial-condition ensemble of a comprehensive, fully-coupled Earth System Model—largest both in terms of forcing scenarios represented and in terms of independent members (Maher et al. 2019). MPI-GE consists of sets of 100 independent realizations under the same forcing conditions that start from different initial states, and allows for 1-in-100-years events to occur on average every simulated year. This design enables the characterization of internal variability that is well-sampled and not confounded by different responses to forcing or model configurations. In turn, these requirements are crucial to empirically evaluate the statistical significance of changes in extreme events, as well as in the range of events that are possible under different levels of global warming. In addition, MPI-GE presents a unique diversity of forcing conditions. This diversity allows us to robustly characterize and compare the well-sampled climates at \(0\,^\circ \hbox {C}\), \(1.5\,^\circ \hbox {C}\), \(2\,^\circ \hbox {C}\), \(3\,^\circ \hbox {C}\) and \(4\,^\circ \hbox {C}\) of global warming above pre-industrial conditions.

2 Data and methods

We use transient climate MPI-GE simulations under historical forcing and three future representative concentration pathways (RCP), namely RCP2.6, RCP4.5, and RCP8.5 (Bittner et al. 2016; Suarez-Gutierrez et al. 2017; Maher et al. 2019). The ensemble consists of sets of 100 realizations based on the same model physics and parametrizations and driven by the same external forcings, that start from different initial climate states taken from different points of the model’s pre-industrial control run. MPI-GE uses the model version MPI-ESM1.1 in the low resolution (LR) configuration, with resolution T63 and 47 vertical levels in the atmosphere (Giorgetta et al. 2013) and \(1.5\,^\circ\) resolution and 40 vertical levels in the ocean (Jungclaus et al. 2013). MPI-ESM1.1 is fairly similar to the the CMIP5 version of MPI-ESM (Taylor et al. 2012), but has a slightly lower equilibrium climate sensitivity of \(2.8\,^\circ \hbox {C}\) (Flato et al. 2013; Mauritsen et al. 2019). The relatively low resolution in MPI-ESM-LR is comparable to most of the models in the CMIP5 ensemble, and can influence the model’s ability to simulate small-scale processes and affect the reliability of our projections. Despite this, MPI-GE is capable of simulating extreme heat events as extreme as observed, i.e. the 2003 and 2010 European summers (Suarez-Gutierrez et al. 2018, 2020), unlike other large ensemble experiments of smaller size (Schaller et al. 2018).

We define changing global warming levels using global mean surface temperature (GMST), defined as the annually averaged, global mean, near-surface 2m air temperature anomaly with respect to pre-industrial conditions defined by the period 1851–1880. We focus on summer months defined as JJA for the Northern Hemisphere and DJF for the Southern Hemisphere. Probabilities, return levels, and return periods are calculated based on non-parametric empirical distributions. We use monthly-frequency output data available from MPI-GE to define the heat metrics used. We use the summer maximum value of daily maximum temperature (TXx), defined as the block maximum temperature reached each summer at each grid cell. We evaluate the likelihood of experiencing sustained tropical night conditions during an entire month, for summer months with block minimum values of daily minimum temperatures (TNn) above the tropical night threshold defined as the 95th percentile level of pre-indutrial minimum temperature TNn at each grid cell. We construct monthly wet-bulb temperature estimates (W) using monthly averages of near-surface 2m air temperatures and relative humidity based on the method described in Stull (2011). Ideally, to obtain the most accurate results W should be calculated instantaneously at the model time step. However, this is not possible in MPI-GE, with only monthly mean relative humidity output available. Calculating monthly W using monthly mean temperature and humidity, as opposed to calculating monthly W averages based on instantaneous data, can lead to a maximum overestimation of up to \(1.5\,^\circ \hbox {C}\) (Buzan et al. 2015). Although this overestimation varies with temperature, its 90% confidence range remains below \(0.5\,^\circ \hbox {C}\), and its median is in the 0.005-\(0.2\,^\circ \hbox {C}\) range for all temperatures considered (Buzan et al. 2015). To counteract this potential bias, we subtract the maximum median overestimation of \(0.2\,^\circ \hbox {C}\) from the monthly W estimates in this study. This correction does not alter our conclusions, and results for uncorrected W values are shown in the Supporting Information (SI) Fig. S2 and S3.

We construct representative samples of the climate conditions at \(0\,^\circ \hbox {C}\), \(1.5\,^\circ \hbox {C}\), \(2\,^\circ \hbox {C}\), \(3\,^\circ \hbox {C}\) and \(4\,^\circ \hbox {C}\) of mean global warming with respect to pre-industrial levels using MPI-GE transient climate simulations. GMST deviates from the long-term mean state on year-to-year timescales due to the effect of internal variability. Therefore, we calculate centered decadal-averaged GMST to robustly define each global warming level (Suarez-Gutierrez et al. 2018). We define years of \(0\,^\circ \hbox {C}\) of global warming as those years in which the centered decadal-averaged GMST is in the range of \(0 \pm 0.25\,^\circ \hbox {C}\) in the historical MPI-GE simulations. Analogously, for the remaining global warming levels we select years in which the centered decadal-averaged GMST is in the range of \(1.5\pm 0.25\,^\circ \hbox {C}\) from RCP2.6 simulations, \(2\pm 0.25\,^\circ \hbox {C}\) from RCP4.5 simulations, and \(3\pm 0.25\,^\circ \hbox {C}\) and \(4\pm 0.25\,^\circ \hbox {C}\) from RCP8.5 simulations. This time-slice method to define global warming levels from transient simulations is similar to the methods used in Schleussner et al. (2016); James et al. (2017); King and Karoly (2017) or Suarez-Gutierrez et al. (2018). However, we define each level based on a slightly larger range of decadal averaged GMST, to reach an adequate and homogeneous sample size of around 1000 simulated years for each warming level.

The climates at \(0\,^\circ \hbox {C}\), \(1.5\,^\circ \hbox {C}\) and \(2\,^\circ \hbox {C}\) of global warming are defined from transient simulations that are in a near-equilibrium state. On the other hand, the climates at \(3\,^\circ \hbox {C}\) and \(4\,^\circ \hbox {C}\) of global warming are defined from highly transient simulations, due to the lack of near-equilibrium simulations for higher warming levels. Similar highly transient simulations are commonly used to define fixed global warming levels (Schleussner et al. 2016; King and Karoly 2017). However, the climate conditions sampled from transient runs may differ from the near-equilibrium conditions at said warming level, such as in different warming patterns or different ocean heat content distributions (Gregory et al. 2015; Rogelj et al. 2017; Tebaldi and Knutti 2018; Rugenstein et al. 2019; King et al. 2020). On average, mean summer temperatures could be around 0.1–0.8 °C higher over some land areas in transient climates compared to their GMST-equivalent equilibrium states (King et al. 2020). The use of highly transient runs also implies that climates with slightly higher or lower levels of warming may be oversampled. Additionally, part of the differences between each warming level sampled from different RCPs may arise from differences beyond \(\hbox {CO}{_2}\) atmospheric concentrations, such as different land use changes or aerosol forcings.

2.1 MPI-GE evaluation

To evaluate the ability of MPI-GE to simulate observed current climate conditions globally, we compare it to the \(1{\,^\circ } \times 1{\,^\circ }\) gridded data from Berkeley Earth Surface Temperatures (BEST) climatology and monthly maximum temperature anomaly for the period 1850–2018 (Rohde et al. 2012). The average summertime monthly absolute maximum temperatures for current climate conditions defined by the period of 1990–2018 in MPI-GE are in good agreement to BEST observations in most regions of the world. MPI-GE average absolute maximum temperatures are larger than observations for parts of Australia, West Asia, or North and South America. In contrast, the simulated average is smaller than observations over parts of East Asia and most tropical regions, respectively highlighting areas where MPI-GE projections may over- or underestimate the risk in summertime heat extremes (Fig. 1).

Fig. 1
figure 1

Maximum temperatures in MPI-GE vs. observations. Absolute summertime monthly maximum temperatures averaged for the period 1990–2018 for MPI-GE simulations compared to observed maximum temperatures in the BEST dataset (Rohde et al. 2012). The observed estimates represent the maximum value of the spatial average of maximum temperature anomaly plus the climatology for the respective month in each grid cell in its original grid for the period 1951–1980. MPI-GE simulations are historical runs for the period 1990–2005 and RCP4.5 for the period 2005–2018. Summer months are defined as JJA for the Northern Hemisphere and DJF for the Southern Hemisphere

We evaluate how MPI-GE captures the variability and forced changes in observed maximum summer temperatures using the evaluation framework in Suarez-Gutierrez et al. (2018) and Maher et al. (2019). This framework allows us to evaluate the whole simulated distribution, including its tails, and offers a more appropriate assessment of the simulated representation of the magnitude and frequency of observed extremes (Suarez-Gutierrez et al. 2018). We find that more than 85% of observed estimates occur within the central 75th percentile range of the ensemble over large regions; thus indicating that MPI-GE may overestimate the observed variability range in maximum temperatures in some regions (Fig. 2). In regions such as Europe or North America, this overestimation of variability translates in warm extremes that are adequately represented, while the magnitude of cold extremes appears to be overestimated (Fig. 2a, b), highlighting a bias in the shape or skewness of the simulated distribution. On the other hand, in regions such as the Persian Gulf or South Africa, we find that variability is overestimated both in the upper and lower tails of the simulated distribution. This tendency to overestimate the variability in observed maximum temperatures may indicate that MPI-GE also overestimates future projections of maximum temperatures. However, summer maximum temperatures exhibit large internal variability (Suarez-Gutierrez et al. 2018), and the observational record may be too short to sample long return period events at the same rate as the ensemble. Thus, hindering our ability to determine whether the amplitude and frequency of extreme events is truly overestimated in MPI-GE. However, we also find that observations occur within the ensemble spread in most land surfaces, with the exceptions of Central Africa or East Asia (Fig. 2). The spread in MPI-GE captures the magnitude of observed extremes adequately both under pre-industrial conditions and under historical global warming levels. This indicates that MPI-GE does not underestimate the observed variability in maximum temperatures, and it also does not under or overestimate the forced warming response. Therefore, our evaluation shows that MPI-GE provides an adequate, albeit potentially overestimated projection of the risk of future summertime heat.

Fig. 2
figure 2

Summer maximum temperature variability and forced response in MPI-GE vs. observations. Global (top) and time series (bottom, ad) evaluation of the variability and forced response in maximum summertime monthly temperature anomalies simulated by MPI-GE against BEST maximum temperature anomalies (Rohde et al. 2012) for the period 1850–2018. Red shading represents regions where observed anomalies are higher than the ensemble maximum, while blue shading represents regions where observed anomalies are lower than the ensemble minimum. Gray hatching represents regions where the observations cluster within the central 75th percentile bounds of the ensembles (12.5th to 87.5th percentiles). Dotted areas represent regions where observations are available for less than 10 summer months, and are therefore excluded from our evaluation. Frequencies are normalized to percentage. Time series for specific grid points (a–d) show ensemble maximum and minimum (red lines) and central \(75^{\mathrm{th}}\) percentile range (red shading) of simulated summertime monthly maximum temperature anomalies compared to observed anomalies (black points). All observed and simulated anomalies are calculated with respect to the climatology baseline, defined in BEST data by the period of 1951–1980. Simulations are historical runs for the period 1850–2005 and RCP4.5 runs for the period 2006–2018. BEST data are adapted to the coarser resolution MPI-GE grid. Summer months are defined as JJA for the Northern Hemisphere and DJF for the Southern Hemisphere

3 Results and discussion

We construct the range of the climate conditions at five different global warming levels from MPI-GE simulations under historical, RCP2.6, RCP4.5, and RCP8.5 forcings (Fig. 3). To achieve homogeneous sample sizes of around 1000 simulated years for each climate conditions, we restrict our selection to the periods marked by the dashed black lines in Fig. 3a. The calculated probability distributions of GMST for the simulated years selected show the effect of sampling near-equilibrium conditions, as for GMST levels of \(0\,^\circ \hbox {C}\), \(1.5\,^\circ \hbox {C}\), or \(2\,^\circ \hbox {C}\), in comparison to sampling highly transient conditions, as for \(3\,^\circ \hbox {C}\) and \(4\,^\circ \hbox {C}\) of GMST (Fig. 3b). The latter exhibit GMST values that are more variable, resulting in wider probability distributions. However, the distributions are adequately centered around the represented GMST levels and present no substantial overlap, indicating that each sample distribution is distinguishable from the others and offers an adequate representation of the climate conditions of each warming level.

Fig. 3
figure 3

Global mean surface temperature (GMST) in MPI-GE. a Time series of annually averaged GMST anomalies (colored thin lines) and centered decadal-averaged GMST anomalies (colored thick lines) for the period 1850–2099, simulated by MPI-GE. Simulations are historical runs for the period 1850–2005 (gray), and RCP2.6 (orange), RCP4.5 (red) and RCP8.5 (dark red) for the period 2006–2099. The black dashed lines show the periods of sampling for each warming level. (b) Probability distribution of GMST anomalies for pre-industrial conditions at \(0\,^\circ \hbox {C}\) of warming (gray; sample size \(\hbox {n}=1300\)), and for future global warming levels of \(1.5\,^\circ \hbox {C}\) (yellow; \(\hbox {n}=1300\)), \(2\,^\circ \hbox {C}\) (orange; \(\hbox {n}=1297\)), \(3\,^\circ \hbox {C}\) (red; \(\hbox {n}=1225\)) and \(4\,^\circ \hbox {C}\) (dark red; \(\hbox {n}= 997\)). The shaded bars represent the range of \(\pm \, 0.25 \,^\circ \hbox {C}\) around each GMST mean state. Bin size is \(0.05\,^\circ \hbox {C}\); frequencies are normalized to unity and translated to percentage

3.1 Maximum absolute temperatures

Maximum summer values of maximum monthly temperatures (TXx) in MPI-GE are comparable to the maximum temperatures observed at our current warming level conditions of around \(1\,^\circ \hbox {C}\) above pre-industrial levels (Hawkins et al. 2017) (Fig. 4). Observations represent the maximum value of the spatial average of maximum temperatures in each grid cell. Thus, localized maxima may be smoothed within each grid-cell, leading to maximum temperatures that are slightly lower than the maximum temperatures for specific locations (Fischer et al. 2013). With this consideration in mind, we find that although the observed temperature patterns are well represented in MPI-GE, the maximum absolute temperatures under pre-industrial conditions in the MPI-GE simulations are similar or higher than those observed under current global warming levels (Fig. 4, top). In regions such as North America, Argentina, Western Asia or Australia, this may occur because MPI-GE simulates maximum temperatures on average warmer than those observed (Fig. 1). In contrast, this can also occur due to a potential overestimation of temperature extremes over these regions in MPI-GE, that may result from an overestimation of temperature variability over some continental areas (Fig. 2). However, the large ensemble size in MPI-GE allows for simulated low-probability extremes with return periods over hundreds of years, and the observational record may just be too short to determine whether the ensemble overestimates very extreme temperatures or whether the Time of Emergence (Hawkins and Sutton 2012) has not yet been reached for TXx over these regions.

For warming levels of \(1.5\,^\circ \hbox {C}\) and \(2\,^\circ \hbox {C}\), TXx remains mostly below \(50\,^\circ \hbox {C}\), with some exceptions in the Arabic Peninsula, Northern India and Pakistan; while the areas where TXx reaches \(45\,^\circ \hbox {C}\) to \(50\,^\circ \hbox {C}\) spread in comparison to pre-industrial conditions over North and West Africa (Fig. 4). In contrast, for \(3\,^\circ \hbox {C}\) and \(4\,^\circ \hbox {C}\) of global warming, TXx is projected to exceed \(50\,^\circ \hbox {C}\) across all continents (Fig. 4). By 4 °C of warming, MPI-GE projects the highest summer maximum temperatures over West Asia, surpassing the \(60\,^\circ \hbox {C}\) threshold over Pakistan, Iraq and Saudi Arabia. Whereas for absolute TXx hotspots occur over India, the Arabic Peninsula and the Western Sahara, hotspots of TXx increase relative to pre-industrial levels occur in MPI-GE projections in regions such as Central Europe and Central North and South America (SI Fig. S1), in agreement with the relative TXx increase hotspots highlighted in previous studies (Seneviratne et al. 2016; Lewis et al. 2019).

Fig. 4
figure 4

Maximum absolute summer maximum temperatures. Absolute maximum summer maximum value of monthly maximum temperature (TXx) under different global warming levels simulated by MPI-GE, compared to observed maximum temperatures in the BEST dataset. The observed estimates represent the maximum value of the spatial average of maximum temperature anomaly plus the climatology for the respective month in each grid cell for the period 1850–2018. The simulated maximum temperatures represent the 99.5th percentile value for each distribution at each grid cell

3.2 Return periods of very extreme temperatures

In this section we investigate how the frequency of events that are extreme under pre-industrial conditions changes with global warming. As reference we choose extreme maximum summertime temperatures that occur on average once every hundred years under pre-industrial conditions (1-in-100-years events; Fig. 5, top). Under warming, these temperature levels characteristic of 1-in-100-years events occur more frequently than once every hundred years, thus becoming 1-in-x-years events for increasing global warming levels. Already at \(1.5\,^\circ \hbox {C}\) of global warming, these 1-in-100-years events could occur every 10 to 25 years in most regions of the globe; while in North Africa, extreme temperatures between 40 to \(50\,^\circ \hbox {C}\) are projected to occur every one to two years (Fig. 5). At \(2\,^\circ \hbox {C}\) of warming, these extreme temperatures are projected to occur more often than every 10 years over most of the world; while at \(3\,^\circ \hbox {C}\) of warming, they are projected to occur every 2 to 5 years in most regions, and every year in East Asia, North Africa and North America. By the point global warming reaches \(4\,^\circ \hbox {C}\) above pre-industrial levels, these very extreme events could occur more often than every one to two years almost all over the world (Fig. 5).

Fig. 5
figure 5

Return periods of very extreme summer maximum temperatures at different global warming levels. Return levels of TXx for events with return periods of 100 years under pre-industrial conditions defined at \(0\,^\circ \hbox {C}\) GMST (top row). Return periods of TXx levels of pre-industrial 1-in-100 years events under different levels of global warming (middle and bottom rows)

3.3 Maximum temperature variability

In this section we evaluate the range of year-to-year variability in summer maximum temperatures under global warming. Under pre-industrial conditions, absolute TXx variability simulated by MPI-GE is larger on mid and high latitudes, particularly in the Northern Hemisphere, and smaller in tropical regions (Fig. 6, top). Pre-industrial TXx variability reaches values well above \(10\,^\circ \hbox {C}\) in regions such as Eastern India, central Eurasia and other mid-latitude continental interiors. Most equatorial and tropical regions exhibit lower pre-industrial TXx variability, ranging from below \(4\,^\circ \hbox {C}\) to around \(8\,^\circ \hbox {C}\).

Fig. 6
figure 6

Variability in summer maximum temperatures at different global warming levels. Variability in TXx under pre-industrial conditions measured as the difference between the 97.5th and the 2.5th percentiles in the TXx distribution at \(0\,^\circ \hbox {C}\) GMST (top row). Relative change in variability based on change in TXx probability distribution width (2.5th–97.5th percentiles) at different global warming levels relative to pre-industrial conditions at \(0\,^\circ \hbox {C}\) GMST (middle and bottom rows)

For global warming of \(1.5\,^\circ \hbox {C}\) and \(2\,^\circ \hbox {C}\), the change in TXx variability is dominated by a relative increase. The relative TXx variability increase is similar for both warming levels and remains mostly under 50%, reaching its maximum values in Central South America, North America and India (Fig. 6). For warming beyond \(2\,^\circ \hbox {C}\), TXx variability increases in these regions by more than 50%. Apart from the exceptions of Australia and some parts of Africa and East Asia, where TXx variability does not change substantially, at \(4\,^\circ \hbox {C}\) of global warming we find a substantial increase in the variability of summer maximum temperatures in large continental areas all across the globe, with maximum relative increase well above 100% (Fig. 6). For regions that exhibit low pre-industrial TXx variability, such as Central South America, this doubling of variability results on absolute TXx variability mostly below \(8\,^\circ \hbox {C}\) at \(4\,^\circ \hbox {C}\) of global warming (not shown). However, in regions of larger pre-industrial TXx variability, the doubling in TXx variability translates in a range of maximum year-to-year variations of summer maximum temperatures of up to \(14\,^\circ \hbox {C}\) in North America or Central Europe, and up to \(18\,^\circ \hbox {C}\) in India (not shown).

In addition, we also find a decrease in TXx variability ranging from 10 to 35% at \(4\,^\circ \hbox {C}\), most prominent in Southern Europe, North America, and high latitude regions such as Greenland (Fig. 6). The variability decrease over high latitude regions is most likely dominated by ice melt. Over mid latitude regions, TXx variability decrease occurs in the vicinity of transition zones between dry and wet climates, and is accompanied by strong variability increases (e.g., Central Europe, North America, Central Africa). This may occur due to decreasing moisture variability caused by overall dryer conditions as the high moisture variability area in the transition zones shifts polewards. A decrease in the frequency of wetter-than-normal summers may reduce temperature variability in the lower tail of the distribution due to decreased evaporative cooling. These results stand in contrast to results from previous studies evaluating standard deviation changes in multi-model ensembles, that find a consistent increase in summertime monthly mean temperature variability over land under global warming, particularly in the Northern Hemisphere (Bathiany et al. 2018).

3.4 Sustained tropical night temperatures

In this section we evaluate the risk of sustained tropical night temperatures, expressed as minimum temperatures higher than the tropical night threshold of pre-industrial 95th percentile monthly minimum temperature at each grid cell for the entirety of the month. These sustained tropical night conditions correspond with monthly minimum temperatures above \(20\,^\circ \hbox {C}\) over most tropical regions and some mid-latitude areas in the pre-industrial climate, and with temperatures higher than \(32\,^\circ \hbox {C}\) over the Western Sahara, parts of India and parts of the Arabic Peninsula (Fig. 7, top). Already at \(1.5\,^\circ \hbox {C}\) of global warming, the probability of these extreme sustained tropical night conditions exceeds 25% over most of the world, and probabilities above 50% spread polewards at \(2\,^\circ \hbox {C}\) of global warming (Fig. 7, middle). At \(3\,^\circ \hbox {C}\) of global warming, MPI-GE projects these extreme tropical night temperatures to be sustained during half or more of the summer months over most regions; with some exceptions over Australia, India, the Sahel, and the high latitudes, where probabilities remain in the 10–50% range (Fig. 7, bottom). By \(4\,^\circ \hbox {C}\) of global warming, our findings show that temperatures are projected to surpass these extreme conditions during every hour in every summer month over vast regions, including South East Asia, the Maritime Continent, Central Africa, and Central North and South America (Fig. 7, bottom). To avert the risk of common-place summertime tropical night conditions sustained during 25% or more of the summer months is not possible over most of the world, even by limiting global to \(1.5\,^\circ \hbox {C}\) (Fig. 8).

Fig. 7
figure 7

Risk of sustained tropical night temperatures at different global warming levels. Minimum monthly temperature TNn for Tropical Night threshold at \(0\,^\circ \hbox {C}\) GMST, as the 95th percentile of the pre-industrial TNn distribution (top row). Probability of sustained exceedance of the tropical night threshold for monthly minimum temperatures (TNn > Pre-industrial TNn 95th percentile) at different global warming levels (middle and bottom rows)

Fig. 8
figure 8

Global warming level of sustained tropical night temperatures. Global warming level measured as GMST that exhibits sustained exceedance of the tropical night threshold (minimum monthly temperature TNn > 95th percentile Preindustrial TNn) with probability of 25% or higher

3.5 Extreme wet bulb temperatures

In this section we evaluate how the combination of simultaneous high temperature and high humidity measured as monthly wet bulb temperature (W) changes under global warming. Under pre-industrial climate conditions in MPI-GE, monthly W reaches maximum values above \(26\,^\circ \hbox {C}\) over Northern India and Pakistan; while remaining below \(24\,^\circ \hbox {C}\) on the majority of the world (Fig. 9, top). Beyond \(1.5\,^\circ \hbox {C}\) of global warming, projections of maximum absolute monthly W above \(26\,^\circ \hbox {C}\) spread over Northern India and East Asia; while monthly W above \(24\,^\circ \hbox {C}\) are projected to occur across all continents (Fig. 9). At \(4\,^\circ \hbox {C}\) of global warming, maximum monthly W levels above \(26\,^\circ \hbox {C}\) could occur over large land fractions across all continents; while projections surpass the \(28\,^\circ \hbox {C}\) danger threshold for vulnerable individuals on average for an entire month over parts of East China, the Arabic Peninsula, Pakistan and Northern India (Fig. 9).

Fig. 9
figure 9

Maximum absolute extreme Wet Bulb temperatures at different global warming levels. Maximum absolute monthly Wet Bulb temperatures at different global warming levels. The simulated maximum represents the 99.5th percentile value for each distribution at each grid cell

In addition, instantaneous W values could exceed these monthly values by several degrees. The maximum monthly W estimates of \(26\,^\circ \hbox {C}\) from MPI-GE under pre-industrial climates and under \(1.5\,^\circ \hbox {C}\) of global warming are indeed several degrees lower than current maximum instantaneous W observed estimates of \(31\,^\circ \hbox {C}\) (Sherwood and Huber 2010; Im et al. 2017). Furthermore, the differences between monthly mean and instantaneous values are particularly large over extreme W hotspots such as South East and West Asia (Sherwood and Huber 2010; Im et al. 2017). Over these regions, daily W estimates under RCP8.5 forcing are projected to exceed the fatal \(35\,^\circ \hbox {C}\) threshold by the end of the century (Im et al. 2017). This comparison indicates that our projections using monthly W estimates may be somewhat conservative. However, previous maximum daily or instantaneous W projections are based on ensembles much smaller than MPI-GE, thus with a smaller probability of capturing very extreme events. Therefore, comparisons based on larger ensembles may yield even larger differences between maximum daily and monthly W estimates. On the other hand, we find good agreement between the regions of largest W increase under warming in MPI-GE and in previous studies (Sherwood and Huber 2010; Im et al. 2017; Brouillet and Joussaume 2019). This indicates that, although our monthly W estimates may underestimate the risk of reaching harmful instantaneous W levels within a month, the good agreement on the regions of largest W increase supports our conclusions regarding which regions become major heat-stress hotspots due to the combination of extreme temperature and humidity in a warmer world.

Lastly, we calculate the maximum global warming level that allows us to avert the risk of extreme hot and humid conditions characterized by monthly mean W above \(26\,^\circ \hbox {C}\) (Fig. 10). These extreme W conditions, that occur rarely in the pre-industrial climate of MPI-GE, can be averted over most regions of the world by maintaining global warming below \(2\,^\circ \hbox {C}\) (99% confidence), with some exceptions over Northern India or East China. For warming levels higher than \(2\,^\circ \hbox {C}\), the risk of extreme W conditions spreads over large land fractions across almost all continents. At \(4\,^\circ \hbox {C}\) of global warming, extreme W conditions above \(26\,^\circ \hbox {C}\) spread not only over large low-latitude and tropical regions, but also over mid-latitude regions in North America, East Asia, and Australia. Our results show that limiting global warming to \(2\,^\circ \hbox {C}\) under pre-industrial conditions is vital to avoid the exposure over large non-adapted regions to the combination of extreme temperature and humidity, one of the factors that exacerbates heat stress the most.

Fig. 10
figure 10

Global warming level of exceedance of extreme Wet Bulb temperature threshold. Global warming level measured as GMST that exhibits exceedance of the monthly wet bulb temperature threshold (\(\hbox {W} > 26\,^\circ \hbox {C}\)) with probability of 1% or higher

4 Summary and conclusions

We identify global risk hotspots of extreme summertime heat for five different metrics under five different levels of global warming, based on the robust sampling of low-probability extremes in the 100-member MPI-GE, currently the largest ensemble from a comprehensive climate model. These warming levels reflect near-equilibrium states for the climates at \(0\,^\circ \hbox {C}\), \(1.5\,^\circ \hbox {C}\) and \(2\,^\circ \hbox {C}\) of global warming, and transient conditions for the climates at \(3\,^\circ \hbox {C}\) and \(4\,^\circ \hbox {C}\) of global warming. We determine the maximum global warming level for which the risk of extreme heat conditions can be averted, as opposed to which conditions are part of the irreducible range of possibilities for each warming level. While MPI-GE adequately simulates the pattern of observed maximum absolute temperatures; the absolute maximum temperatures under preindustrial conditions are over some regions larger than those observed at the current warming level of \(1\,^\circ \hbox {C}\). This may indicate that MPI-GE overestimates either average maximum temperatures or maximum temperature variability in some areas. However, due to the relatively short length of the observational record, this can also indicate that extreme temperatures with return periods as long as those simulated by MPI-GE have not yet been recorded over these regions.

MPI-GE projects that, for global warming levels below \(2\,^\circ \hbox {C}\), maximum absolute summer temperatures stay below \(50\,^\circ \hbox {C}\) over most of the world, with some exceptions over maximum temperature hotspots in the Arabic Peninsula, Northern India and Pakistan. However, for warming levels beyond \(2\,^\circ \hbox {C}\), this threshold is overshot in all continents, with projected temperatures above \(60\,^\circ \hbox {C}\) in hotspots over Pakistan, Iraq or Saudi Arabia. We find that very extreme events that occur on average once every 100 years under pre-industrial conditions are projected to occur every 10 to 25 years at \(1.5\,^\circ \hbox {C}\) of warming, and more often than once every 10 years at \(2\,^\circ \hbox {C}\). At \(4\,^\circ \hbox {C}\) of warming, these 1-in-100-years events are projected to happen every one to two years over most of the world.

Our results also show that maximum temperature variability changes substantially under warming in large regions of the globe. Summer maximum temperature variability increases relative to pre-industrial levels up to 50% under \(2\,^\circ \hbox {C}\) of global warming, mostly in Central South America and North America, Central Europe and India. At \(4\,^\circ \hbox {C}\) of global warming we find hotspots of large maximum temperature variability increase over large continental areas, with maximum relative increase well above 100%. This 100% increase translates into maximum year-to-year variations of maximum temperatures of up to \(14\,^\circ \hbox {C}\) in North America and Central Europe, and up to \(18\,^\circ \hbox {C}\) in India. For regions such as Australia and large parts of Africa and East Asia, maximum temperature variability does not change substantially under warming. For high latitude regions and parts of Southern Europe, North America and South Africa, maximum temperature variability decreases by 10–35% at \(4\,^\circ \hbox {C}\) of global warming. Our results stand in contrast to previous studies, that indicate a consistent and substantial increase in summertime monthly mean temperature variability with global warming, especially in the Northern Hemisphere (Bathiany et al. 2018).

We find that for warming levels of \(1.5\,^\circ \hbox {C}\) above pre-industrial conditions, heat stress could be substantially aggravated by common-place sustained tropical night temperatures over non-adapted regions that rarely experience these conditions in a pre-industrial climate. At \(4\,^\circ \hbox {C}\) of global warming, extreme tropical night conditions are projected to occur almost everywhere in the world during more than a quarter of the summer months; while being sustained for the entirety of every summer month in tropical night hotspots over South East Asia, the Maritime Continent, Central Africa, and Central North and South America. Similarly, hot and humid conditions with extreme monthly wet bulb temperatures above \(26\,^\circ \hbox {C}\), that occur very rarely under pre-industrial conditions, are projected to occur over hotspots in North India and China already by \(1.5\,^\circ \hbox {C}\) of global warming. At \(4\,^\circ \hbox {C}\) of warming, these extreme wet bulb temperatures are projected to spread across most tropical regions as well as over mid latitude regions such as North America, Australia and South Asia. Over hotspot areas in India, Pakistan, China and the Arabian Peninsula, the most extreme monthly wet bulb temperatures are projected to exceed the \(28\,^\circ \hbox {C}\) danger threshold for vulnerable individuals on average for entire months.

We assess by how much global warming increase above pre-industrial conditions must be limited in order to avert the risk of extreme heat in five different metrics, considering the irreducible range of possibilities at each level of warming defined by well-sampled internal variability. Based on large samples of low-probability extremes for each of these heat metrics, we identify major hotspots over different regions—from maximum summertime temperatures of \(60\,^\circ \hbox {C}\) over hotspots in the Persian Gulf, maximum temperature variability increase beyond 100% over India or North America, to hotspots of extreme hot and humid conditions over India and China. These different heat-stress hotspots highlight the different potential risks and related adaptation measures that are necessary over different regions. With time and within limits, our society, economy, ecosystems, and even our bodies, are able to adapt to a warmer mean climate state. However, chaotic low-probability deviations from these mean climate conditions in the form of extreme events challenge our range of adaptability, potentially to its limits.