Introduction

Heat waves cause more deaths than any other weather-related natural disaster1. Extreme heat, particularly when combined with high humidity, can cause extremely dangerous conditions for humans, leading to heat stroke and fatalities2,3. Cities are particularly vulnerable to heat waves, due to the urban heat island effect4,5,6. In addition to human health, heat waves can negatively impact the energy sector (additional air conditioning), agriculture (crop failures, increased pests, animal deaths), local ecologies (pond and lake health, plant and tree health), and infrastructure (including roads and railroad tracks). Moreover, long-duration or multiple heat waves can be associated with drought7 and flash drought8, which has additional long-term negative impacts for humans, animals, plants, resources, and the economy. Under climate change, heat waves are expected to increase in intensity, duration, and frequency9, exacerbating these negative impacts.

Though heat waves and climate change projections of heat waves have received attention for other regions across the globe10,11,12, here we examine heat waves for a region that has received less attention, the US Northeast. Heat waves in this region merit closer consideration, as several Northeast coastal cities, including the major population centers of Boston and New York City, are associated with some of the highest US anomalous mortality rates (over 1.5 deaths per standardized million) during oppressive June–August heat days (defined as days with apparent temperature, based on both temperature and relative humidity, at least one standard deviation above seasonal means)13. The most deadly heat wave to impact the Northeast occurred in 1911, when temperatures surged for 11 straight days, killing at least 340 people14.

Previously, trends and future projections for Northeast heat waves have been identified within the context of US-wide studies. For example, researchers found that heat waves (two or more consecutive days exceeding the local 85th-percentile minimum apparent temperature, based on temperature and water vapor pressure) in 50 US cities (1950–2010) increased in frequency by 0.6 times per decade, in intensity by 0.1 °C per decade, and in duration by 0.2 days per decade, while for Boston the trend in frequency and intensity increased even faster6. Among 55 US locations (1948–2012), both Boston and New York City experienced significant increases in decadal heat days (95th-percentile 3-day mean apparent temperature, based on temperature, water vapor pressure, and wind speed) and heat events (multiple heat days), and Boston saw an increase in heat event duration4. Among 187 US locations from 1979 to 2013, significant positive trends were found in the Northeast for May–September daily minimum temperature and equivalent temperature (a measure that incorporates both dry-bulb temperature and specific humidity)15. Statistically significant upward trends in equivalent temperature were also found for high-humidity-related heat wave days (defined as equivalent temperature exceeding both daily and 3-day 90th percentiles), which account for about 80% of Northeast heat waves, for the period 1981–201516.

These upward trends are expected to continue in the future. For example, based on even a relatively modest warming of 1.5–2.0 °C and an associated moisture increase based on the Clausius–Clapeyron relationship17, highly populated areas such as the Eastern US could regularly experience heat waves with a greater apparent heat wave index (based on CMIP5 daily maximum temperature and daily minimum relative humidity) than the Russia 2010 heat wave in near-future scenarios18. In addition, high-mortality heat waves (those with over 20% increased mortality, accounting for about 1% of all heat waves) are projected to increase for a number of US communities, including six in the Northeast, for the time period 2061–2080 under the RCP4.5 and RCP8.5 warming scenarios19.

Because of these recent and projected upward trends in Northeast heat waves, it is important to understand the meteorological conditions and circulation associated with heat waves in this region. However, an in-depth analysis focusing on heat wave circulations for the Northeast has not yet been done, and previous analyses that have included the Northeast have considered all regional heat waves in a single category, that of a typical Peterson Type 2 heat wave high pressure system7, defined by a Gulf of Mexico or tropical Atlantic airmass associated with a mid-level high-pressure system located near or slightly downstream of the Northeast US, and anomalously high sea-level pressure (SLP) similarly located downstream9,20. However, heat waves presumably also occur in association with other circulation patterns, and have different magnitudes, impacts, predictability, and relationships with other phenomena such as surface moisture and radiation feedback. By considering only a single category of heat wave, we limit our understanding of these associated characteristics.

Here, we perform k-means clustering21 on daily circulation fields (500-hPa heights and 900-hPa winds) to separate Northeast US heat wave days into several flavors, or large-scale meteorological patterns22, defined as circulation systems that have a spatial scale anywhere between mesoscale systems and near-global scale climate variabilities. Although heat wave definitions can vary widely from study to study and in application, and often rely on different formulations of heat indices and thresholds, here we define heat wave days in terms of maximum daily temperature, as is done in many previous studies9,15,20. This provides a first step in understanding the circulation and meteorological implications associated with extreme heat in the Northeast, and allows us to consider both dry and humid heat waves. Within each set of pattern days, we analyze the associations with warm air advection, subsidence, surface humidity, and soil moisture, all of which can affect localized surface heating. We also examine transitions between the pattern days, as well as trends in the patterns over the 1980–2018 period to assess whether certain heat wave patterns have become more or less common in response to recent warming. Finally, we examine electrical energy usage within the context of the four patterns to provide one perspective on the societal impact of the different patterns. The heat wave identification and clustering techniques presented here can be applied to study heat wave variants and their impacts in any region.

Results

Identifying heat wave patterns

For this study, heat waves are defined at the station-level, based on 35 Global Historical Climatology Network [23] (GHCN) stations, using a definition of three or more consecutive days of maximum daily temperature (Tmax) above the 95th-percentile for all days 1980–2018. The individual station heat wave days are combined to form the set of dates on which k-means is performed (repeated dates are included only once). For clarity, we list several terms in Table 1 that are used throughout the study and have specific meanings. In particular, we distinguish between station extreme heat days (any date with above-95th-pecentile Tmax for the station), station heat wave events (three or more consecutive station extreme heat days), station heat wave days (individual days within station heat wave events), and the combined regional heat wave days, which is the unique set of days when a station heat wave day occurs at one or more stations concurrently. Regional heat wave days do not necessarily represent individual synoptic-scale events, since heat wave days at one station may or may not coincide with or partially overlap with heat wave days at another station. Instead, regional heat wave days can be thought of as snapshots of regional conditions that lead to extreme heat at specific station locations.

Table 1 Terms used in this study.

From 1980–2018 there are 1693 Northeast regional heat wave days where at least one of the 35 GHCN stations simultaneously experiences a station heat wave day. All regional heat wave days occur during the months of April–September (that is, no station experiences 95th-percentile Tmax outside of that time period). The annual mean number of station extreme heat days, station heat wave events, and duration of station heat wave events is shown in Table 2 for each of the 35 stations, along with the 95th-percentile Tmax threshold for that station. Stations overall average 2.8 heat wave events per year, with a mean duration of 4.3 days. The mean 95th-percentile Tmax is 29.6 °C (85.3°F). Regionally, just 20.4% of the 1693 regional heat wave days represent heat wave days that occur at a single station only. The balance reflects regional heat wave days that occur at multiple stations concurrently, with over 56% of the regional heat wave days occurring at five or more stations concurrently.

Table 2 Northeast GHCN stations used in the study and their mean 95th-percentile Tmax, mean annual extreme heat days (days with Tmax over the 95th-percentile), mean annual number of heat wave events (three or more consecutive extreme heat days), and mean duration of heat wave events 1980–2018.

Non-hierarchical k-means clustering21 is used to separate the 1693 regional heat wave days into four circulation patterns (P1–P4) based on daily reanalysis 500-hPa geopotential height anomalies and 900-hPa wind anomalies. Figure 1a shows composites of P1–P4 500-hPa geopotential heights and anomalies and mean sea-level pressure (MSLP), while Fig. 1b shows 900-hPa winds and surface temperature anomalies. The monthly frequency (April–September) of pattern days is shown in Fig. 1c. The relative frequency of station heat wave days within each pattern is shown in Fig. 1d. Additional reanalysis composites are shown in Fig. 2a–d, including 850-hPa temperature advection anomalies, 800-hPa vertical velocity anomalies, 2-m specific humidity anomalies, and top layer soil moisture anomalies, respectively. All anomalies shown are based on climatological daily means.

Fig. 1: Regional heat wave day patterns.
figure 1

Composites for P1–P4 days of a 500-hPa geopotential heights (thick contours every 6 dam, anomalies shaded every 4 dam) and MSLP (thin contours every 2 hPa), and b 2-m temperature anomalies (shaded, in 0.6 K increments) and 900-hPa winds (quivers, largest quiver 8.5 ms−1). c April–September frequency of P1–P4 days with gray shading indicating the 95% confidence interval using random sampling of all regional heat wave days (n = 1693), and red, blue, and black bars indicating, respectively, higher-than-normal, lower-than-normal, and normal monthly values at the 0.05 level of significance. d Relative frequency (percent) of station heat wave days assigned to P1–P4 (for each station, number of heat wave days in specific pattern divided by total number of all-station heat wave days for specific pattern, represented by black dots, where size of dot is proportional to frequency). Heat wave patterns are from k-means clustering of MERRA-2 1980–2018 500-hPa geopotential height and 900-hPa wind daily anomalies (based on the climatological daily means).

Fig. 2: Composites of other meteorological fields on regional heat wave days.
figure 2

Composites for P1–P4 days of a 850-hPa temperature advection anomalies (shaded, in 1 × 10−5 K s−1 increments) with 850-hPa temperature (contours, in 2 K increments) and 850-hPa winds (quivers, largest quiver 10 ms−1), b 800-hPa vertical velocity anomalies (shaded, in 0.025 Pa s−1 increments, such that anomalous subsidence is red), c 2-m specific humidity anomalies (shaded, in 0.7 kg/kg increments), and d top surface layer soil moisture anomalies (shaded, in 0.02 increments). The black dots in each panel are identical to those in Fig. 1d, showing relative frequency of station heat wave days for each pattern.

Table 3 includes a summary of characteristics for the four patterns, including the percent of regional heat wave days assigned to each pattern, the mean percent of stations that experience simultaneous heat waves for the pattern days (the hit rate), the mean duration of regional heat wave days within the patterns, and the mean areal-averaged near-surface temperature, vertical velocity, wind speed, and wind direction for each pattern. The mean duration of each pattern requires some clarification: this is the average number of consecutive regional heat wave days within each set of pattern days. As such it is an estimation of how long the region is subject to a particular circulation regime that is linked to heat waves at one or more stations, as opposed to the pattern duration at individual stations experiencing heat waves. Following is a detailed description of each heat wave circulation pattern.

Table 3 Characteristics of the four patterns associated with Northeast heat wave events.

Pattern descriptions

P1 features a shallow upper-level trough located across the Northeast, with its axis east of the Atlantic seaboard, and anomalously low 500-hPa geopotential heights and MSLP (Fig. 1a) over the Canadian Maritimes. Low level winds are northwesterly, and the near-surface temperature is slightly above the climatological normal, particularly in southern portions of the domain (Fig. 1b). The pattern occurs preferentially in July (Fig. 1c), and the majority of the stations experiencing heat waves in this pattern are located towards the south and southeast of the domain (Fig. 1d). Temperature advection is predominantly negative (the most negative of the four patterns) due to low-level northwesterly winds (Fig. 2a), but subsidence is anomalously high in regions experiencing extremes (Fig. 2b). Near-surface humidity is average (Fig. 2c), but the ground is anomalously dry, particularly to the south, potentially suppressing evapotranspiration (Fig. 2d). The mean pattern duration (mean number of consecutive regional heat wave days within P1 days) is the lowest of the four patterns (Table 3).

P2 represents what we often think of as a classic Northeast summer heat wave, although here it occurs preferentially in late spring and early summer (Fig. 1c). An anomalously high upper-level ridge is in place over the Northeast, with high MSLP over the waters to the east of the southeast US (Fig. 1a). High temperature anomalies (the highest of the four patterns) are centered over the Great Lakes region but extend throughout the Northeast (Fig. 1b). Low-level winds are predominantly westerly and southwesterly, leading to anomalously high warm air advection (Fig. 2a). The majority of heat waves in this pattern occur in western portions of the domain, and in particular New York (Fig. 1d). The airmass in P2 is particularly moist, with the most positive and widespread specific humidity anomalies (Fig. 2c), while soil moisture is anomalously low (Fig. 2d). P2 features the largest proportion of simultaneously-occurring station heat wave days (Table 3), reflecting the widespread nature of this pattern of heat wave. This pattern also features the longest mean duration (mean number of consecutive regional heat wave days within P2 days) of 2.5 days.

P3 features a shallow upper-level trough located across the Ohio Valley, with anomalously high ridging to the east of Maine, and high MSLP well to the east of the Southeastern US (Fig. 1a). Anomalously high surface temperatures are centered across New Hampshire and Maine, with low-level winds predominantly from the southwest (Fig. 1b). The pattern occurs preferentially in mid-summer to late-summer (Fig. 1c). Heat waves occur mostly in the eastern portion of the domain for this pattern, with fewer heat waves in western New York (Fig. 1d). This pattern features anomalously high near-surface humidity (Fig. 2c) and the smallest surface moisture anomalies (Fig. 2d), and is the most anomalously rainy of the patterns (not shown). Subsidence (Fig. 2b) is anomalously low for this pattern, consistent with anomalously high precipitation.

P4 is similar to P2, but with two distinctions: the upper-level ridge is situated more northerly, and there is a slight trough in the flow over the coast of the Carolinas, with high MSLP located farther to the east (Fig. 1a). P4 tends to occur throughout the warm season (Fig. 1c), in contrast to P2, which tends to occur in late spring and early summer. The majority of the heat waves occur in the northwest regions (Fig. 1d), where anomalous high surface temperatures extend from Canada well into the Northeast (Fig. 1b). In the western regions, low-level winds are westerly and southwesterly, while in the eastern portions of the domain, low-level winds are predominantly northwesterly (due to the slight trough to the south) (Fig. 1b). Both warm air advection (Fig. 2a) and subsidence (Fig. 2b) are anomalously high to the northwest, consistent with where heat waves occur in this pattern. P4 features anomalously moist air only in the northwest region (Fig. 2c), and the driest soil moisture of the four patterns (Fig. 2d), particularly to the south, potentially suppressing evapotranspiration. Although P4 has the least number of regional heat wave days assigned to it, the percent of stations experiencing extreme heat is the second highest after P2 (Table 3).

The unique characteristics of each pattern suggest that extreme surface temperatures in the Northeast are often related to a combination of circulation features and mechanisms. The following sections examine additional temporal characteristics of the patterns.

Pattern transitions

Figure 3a–c shows, for each station, the percent of P1–P4 days that make up day 1, day 2, and day 3 of the station heat wave days (relative frequency is indicated by dot size). Most station day 1 patterns are split between P2 and P4. While P2 continues to dominate day 2 patterns, there are less P4 days and more P1 and P3 days. This trend continues to day 3, resulting in a more equal distribution between P1–P4.

Fig. 3: Pattern transitions for first three days of station heat wave events.
figure 3

Percent (dots, where dot size is proportional to frequency) of station heat wave events that occur in pattern P1–P4, and where red, blue, and black dots indicate, respectively, higher-than-normal, lower-than-normal, and normal values at the 0.05 level of significance, for a day 1 b day 2 and c day 3 of station heat wave events. d Percent of pattern assignments P1–P4 for (left to right) combined and separate days 1–3 of station heat wave events. e Frequency of transitions from P1–P4 on day 1 to next-day P1–P4 for station heat wave events. f Frequency of transitions from P1–P4 on day 2 to next-day P1–P4 for station heat wave events. For df, all station heat wave days 1–3 are counted, regardless of whether other stations share the same heat wave day. Red, blue, and black bars indicate, respectively, higher-than-normal, lower-than-normal, and normal values at the 0.05 level of significance, and gray shading indicates the 95% confidence interval of background frequency based on random sampling.

While this graphic provides granularity at the station level, the results are more easily seen by combining the station data (Fig. 3d–f). However, widespread events (where multiple stations experience extreme heat simultaneously) can bias the combined results. Given this caveat, Fig. 3d shows the frequency of P1–P4 for the first 3 days of combined station heat wave events (the background frequency). P2 comprises the majority of these days, which may be partially related to its relatively long duration (Table 3). The relative pattern frequency (compared to the background frequency) shifts from P2/P4 on day 1 to a greater percentage of P1/P3 by day 3, consistent with Fig. 3a–c. Figure 3e, f shows in detail how day 1 (day 2) patterns transition to all other patterns on day 2 (day 3) of station heat wave events. For most station heat wave events, there is strong pattern persistence. That is, for a particular heat wave event at a station, the first 3 days tend to occur under the same pattern assignment. Only one pattern, P3, shows a tendency to transition to P1 on day 2 or day 3, although the vast majority remain in P3.

Similarity to all-day patterns

To gain perspective on whether P1–P4 are unique in producing extreme heat, we use Self-Organizing Maps (SOMs) to produce a set of circulation patterns based on 500-hPa geopotential height anomalies and 900-hPa wind anomalies as for k-means, but for all days 1980–2018. The results, arranged into a 4 × 5 pattern-space, are shown in Fig. 4a, where each pattern comprises 2.5–8.3% of the days 1980–2018. Deep troughs (enhanced ridges) occupy the lower left (upper right) of the pattern-space. The 1693 regional heat wave days fall predominantly into four SOM patterns, with SOMs 13, 9, 7, and 15 representing the majority of P1, P2, P3, and P4 days, respectively (Fig. 4b).

Fig. 4: Self-organizing map of circulation patterns.
figure 4

a The results of SOM analysis on the same fields as used for k-means typing, but for all days 1980–2018, separated into a 4 × 5 pattern space, with the text above each panel showing the SOM number, the percent of all days assigned to the SOM, and the percent of regional heat wave days that belong to the SOM (in parenthesis, with the value in bold if significantly higher or lower than expected based on random sampling). b The relative frequency (proportional to dot size) of P1, P2, P3, and P4 days assigned to each SOM.

The P2-like pattern, SOM 9, has the largest number of days assigned to it (8.3%). However, 21.5% of the regional heat wave days fall into SOM 9, which is significantly higher than expected based on the background frequency. P1, P3, and P4 days also occur with much higher frequency than expected in SOMs 13, 7, and 15, respectively. This suggests that only certain circulation patterns are associated with extreme heat, but these same patterns can and do produce non-extreme surface temperatures also.

Trends

Some individual stations show statistically significant positive annual trends (1980–2018) in extreme heat days, heat wave days, and heat wave events (Fig. 5a), based on linear least squares fit. Monthly analysis shows that the majority of positive trends occur in May and September, including an increase in heat wave duration at several stations in September (Supplementary Fig. 1). A number of stations show significant positive annual trends for station heat wave days assigned to P3, while one station shows a negative trend for both P1 and P4 days (Fig. 5b). Monthly analysis (Supplementary Fig. 2) shows P1 days decreasing at some stations in August, P2 days increasing at many stations in September, P3 days increasing at a number of stations July–September, and P4 days increasing at several stations in September. While trend analysis is limited by the short time period and station-level sample sizes, these results suggest that Northeast extreme heat days are increasing predominantly in May and September, and much of this increase is associated with P2 and P3 days (and to a lesser extent with P4 days in September).

Fig. 5: Trends in heat waves and patterns.
figure 5

Trends in 1980–2018 a station extreme heat days, heat wave days, heat wave events, and heat wave duration, and b station heat wave days assigned to P1–P4, where enlarged dots indicate statistically significant positive (red) or negative (blue) trends (at 0.05 level). c Frequency (bars) and linear trends (dashed lines) of regional heat wave days assigned to patterns P1–P4 and d same as c except for the SOMs that are most similar to P1–P4 days, where dashed lines are red (blue) to indicate statistically significant positive (negative) trends at the 0.05 level. Trends are determined by linear least squares fit and significance is based on the two-sided t-distribution of the Wald Statistic. Additionally, the Kendall’s tau-b rank test is considered for c, d, and any significant trends at the 0.05 level are indicated by text at the top of the panels.

Overall annual trends, using linear least squares fit as well as rank correlation, in P1–P4 regional heat wave days (where one or more stations experience heat wave days concurrently) are shown in Fig. 5c. Only P3 shows a statistically significant upward trend for the annual period. Separated by month (Supplementary Fig. 3), there is a statistically significant downward trend in P1 in May, and statistically significant upward trends in P2 in May, P3 in August and September (by rank correlation only), and P4 in September, consistent with the station-level results. This can be compared to annual and monthly trends for the corresponding SOM patterns (Fig. 5d and Supplementary Fig. 4), to determine if the patterns themselves are changing in frequency, or if the number of extreme heat days within the patterns are changing. From Fig. 5d, P1-like and P4-like patterns are increasing in frequency, while P2-like and P3-like patterns are decreasing (opposite the trends for P1–P4). This intriguing result can be interpreted in a number of ways. It is possible that while the circulation favorable for heat waves is becoming less frequent, the heat waves themselves are (1) becoming more likely due to increased warming, (2) becoming more intense, or (3) simply lasting longer.

Links to energy use

Different heat wave patterns may result in varied societal impacts (a key motivator for examining heat wave patterns). Here we examine one potential impact: energy use. Heat waves can have detrimental effects on human health, and cooling measures are often necessary to mitigate the threat. As such, heat waves can be linked to high energy usage in the summer, when air conditioning is more frequently used. Additionally, the Northeast has a greater percentage of less-efficient window units24, and with it an expected higher energy demand during heat waves. Figure 6a shows a scatter plot of the mean maximum daily electrical demand, and mean near-surface dry-bulb temperature at time of peak demand for six Northeast states (Maine, New Hampshire, Vermont, Massachusetts, Connecticut, and Rhode Island) for 2013–2018 (New York is not included here, as the mean near-surface dry-bulb temperature associated with peak demand is not available.) For reference, two least squares fit lines are added for fit below and above the 65°F cooling/heating degree day definition. The below 65°F (above 65°F) fit line has slope, intercept, and coefficient of determination equal to −95, 20,642, and 0.50 (333, −7167, 0.64). Extremes of temperature (both cold and warm) generate the largest peak demands, as one would expect, with the highest-temperature peak demand exceeding the lowest-temperature peak demand. In Fig. 6b, mean peak demand standardized anomalies are shown for each set of pattern days, divided by state. In general, usage is anomalously high for regions experiencing heat waves (for example, southern regions in P1). The P2 pattern shows the highest anomalous energy use. This pattern occurs more often in late spring and early summer and is anomalously hot and humid. The anomalous energy demand may be related to the timing of the heat waves in this pattern—early season anomalous heat in conjunction with a humid airmass (even if raw temperatures are considerably below mid-summer levels) is associated with increased mortality13. The second highest anomalous use pattern is P3, which tends to occur in July and August, and less often in the spring. This pattern is anomalously humid throughout the region. Energy use is highest in the southeast portion of the domain, where there is both anomalous humidity and heat. Energy usage in P4 is slightly higher than for P1, but not near the levels of P2 and P3, and may be related to lower humidity levels (Fig. 2c).

Fig. 6: Electrical demand associated with heat wave patterns.
figure 6

a Maximum daily electrical demand (in MW), 1980–2018, for mean regional daily dry-bulb temperature at peak load (°F), according to ISO New England, for the New England Control Area (NECA) encompassing the states of Maine, New Hampshire, Vermont, Massachusetts, Connecticut, and Rhode Island. The red lines show the least squares fit for data below and above 65°F. b Mean usage by state (shaded, standardized anomalies) for heat wave days in patterns P1–P4. Standardized anomalies are derived by removing the long-term daily mean (e.g., Jan 01, Jan 02, etc.) from each day’s peak demand in the full record, and dividing by the temporal standard deviation.

Discussion

Heat waves in the Northeast occur in four distinct flavors, or large-scale meteorological patterns, based on the underlying circulation and mechanisms for generating or supplying surface heat. These mechanisms, which are closely related to the circulation, include subsidence, advection of warm air, and suppressed evapotranspiration (which supplies a heating effect). Here, we define heat waves as three or more consecutive days of 95th-percentile maximum daily temperature at 35 Northeast GHCN stations, 1980–2018. This results in 1693 regional heat wave days where a heat wave occurs at one or more stations in the Northeast concurrently. We separate these 1693 regional heat wave days into four circulation patterns P1–P4 based on k-means clustering of daily upper-level circulation and lower-level winds, in order to evaluate several of these mechanisms within the framework of large-scale synoptic circulation. While we present results for the Northeast US, this technique can be used to examine heat waves and their impacts in any region.

Pattern P1 represents a July/August shallow trough with extreme heat to the south and southeast. Winds are predominantly from the northwest in this pattern, and surface heating may be linked to anomalously high subsidence as opposed to temperature advection. Pattern P2 represents an early summer ridge with anomalous surface heat in the western regions. Low-level winds transport anomalously hot and humid air from the southwest for this pattern, which features the largest number of 95th-percentile Tmax days and the longest duration of pattern days. Pattern P3, where most extremes occur in the eastern portions of the domain, reflects a shallow trough across the Ohio Valley, with southwest winds transporting hot and humid air towards the Northeast. Pattern P4 represents a summer ridge with extremes in the northwest portion of the domain, with evidence of both temperature advection from the southwest and subsidence leading to anomalous surface heating. In addition, P1, P2, and P4 feature anomalously dry conditions that may tend to exacerbate surface heating due to suppressed latent heating via evapotranspiration. While both P1 and P4 feature subsidence, the differing location of station heat waves in the patterns may indicate a topographical influence on subsidence in P1, where downsloping winds may contribute to heating.

While the regional heat wave days are somewhat evenly distributed between P1–P4 (21–28%), P2 and P4 days make up the majority of the first and second days of station heat wave events. The relative proportion of P1 and P3 station heat wave days begins to increase by the third day. To some extent this likely reflects synoptic-scale ridges moving eastward as the station heat wave events transpire. However, station extreme heat can occur in all four patterns (even as the first day of an event), and there is a strong tendency for station heat wave event patterns to persist. For example, if the first day of a station heat wave event is a P1 pattern, the most likely pattern for the second and third day is also P1.

Despite recent warming throughout the summer season15, annual trend analysis shows that only P3 heat wave days exhibit widespread and statistically significant increases in frequency since 1980. Monthly trends are more informative. Increases in P3 heat wave days are particularly pronounced in late summer and early autumn months, and predominately occur at inland stations. P2 heat wave days exhibit statistically significant increases in frequency during May for several stations. These trends in P2 and P3 are likely related to statistically significant increasing trends in extreme heat days during May and September. Additionally, P4 heat wave days show significant upward trends during September while P1 heat wave days show significant downward trends during May. The identification of increasing or decreasing frequency in only certain circulation patterns associated with heat waves, and only at certain times, is an important addition to our knowledge of Northeast heat waves, and underscores the value of performing pattern analyses such as done here for specific regions.

The most anomalous energy use occurs for the early summer P2 pattern (which coincidentally shows a statistically significant increase in frequency since 1980 in May). This is consistent with anomalously high mortality associated with April–May high-humidity heat waves in the Northeast, despite daily high temperatures being below the peak values experienced during the summer months13. Anticipating these early-cooling-season episodes, based on forecasts of heat wave patterns such as P2, can potentially save lives, and eliminate surprise electrical demand.

This study provides an initial assessment of the mechanisms associated with extreme heat under several Northeast circulation regimes, and is intended as a first step to a more in-depth analysis of Northeast heat waves. Importantly, it outlines a technique that can be used for other study regions, and may assist in identifying circulation trends that may not be readily apparent when evaluating heat waves as a singular phenomenon. It also serves as a useful basis for future research into heat wave predictability, model assessment, and climate projections. Future work will build on our definition of heat wave (based on Tmax) to incorporate additional heat measures, including heat stress indices that account for humidity and overnight temperatures. In addition, future analysis will include a complete surface energy budget, analysis of evapotranspiration, and back-trajectory analysis of the near-surface air, to examine the development of the heat wave. This is of particular relevance in pattern P1, which features trough-like conditions and cold air advection. It may also help to better understand pattern P3, which has anomalously high precipitation associated with it. Because this pattern occurs mainly in July and August, it is possible that the advection of extremely warm air and surface heating outweighs the cooling effects of clouds and precipitation. It is also possible that the precipitation occurs over a limited time period and spatial scale, such as that associated with afternoon convective storms. Assessing the heat wave patterns in terms of wet years and dry years may help to better understand the feedback roles of precipitation and soil moisture in the circulation regimes. Additional analysis will also examine the relationship between trends in heat wave-generating circulation regimes versus trends in actual heat wave events occurring under those regimes.

Methods

Temperature observations

To identify station heat wave days, we use daily maximum temperature (Tmax, units of 0.1 °C), for 35 Global Historical Climatology Network23 (GHCN) stations in Maine, New Hampshire, Vermont, Massachusetts, Rhode Island, Connecticut, and New York that are missing no more than 3% of daily data, 1980–2018. We define a station extreme heat day as any day in the station record where Tmax exceeds the 95th percentile of all days 1980–2018, and a station heat wave event to be any occurrence of three or more consecutive extreme heat days at a particular station. Using this definition, we identify 1693 unique dates where heat wave days occur at one or more stations concurrently, and we call these regional heat wave days. This particular terminology is repeated throughout the study, and is further defined in Table 1. In particular, a station heat wave day is defined at a particular location, while regional heat wave days consist of the collective set of station heat wave days, with each date listed only once.

We also consider station heat wave days where Tmax exceeds 90°F (32.2 °C) for three or more consecutive days at a station. This has the effect of increasing the frequency of station heat wave days in the southern portion of the domain. Because we are ultimately more interested in anomalous heat, as opposed to a strict threshold, we choose to use the 95th-percentile criterium. However, we note that our results are largely insensitive to this choice.

Reanalysis data

To capture circulation and other atmospheric features on regional heat wave days, we use the National Aeronautics and Space Administration (NASA) Modern Era Retrospective Reanalysis for Research and Application25 (MERRA-2) daily mean fields, including 500-hPa geopotential heights, 900-hPa winds, MSLP, 2-m temperature, 2-m specific humidity, surface soil wetness, 850-hPa temperature advection, and 800-hPa vertical motion for 1980–2018. Anomalies are created by removing the long-term-daily mean at each grid point, where the long-term daily mean is created by taking the mean of each year-day (Jan-01, Jan-02, etc. through Dec-31) for the 39 years, and smoothing the results with a 21-day running mean.

Energy use data

Energy use data is provided by several independent system operators (ISO). ISO New England provides online access to a daily summary of regional mean peak electrical demand and maximum temperature for load zones in the states of Maine, New Hampshire, Vermont, Massachusetts, Connecticut, and Rhode Island. Maximum daily temperature is also available for a select station within each load zone. The New England Control Area (NECA) load zone data is the total of the individual load zones, and the NECA maximum daily temperature is the mean area-weighted temperature for the representative stations. Separate weighting is applied for summer and winter months. Data is available from March 2003 to the near-present.

New York ISO provides online access to hourly load data for New York State, separated by load zones. We use the Integrated Real-Time Archive to determine daily peak demand. Although maximum hourly temperatures are also archived for a number of stations (Load Forecast Weather Data Archives) in certain load zones, a regional area-weighted maximum temperature similar to that provided by ISO New England is not available. The data is available from June 2001 to the present, but to match ISO New England data availability, we use 2003–2018 data.

Typing methodology

Circulation on the 1693 regional heat wave days is captured by performing non-hierarchical k-means clustering21 of daily mean MERRA-2 500-hPa geopotential height anomalies and daily mean MERRA-2 900-hPa wind anomalies over the domain prescribed by 30–50°N and 90–60°W. Anomalies are created by removing the climatological daily mean at each grid point. Before processing, the input fields are standardized and reduced through empirical orthogonal function (EOF) to retain 95% of the variance.

K-means clustering is an unsupervised machine-learning algorithm that separates data into a pre-specified (k) number of non-overlapping clusters, where each data point (in this case, the daily mean anomaly field) can only belong to one cluster, based on the nearest fit (smallest Euclidean distance) to a cluster centroid (the mean of all data points assigned to the cluster). The process is iterative, with initial centroids randomly selected from the set of data points, and data points continually re-assigned based on least Euclidean distance.

Here, we use the MATLAB kmeans function, with a custom wrapper26,27 to determine an optimal number of clusters k and the most repeatable clustering using that optimal k. Using this methodology, an optimal number of clusters for this data is either 4 or 6. We use k = 4 for simplicity, as the k = 6 solution has a similar set of four patterns, but with two of the four patterns split into two sub-patterns each.

K-means clustering was also performed for various other combinations of meteorological fields on regional heat wave days, including MERRA-2 2-m temperature, 700-hPa and 800-hPa vertical velocity, 10-m winds, and MSLP anomalies. Typing with MSLP anomalies or vertical velocity anomalies was problematic in that no optimal k emerged for the heat wave days, so these fields were avoided. The temperature field gave no additional information, and the 10-m wind results were similar to those using 900-hPa winds. Additionally, we also performed typing using individual station heat wave days, such as the first or second day of a station heat wave event, as opposed to all station heat wave days. This reduced the sample size considerably, and led to less stable clustering. Ultimately, we determined that using 900-hPa wind and 500-hPa geopotential height anomalies on all regional heat wave days resulted in consistent and reliable typing results.

Additionally, we type using self-organizing maps (SOMs)28 on the same fields as for k-means clustering (500-hPa geopotential heights and 900-hPa winds, made into anomalies by subtracting the climatological daily mean, and then standardizing), but for all days 1980–2018. SOMs use neural network classification and unsupervised learning to separate input fields into a prescribed number of nodes of a pattern-space, where each node is defined by a set of weights that correspond to the input field size. As input fields are processed, both node weights and the surrounding node weights are adjusted until the nodes represent the best fit of the data. Here we prescribe a 4 × 5 rectangular pattern-space using the MATLAB SOM Toolbox 2.0 from http://www.cis.hut.fi/somtoolbox/, with parameters set for linear initialization, 200 initial training iterations, and 1200 secondary training iterations. The regional heat wave days within the SOM patterns are identified and compared to the results of k-means clustering.

Frequency analysis

Statistical significance of the seasonal frequency of the patterns (Fig. 1c) is based on a Monte-Carlo approach, where the set of pattern assignments is randomly shuffled 1000 times among the regional heat wave days, and the seasonal frequency of each pattern is recalculated to identify the background frequency. Actual frequencies outside the bottom 2.5% and top 2.5% of the random values for each season represent statistically (at the 0.05 level) different values from the background frequency. Similar methodology is used establish significance for the first 3 days of station heat wave pattern frequency and pattern transitions (Fig. 3). In these cases, the pattern assignments of the first three days of the station heat wave events are randomly shuffled before recalculating the frequencies to create the 95% confidence interval of background frequencies.

Statistically higher-than-expected or lower-than-expected frequencies are indicated by red and blue coloring, respectively, for bars or dots in all associated figures. For all bar charts, the gray shading behind the bars represents the 95% confidence interval of background frequencies.

Trend analysis

Trends (Fig. 5 and Supplementary Figs. 14) are assessed using both linear least squares fit and rank correlation of the counts per year of station extreme heat days, station heat wave days, station heat wave events, station heat wave duration, station pattern days, and regional heat wave days. Statistical significance (at the 0.05 level) of linear trends are identified using a two-sided t-test of the Wald Statistic (scipy.stats.linregress). Statistical significance of rank correlation is established using Kendall’s tau-b ranking (scipy.stats.kendalltau).