1 Introduction

Hydraulic fracturing-induced seismicity (HF-IS) has become a significant concern for the unconventional gas industry (e.g. Schultz et al., 2020a). The seismicity response has been highly variable between different basins and regions. Many major plays, such as the Barnett (Texas), Bakken (North Dakota), and Marcellus (Pennsylvannia), have experienced little to no HF-IS despite thousands of wells being drilled and hydraulically stimulated (Verdon et al., 2016; van der Baan and Calixto, 2017; Skoumal et al., 2018a). Conversely, some plays have experienced well-publicised cases of HF-IS.

In the West Canadian Sedimentary Basin (WCSB), stimulation of the Horn River, Duvernay and Montney Formations has produced several notable cases of HF-IS (Kao et al., 2018). The first cases of HF-IS in the WCSB were associated with the Horn River Shale in northeastern British Columbia between 2009 and 2011, where magnitudes reached M 3.8 (BC Oil and Gas Commission, 2012; Farahbod et al., 2015). The largest hydraulic fracturing induced events in the WCSB have occurred in the Montney and Duvernay Formations. The BC Oil and Gas Commission (2014) documented nearly 200 events with magnitudes ranging from 1.0 < M < 4.4 attributed to hydraulic fracturing of the Montney Shale from late 2013 to early 2014. Further cases of hydraulic fracturing induced seismicity in the Montney play have been recorded by Babaie Mahani et al. (2017) and Babaie Mahani et al. (2019); in both cases, the largest reported magnitudes were M 4.6. Roth et al. (2020) and Peña-Castro et al. (2020) have provided a detailed analysis of events from the 2018 Septimus sequence.

The Fox Creek area, in which seismicity is induced by hydraulic fracturing of the Duvernay Formation, is probably one of the most extensively studied for induced seismicity anywhere in the world. Schultz et al. (2015a) and Bao and Eaton (2016) identified clusters of seismicity spatially and temporally correlated with stimulated wells. Continued activities in this area produced several events with M > 4 in 2015 and 2016 (Schultz et al., 2017). For the Fox Creek Duvernay cases of induced seismicity, high-resolution microseismic monitoring networks combined with 3D reflection seismic data have revealed in detail the interactions between hydraulic fractures, pre-existing natural fracture networks and faults (e.g. Eaton et al., 2018; Verdon et al., 2019; Eyre et al., 2019). While HF-IS in the Duvernay had previously been observed in the Fox Creek area, Schultz and Wang (2020) have identified several sequences of HF-IS from stimulation of the Duvernay near to the town of Red Deer, the largest of which had a magnitude of M 4.2.

In the Sichuan Basin, China, hydraulic fracturing for shale gas has produced M > 4 events from 2014 onwards (Meng et al., 2019), and several events have exceeded M > 5, including an M 5.7 event in December 2018, followed by an M 5.3 in January 2019 (Lei et al., 2019a), and an M 6 earthquake in June 2019 (Liu and Zahradník, 2020), which produced several M > 5 aftershocks. The M 5.7 December 2018 event was reported to have injured 17 people and caused extensive damage to roughly 400 houses (Lei et al., 2019a), while the M 6 event in June 2019 is reported to have killed 13 people, injured over 200 people, and damaged a large number of buildings (Yi et al., 2019). However, the causative factor for the June 2019 M 6 event is unclear, with several water injection wells associated with salt mining also found in close proximity to the epicentre (Lei et al., 2019b; Jia et al., 2020).

In the mid-continental USA, disentangling the signal of HF-IS from events caused by wastewater disposal has also proved challenging at times (e.g. Yoon et al., 2017). The first reported case of HF-IS identified in the USA occurred in the Woodford Shale, Oklahoma (Holland, 2013). Induced seismicity in Oklahoma has been dominated by disposal of produced fluids from conventional hydrocarbon operations (Rubenstein and Babaie Mahani, 2015), with the largest magnitude events associated with such activities (e.g. Keranan et al., 2013; Yeck et al., 2017). Nonetheless, Darold et al. (2014) reported a second case of HF-IS in Carter County, southern Oklahoma, and more recently, a re-analysis of earthquake catalogues by Skoumal et al. (2018b) identified 274 hydraulic fracturing wells, primarily in the Anadarko Basin, that correlate spatially and temporally with sequences of seismicity, with over 700 events with M > 2, and 12 events with magnitudes between M 3.0–M 3.5.

The largest earthquake in the USA to have been linked to hydraulic fracturing occurred in the Eagle Ford Shale, southwest Texas. Fasola et al. (2019) identified 94 earthquakes with M > 2.0 that could be correlated with hydraulic fracturing wells, with the largest having a magnitude of M 3.5 located near to Karnes City. While the primary cause of the Guy-Greenbrier sequence, where the largest event reached M 4.7, is believed to be high-volume wastewater disposal wells (Horton, 2012), Yoon et al. (2017) were able to identify clusters of events, with magnitudes below M 3, which appeared to be caused by nearby hydraulic fracturing activities in the Fayetteville Shale.

In the Appalachian Basin, hydraulic fracturing has predominantly taken place in the Marcellus Shale, which has been observed to be almost entirely aseismic (van der Baan and Calixto, 2017; Skoumal et al., 2018a). In contrast, hydraulic fracturing in the deeper Utica Shale has produced several cases of HF-IS, including an M 2.1 event in Harrison County, Ohio in October 2013 (Friberg et al., 2014), and an M 3.0 event in Mahoning County, Ohio, in March 2014 (Skoumal et al., 2015). Skoumal et al. (2018a) argue that the different responses between the Marcellus and Utica formations are produced by their differing depths, with the Utica Formation being much closer to the underlying basement. Again, cases of HF-IS may overlap with cases of wastewater disposal-induced seismicity (e.g. Kim, 2013).

In the UK, hydraulic fracturing of the Preese Hall well in 2011 generated an M 2.3 event: this was the first widely reported case of HF-IS (Clarke et al., 2014). In December 2018, stimulation of the Preston New Road PNR-1 well produced an M 1.5 event that was felt by nearby residents (Clarke et al., 2019), while stimulation of the adjacent PNR-2 well in August 2019 produced an M 2.9 event that was felt in the nearby towns of Blackpool and Preston (Kettlety et al. 2020; Cremen et al., 2020). While the magnitudes of events in the UK have not been as large as those in the WCSB or the Sichuan Basin, all 3 stimulated wells in the Bowland Shale have produced HF-IS of sufficient magnitude to be felt by nearby residents, although it should be noted that these three wells are within 5 km of each other, and so may not be representative of the Bowland as a whole. The UK government imposed an 18-month moratorium on shale gas hydraulic fracturing after the 2011 event and has re-imposed a moratorium following the August 2019 event, significantly hampering the development of this industry.

The concept of a Traffic Light Scheme (TLS) to mitigate induced seismicity was first developed by Bommer et al. (2006) for application to an enhanced geothermal project in El Salvador. To date, TLSs have been the primary means by which HF-IS has been regulated and they have been generally recommended as an appropriate tool for this purpose for HF-IS as well as induced seismicity due to several other injection-related activities (NRC, 2012; Majer et al., 2012; Zoback, 2012). TLSs have the advantage of being conceptually simple, easy to explain to non-expert stakeholders and the general public, and relatively immune to model-based assumptions or parameterisation.

Given an expectation that TLSs will continue to play a role in regulation and mitigation of HF-IS, in this study we examine the key assumptions that underpin the successful operation of TLSs, and the extent to which they are met by cases of HF-IS. Before doing so, we begin by examining the potential hazard and risk posed by HF-IS, since any attempt to mitigate a hazard must start with a reasonable appreciation of the risk it poses. We follow this by an analysis of cases of HF-IS, spreading our net as widely as possible to capture examples of HF-IS from different plays and basins around the globe, focussing on the temporal evolution of event magnitudes, and the extent to which TLSs did, or could have, mitigated the occurrence of seismicity.

Although we focus on HF-IS in the paper, the same approach and many of our conclusions could be extended to other injection-based operations, including wastewater injection, enhanced geothermal systems and carbon capture and storage.

2 TLSs as seismic risk mitigation tools

TLSs have been very widely adopted and applied to the management of induced seismicity related to processes involving fluid injection. The objective of a TLS in such operations, whether explicitly stated or not, is to mitigate the potential seismic risk associated with induced earthquakes. Seismic risk can be defined as the possibility of undesirable consequences of earthquake effects on people and on the built environment (although serious consequences for a population are generally a direct result of damage to buildings). Conceptually, seismic risk is the product of four factors: the seismic hazard, which is the quantification of the earthquake effect, primarily ground shaking but also fault rupture, landslides and liquefaction; the exposure, which refers to the elements of the built environment in the locations where the shaking could happen; the fragility, which defines the likelihood of different degrees of damage being experienced under different levels of shaking; and the consequences of the building damage, which can be measured in economic terms, loss of function, or injury to inhabitants. When fragility and consequences are directly combined in a single function, it is referred to as vulnerability.

In conventional earthquake engineering for natural seismicity, the starting point is usually to characterise the seismic hazard and then to apply this to the structural design of new buildings or retrofit of existing buildings to reduce fragility in order to achieve tolerable risk goals. The approach to induced seismicity has generally been the opposite, namely to assess the vulnerability of the area surrounding the industrial activity that could potentially cause induced seismicity and then to apply controls to operations in order to maintain the hazard below thresholds that will result in acceptable or tolerable risk levels. In many cases, some of these steps have been implicit rather than explicit, but the focus has been clearly placed on mitigating risk through control of hazard, an option that is not available when dealing with natural earthquakes (other than by relocation, which in effect is the removal of the exposure). Bommer et al. (2015) have proposed that conventional earthquake engineering approaches could be invoked in the mitigation of induced seismic risk rather than relying exclusively on mechanisms for controlling the hazard, especially since the latter could pose a threat to the viability of the industrial activities in question. This concept was applied to conventional gas production in the Groningen gas field in the Netherlands through the development of an induced seismic risk model that is capable of estimating the risk impact both of changes in production and structural upgrading of the most susceptible buildings (van Elk et al., 2019). However, the viability of such an approach depends on whether the economic benefits of the operation warrant the investment in building strengthening. For many operations, including most hydraulic fracturing wells for oil and gas recovery, it will often be the case that the only option will be to mitigate risk through the design and implementation of an appropriate TLS.

Descriptions of TLSs for induced seismicity have often focussed on the monitoring systems and the response protocols when different thresholds are passed, and even more advanced concepts such as systems that adapt to the observed seismicity (e.g. Mena et al., 2013). All of these considerations are very important, but the actual risk goals of the TLS are often not stated explicitly. From the perspective of risk mitigation, the design of a TLS should consider three questions:

  1. 1.

    What is the risk target in terms of consequences to be avoided?

  2. 2.

    Which hazard metric is the best indicator of the risk potential?

  3. 3.

    What threshold of the hazard metric will ensure risk targets are not exceeded?

The risk could be considered in terms of disturbance to the population from felt shaking episodes. Such an approach was considered by Douglas and Aochi (2014), who defined risk in terms of the probability of induced events due to an enhanced geothermal system generating felt levels of shaking. Such a framework does not seem to have been used widely in practice although there is precedent for control of vibrations of anthropogenic origin in terms of limiting the disturbance to the exposed population. For example, the British Standards guidelines on tolerable levels of blast-induced vibrations (BSI, 2008) specify diurnal and nocturnal levels that apply for up to three blasts per day. These standards also relate the tolerable level of shaking to the frequency of shaking episodes: a factor is defined to reduce the tolerable levels when the number of events is greater than 3, such that there is 25% reduction for 5 blasts per day, and almost a 50% decrease for 10 blasts. However, a TLS based on perceptible levels of shaking could be considered excessively conservative given that ground motions are typically felt at amplitudes far below those that might potentially cause damage. Moreover, there are strategies for dealing with felt shaking episodes, including communication—before and after such occasions—with the local population, which can be employed to avoid interruption of operations on this basis.

The risk targets are more likely, and more usefully, to be set in terms of avoiding damage to exposed buildings. A brief review of a few well-documented cases of risk considerations in the design of TLS can provide some useful insights into responses to the three questions posed above (although the specific thresholds are of less interest to the present discussion). In the design of the TLS developed by Bommer et al. (2006) for an enhanced geothermal system at Berlín, El Salvador, consideration was given to published guidelines on tolerable levels of anthropogenic vibrations, correlations between ground-motion amplitudes and intensity, and the fragility of the most vulnerable local buildings, which were constructed from adobe, to infer thresholds in terms of peak ground velocity (PGV). Although the amber and red thresholds of the TLS were defined in terms of PGV, the system was calibrated in terms of magnitude, based on the size of the earthquake at the injection depth required to generate median values of PGV matching these thresholds at the epicentre. The magnitudes were calibrated using a ground-motion prediction equation (GMPE) based on recordings from small-magnitude shallow seismic events recorded in the region (and updated during the operation using recordings of the induced seismicity). The original intention was to take account of the number of induced events and possible cumulative impacts on the local building stock by defining a decay of the PGV thresholds with increasing values of the cumulative Arias intensity (integral of the squared acceleration) from recorded motions. The decay rate was to have been calibrated against observations and recordings from seismic swarms in the same region, but the constraint was found to be insufficient. The alternative approach adopted was to superimpose the thresholds on a Gutenberg and Richter (1944) (G-R hereafter) recurrence plot, with the sloping boundary between the green and amber zones determined by the natural seismicity rates in the area for a 30-day period (the planned period of injection).

The Berlín TLS approach was subsequently adapted for the Basel Deep Heat Mining geothermal project in Switzerland (Häring et al., 2008). The risk targets were not explicitly stated but four levels (green, yellow, orange, and red) were defined for the TLS, each characterised by a threshold of local magnitude and a PGV value; it was not stated if both or either one needed to be exceeded to trigger the TLS level. The PGV thresholds were set to very low levels compared to those adopted by Bommer et al. (2006), suggesting that the implicit risk goals were rather conservative.

Westaway and Younger (2014) proposed the adoption of nuisance levels of PGV as the basis for controlling HF-induced ground vibrations in the UK, the thresholds being adopted from a variety of sources, including correlations between PGV and intensity and guidelines for the control of vibrations of anthropogenic origin. These authors discuss both empirical and stochastic GMPEs as tools for relating the PGV thresholds to magnitude but opt instead for a purely theoretical relationship between seismic moment, corner frequency and particle velocity.

Ader et al. (2019) discussed the design of a TLS for a deep geothermal project in Helsinki, highlighting some of the challenges in responding to the question of the best hazard indicator for the risk potential of induced seismic events. The design began by defining PGV levels such that green would correspond to motions that are not felt, amber to amplitudes of motion that might be felt but not cause damage, and the red light corresponding to the threshold level of PGV for cosmetic damage. Ader et al. (2019) identified two potential problems with using a PGV criterion for triggering different levels of the TLS. Firstly, false positives may arise due to ground vibrations from other sources, and secondly, false negatives may arise if the highest PGV should occur at a location other than where seismic instruments are installed to monitor the shaking. To address these potential pitfalls, Ader et al. (2019) proposed two sets of criteria for triggering of the amber level of the TLS: either the exceedance of the PGV threshold (1 mm/s) at any sensor in association with an event of at least ML 1.0, or an event of ML ≥ 1.2. The red light was triggered simply by an event of ML ≥ 2.1. The magnitude levels were determined using GMPEs for PGV but considering exceedance probabilities other than the 50-percentile (medians). One other innovative feature of the Helsinki TLS that is worthy of note was the identification of vibration-sensitive receptors in the region surrounding the injection well, which included a hospital and some research laboratories. Site-specific thresholds were established for these facilities, sometimes in terms of peak ground acceleration (PGA) rather than PGV, and these were maintained when found to be more conservative than the generic thresholds adopted for the TLS.

Schultz et al. (2020b) extended these concepts to a fully probabilistic risk framework for the design of a TLS to be deployed for hydraulic fracturing. They noted the merits of ground-motion amplitudes being more closely related to earthquake impact than magnitude, but also acknowledged that reliably monitoring the former is more challenging. Therefore, in common with earlier examples, they set thresholds in terms of PGV but then translated them into magnitudes. In this case, however, their conversion accounts for the depths of events, the separation between the injection wells and exposed buildings, and variability in the thresholds for nuisance and damage (i.e. fragility functions). Schultz et al. (2020b) proposed that red lights should be set at the level corresponding to shaking that is considered a nuisance, since this will provide a margin against motions that are potentially damaging. Once the red-light threshold has been established, they proposed that the amber light should be set two magnitude units lower.

It is therefore clear that a key issue is whether to use ground-motion amplitudes or earthquake magnitude as the hazard metric used to define TLS thresholds. The impact of an earthquake on a building or infrastructure facility depends on the severity of the ground shaking at the site of that facility. This observation would lead one to conclude that a ground-motion parameter is a preferable basis for fixing the thresholds of an effective TLS instead of using the earthquake magnitude. PGV has been frequently chosen as the ideal ground-motion parameter because it is widely used to define vibration limits, both for human perception and damage control, and it is generally considered to be a better indicator of the damage potential of earthquake shaking than PGA (Bommer and Alarcón, 2006). However, there are challenges in using PGV, some of which have been noted above. From an operational perspective, the main challenge is determining where and how PGV levels should be monitored, especially in view of the appreciable spatially variability observed in ground-motion fields generated by earthquakes. Instruments may be placed in close proximity to the injection well (where noise may be an issue), close to the nearest or most vulnerable buildings, or where surface ground conditions are most likely to lead to amplification of the shaking. Without a dense network of ground-motion instruments, it must be acknowledged that the largest PGV may not be recorded. Reliance on a critically-located instrument can make a PGV-based TLS susceptible should that specific instrument malfunction. Another important factor is that no single ground-motion parameter can be considered as a very reliable indicator of the damage potential of the ground motion. Exceeding a given threshold of PGV, for example, may be a necessary condition for a damaging motion, but it is not sufficient. This was demonstrated in the Berlín geothermal project for which the first TLS was developed: a magnitude ML 4.4 earthquake (which occurred after pumping had been shut in) caused ground shaking that was registered with a PGA on the order of 0.8 g and a PGV value that exceeded the red-light threshold of the TLS, but no damage at all was reported (Bommer et al., 2006). Large amplitudes of PGA and PGV from small-magnitude earthquakes have been recognised for some time (e.g. Hanks and Johnson, 1976). However, it has also been recognised that these short-lived peaks will often contain insufficient energy to pose any threat in terms of structural damage. Indeed, the reasoning behind the imposition of a minimum magnitude in probabilistic seismic hazard analysis (PSHA) is to avoid inflating the hazard estimates with such peaks that do not contribute to risk (Bommer and Crowley, 2017). The degree of damage caused by earthquake ground shaking will ultimately depend on several factors, depending on the characteristics of the specific structure in question, including the frequency content of motion as well as the duration and energy content. A more effective way to capture the real damage potential of induced ground shaking could be through the use of a vector of ground-motion parameters but such an approach would not avoid the operational challenges associated with amplitude-based thresholds.

If the risk-based thresholds are to be calibrated to the specific conditions of an HF operation, appropriate magnitude levels can be inferred from the depth of the injection wells and reasonable assumptions regarding the separation of induced seismicity from the wells, together with the locations and characteristics of the exposed elements and the local ground conditions (provided a suitable GMPE can be adopted or developed). All of the relevant ground-motion characteristics can be estimated from the magnitude, depth, distance, and site conditions. From this perspective, if thresholds are to be defined in terms of an individual parameter, then given that in any particular application the approximate depth and location of induced earthquakes can be estimated a priori, magnitude may be the optimal parameter for a TLS. Notwithstanding the challenges of quantifying seismic event magnitudes accurately (e.g. Butcher et al., 2017; Kendall et al., 2019), magnitude is a parameter that can be calculated rapidly after an event, and this holds even if an individual instrument in the monitoring network fails to operate. If operations are monitored by regional networks, as opposed to dedicated site-specific arrays, which will often be the case if large numbers of hydraulic stimulations are taking place across a region, the magnitudes of larger events will be the best-constrained datum available, since locations may be poorly constrained and smaller events may be below the detection threshold. An optimal system may be based on magnitude thresholds for rapid calculation and response, with ground-motion thresholds, including but not necessarily limited to PGV, used to make decisions regarding resumption of full operations following an amber- or red-light trigger.

There are several options for calibration of magnitude thresholds, which could be based on location-specific risk modelling or else from appropriate PGV thresholds, as was done in several of the case studies reviewed above. Another approach is to make inferences from global observations regarding the impacts of small-to-moderate magnitude earthquakes. Several studies have inferred that epicentral ground motions from induced and tectonic earthquakes of the same magnitude are comparable (e.g. Atkinson, 2020), the shallower depths that result in shorter travel paths for induced earthquakes being offset by lower stress drops (Hough, 2014; Atkinson et al., 2018).

This being the case, global observations of earthquake damage (e.g. Nievas et al., 2020a) can provide a basis for calibrating magnitude thresholds, provided certain assumptions are made regarding the completeness of reporting for damaging events (Nievas et al. 2020b) and that the exposed building stock is analogous to that affected by the earthquakes for which damage was reported. However, a critical question is whether the red-light threshold of a TLS should be fixed at the target risk level, or whether it is necessary to set the red light at a lower threshold if the risk target is to be achieved.

3 Assumptions underpinning Traffic Light Schemes

TLSs thresholds are typically assigned as “green”, where operations proceed as normal, “yellow” or “amber”, where operations proceed with caution, and “red”, where injection is stopped, although additional threshold levels have been used (e.g. Häring et al., 2008). TLS thresholds have varied significantly between different jurisdictions (e.g. Kendall et al., 2019): in the UK the red-light threshold was M 0.5 (Clarke et al., 2019), whereas in Fox Creek area of the WCSB it is M 4.0 (Kao et al., 2018). Within Alberta, the regulator has also imposed red-light thresholds of M 2.5 in the vicinity of the Brazeau Dam, and M 3.0 near to the town of Red Deer. Other thresholds for HF-IS include Illinois, M 4; Oklahoma, M3.5; California, M 2.7; and Ohio, M 1.0.

Unlike the statistical modelling approaches demonstrated by, for example Kwiatek et al. (2019) and Clarke et al. (2019), TLSs are retroactive, with actions being taken after a threshold is reached. As such, the effectiveness of TLSs depends primarily on two assumptions. Firstly, TLSs presume the levels of seismicity will increase gradually during operations, triggering the yellow and then the red-light thresholds, allowing mitigation actions to be taken as these thresholds are reached. The alternative to this case is if large events occur “out of the blue” without precursory, smaller-magnitude seismicity, and therefore no mitigation would have been taken prior to a potentially damaging event.

Secondly, TLSs presume that mitigation actions (e.g. reducing or stopping injection) are effective in preventing further seismicity from occurring. Typically, rates of seismicity are observed to decrease once injection ceases (e.g. Clarke et al., 2019). However, while rates do decrease, seismicity does not stop instantaneously. Moreover, in many cases, the post-injection seismicity has continued to increase in magnitude (e.g. Majer et al., 2007; Häring et al., 2008; Clarke et al., 2014, 2019; Bao and Eaton, 2016; Kao et al., 2018; Ellsworth et al., 2019; Cremen et al., 2020; Kettlety et al., 2020). Such earthquakes are often referred to as “trailing events”.

The occurrence of magnitude “jumps” (where the next event to occur is significantly larger than any previous events during a sequence) and of trailing events poses challenges to the effectiveness of TLSs (e.g. Baisch et al., 2019). If the objective of a TLS is to preclude any events reaching a proscribed magnitude, then the red-light threshold must be set below this level, by an amount that is determined by the extent to which magnitude jumps, and trailing event magnitude increases, are likely to occur. Therefore, in this study, we examine case histories of HF-IS sequences to characterise the magnitude jumps and trailing events that have occurred, and therefore whether the assumptions inherent to the successful operation of TLSs are met.

The data compiled here also provide an opportunity to examine the timescales of induced seismicity relative to injection, and in particular, for how long after injection ceases are magnitudes be able to increase. Hydraulic fracturing represents a relatively abrupt intervention in the subsurface, with a high intensity of deformation around the well, but of limited extent both spatially and temporally: where local monitoring arrays with low location uncertainties have been used, induced seismicity is typically observed to occur within 1 km, and at most 2 km, from the well (e.g. Eaton et al., 2018; Eyre et al., 2019; Clarke et al., 2019), and hydraulic fracturing of a well typically takes place over a few days or weeks, after which the perturbations caused by stimulation will rapidly dissipate. This contrasts with wastewater disposal, for example, where injection takes place over many years, and so one might expect the resulting perturbations to be similarly long-standing (e.g. Keranen et al., 2013; Langenbruch and Zoback, 2016). Therefore, from a conceptual perspective, we would expect a closer temporal relationship between the well stimulation and induced seismicity for hydraulic fracturing.

4 HF-IS case histories

In this study, we evaluate HF-IS from a range of case studies across the globe. Our primary focus is on cases of HF-IS where magnitudes exceeded M 4, since this is a level at which some degree of nonstructural damage might be expected in some building types: for example, Baird et al. (2020) propose this as a threshold for light-frame wood buildings. However, to broaden the global breadth of our analyses, we also include examples that did not reach this threshold. We only consider cases of seismicity induced by hydraulic fracturing for shale gas. The case studies we examine are listed in Table 1. Where available, data were taken from published catalogues. Where digital catalogues were not available (for 11 of the 35 cases), we derived data by digitising figures provided in the papers mentioned herein. As an aside, we note that the limited data availability in many cases of HF-IS continues to pose a barrier to the scientific study of this phenomenon by the academic community.

Table 1 Case studies, including the observed maximum magnitude, the largest magnitude “jump”, any trailing event magnitude increase, and the number of days after injection for a trailing event magnitude increase to occur

We note that the cases selected represent a biased sample because cases where high-magnitude events occur draw wider attention. Even in relatively seismogenic areas, the majority of wells do not produce induced seismicity of sufficient magnitude to be detected, or produce relatively small events that do not attract the attention of regulators, the public, or academics, and therefore tend not to appear in publications. In such “well-behaved” examples, by definition magnitudes do not increase significantly, and hence there will be no significant magnitude jumps during operations or trailing magnitude increases after them.

4.1 Western Canadian Sedimentary Basin

The WCSB has experienced some of the highest levels of HF-IS, with several cases where magnitudes exceeded M 4. Kao et al. (2018) provided an overview of many of the largest magnitude HF-IS cases in the region, although where possible we add to these summaries with more detailed data provided by site-focussed papers. Bao and Eaton (2016) documented six clusters of HF-IS associated with the Duvernay Formation in the vicinity of Crooked Lake, Alberta, between December 2014 to March 2015, where the largest magnitude reached M 4.4. Schultz et al. (2017) showed that seismicity has continued in this area, including an M 4.6 event in June 2015 and an M 4.5 event in January 2016 (also described by Wang et al., 2016). Eaton et al. (2018) described another site in the area that was monitored in detail using a dedicated microseismic monitoring array, providing a significant improvement in the detection threshold, and thereby our ability to image and understand the interaction between hydraulic fractures, pre-existing natural fractures, and faults (Verdon et al., 2019). The maximum magnitude at this site was M 3.2. Similar monitoring was used at a second site in the area, described by Eyre et al. (2019), at which an M 4.1 event occurred. More recently, Schultz and Wang (2020) have identified clusters of induced seismicity associated with stimulation of the Duvernay near to the city of Red Deer, from which we examine the two largest cases, with magnitudes of M 4.2 and M 3.1. For the cluster ESB10 described by Schultz and Wang (2020), we note that this sequence is adjacent to the ESB09 cluster, which was linked to a nearby well that had ceased stimulation only 4 days before the start of operations at the ESB10 well. However, we follow Schultz and Wang (2020) in treating these sequences as unrelated (note that the ESB09 cluster did not produce magnitudes significantly above M 2.0, and so is not included in our study). We also include the Cardston HF-IS sequence in southern Alberta, where an M 3.0 event was caused by stimulation of the Exshaw Formation (Schultz et al., 2015b).

For the Montney, we use the cases described by Kao et al. (2018) for the sequences in August 2014 and in July 2016, with maximum magnitudes of M 4.5 and M 3.9, respectively. To these, we add the HF-IS sequence from August 2015 described by Babaie Mahani et al. (2017), and the sequence near to Septimus described by Peña-Castro et al. (2020). We note that the largest event in this sequence is described as M 3.9 in the Composite Alberta Seismicity Catalog (Fereidoni and Cui, 2015), whereas Peña-Castro et al. (2020) assigned a moment magnitude of M 4.2. This discrepancy highlights the challenge of assigning accurate and consistent magnitudes to HF-IS sequences (Kendall et al., 2019). We also note that more detailed analysis of seismic data can reveal precursory events not identified in initial studies, as can be seen by a comparison between the event catalogues for the Septimus sequence reported by Babaie Mahani et al. (2019) and those of Peña-Castro et al. (2020) and Roth et al. (2020).

4.2 Sichuan Basin, China

Meng et al. (2019) documented the increase of HF-IS in the Sichuan Basin from 2014 onwards, but no single case studies are presented in sufficient detail for our analysis. Lei et al. (2017) documented increasing seismicity across the Sichuan Basin, and in particular, provided detailed analysis of HF-IS that was associated with two pads: pad N4, which produced an M 3.0 event in April 2016, and pads Y6-8, which produced an M 4.7 event in January 2017. Lei et al. (2019) reported further HF-IS in the same region, focussing on pad N201, which produced an M 5.7 earthquake in December 2018. We note that caution may be required with these events, since recent studies have linked some of the induced seismicity that has occurred in the Sichuan Basin with salt mining processes rather than hydraulic fracturing (Lei et al., 2019b; Jia et al., 2020).

4.3 Bowland Shale, UK

To date, three wells have been hydraulic fractured in the Bowland Shale in northwest England. The first was the Preese Hall well in Lancashire in 2011, which produced an M 2.3 event (Clarke et al., 2014). In 2018, the Preston New Road PNR-1z well was stimulated, approximately 4 km from the Preese Hall site, producing an M 1.5 event (Clarke et al., 2019), and in 2019 the adjacent PNR-2 well was stimulated, producing an M 2.9 event (Cremen et al., 2020; Kettlety et al., 2020).

4.4 United States

In the mid-continental USA, we use the clusters of HF-IS identified by Yoon et al. (2017) from stimulation of the Fayetteville Shale in Arkansas in 2010, where magnitudes reached M 2.9. To these, we add the 2011 sequence from Garvin County, OK, where magnitudes again reached M 2.9 (Holland, 2013); the 2014 sequence from Carter County, OK, where magnitudes reached M 3.2 (Darold et al. 2014), and the 2018 M 3.5 sequence near to Karnes City, TX (Fasola et al., 2019). In the Appalachian Basin, we consider the 2013 Harrison County sequence identified by Friberg et al. (2014), and the Mahoning County sequence identified by Skoumal et al. (2015), both of which were associated with stimulation of the Utica Shale.

5 Analyses of HF-IS sequences

Table 1 lists the observed magnitudes, magnitude jumps, and trailing event increases for each of the cases described in Section 4. Figure 1 shows the temporal evolution for each of these sequences.

Fig. 1
figure 1

Temporal evolution of maximum magnitudes for the HF-IS sequences detailed in Table 1, relative to the start of injection. For ease of visibility, we divide the cases by region, showing a UK Bowland Shale, b WCSB Duvernay cases, c other WCSB cases, d Sichuan Basin, China, and e USA. Lines are labelled according to the case studies listed in Table 1. Coloured squares along each line denote the point at which injection stopped in each case

5.1 Magnitude jumps and trailing events

Figure 2 shows a histogram and cumulative distribution of the observed magnitude jumps. We find that for 60% of cases studied, the largest jump in maximum event size was less than 1 magnitude unit, implying a relatively gradual evolution of event size during injection. In 23% of cases, there was a magnitude jump of between 1 and 2 magnitude units, while 4 cases (11%) had a magnitude jump of more than 2 units. Similarly, one case in the Montney Shale (Babaie Mahani et al., 2017), and the Carter County, OK, case (Darold et al., 2014) did not appear to have any precursory events. However, we note that monitoring in both of these regions is provided only by regional networks rather than by dedicated local microseismic arrays (e.g. Eaton et al., 2018; Clarke et al., 2019), and we do not have detailed information about the detection thresholds (i.e. at what level precursory event might have occurred but was not detectable from the available arrays).

Fig. 2
figure 2

Histogram (bars) and cumulative percent (circles) of observed magnitude jumps for the cases listed in Table 1

Figure 3 shows a histogram and cumulative distribution of the observed post-injection trailing magnitude increases. We find that 74% of cases showed no post-injection increase in magnitudes, 17% showed a post-injection magnitude increase of between 0 and 1 magnitude units, and 9% had an increase greater than 1 unit, with the largest being a 1.6 magnitude unit increase at SS10 cluster of the Crooked Lake sequences described by Schultz et al. (2017).

Fig. 3
figure 3

Histogram (bars) and cumulative percent (circles) of observed magnitude jumps for the cases listed in Table 1. We also show the simulated magnitude increase based on the concept outlined by Schultz et al. (2020b) (dashed line)

Schultz et al. (2020b) used a statistical approach to estimate the likely size of post-injection trailing events using an assumption that 20% of events typically occur after injection. Given a G-R distribution with b = 1, from the increased event count, Schultz et al. (2020b) estimated that the probabilities of trailing event magnitude increases being less than 1, 2, and 3 magnitude units were 90%, 99%, and 99.9%, respectively. We perform a similar analysis, albeit differing from Schultz et al. (2020b) in that we do not assume the occurrence of a red-light event. We use a stochastic approach, drawing populations of 1000 events from a G-R distribution with b = 1, and then drawing an additional 200 events (i.e. an additional 20% of events occurring post-injection) from the same distribution, and comparing the maximum magnitudes from the two populations. The results are shown in Fig. 3 (dashed line): the probability of a trailing event magnitude increase is slightly below 20%, and the probabilities of trailing increases of more than 1 and 2 magnitude units are 2% and 1%, respectively. Our observed population has a slightly higher proportion of cases with a trailing event increase (26%), possibly representing the bias described above that published cases of HF-IS are more likely to focus on the “badly-behaved” cases, or perhaps simply representing the uncertainties inherent in assuming that 20% of events occur after injection.

5.2 Timescales for trailing events

Table 1 also lists the number of days post-injection on which a trailing event magnitude increase occurred. Figure 1 also shows the post-injection evolution of seismicity for each case study. Figure 4 shows a histogram and cumulative distribution of the observed times in which post-injection trailing magnitude increases occur. As described above, for 74% of cases, the largest event occurred during stimulation, and therefore there was no trailing event magnitude increase. Hence, 26% of cases experienced increases in magnitude for post-injection trailing events. In 17% of cases, the largest event occurred within 3–4 days of stimulation, while in 9% of cases the lag between injection and the largest event was greater than 1 week. The longest observed delay between injection and the occurrence of a larger event was 23 days. Note that these times define the window within which magnitudes continue to increase post-injection, it is not the time in which all seismicity associated with a HF-IS has ceased.

Fig. 4
figure 4

Histogram (bars) and cumulative percent (circles) of observed times in which post-injection magnitude increases occur. Cases for which no post-injection increase occur are plotted at 0

The observed time windows in which trailing events occur confirm our hypothesis with respect to the tight correlation between stimulation and induced seismicity: long lag times between stimulation and increases in seismic magnitude are not observed to occur. These observations characterise the time period for which an otherwise quiescent site may need to be monitored before it can be assumed that the risk of induced seismicity has passed. Moreover, they also define a suitable time window that might be used when performing data-mining exercises to identify potential HF-IS candidates (e.g. Atkinson et al., 2016).

The compiled datasets also allow us to investigate the timing of the largest event within each sequence. Van der Elst et al. (2016) argued that the timing of the largest event within a sequence will be random, since the occurrence of seismicity should be controlled entirely by tectonic processes. In their cases studies, which included induced seismicity from a variety of industrial activities, van der Elst et al. (2016) did not find evidence to rule out the random occurrence of the largest events within the sequence. If true, this would pose a challenge for TLSs, since it implies that a large event could be the first to occur (or may occur relatively early in the sequence) and that magnitudes are controlled solely by tectonic factors. In contrast, Skoumal et al. (2018b) examined sequences of HF-IS in Oklahoma and found a statistically significant trend (p = 2.5%) for the largest events to occur in the latter half of their sequences and for the largest events to occur towards the end of injection periods.

We performed this analysis for the 24 cases where we had access to full event catalogues. Figure 5 shows the number of events prior to the largest event, as a percentage of the total number of events, Nprior/Ntotal, for each case. We find that for 17 of the 24 cases examined the largest event occurred in the latter part of the sequence, indicating that the occurrence of the largest event is not randomly distributed, with p = 4%. Moreover, we note that the cases where large events occurred early within sequences were all cases with relatively small numbers of events in the overall catalogue, which may indicate an impact of detection limitations failing to identify precursory events that are reduced when larger catalogues are produced by better monitoring systems, or by more detailed analysis of recorded data using more advanced event detection algorithms (e.g. Verdon et al., 2017). This impact can be understood by comparing the report of the Montney Septimus sequence made by Babaie Mahani et al. (2019), which did not identify any precursory events, and the catalogues produced by re-analysis of the data by Peña-Castro et al. (2020) and Roth et al. (2020).

Fig. 5
figure 5

Order of occurrence of the largest earthquake within each sequence. In a, we show each case in Table 1 for which catalogue data is available, with the position along the x-axis showing the number of events prior to the occurrence of the largest event in the sequence as a percentage of the total number of events, while the y-axis shows the number of total number of events within each catalogue. In b, we show a histogram (in 10% bins) of the Nprior/Ntotal percentage results for the cases shown in a

Our results therefore are in agreement with the results found by Skoumal et al. (2018b), indicating that the largest events are unlikely to occur at an early point within HF-IS sequences, but instead, magnitudes tend to increase as injection progresses.

6 Next record-breaking event magnitudes

The concept of magnitude jumps has been more commonly used by the mining industry to address induced seismicity (Cao et al., 2020). Mendecki (2016) developed the concept of the Next Record-Breaking Event (NRBE), which represents the expected size of the next event that is larger than all previous events, based on record-breaking statistics theory (Tata, 1969). Cao et al. (2020) applied this approach to oil and gas operations, including the Groningen field, and an unnamed hydraulic fracturing dataset from North America, finding that the NRBE approach performs well in estimating the observed magnitude jumps. The maximum expected magnitude jump, ΔMMAX, is determined from the observed event population as:

$$ \Delta {M}_{\mathrm{MAX}}=2\Delta {M}_{\mathrm{MAX}}^O-\sum \limits_{i=1}^{n-1}\left[{\left(1-\frac{i}{n}\right)}^n-{\left(1+\frac{i+1}{n}\right)}^n\right]\Delta {M}_i^O, $$
(1)

where \( {\Delta M}_{\mathrm{MAX}}^O \) is the largest magnitude jump observed to date, and \( \Delta {M}_i^O \) are all observed magnitude jumps ordered by size. The maximum expected magnitude jump can be added to the current largest event, \( {M}_{\mathrm{MAX}}^O \), to determine the modelled magnitude of the NRBE, \( {M}_{\mathrm{MAX}}^M \). Recent attempts to model the largest expected event magnitude have been based on the correlation between injection volumes and earthquake rates (e.g. Shapiro et al., 2010; Hallo et al., 2014; Mignan et al., 2017; Verdon and Budge, 2018). These methods have shown some success when applied in real time (e.g. Kwiatek et al., 2019; Clarke et al., 2019). The NRBE approach described by Mendecki (2016) provides an alternative approach that allows us to evaluate whether the observed magnitude jumps are in line with expectations given the preceding seismicity. It has the advantage of not requiring detailed injection data against which to correlate rates of seismicity, but like the methods described above, it requires a sufficient number of events to be recorded in order to establish the minimum magnitude of completeness, MMIN.

In Figs. 6, 7, and 8, we track the modelled NRBE magnitudes using Equation 1. We perform these calculations for cases where full event catalogues are available, as opposed to the cases where we were required to digitise figures to extract datasets (see Section 4). We use the Kolmogorov-Smirnov test, with a 20% significance threshold (Clauset et al., 2009) to determine MMIN, requiring at least 30 events larger than this magnitude. This limits our analysis to the 3 UK Bowland Shale cases; the Crooked Lake clusters 1, 2, 3, and 6 (Bao and Eaton, 2016); Tony Creek (Eaton et al., 2018); the Red Deer ESB02 and ESB10 clusters (Schultz and Wang, 2020); the 2015 Montney sequence described by Babaie Mahani et al. (2017); the Septimus sequence described by Peña-Castro et al. (2020); Garvin County, OK (Holland, 2013); Karnes City (Fasola et al., 2019); and the Guy-Greenbrier clusters (Yoon et al., 2017).

figure 6
Fig. 7
figure 7

Next record-breaking event model applied to cases of HF-IS in the WCSB. Figure format as per Fig. 6

Fig. 8
figure 8

Next record-breaking event model applied to cases of HF-IS in the USA. Figure format as per Fig. 6

In each case, we track the modelled ΔMMAX, using Equation 1, as each sequence progresses, from which we estimate \( {M}_{\mathrm{MAX}}^M \). Comparing the observed and modelled maximum magnitudes, we find that this approach does a good job of forecasting the observed magnitude jumps, with the modelled \( {M}_{\mathrm{MAX}}^M \) tracking the growth in magnitudes in advance. Figure 9 compares the observed \( {M}_{\mathrm{MAX}}^O \) for each of the cases with the modelled \( {M}_{\mathrm{MAX}}^M \) at the timestep prior to the occurrence of this largest event, finding good agreement between the modelled and observed magnitudes.

Fig. 9
figure 9

Comparison of observed \( {M}_{\mathrm{MAX}}^O \) (largest event size) for each of the sequences shown in Figs. 6, 7, and 8, versus the NRBE \( {M}_{\mathrm{MAX}}^M \) that was modelled prior to the occurrence of this event. The dashed line shows a 1:1 relationship—the only case which does not fall close to this line is the Red Deer ESB10 cluster

There are, however, some exceptions to this success—for the Red Deer ESB02 and Montney 2015 cases, there were an insufficient number of events prior to the occurrence of the largest event, and hence \( {M}_{\mathrm{MAX}}^M \) was undetermined at this time. For the Red Deer ESB10 case, the observed magnitude jump to the M 4.2 event is far larger than that modelled by the NRBE approach, even when the events from the preceding ESB09 cluster are included. This is the only case for which a large number of precursory events were observed and yet the NRBE approach produced a significant underestimate of ΔMMAX.

These observations show that this method has promise for estimating magnitude jumps and, therefore, provides a viable alternative to injection volume-based methods to forecast \( {M}_{\mathrm{MAX}}^M \). However, much like these volume-based methods, monitoring arrays with sufficiently low detection thresholds must be available in order to record sufficient numbers of events.

7 Implications for TLS design and operation

The essence of a TLS is to use monitoring of induced seismicity and pre-established thresholds for invoking modifications (reductions or suspension) of fluid injections in order to avoid shaking levels that could result in unacceptable consequences. From this perspective, the fact that for many geothermal systems and some hydraulic fracturing wells the largest induced events have occurred after shut-in has led to questioning of how effective TLS can actually be as a tool for controlling the impact of induced earthquakes (e.g. Baisch et al., 2019). However, if the patterns of jumps in the magnitude of induced events and the occurrence of trailing events are taken into account in TLS design, these systems can provide an effective means of mitigating induced seismic risk—and in many cases, offer the only possibility for rational risk management where hydraulic fracturing takes place.

7.1 Magnitude thresholds and observation intervals

The occurrence of magnitude jumps and trailing events appears to vary by play. For example, the Bowland Shale seems to be particularly prone to post-injection trailing event increases, while such events have also been observed in the Duvernay. In contrast, the Sichuan Basin and the Montney have produced large magnitude jumps during operations (or a lack of precursory events), but have not produced trailing events of any significance. While we do not have a physical explanation for this geographical variability, it is likely to be explained by different geological conditions (rock properties, structural features, in situ stresses) between plays (see Section 7.3).

We note that cases that produced large magnitude jumps during injection did not produce large trailing event magnitude increases, and vice versa. A “double whammy” effect of a large jump in magnitude during injection, followed by further increases after injection, which would pose the greatest challenge for a TLS, has not been observed in any of our case studies. If the objective of a TLS is to preclude the occurrence of a single event above a given magnitude, as opposed to preventing the occurrence of multiple large events (see Section 2), then the combined population of observed jumps and trailing event increases may provide a guide as to an appropriate separation between a red-light TLS threshold and the event magnitude that is to be avoided. We find that for 85% of the 35 cases we studied, there was no jump or trailing event larger than 2 magnitude units, while the largest confirmed jump was 2.7 magnitude units. We again note that these statistics are drawn from a biased sample because our focus is on cases where large increases in magnitude have occurred, whereas most wells undergo hydraulic stimulation without any increase in observed magnitudes at all. Nevertheless, a separation of 2–2.5 magnitude units between the TLS threshold and the magnitude to be avoided seems reasonable. Schultz et al. (2020b) reached a similar conclusion based on the theoretical consideration of potential jumps as described above, and their estimate that jumps of more than 2 units will occur in less than 1% of cases. As a result, Schultz et al. (2020b) suggested that red-light thresholds be set at a level corresponding to nuisance levels of shaking to ensure an adequate gap to account for jumps and trailing events, such that damaging levels of shaking are not reached.

Kao et al. (2018) discuss the performance of TLSs imposed to mitigate induced seismicity in the WSCB. An assumption implicit to their assessment of the TLS performance is that the objective of the TLSs was or is to prevent events that are larger than the red-light threshold of M 4. However, as we have shown, we should not expect the red-light threshold to define the limit of event magnitudes that might occur. No HF-IS events with M > 5 have occurred in the WCSB. However, since some of the M > 4 HF-IS cases occurred prior to the imposition of TLSs, this may simply represent a threshold to event magnitudes imposed by the geomechanical and tectonic setting, or a statistical product of the G-R distribution of HF-IS in the WCSB, rather than any operational mitigation actions.

The UK represents a contrasting situation, where TLS red-light threshold is set at M 0.5, yet an M 2.9 event occurred under this scheme. It is therefore worth examining the operation of this scheme in more detail. While the M 0.5 limit is the most conservative of any TLS applied to hydraulic fracturing, the consequences of breaching the red light are the least onerous, with a pause in injection of a minimum of 18 h being required after which, once seismicity levels have decreased below M 0.5, further injection may occur (Clarke et al., 2019). This was the case at the Preston New Road wells, where the red light was triggered on multiple occasions. Most notably, during PNR-2 the red-light threshold was reached with an M 1.6 event on the 21st August 2019, but further injection took place on the 23rd August 2019, after which the M 2.9 event occurred (Kettlety et al. 2020). Evidently, the pause in injection did little to reduce the cumulative impact of injection, with event magnitudes continuing to rise after injection re-started.

A more extreme example of this effect took place at the Pohang geothermal operation in South Korea, where an M 5.5 event occurred (Ellsworth et al., 2019), despite a TLS with a red light of M 2.0 (Kim et al., 2018). However, while the red-light threshold was reached with an M 3.2 event in April 2017, further injection took place in August and September 2017, after which the M 5.5 event occurred. Observations from sites such as Preston New Road and Pohang indicate that once a fault has been activated by fluid injection, it may remain prone to continued activation by further injection, even after a substantial period of time, with magnitude levels continuing to develop from where they were during previous injection phases, as opposed to resetting to much reduced levels. While it might be anticipated that extensive flowback could act to reduce the levels of stress acting on a reactivated fault, and thereby reduce the levels of seismicity should further injection take place, we are not aware of any studies that have yet demonstrated this in practice.

7.2 TLSs and maximum magnitudes

In order to obtain a licence for a HF operation, it is likely that operators will need to present a safety case that demonstrates that the injections will not lead to unacceptable levels of seismic risk. This may include hazard or risk assessments, in which a key parameter to be defined would be the largest magnitude event expected to occur. If the assessments are made deterministically, then the size of this postulated largest event will largely control the impacts. If a probabilistic approach is adopted, the maximum magnitude will define the upper limits of the hazard integration. In either case, the value will be difficult to estimate with a high degree of certainty and, at the same time, the perception of the viability of the project will be strongly influenced by the value of maximum magnitude, MMAX. Considerable effort has been invested over recent decades to define MMAX for probabilistic seismic hazard analysis (PSHA) for natural seismicity, despite the fact that this parameter is generally found to exert relatively little influence on hazard estimates. For induced seismicity, MMAX can be more influential but it could also be argued that MMAX has different meanings in hazard assessments for natural and induced seismicity. In long-term PSHA, the value corresponds to the largest earthquake that could ever occur within a given seismic source during the current tectonic regime. For induced earthquakes, it corresponds to the largest event that could be triggered by the fluid injection, and therefore must occur during or shortly after injection which, for hydraulic fracturing, will correspond to a period typically measured in days or weeks (Fig. 1). While the distinction is subtle, it is very important: we believe that simply assuming the same MMAX values used in PSHA for natural seismicity when estimating seismic hazard and risk due to induced earthquakes could lead to gross overestimation in many cases. In PSHA for tectonic seismicity, it is perfectly reasonable to constrain MMAX based on the dimensions of the largest seismogenic faults, but for induced seismicity from short-lived operations, the largest magnitude earthquake may depend on numerous operational factors and hydraulic connectivity between the injection wells and any critically stressed faults, as well as the dimensions of such faults. We would argue, therefore, that approaches to estimating MMAX for induced and natural seismicity should be different.

McGarr (2014) proposed that the largest earthquake is controlled by the total volume of injected fluid, whereas van der Elst et al. (2016) have argued that this volume controls the number of earthquakes and that the largest earthquake is actually controlled by the tectonics of the region (although they also note that the largest event is proportional to the total number of earthquakes, thereby linking MMAX back to volume). The findings of van der Elst et al. (2016) would seem to support those who propose that MMAX for induced earthquakes is the same as for tectonic earthquakes in a given region (e.g. Petersen et al., 2015), but this point of view has not been universally accepted. Although related to long-term conventional natural gas production rather than short-lived fluid injection, induced seismicity in the Groningen gas field in the Netherlands serves as an example of how the two estimates of MMAX may vary. In the SHARE project that produced seismic hazard maps for Europe, the logic-tree branches for MMAX in the region containing the Groningen field were all ≥ 6.5 (Woessner et al., 2015), whereas studies focussed on induced seismicity in the field proposed maximum magnitudes in the range of 4.0–4.5 (Zöller and Holschneider, 2016; Dempsey and Suckale, 2017; Beirlant et al., 2019); the largest observed earthquake to date was of magnitude M 3.6. A panel of international experts charged with making an assessment of MMAX in Groningen developed a logic-tree that assigned a probability of almost 75% that MMAX would be no more than 1.5 magnitude units greater than the largest observed event and a probability of less than 10% to MMAX being equal to that assigned for tectonic seismicity (Bommer and van Elk, 2017). Moreover, at the time of writing, the expert panel has been reconvened for a second workshop to consider new information regarding the possibility of the compaction of the gas reservoir triggering tectonic earthquakes.

The depth of earthquake nucleation is an important issue in the consideration of appropriate MMAX values. Large tectonic earthquakes tend to occur at mid-crustal depths, with a large portion of the rupture propagating upwards from the hypocentre (e.g. Mai and Thingbaijam, 2014). Events induced by hydrocarbon activities typically occur in the sedimentary column, or within the upper km of the underlying basement (e.g. Verdon, 2014). Therefore, the triggering of large-magnitude earthquakes would require that a fault rupture is initiated which proceeds to propagate mainly downwards into the crust. Whether such a mechanism is likely to occur is open to debate but the available evidence suggests that fault ruptures that propagate exclusively downwards are very rare. Mai et al. (2005) present a database of more than 50 finite rupture models for both strike-slip and dip-slip earthquakes: in only 6 of the cases is the hypocentre located in the upper third of the rupture width and none in the top 15% of the rupture width. Lomax (2020) reports a focal depth of just 4 km for the M 7.1 Ridgecrest mainshock “implying nucleation in a zone not conducive to spontaneous, large earthquake rupture nucleation and growth.” Lomax (2020) argues that this shallow hypocentre was the result of stress transfer from a deeper (12 km) foreshock of M 6.4, without which rupture initiation of a large event at such shallow depth would not have occurred.

The onus on operators to assess a priori the largest earthquake magnitude that might be expected is reduced when a TLS is implemented, especially if such a system is designed to take magnitude jumps and trailing events into account, rather than considering these as cases of TLS failing in their objectives. By establishing an appropriate red-light threshold and allowing for the potential jump above this level, a system can be designed with reasonable confidence of maintaining induced seismicity at an acceptable level. The operation of a TLS can be further refined and improved through the use of observational data in real time to continuously update estimates of the largest expected magnitudes using either methods based on correlations between volume and earthquake rate (e.g. Shapiro et al., 2010; Hallo et al., 2014) or methods based on the NRBE concept (Mendecki, 2016; Cao et al., 2020).

7.3 Influence of stress conditions on the evolution of HF-IS sequences

We have shown throughout this study that there is large variability in the induced seismicity response between different formations and between different geographical regions. It is well established that the occurrence of HF-IS requires certain necessary conditions, including the presence of pre-existing tectonic faults that are near to their Mohr-Coulomb critical stress threshold at which slip can occur. Hence, both the abundance of faulting and the in situ stress conditions are likely to play important roles in the evolution of HF-IS sequences.

As an example, at the Preston New Road site in the UK, Kettlety et al. (2020) identified that the different orientations of the two reactivated faults may account for their different responses. The PNR-1z fault, which was moderately well orientated in the in situ stress field, saw a gradual increase in magnitudes over time, whereas the PNR-2 fault, which was very well orientated in the in situ stress field, reactivated after a much smaller volume had been injected, and produced larger magnitude jumps and trailing events.

While detailed stress information on a site by site basis is often not available, it is of interest to consider the general stress conditions for the different plays and formations identified in Table 1, as revealed by the focal mechanisms of recorded events. The only HF-IS case study from Table 1 which appears to have extensional stress conditions (Shmin < SHmax < SV, where Shmin, SHmax, and SV are the minimum horizontal, maximum horizontal, and vertical principal stress components, respectively) was the Karnes case from the Eagle Ford Shale (Fasola et al., 2019). The majority of cases of HF-IS appear to be from sites with strike-slip stress conditions (Shmin < SV < SHmax), including the UK Bowland Shale (e.g. Clarke et al., 2014), the Duvernay (e.g. Eaton et al., 2018), the Appalachian Basin (e.g. Skoumal et al., 2015), Arkansas (e.g. Yoon et al., 2017), and Oklahoma (e.g. McNamara et al., 2015).

Two cases of HF-IS show evidence of compressional stress conditions (Shmin < SHmax < SV): the Montney shale (e.g. Peña-Castro et al., 2020) and the Sichuan Basin (e.g. Jia et al., 2020). We note that these two plays represent the two most extreme cases of HF-IS, with the Sichuan Basin hosting the largest HF-IS events yet recorded, with the Montney shale being the second-largest, and also showing the most extreme examples of magnitude jumps during injection.

This behaviour might be expected when these stress regimes are considered in more detail. It is widely held that the crust is critically stressed (Zoback et al., 2002). While it may be argued that this concept does not apply in all settings (e.g. Vilarrasa and Carrera, 2015), it is reasonable to assume that sites that are susceptible to induced seismicity will be near to the critical stress state. As such, the stress conditions will be controlled by the frictional properties of faults, as shown schematically in Fig. 10. The vertical stress is generated by the weight of the overburden, and so is primarily controlled by depth. In an extensional regime, the horizontal stresses must be lower than SV. Given the positive slope of the Mohr-Coulomb failure envelope, the shear stresses must therefore be relatively small. In contrast, in a compressional regime, the horizontal stresses must be larger than SV. As such, for the same depth (and therefore the same value of SV), the shear stresses will be significantly larger. A strike-slip regime, where the SV is interposed between maximum and minimum horizontal stresses, results in shear stresses that will be lower than compressional regimes, but higher than extensional regimes.

Fig. 10
figure 10

Schematic depiction of the impact of stress regime (extensional, blue; strike-slip, green; compressional, orange) on the shear stress that might be experienced by faults. We show Mohr circles for each stress condition, on the condition that the vertical stress, SV, is fixed by depth, and that stress conditions are close to the Mohr-Coulomb failure criteria (red line). As such, the extensional regime produces lower shear stresses, while the compressional regime produces higher shear stresses

It can be surmised that HF-IS will generate larger magnitudes, and larger magnitude jumps, in conditions where shear stresses acting on faults are higher. The observations presented above are consistent with this concept. While hydraulic fracturing has been conducted in many formations with extensional stress conditions, such as the Barnett Shale, cases of HF-IS are rare. Cases of HF-IS become more common in strike-slip settings, although magnitudes have been moderate, and a gradual evolution of magnitudes is typically observed, with few large magnitude jumps. Hydraulic fracturing in formations with compressional stress conditions is less common. However, where such activities have taken place, event magnitudes have been larger, and more extreme jumps in magnitudes have been observed.

8 Conclusions

Hydraulic fracturing-induced seismicity continues to pose an issue for the shale gas industry in various regions and plays around the world. Traffic Light Schemes are widely used as a tool to regulate and mitigate HF-IS hazards. The effective use of TLSs requires that appropriate thresholds be set, both with respect to the risk target in terms of consequences to be avoided, and with respect to the ability of TLSs to ensure that events with potential to produce said risks do not occur. TLS thresholds may be based on event magnitudes or observed ground motions (or a combination of the two). Whereas ground motions may be more closely related to earthquake impacts, accurate characterisations of ground motion may be more challenging to make unless dense local networks are available. Equally, the impacts of jumps in event size and post-injection trailing effects are more easily quantified with respect to magnitude. Hence, magnitudes may be the optimal parameter on which to base TLS thresholds. However, choices of magnitude thresholds should reflect the expected ground motions that would result given the anticipated event locations, depths, and near-surface conditions, relative to the vulnerability of nearby buildings and infrastructure.

TLSs are underpinned by two key assumptions: that magnitudes increase relatively gradually during injection, allowing mitigative actions to be taken as magnitudes proceed from green to yellow to red and that the mitigating action (e.g. ceasing injection) will stop or reduce the earthquake magnitudes. These assumptions are challenged by observations of magnitude jumps, whereby an event occurs that is significantly larger than any event in the preceding sequence (“out-of-the-blue”), and by trailing events, where magnitudes continue to increase after injection has ceased. The extent to which these phenomena are able to occur will determine the appropriate separation between TLS thresholds and the levels that are to be avoided.

To address this question, we have assembled HF-IS datasets from around the world. We find that magnitude jumps of up to 1 unit are common, while jumps that are significantly larger than 2 magnitude units are rare, with the largest observed jumps being 2.7. Trailing event magnitude increases occur in approximately one quarter of cases, with very few cases having a trailing event increase larger than 1 unit. Sequences where large jumps occurred during injection did not produce trailing events, and vice versa, meaning that the double impact of these two effects has not been observed. As such, based on the observed cases, a separation of 2–2.5 units between the red-light threshold and the magnitudes that are to be avoided would seem reasonable.

The concept of magnitude jumps during induced seismicity sequences has also been addressed by the mining industry, where the concept of the next record-breaking event has been developed. We applied this method to our compiled datasets and found that the NRBE approach was generally successful in forecasting event magnitudes. However, there remained a minority of cases where either the largest event either had too few precursors to make an NRBE forecast or where the forecast underpredicted the largest event. Therefore, while such forecasting schemes will undoubtably provide useful guidance for regulators and operators during injection (e.g. Kwiatek et al., 2019; Clarke et al., 2019), we anticipate that TLSs, so long as they are appropriately designed, will continue to play an important role in the successful mitigation of HF-IS.