Introduction

The Environment Agency (EA) of England and the Water Services Regulation Authority of England and Wales (Ofwat) are, respectively, the environmental and economic regulators of the water sector. In recent years, there have been increasingly significant financial penalties and criminal prosecutions following major incidents of sewage pollution of watercourses1,2,3,4,5,6,7. In 2018, there were 48 category 1&2 (‘major’&‘serious’) and 1527 category 3 (‘minimal’) pollution incidents impacting river water quality related to sewerage networks and Waste Water Treatment Plants (WWTPs) in England1,8. Individual WWTP operators self-reported between 62% and 84% of identified pollution incidents in England in 2018; the public and third parties were responsible for reporting the remaining 3951.

The non-trivial role of public reporting of pollution incidents in England reveals that: (1) operators may significantly under-report pollution incidents; (2) the public, unknowingly, plays an important role in water industry oversight; and (3) annual reporting of wastewater pollution incidents is very likely incomplete9. Uncertainty regarding the frequency, duration and impact of wastewater pollution incidents perpetuates the gap in evidence needed to inform intervention, capital investment, and prosecution. Here, we have applied machine learning (ML) techniques to leverage available data streams to highlight putative sewage pollution incidents. We also employed rainfall, river flow and WWTP alarm data to contextualise potential polluting effects and possible non-compliance with EA permits to discharge untreated sewage.

Environmental Information Regulation (EIR) requests were used, under UK regulations enacting European Council Directive 2003/4/CE10, to obtain daily treated effluent flow patterns and event duration monitoring (EDM) data, the latter being start and stop times of untreated wastewater discharges from storm tanks. During periods of rainfall, storm tanks are employed at many WWTPs to hold excess sewage temporarily when inflow is swollen by surface water runoff. The contents of storm tanks are required, under permits to discharge to watercourses, to be transferred for treatment as soon as inflow recedes and WWTP capacity returns. During heavy rainfall, storm tank capacity can be breached leading to permitted spills of untreated, but partially screened, sewage via storm tank overflows, resulting in pollution of receiving watercourses.

In 2013, the UK Department for Food, Rural and Agriculture (DEFRA) declared that, by 2020, EDM devices should be installed on the “vast majority” of combined sewer overflows (CSOs) on sewerage networks, WWTP storm tank overflows, and sewage pumping station (SPS) emergency overflows in England and Wales. Although the term storm tank overflow is commonly used by the Water Industry, the EA and in discharge permits, by being subject to a combination of groundwater, surface runoff and wastewater, it clearly is a combined sewer overflow.

Since 2016, the number and length of EDM detected spills have been reported annually to the EA by WWTP operators. In 2020, EDM reports were made for about 70% of such overflows. A discharge of untreated sewage during exceptional rainfall would be compliant with a permit and until the introduction of EDM devices would not have needed self-reporting to the EA by an operator. During periods of sub-exceptional rainfall, such discharges are non-compliant and potentially illegal under UK and European law11,12.

Operators of WWTPs are also required to continue to treat a minimum flow of sewage that is at least a plant-specific “storm overflow rate” defined in its EA permit, even when excess flow is diverted to, or spilled from, a storm tank. Therefore, it would also be non-compliant for a storm tank to receive untreated sewage or to overflow to a watercourse when treatment flow is below the storm overflow rate. Despite the flow passed to treatment being essential to checking such compliance, there is typically no permit requirement to record it and many WWTPs only record effluent flow. Effluent flow is obviously closely related to flow passed to treatment but may not be a legally acceptable surrogate for validating compliance. Henceforth, we use ‘storm’ and ‘spill’ in quotes to mean putative storm discharges detected by our flow analysis but possibly undetected by EDM devices, storm tank diversion alarms and even occurring during non-exceptional rainfall.

We selected two WWTPs operated by the same water company because of the variation in the population size served and the availability of flow data. Summary attributes of WWTP1 and WWTP2 are provided in Table 1, where anonymity has been intentionally preserved. Responses to EIRs confirmed that both WWTPs recorded only treated effluent flow and hence ruled out direct validation of compliance with minimum treatment rates for flow passed to treatment during spills. Monitoring Certification Scheme (MCERTS) 15-min daily effluent flow patterns were provided for 8000+ days over an 11-year period (2009–2020) and EDM data defining spill intervals, also requested by EIR, for 900+ days (2018–2020).

Table 1 Metadata for WWTP1 and WWTP2.

Our objective was to develop techniques for analysing daily flow patterns and EDM data that could detect spills of untreated “storm tank overflow” discharges into watercourses. We believe that the retrospective detection of such spills would benefit both water companies and regulators as well as citizen and professional scientists interested in sewage-related pollution of watercourses. We adopted a machine learning (ML) approach that used flow patterns during EDM recorded storm tank overflow spills to train pattern recognition algorithms to detect similar flow patterns when the occurrence of wastewater discharges was unreported or unknown. Artificial Intelligence (AI) techniques based on explicit, symbolic representations of regulations and legislation have been successfully developed and applied for several decades13. The use of quantitative AI methods such as Machine Learning and Pattern Recognition in regulatory compliance checking is now receiving more attention as industry and government accrue large databases accessible to lay, scientific and regulatory scrutiny14,15. For example, ML techniques were recently used to predict the likelihood of an organisation failing a US government agency inspection of compliance with environmental regulations16. The authors used a public database of environmental enforcement and compliance (https://echo.epa.gov/) to predict compliance based on location, industrial sector and previous inspection history over 5 years.

As a precursor to the ML component of the study, we undertook shape analysis of 3038 daily flow patterns from 2016 to 2020 to identify a compact flow representation. Then, we used supervised learning with 20 variations of standard ML algorithms on 917 flow patterns from 2018 to 2020, with EDM data, to develop classifiers able to discriminate between those affected and those unaffected by untreated sewage spills. Optimal classifiers, one for each WWTP, were subsequently verified in a semi-blinded manner on 2121 flows from 2016 to 2018 used for shape analysis of flow patterns but not supervised learning. The classifiers were then applied retrospectively, and fully blinded, to 5039 daily flow patterns from 2009 to 2015 not used for shape analysis nor supervised learning. Finally, publicly accessible rainfall, river flow and river level data as well as telemetry alarm data, obtained through EIR from the operator, were introduced to contextualise and to corroborate potential ‘spill’ days identified by the statistical or ML approaches and to inform discussion of compliance with EA discharge permits.

The primary contribution of this study is the use of machine learning models to detect unreported spills of untreated sewage from wastewater treatment plants having previously trained the models to determine relationships between known spilling events and associated perturbations of effluent flow. The addition of telemetry alarm and rainfall data enables waste treatment plant operators and regulators to detect equipment malfunction and permit non-compliance, respectively. Professional and citizen scientists also benefit from improved identification of putative spills that might affect their study of potentially polluted watercourses.

Results

Shape analysis of 3038 daily flow patterns (2016–2020) for WWTP1 and WWTP2

An example effluent flow pattern for a 10-day period at WWTP1 is shown in Fig. 1. EDM detected spilling intervals are overlaid to demonstrate the flattening effect of spilling on the profile of the flow pattern.

Fig. 1: WWTP1: example effluent flow pattern for 10 days annotated with EDM confirmed spilling intervals.
figure 1

A 24-h (midnight to midnight) daily flow pattern of 96 15-min-interval average flow rates (litres/second) of treated effluent is shown in blue. The black horizontal, linear annotations represent EDM recorded intervals denoting a discharge from a storm tank (i.e., consented spill or potentially unconsented spill of untreated sewage), the shortest being 15 min and the longest over 24-h. Total daily rainfall (mm/d) is provided in green. The first two days, with no detected spills, show diurnal patterns of low flow between midnight (previous day) and the first peak after mid-morning, followed by a lull until a second, smaller peak in the evening. The next seven days (15/12/18 through 21/12/18) involve spill intervals of various length (black EDM line), showing a flattening of flow, which is typical of storm discharge during heavy rainfall. The last day shows elevated flows and a partial return to a diurnal flow pattern with no spills reported.

For WWTP1 (resp. WWTP2), EDM data were available for 446 (resp. 471) consecutive days for 2018–2020 during which untreated sewage spill intervals of varying lengths had been recorded. For each day, spill intervals were aggregated to the total number of hours of discharge. Of the days used for machine learning for WWTP1 (resp. WWTP2), 339 (resp. 346) involved no EDM recorded spilling incidents and 107 (resp. 125) days had spills with various lengths of which over a third were for 24-h. For WWTP1 (resp. WWTP2), 97 (resp. 117) days with an aggregated ‘spill’ length of at least 3-h were labelled as ‘spill’ and 349 (resp. 354) with an aggregated ‘spill’ length of below 3-h as ‘normal’. A 3-h aggregation period was selected because it guaranteed a reasonable number of ‘spill’ days on which to base the supervised learning and preliminary attempts to predict spilling hours per day were weakest for aggregated daily spills under 3-h. Where no EDM data was available, days were labelled as ‘unknown’. The average ‘normal’ (blue line) and ‘spill’ (black line) daily flow patterns as a proportion of storm overflow rates (red line) are shown in Fig. 2 for each WWTP. The storm overflow rates mark the minimum flow that should be treated before untreated sewage spills can be made in compliance with EA permits to discharge to watercourses.

Fig. 2: Average daily flow patterns.
figure 2

a WWTP1: black curve for ‘spill’ days (n = 97) and blue curve for ‘normal’ days (n = 349); b WWTP2: black curve for ‘spill’ days (n = 117) and blue curve for ‘normal’ days (n = 354).

Separate shape models were generated for flow patterns from 2016 to 2020 for WWTP1 (n = 1511) and WWTP2 (n = 1527). The first principle component of shape variation, PCA1, in both models, is associated with magnitude, and temporal shifting of morning flow peak (see Supplementary Video 1.mp4) as well as “seasonal” changes related to daylight saving, public holidays and vacation periods (Supplementary Fig. 1). Despite differences in the population served by WWTP1 and WWTP2, Fig. 3a, b shows similar distributions for scatter plots of PCA1 vs PCA2 for 2121 flows for 2016–2018 without EDM data. Analogous plots of PCA1 vs PCA2 for 917 flows for 2018–2020 with EDM data (Fig. 3c, d) suggest that, for both WWTPs, PCA2 is correlated with shape difference between ‘normal’ flow (open circles) and ‘spill’ affected flow (filled triangles). This spill-related flattening is illustrated by morphing the overall average daily flow pattern for WWTP1 between −1 and +1 standard deviations of PCA2 (Supplementary Video 2.mp4). Interestingly, the area under the receiver-operating characteristics curve associated with using PCA2 alone for ‘normal’/’spill’ discrimination is 0.88 and 0.91 for WTTP1 and WWTP2, respectively (this is the estimated probability of correctly classifying a pair of flow patterns selected randomly, one each, from the ‘normal’ and ‘spill’ labelled subsets).

Fig. 3: PCA1 vs PCA2 for daily flow patterns.
figure 3

Unknown spill status (grey filled circles); spill confirmed by EDM (filled black triangles); confirmed as normal by EDM (unfilled grey circle) 2016–2018 without EDM data a WWTP1 (n = 1065); b WWTP2 (n = 1056); 2018–2020 with EDM data c WWTP1 (n = 466); d WWTP2 (n = 471).

Supervised learning of the effect of sewage spills on 917 effluent flow patterns

The performance of 20-folded cross-validation of supervised learning for labelled flow patterns for WWTP1 and WTTP2 is shown in Supplementary Tables 3 and 4 for 20 support vector machine (SVM) variations while retaining up to 15 PCA modes for flow pattern synthesis. The number of PCA modes retained for shape synthesis affects the validity of the reconstruction of each daily flow pattern and hence classification accuracy. For the three best-performing algorithms, Supplementary Fig. 2 shows the variation in classification accuracy of daily flow patterns for different numbers of retained PCA modes estimated as the average area under the 20 receiver-operating characteristic curves associated with the cross-validation folds. For the optimal classifiers, the average area under the receiver-operating characteristic curve was 0.97 for WTTP1 and 0.96 for WWTP2.

For verification, prior to wider application, the optimal ML classifiers defined for each WWTP were used to reclassify the flow patterns used in their derivation. Figure 4 shows these flow patterns in contiguous temporal sequence with annotations for each day reflecting EDM detected spill intervals (horizontal black segments) and ML confirmation of ‘spill’ (unfilled gold circles). During this period there were 97 (resp. 117) days with an EDM confirmed aggregated spill of at least 3-h at WWTP1 (resp. WWTP2). The agreement between optimal ML classification and spill day labels derived from EDM data was extremely high (WTTP1: sensitivity = 0.91, specificity = 0.95; WTTP2: sensitivity = 0.98, specificity = 0.98), as would be expected for such “training” data.

Fig. 4: Daily effluent flow patterns and event duration monitor (EDM) detected spill intervals at WTTP1 and WWTP2 used as training data (Dec'2018–Mar'2020).
figure 4

The daily flow and EDM spill data are measured at 15 min intervals. Flow is coloured (orange/blue/pink) to distinguish different years. Black horizontal lines delimit EDM detected spill intervals. Daily flows of aggregated spill length of at least/less than 3-h are labelled as ‘spill’/‘normal’ prior to the supervised learning. Gold circles indicate days classified as ‘spill’ following the training of the machine learning (ML) algorithms to produce an optimal classifier for each WWTP. The grey dashed line represents the storm overflow which defines the minimum sewage flow that should be treated even during storm filling or overflow. Additional annotations are telemetry alarms provided by the operator. These alarms have the potential to corroborate ML predictions of ‘spill’ days for the unseen flow patterns from 2009 to 2018 for which there is no EDM data. Similar charts showing the unseen ML classification of the 2009–2018 daily flow patterns overlaid with rainfall and river level data are provided in Supplementary Figs 510.

Figure 4 also includes data from other alarms related to untreated sewage discharges that have the potential to corroborate ML flow pattern classification for historical periods without EDM data. For WWTP1, there is near-perfect agreement (Cohen’s kappa: 0.81–1.00) between the EDM, STO (Storm Tank Overflow) and COL (Consented Overflow Level) alarms and ML classification for Feb ‘19–Feb ‘20 (Fig. 4 and Table 2). For just two months, Dec ‘18 and Jan ‘19, the EDM and COL devices concur with near-perfect agreement (Cohen’s kappa = 0.95), the STO device was largely at odds (Cohen’s kappa ≤ 0), and the ML classifier flagged incidents detected by all three. These results suggest that the STO is a good candidate and the COL alarm is an excellent candidate for corroborating ML detected putative spills at WWTP1 when EDM data is unavailable.

Table 2 Agreement of ML classification, EDM, COL and STO alarms for the supervised learning.

For WWTP2, there is almost perfect agreement between EDM and COL alarms (Cohen’s kappa = 0.87) and with ML classification (Cohen’s kappa = 0.78) (Fig. 4 and Table 2). No STO alarm data were provided for 2020 and between Dec ‘18 and Dec ‘19 STO showed only chance agreement with other devices and the ML classifier (Cohen’s kappa < 0.1). These results suggest that STO is a poor candidate while COL is an excellent candidate for corroborating ML detected putative spills at WWTP2 when EDM data is unavailable.

Detection of spills in 7160 daily flow patterns (2009–2018) not used to train ML algorithms

The classification of 2121 flow patterns from Jan 2016 to Nov 2018 was considered semi-blinded as they were used in shape analysis but not in the ML “training”, whereas the 5039 flow patterns from 2009 to 2015 were classified fully blinded as they were not used in either. Table 3 summarises the annual number of potential ‘spill’ days detected by the ML algorithms.

Table 3 Number of potential ‘spill’ days detected by machine learning.

A subset of 327 ‘spill’ days detected by the ML analysis between 2009 and 2018 at WWTP1 were corroborated by STO or COL alarm data. For the same period, a subset of 128 ‘spill’ days detected at WWTP2 were corroborated by STO or COL alarm data. The COL alarm corroborated all detected spills for which it was available while the unreliability shown earlier for the STO alarm at WWTP2 suggested an alternative approach to corroborate spills detected by ML analysis. For both WWTPs, approximately three additional months of flow and EDM data (87 days between March 7th 2020 to June 1st 2020) were available after the end of the ML training data. This period was omitted from the original ML training data because the daily flow volume at WWTP1 was zero or less than 1% of expectation for more than 50% of the time and hence unusable (Supplementary Fig. 12). Such data anomalies are in any case a breach of the EA permit requirement that only 37 days in total in each year be missing or suspicious. However, it was possible to perform blinded testing of the 87 daily flow patterns from WWTP2 against the classification models constructed for Dec’ 2018–Mar'20 and demonstrate corroborative agreement with the EDM data 93% of the time (Supplementary Fig. 12).

When WWTP1 spilled untreated sewage, whether detected by COL/EDM alarms or by ML classification, it typically did so at an effluent flow rate that was considerably below the storm overflow level (50.52 l/s) stipulated in its EA permit as the minimum flow rate for continued treatment (pass forward flow or PFF). This can also be seen in the 2018–2020 EDM monitored period. A comparison of average ‘spill’ and ‘normal’ flow patterns (Fig. 2) shows that the average effluent flow for ‘spill’ days at WWTP1 is never above the storm overflow rate, whereas at WWTP2 it is always above. Specifically, at WWTP1, 141 of 274 (51.5%) non-aggregated (i.e. individual) spills detected by EDM at WWTP1 start when the effluent rate is less than 80% of the storm overflow rate compared to none at WWTP2 (Supplementary Fig. 11).

Due to the COVID-19 related lockdown from March 2020, permits for both WWTPs valid for the period prior to 2018 could not be provided in response to an EIR request to the Environment Agency because they were not in electronic format and premises were inaccessible. However, for both WWTPs, the current permits, which include historical amendments, suggest that the storm overflow settings have remained unaltered since before 2009. It appears, therefore, that WWTP1 has been spilling ‘early’ for more than 12 years whereas WWTP2 has rarely done so and, even then, only marginally.

ML detection of isolated and contiguous series of 24-h spills

For each WWTP, the daily flow patterns detected by EDM or ML analysis were ordered by the degree of flattening of the flow pattern as measured by the standard deviation of the 96 constituent 15-min interval flow rates. For ML detected ‘spills’ at WWTP1 without EDM data, the 20 most “flattened” daily effluent flow patterns are compared in Fig. 5 to the average dry weather flow. Each flow reflects persistent 24-h spilling at an effluent flow between 60% and 80% of the minimum required. In contrast, the twenty most “flattened” daily flows at WWTP2 without EDM data have an effluent rate greater than or equal to the corresponding storm overflow rate and so are likely to comply with the minimum flow to treatment condition. Nevertheless, two of these “top twenty” 24-h spills at WWTP2 in Fig. 5b, on 05/05/2012 and 12/05/2012, occur on a rainless day following a dry previous 24-h. Therefore, they are likely to be due to groundwater ingress which the EA considers to be unpermitted. It is widely recognised that groundwater ingress into sewer networks does occur, especially in England where many sewerage networks have been in place for more than 100 years (www.swig.org.uk/wp-content/uploads/2014/10/David-Walters-2015.pdf; https://wwtonline.co.uk/news/thames-water-trials-sewer-infiltration-survey-system; www.theguardian.com/environment/2020/oct/09/oxford-stop-thames-water-firm-dumping-sewage-river; www.southernwater.co.uk/help-advice/sewers/combined-sewer-overflows-csos). It is difficult to obtain groundwater level data for specific locations and for specific days when spills have occurred. Moreover, the underlying geology for the sewerage networks and sewage pumping stations (SPSs) feeding the two WWTPs in this study varies quite considerably without borehole data local to each SPS.

Fig. 5: The 20 daily effluent flows most flattened by 24-h spilling compared to the average daily dry.
figure 5

Weather flow For WWTP1, each spill last 24-h during which the effluent rate is between 60% and 80% of the storm overflow rate. For WWTP2, in contrast, the effluent rate is at or above the corresponding storm overflow rate. Also, two 24-h spills (5.5.12 and 12.5.12) are highlighted as “Dry Spills” because there was no rainfall on the day they occurred nor on the previous day.

An isolated 24-h spill of untreated sewage covers a complete diurnal sewage cycle and so includes the twin peaks of maximum inflow when spilled sewage dilution is likely to be least and risk of pollution damage greatest. But, worse still, is the pollution potential caused by an unbroken series of 24-h untreated sewage spills during which a receiving watercourse has no respite nor opportunity to recover.

2009–2018 The ML analysis detected over 160 24-h spills at WWTP1, of which 105 were corroborated by STO or COL alarm alerts. Similarly, 200 24-h spills were detected at WWTP2. These involved multiple examples of contiguous 24-h spills of more than 10 days.

At WWTP2, a notable near-continuous ‘spill’ of 60 days was detected by the ML classifier between 21/12/2013 and 22/02/2014 (see Supplementary Figs 8 and 9). Extensive sewage fungus in the receiving watercourse had been reported to the EA (27/01/2014 and 03/02/2014) by a member of the public before the EA visited the works on 06/02/2014 to investigate. The EA Compliance Assessment Report concluded that

“There is extensive sewage fungus over 1.5 km of watercourse with a corresponding negative impact on the aquatic environment. Our fisheries and biodiversity teams are very concerned by the impact which we have classified as an ongoing Category 2 incident”.

No prosecution was made. On more than 20 days during this 60-day spill, rainfall was below 2 mm. Similar series of contiguous 24-h spills were detected by the ML analysis in 2012 (14 days), 2013 (16 days, 8 days), 2015–2016 (17 days). Each of these spills also contained subseries of 2 or more consecutive days without rainfall.

2018–2020 EIR requests established that in 2019, WWTP1 spilled for over 1000 h on 72 days (mean: 15 h/spilling day) including 21 ML detected 24-h spills with contiguous series of 2–11 days; similarly, WWTP2 spilled for over 1390 h on 76 days (mean: 18.3 h/spilling day) including 32 ML detected 24-h spills with multiple contiguous series of 2–14 days. A near-continuous spill at WWTP2, for ~30 days in November 2019, included 14 days during which, at most, 2 mm of rainfall had occurred (see Fig. 4). As was the case in 2014, the spills resulted in extensive sewage fungus that was reported to the EA by a member of the public (Fig. 6).

Fig. 6: Photograph of sewage fungus.
figure 6

Sewage fungus resulting from 30 day spill of untreated sewage from WWTP2 in November 2019.

These long spills and sewage fungal growth involved periods of unexceptional rainfall. Our analysis suggests that, for at least nine years, WWTP2 is likely to have been, and continues to be, subject to groundwater ingress—a driver of sewage spills that the EA considers to be unpermitted.

Discussion

In England and Wales, there are ~17,000 combined sewer overflows, all with the potential to discharge untreated sewage to rivers and coastal waters. At WWTPs, such discharges have a finite number of causes: excessive inflow due to rainfall or infiltration, inadequate capacity, equipment malfunction, poor maintenance and, occasionally, deliberate diversion from treatment2,17,18,19. Despite considerable environmental regulation and environmental impact, there remain gaps in our knowledge of the frequency, volume, and polluting effects from untreated sewage spills. The water industry self-reports sewage pollution incidents as part of EA compliance checking and the annual performance review required by Ofwat, e.g., Thames Water20. Recognition of under-reporting is acknowledged in the EA “Water and sewerage companies’ performance: Annual report”1. Prior to our machine learning approach, the only insight into the unreported ‘spills’ were those reported by the public, as many as 38% of sewage ‘spills’ in 2018, e.g., Anglian Water region9. In this manuscript, we report 926 putative ‘spills’ as determined by ML in only two of the 3817 or so WWTPs in England1. Depending on the characteristics of the receiving river and the weather, these might be highly impactful discharges. It remains to be determined whether these ‘dark’ discharges (i.e., previously unknown) could help to explain why 80% of surface water bodies in England are assigned a bad, poor or moderate status classification within the Water Framework Directive21.

We have shown that the machine learning approach developed here can detect untreated sewage ‘spills’ retrospectively. This can assist the water industry in identifying assets that need better management, help regulatory bodies improve compliance checking, and facilitate public oversight of WWTPs. The benefits of a rapid, automated machine learning approach to reporting ‘spills’ would extend to catchment managers, conservation groups, special interest groups (e.g., angling society), recreational users and clubs (e.g., kayaking and open swimming), and consultants and academics focused on modelling and measuring water quality and wildlife health. The lack of data detailing the frequency and length of ‘spills’ can impact society through the regulation of new builds and the requisite planning permissions. A new housing/commercial development can contribute substantially to the flow in a sewage network and WWTP. As such, any new developments within WWTP catchments that are already underperforming, will further exacerbate the frequency and length of ‘spills’ and constrain any progress made through other catchment management initiatives, e.g., habitat restoration.

A potential inadequacy of the machine learning approach is its reliance on a plentiful supply of accurate data. The introduction of EDM devices at WWTPs has been relatively recent and according to the operator of the WWTPs in this study they have required repositioning at least once at all WWTPs for which we enquired. This recommissioning disrupted, and limited the data collected having implications for ML and corroborative analyses. With a view to reusing EDM data, we cross-tested the flow pattern classifiers induced from one WWTP on daily flow patterns of the other after normalising to adjust for magnitude differences. The classification results were not convincing and so, in future, it may be that the approach taken in this study needs to be customised to an individual WTTP or possibly class of WTTPs with similar characteristics yet to be identified.

The ML approach presented here was only possible after a protracted period of EIR submissions. There is no standard protocol in England for making EIR requests to sewerage companies for flow, EDM and alarm data. A written request is required at one sewerage company, while others support requests via email with subsequent electronic transmission or data download. The statutory default period for fulfilling an EIR request, in England and Wales, is 20 working days but this can be doubled, in practice. Two UK sewerage companies have initiated open, cloud-based access to modest amounts of flow and EDM alarm data for its WWTPs and sewage pumping stations (https://marketplace.wessexwater.co.uk/dataset; https://www.southernwater.co.uk/our-performance/flow-and-spill-reporting). The EA Public Register of permits relating to discharges to watercourses offers minimal summary data online and acknowledges requests with an anonymous, untagged email that complicates follow-up unnecessarily. A PDF of a permit is usually supplied within ten working days, but on receipt, might turn out to be a generic amendment relating to several hundred sites without details specific to the site of interest. Online perusal of permits before immediate download would be a more practical and effective approach.

EA permits allowing storm tank related discharges of untreated sewage to watercourses stipulate that a minimum treatment flow has to be attained even when a storm tank is filling or overflowing. However, the permits do not require the flow sent to treatment to be measured, recorded or reported. EIR requests to WWTP operators have established that flow passed forward to treatment is not routinely recorded as “permits do not require it”. Therefore, it is unlikely that this permit condition has been easily verifiable except by using treated effluent flow as a proxy.

As far as the authors are aware, there has not been a similar study applying machine learning to wastewater treatment flow patterns combined with rainfall and telemetry alarm data over a long time series. Using EDM validated ‘storm’ discharge data and treated effluent flow patterns for two contrastingly sized WWTPs, we applied standard machine learning algorithms to construct classifiers that performed exceptionally well at identifying spills previously detected by EDM devices and reported under permit obligation by the WWTP operator. Their application to 10 additional years of daily flow patterns, distinct from the training data, identified 926 potential ‘spill’ days. The ML analysis has provided insight into “early” non-compliant storm tank overflow discharges between 2009 and 2018, revealing none at WWTP2 but hundreds at WWTP1. ML analysis also detected many spills at WWTP2 during periods of unexceptional rainfall suggesting that groundwater ingress has been exacerbating untreated sewage spills there for at least nine years. In 2012, the European Commission ruled that the UK had failed to fulfil its obligation under the Urban Wastewater Directive 91/271/EEC and that untreated sewage discharges were only permitted in exceptional circumstances. Furthermore, as a result of our analysis, there is evidence to suggest that between 2009 and 2020 the rivers downstream of WWTP1 and WWTP2 may have received more than 360 spills of untreated sewage lasting a whole day, often in extensive contiguous series of more than 10 days.

The likely correlation between rainfall and spill occurrence suggests its inclusion in ML studies. However, there is general acceptance that groundwater infiltration contributes to increased WWTP influent and consequent spills of untreated sewage. Groundwater can be surprisingly delayed following rainfall and so some spills would not be predicted by local rainfall or river level alone. Moreover, WWTP2 in this study often treats all influent and does not spill even when sewage flow exceeds its storm overflow level. In this study, we have focused on WWTP treatment flow and EDM data but plan to address the use of ML for more predictive detection of untreated sewage spills from river quality parameters gathered by multi-parameter sondes deployed upstream and downstream of WWTPs.

From a technology transfer point of view, valuable lessons were learned in the accumulation and analysis of large amounts of multi-parameter data and marshalling of appropriate visualisations to support interpretation and presentation of results. Our experience and analysis methodology might be of use to the sewerage industry and regulatory authorities. We hope the results will help to improve WWTP management and compliance oversight and, ultimately, contribute to a reduction in the discharge of untreated sewage to rivers and coastal waters.

Methods

Data related to discharges of untreated sewage and treated effluent

Individual permits governing permitted discharges to watercourses were obtained from the Public Register of the Environment Agency for England and Wales (EA)22. In general, such permits determine:

  1. 1.

    Minimum sewage flow rate to be passed forward to treatment (PFF) before a ‘storm’ discharge is permitted.

  2. 2.

    Capacity of storm tanks used to hold untreated sewage during severe rainfall and/or snow melt.

  3. 3.

    Permitted discharge of untreated sewage to storm tanks due to “rainfall and/or snow melt”.

  4. 4.

    Requirement for event duration monitoring (EDM) equipment to record untreated sewage spills.

  5. 5.

    Quality standards for treated effluent.

  6. 6.

    Reporting frequency to the EA, by the WWTP operator, of effluent quality, EDM and treated flow data.

Untreated sewage and treated effluent flow data

Through EIRs to the company managing WWTP1 and WWTP2, we obtained “Monitoring emissions to air, land and water Certification Scheme” (MCERTS) treated effluent flow data for 2009–2020. The data were provided as.csv files of time-stamped average flows for 15-min intervals in litres or m3 per second. Very occasionally, a whole or significant part of a day’s flow was missing. Such days were excluded from analysis but where data was complete, the pattern of 15-min interval daily flow comprised 96 values. The total number of flow values available for analysis, covering both WWTPs, was about 800,000 corresponding to more than 8000 daily flow patterns. An extract of such MCERTS 15-min treated effluent flow data is provided in Supplementary Table 1.

Data analysis protocol

  1. 1.

    Build shape model from daily flow patterns for 2016–2020 with and without EDM data.

  2. 2.

    Build classifiers using supervised learning on 2018–2020 flow patterns with EDM data.

    Select optimal classifier and verify semi-blinded on 2016–2018 flow patterns without EDM data.

  3. 3.

    Test optimal classifier retrospectively and fully blinded on flow patterns for 2009–2015.

Rainfall and river levels

Heavy rainfall and snow melt can have a deleterious effect on wastewater treatment through surface water runoff causing overload at a WWTP inlet. The EA allows for this by permitting excess raw sewage above a specified overflow rate to be diverted to a storm tank which, when full, is permitted to spill to the linked watercourse. The UK Environment Agency regulations state that a “storm tank must settle out solids and have a minimum capacity of 68litres/population head served or a storage equivalent of 2hours at the maximum flow rate to the storm tanks”23. During some storm sewage spills, rivers in full spate may further dilute combined discharges of untreated sewage and surface water runoff. In anticipation of checking the ML classification of daily flows as involving sewage spills, we obtained average daily rainfall, river flows and/or river levels, when available, through publicly accessible sources (www.accuweather.com/; https://nrfa.ceh.ac.uk/; https://riverlevels.uk/). More detail is provided in the supplementary information.

Telemetry communications between WWTPs and Waste Operating Control Centre

In order to identify discharges of untreated sewage to watercourses, EIRs were made to the sewerage company for EDM records and telemetry alarm exchanges between each WWTP and the company’s Waste Operating Control Centre (WOCC). These were supplied as.csv files cataloguing times/dates of untreated discharges; WWTP id; times/dates of alarm messages; level of alarm severity (reflecting internal company standards for associated minimum response and intervention times); message source (e.g. equipment/device/asset involved); state or change of state of device involved.

Particularly relevant are alarms for storm tanks and EDM devices installed on storm overflows. The consented overflow alarm (COL) measures the level of untreated sewage diverted to a storm tank for which the EDM detects intervals of overflow to the receiving watercourse. As would be expected COL and EDM are closely correlated and as COL was installed before EDM at both WWTPs it is a reasonable surrogate to use when EDM is unavailable to corroborate potential ‘spill’ days detected by ML classification. Illustrative extracts of such data are provided in Supplementary Table 2.

Flow shape analysis

The shape analysis methods and associated software used in this study were developed over the last two decades and have been applied extensively to the detection of shape differences in 3D surfaces, notably those representing anatomical structures. So, rather than redevelop the software, each daily flow ‘curve’ of 96 15-min interval values was converted automatically to a thin flow ‘ribbon’ comprising 190 triangular facets annotated (automatically) with 192 landmarks (Supplementary Fig. 3) to enable correspondence to be established between the daily flow patterns for which a shape model is computed.

A dense surface model (DSM) of a set of landmarked surfaces, described in detail elsewhere24,25, comprises shape variation modes arising from a principal component analysis (PCA) of differences of the surface points’ positions from those of the average surface in the dataset. Prior to the PCA, using a completely flat “base” flow ribbon, a dense correspondence of surface points across all flow ribbons is induced with no manual interaction. During the shape model building, the PCA modes are computed in terms of decreasing variance coverage (defined as the ratio of the eigenvalue corresponding to a PCA mode to the sum of all eigenvalues of the diagonalised covariance matrix). Sufficient modes in the dense surface model were retained to cover 99% of shape variance. The effluent flow data acquired for each WWTP, for daily flow patterns from 2016 to 2020, were used to build separate dense surface models of flow shape.

Supervised learning for identifying daily flow patterns associated with sewage spills

As for the shape analysis described above, software employed in this paper incorporates supervised learning techniques for building classifiers for shape discrimination and has been used in a wide range of neurofacial applications: altered face shape in genetic syndromes26; premature skull fusion27; tissue engineering of face-skull shape28; facial asymmetry associated with epilepsy29 and early childhood cancer30; and, correlated face-brain shape changes arising from foetal alcohol exposure during pregnancy31.

The WWTP operator reported that during an initial period of installation EDM results were unreliable and both devices were recommissioned in Nov/Dec 2018. For this reason, EDM results for both WWTPs earlier than the recommissioning date were excluded from ML analysis.

The supervised learning used the classical ML technique of Support Vector Machines (SVM). The neural network based SVM, or large margin classifier, approach focuses on individual cases in the overlap of the subgroups to be classified that help to define a separating surface with largest margin between the subgroups. The SVM-based classification here employed a radial basis function kernel with 5 heuristics for determining margin width (Hinton; median separation; mean separation; Jaakkola; Jaakkola-mean), each with 4 parametric variations. Thus, 20 variations of “machine learning” algorithms were used to construct flow pattern classifiers for each WWTP.

To estimate the accuracy of each variation, and to avoid overfitting, we used 20-folded cross validation 90%-10% training-test set splits of stratified, randomly selected subsets of the ‘spill’ and ‘normal’ labelled flow patterns and after classification estimated the overall classification accuracy as the mean area under the corresponding receiver-operator characteristic curves of the 20 splits. For each SVM variation, a final classification was ‘spill’ if the lower value of a 95% confidence interval (CI) for the estimated classification was positive. The best-performing combination of SVM kernels (median, Jaakkola and Jaakkola-mean for both plants) and number of PCA modes (2 for WWTP1 and 10 for WTTP2) were identified (see Supplementary Tables 3 and 4 for complete results). For the classification testing, we adopted a conservative approach and defined an optimal classifier that labelled a flow pattern as ‘spill’ if and only if these three best-performing algorithms all classified it as ‘spill’.

Corroboration of ML classification of historic pre-EDM flow patterns with telemetry alarms devices

Supervised learning was used to produce classifiers from flow patterns labelled as ‘spill’ or ‘normal’ using spilling interval data provided by the WTTP operator and detected by EDM devices during Dec ‘18 to Mar’2020. Without the availability of EDM detected spilling intervals for the period January 2009 to early 2018, there is apparently no “gold standard” with which to compare and hence validate the ML based classification of flow patterns. However, both WWTPs were fitted with other devices that could be used to corroborate ML “predictions”. For example, shortly before EDM installation (WWTP1: Nov ‘17; WWTP2: Feb ‘18), both WWTPs were fitted with analogue “Consented Overflow Level” (COL) alarms recording both sewage level in storm tanks monitored by EDM and sending raised/cleared messages to a central control centre. When asked, through an EIR, for data for a “storm tank filling” alarm deployed in many WWTPs, the operator reported no such device to be in use but provided data for a “Storm Tank Overflow” alarm (STO) used at WWTP1 throughout the study period employing the less reliable float switch technology. A similarly named alarm was also in use at WWTP2.

Detection of isolated and contiguous series of 24-h spills

Sewage spills vary considerably in length but typically feature a flattening of flow pattern due to the diversion of excess flow via a storm tank or directly to a river. This reduces variation about the mean of the individual 15-min flow rates representing either the flow passed forward or the treated effluent. As a result, a flow pattern associated with a compliant storm discharge of 24-h (or more) often has a low standard deviation and a mean close to, if not above, the storm overflow rate specified in a discharge permit. 24-h spills with a mean flow much less than the storm overflow rate are potentially non-compliant with the minimum PFF permit condition. A long contiguous series of 24-h spills inhibits recovery from sewage exposure and is more likely to result in sewage fungus pollution that is harmful to both fish and macroinvertebrates32. Short spills, much less than a day in length, may coincide with low flow periods in the typical diurnal pattern of wastewater generation and potentially cause less significant pollution or at least allow recovery. However, even short spills can be extremely polluting in their first flush if storm tanks contain settled solids of previous spills because they have not been emptied in a timely manner.