Introduction

Whether for drinking, agricultural, or recreational purposes, access to safe, clean water is of paramount importance to public health. With the incredible burden that waterborne pathogens place on human health, the risks associated with exposure to contaminated water sources need to be properly evaluated. For instance, an estimated 1.6 million deaths and 105 million disability-adjusted life years lost were associated with water quality or sanitation issues in 2016, of which roughly half were directly attributable to waterborne diarrheal disease1. While the public health benefits of access to clean water are most apparent with drinking water2,3, effective wastewater sanitation also represents an important public health intervention. Indeed, across 39 nations, inadequate wastewater treatment was found to correlate with increased disease mortality, irrespective of national income, development, and overall sanitation4. As such, the treatment and sanitation of all urban water supplies represent one of the most important preventative strategies against infectious diseases.

The ability to reliably eliminate pathogens from the water supply is key for infectious disease prevention; however, what would happen if pathogens could evolve resistance against water treatment? Microbes are incredibly adaptable, and they have a remarkable ability to tolerate normally lethal stressors. For instance, the model bacterium Escherichia coli possesses a myriad of adaptive stress response mechanisms5,6 and stress-specific transcription factors5,7,8,9,10,11, many of which have been found to confer resistance against water treatment-associated stressors including chlorination12, osmotic stress, and oxidant- and UV-mediated oxidative stress5. Beyond these stress-specific response systems, cross-resistance may also be conferred by other stress resistance strategies in E. coli, such as through the production of heat shock proteins13 and a recently characterized genetic island known as the locus of heat resistance (LHR) which, alongside heat resistance, also appear to provide resistance against chlorination and oxidative stress14,15. As such, microbes may already possess the genetic capacity to survive water treatment.

Reflecting this, growing evidence suggests that water treatment may be selected for the evolution of resistance to water disinfection processes. Indeed, certain subpopulations of E. coli have been found to be particularly resistant to water treatment, including naturalized wastewater strains that appear to have adapted to survive in wastewater treatment plants16. Interestingly, these strains appear to be enriched across wastewater treatment17, and demonstrate enhanced resistance to chlorine, oxidants, and heat, as well as an increased capacity for biofilm formation15,18. Beyond these naturalized strains, however, concerning evidence suggests that extraintestinal pathogenic E. coli (ExPEC), including strains causing urinary tract infections, also appear to survive wastewater treatment. Indeed, E. coli isolates harboring typical uropathogenic E. coli (UPEC) virulence genes and pathogenicity islands have been recovered from finished wastewater effluents19, and it has been estimated that anywhere from 41.7% to 59.5% of E. coli isolates surviving wastewater treatment represent potential UPEC according to virulence gene screening20,21,22.

More recently, comparative genomic analyses performed by Zhi and colleagues similarly identified several E. coli strains isolated from chlorinated sewage and treated effluents as clinically-relevant UPEC23. E. coli strains recovered from wastewater matrices clustered in clinically relevant sequence types, such as ST131 and ST95, and were found to be almost identical to clinical UPEC strains across the whole and core genome, as they shared 96.00–99.49% whole genome similarity and differed by as few as 2 single nucleotide polymorphisms (SNPs) in a ~0.4 Mb core genome backbone compared to clinical UPEC strains. Many of the wastewater isolates also possessed the exact complement of virulence and antibiotic resistance genes harbored by their closest clinical UPEC match. Concerningly, five wastewater isolates belonged to the emerging pandemic O25b-ST131 clonal group and were characterized as extended-spectrum beta-lactamase (ESBL) producing strains. Collectively, the finding that ExPEC appear to dominate the population of E. coli surviving wastewater treatment has led to the suggestion that either ‘higher-than-expected’ levels of ExPEC infection exist in the community, or that ExPEC strains naturally occur in wastewater24.

Importantly, the evolution towards water treatment resistance may extend to other ExPEC pathotypes, including the bloodborne E. coli (BBEC) and neonatal meningitic E. coli (NMEC). For instance, E. coli strains isolated from treated wastewater have been shown to cluster within sequence types such as ST95 and ST13119,23,25, lineages that have recently been associated with NMEC26 and BBEC26,27 outbreaks, respectively. Furthermore, while Adefisoye and Okoh characterized the majority of E. coli isolates recovered from wastewater effluent as potential UPEC, 14.8% were identified as potential NMEC22. This points to the possibility that, similar to the findings described for UPEC, the NMEC and BBEC strains may also have evolved resistance against water treatment. Herein, we demonstrate through a comprehensive comparative whole-genome analysis, that NMEC and BBEC strains are common constituents in full-scale treated wastewater effluents and chlorinated sewage—raising the worrying prospect that diseases such as urinary tract infections, septicemia, and meningitis could be water transmissible.

Results

Identification of presumptive ExPEC from chlorinated sewage and final wastewater effluent

Among all 637 wastewater E. coli isolates collected in this study, 247 possessed at least one screened ExPEC-associated virulence gene, of which 7 harbored all seven (Fig. 1a). Of the seven screened ExPEC virulence genes, fyuA and chuA were the most prevalent, while sfa/foc and ibeA were the least common (Fig. 1b). All isolates were also screened for the major ExPEC pandemic lineage ST131 marker, of which 22 were positive. According to the screening criteria used, 86 isolates were identified as presumptive wastewater ExPEC (W-ExPEC) and were selected for genome sequencing. The genome characteristics of the sequenced isolates are summarized in Supplementary Table 1.

Fig. 1: ExPEC virulence marker screening panel results.
figure 1

Number A of ExPEC virulence genes harbored by each wastewater E. coli isolate and B their frequency across all wastewater E. coli isolates as determined by genetic screening with ExPEC PCR marker panel and targeted gene identification in the genome sequences of the wastewater E. coli isolates.

Core genome similarity between wastewater and clinical ExPEC (C-ExPEC) strains

Based on an upper maximum difference of 250 core genome SNPs across a ~417 kbp backbone, 37 W-ExPEC strains were found to share close core genome similarity with anywhere from 1 to 48 clinical BBEC and NMEC strains (Supplementary Table 2). Some W-ExPEC strains in particular displayed a remarkable level of similarity to a clinical counterpart at the core genome level, including the wastewater strains 2F5 and 2F6, which both varied from BBEC_156 by only 5 SNPs (Table 1); and wastewater strain 3G9, which could be distinguished from BBEC_211 by only 3 SNPs (Table 1), and wastewater strain 1G6, which differed from the clinical strain NMEC_9 by only 7 SNPs (Table 2). Some wastewater strains were even found to exhibit close core genome similarity with multiple clinical strains, such as strains 1F2A and 2B4, which differed from BBEC_268 by 6 and 4 SNPs, respectively, and BBEC_267 by 5 SNPs each. Interestingly, several wastewater strains previously identified to be genetically similar to clinical UPEC strains23 also shared high core genome similarity with a clinical NMEC or BBEC strain in this analysis. Of note were WU1030, WU1036, WU1155, WU1265, and WU1266, as each differed from BBEC_265 and BBEC_158 by 6 or fewer core genome SNPs; WU1274, which differed from BBEC_162 by 8 SNPs; and WU1157, which could be distinguished from BBEC_71 by only 7 SNPs (Table 1).

Table 1 Pairwise whole genome similarity and core genome SNP distances of W-ExPEC strains and the two closest clinical BBEC counterparts.
Table 2 Pairwise whole genome similarity and core genome SNP distance matrix of W-ExPEC strains and their closest clinical NMEC counterparts.

Pairwise whole-genome similarity between W-ExPEC and C-ExPEC strains

Identified W-ExPECs demonstrating high core genome similarity with clinical BBEC and NMEC strains (n = 37) were selected for pairwise whole-genome comparisons against the local genome repository of 320 clinical NMEC and BBEC isolates. Additional comparisons were also performed for each W-ExPEC strain against 46 representative intestinal pathogenic E. coli and naturalized wastewater E. coli strains as further evidence that any significant similarities observed between W-ExPEC and C-ExPEC strains were reflective of a shared extraintestinal pathogenic phenotype. All pairwise comparisons were evaluated against an upper median whole genome similarity value of 96.03%, which was previously found to be reflective of a shared pathogenic phenotype23.

Interestingly, all 37 W-ExPEC strains sharing close core genome similarity with a clinical counterpart also exhibited >96.03% whole genome similarity with at least 1 clinical BBEC (Table 1) or NMEC (Table 2). Whole-genome similarity values ranged from 96.04% to 99.74%, with some W-ExPEC strains exhibiting high similarity with up to 48 C-ExPEC counterparts (Supplementary Table 2). Of note was the wastewater strain 1G6, which exhibited 99.72% whole-genome similarity with NMEC_9; and wastewater strains 3B9 and 4G1, which shared 99.62% and 99.58% whole genome similarity, respectively, with BBEC_38 (Table 1), and 99.59% and 99.55% whole genome similarity, respectively, with NMEC_4 (Table 2). Interestingly, wastewater strains previously identified to share close genetic similarity to clinical UPEC strains23 were found to exhibit greater similarity to clinical BBEC strains in this analysis. For instance, the wastewater strains WU1030, WU1036, WU1155, WU1265, and WU1266 all shared >99.10% whole genome similarity with the clinical strain BBEC_265. Similarly, WU1030, WU1155, WU1265, and WU1266 were also all found to exhibit >99.40% whole genome similarity with BBEC_158 (Table 1).

In contrast, W-ExPEC isolates exhibited significantly lower similarity with intestinal pathogenic and naturalized wastewater E. coli strains, with similarity values ranging from 65.9% to 95.3%, with a mean of 84.9% (Supplementary Table 3). Indeed, none of the 86 presumptive W-ExPEC isolates shared >96.03% whole genome similarity with any intestinal pathogenic or naturalized wastewater E. coli strains.

Core genome phylogenetic and sequence typing of W-ExPEC and C-ExPEC strains

To understand the evolutionary relationship between the W-ExPEC strains and their clinical counterparts, a maximum-likelihood core genome phylogenetic analysis was performed with the 37 W-ExPEC isolates, 38 of the closest clinical ExPEC strains at the whole and core genome levels, and E. coli strains from various known phylogroups28,29,30. Of the 37 W-ExPEC strains, 34 were distributed amongst clinical NMEC and BBEC strains throughout phylogroup B2, a major phylogroup known to harbor ExPEC pathotypes31,32, whereas the remaining 3 clustered with their closest clinical matches in phylogroup A (Fig. 2). Although phylogroup A is classically considered to be non-pathogenic33,34 and negatively associated with the UPEC pathotype35, studies have reported clinical ExPEC strains belonging to this phylogroup36. Reflecting this, the latter three W-ExPEC isolates grouped separately from commensal E. coli strains and into two separate sub-clusters alongside clinical BBEC strains within phylogroup A (Fig. 2). Interestingly, although phylogroup D has been positively associated with ExPEC pathotypes other than UPEC35, none of the ExPEC strains clustered within this phylogroup.

Fig. 2: Core genome maximum likelihood phylogenetic tree of wastewater ExPEC strains, their closest clinical counterparts and E. coli strains of known phylogroups.
figure 2

Presumptive wastewater ExPEC strains were collected from samples of chlorinated sewage, partially-treated wastewater effluent and finished wastewater effluent (indicated by the colored circles) from five wastewater treatment plants across Alberta, Canada (highlighted according to the lower legend). The core genome sequence variation of wastewater ExPEC strains (colored black in the outer circle) was compared to their closest clinical counterparts at the whole and core genome levels (colored gray in the outer circle) and various E. coli strains of known phylogroups (inner colored circle according to the upper legend). The main sequence type lineages of the wastewater and clinical ExPEC strains are indicated in the outermost circle. The tree is rooted with E. albertii as the outgroup.

W-ExPEC and C-ExPEC strains were found to sub-structure extensively by sequence type within the phylogenetic tree (Fig. 2). Although 2 of the 37 W-ExPEC strains were designated as unknown sequence types (ST), the remaining 35 were distributed across 8 STs, including those of clinical importance. The major ExPEC lineage-associated sequence type ST13137 was the most represented amongst the W-ExPEC strains, with 18 of 37 isolates clustering within this sequence type. Confirming previous analyses, these 18 W-ExPEC isolates were also confirmed to be positive for the ST131 marker by the PCR screening panel. The NMEC-associated lineage ST9527 was also well represented, identified in 8 wastewater isolates. Of the remaining W-ExPEC isolates, 2 each were designated as ST73, ST10, and ST127, while 1 isolate each was designated as ST357, ST538, and ST44. The 38 C-ExPEC strains were distributed in a similar manner across the 8 STs represented, as each W-ExPEC strain that was assigned a sequence type belonged to the same lineage as their closest clinical match.

Pan-genome similarity and accessory genome phylogenetics of W-ExPEC and C-ExPEC strains

Across the core and whole-genome, W-ExPEC and C-ExPEC strains appear to be highly similar; however, measures of genetic similarity alone may not completely reflect whether the W-ExPEC strains share a similar pathogenic phenotype with their clinical counterparts. In contrast to the core genome, which encodes for essential housekeeping genes, the accessory genome includes genes linked to adaptation, virulence, and antibiotic resistance, which are likely reflective of the predominant lifestyle of a given strain38. As such, a pan-genomic analysis was performed to determine whether the W-ExPEC strains shared a similar accessory gene profile, and thus a similar pathogenic potential, with their clinical counterparts.

A pan-genome was calculated for all 86 presumptive W-ExPEC strains, 38 of the closest clinical NMEC/BBEC matches, 9 reference UPEC strains, 5 naturalized wastewater strains17,39, and 4 laboratory reference strains (Supplementary Table 4). The pan-genome consisted of 26,865 genes, including 2133 core genes, indicating a high level of pan-genomic diversity across the strains analyzed. Reflecting this, the core genes comprised only 8% of the pan-genome, whereas 77% consisted of genes that were shared by fewer than 15% of the strains analyzed (Supplementary Fig. 1). As the W-ExPEC strains possessed an average of 4729 genes, the core genes comprised roughly 45% of the W-ExPEC genome. Although this percentage is higher than typical estimates for E. coli core genomes40, the majority of the strains analyzed either shared a similar ecological niche (i.e., treated wastewater matrices) or a presumed similar extraintestinal pathogenic potential (i.e., wastewater and clinical ExPEC).

To assess the similarity of wastewater and clinical ExPEC strains at the accessory genome level, all isolates were clustered based on the binary presence and absence of accessory genes within the estimated pan-genome. According to the generated clustering tree, the isolates were grouped into three main clusters (Fig. 3a). The most basal cluster (Cluster 1) included E. coli isolates that did not share any significant similarities with any clinical ExPEC, wastewater ExPEC, laboratory reference or naturalized wastewater E. coli strains. As this cluster included strains harboring comparatively large numbers of virulence genes (Fig. 3b), Cluster 1 may represent enteric E. coli strains, which would be expected to be found in wastewater and treated sewage. In contrast, the naturalized wastewater E. coli (all ST635 strains), wastewater ExPEC, and clinical ExPEC strains were distributed throughout Clusters 2 and 3. Cluster 2 isolates were distributed in two main subclusters designated 2a and 2b (Fig. 3a), with 2 of the 86 sequenced wastewater E. coli isolates (2F11 and 1H6) grouping with previously characterized naturalized wastewater E. coli strains including WW223, WW10, and ABWA4517,39. With no clinical ExPEC matches in the local repository, 2F11 and 1H6 may represent additional naturalized wastewater E. coli strains. Subcluster 2b consisted of a small group of W-ExPEC strains (4H1, 2C8, 2F12, 2H7, and 1F4) that grouped with clinical ExPEC strains (BBEC_29, BBEC_88, BBEC_285 and the UPEC strain E. coli 219) representing relatively minor ExPEC lineages including ST10 and ST44 (Fig. 3a). Although some wastewater strains (4E10 and 4G8) present in this cluster did not have a direct clinical match, this could be due to the under-representation of other minor ExPEC lineages in the local repository. The last and largest cluster included most of the wastewater E. coli isolates and their closest clinical NMEC and BBEC matches representing the major ExPEC lineages, including ST131, ST95, ST73, and ST127, as well as other important ExPEC lineages such as ST538 and ST35726. While most W-ExPEC strains clustered closely with their closest clinical BBEC or NMEC counterpart in this analysis, some strains grouped closer to a reference UPEC strain, including WU664 (W-ExPEC) with U059 (C-ExPEC) and 4F9 (W-ExPEC) with CFT073 (C-ExPEC), suggesting that some of these wastewater isolates are more likely to exhibit a uropathogenic phenotype23. Interestingly, even W-ExPEC strains in Cluster 3 without a clinical match in this analysis were found to group within clusters dominated by clinical ExPEC strains (Fig. 3a).

Fig. 3: Accessory genome clustering and virulence and antibiotic resistance gene screening of wastewater and clinical ExPEC strains.
figure 3

Binary accessory gene presence–absence clustering tree A of all original presumptive wastewater ExPEC strains (n = 86) and including those closely matching clinical ExPEC strains (n = 37 of the 86), as well as clinical ExPEC strains, naturalized wastewater E. coli strains, and laboratory reference strains. Included in this analysis are the number of antibiotic resistance genes (ARG) and virulence genes (VG) present in each isolate (B). The major sequence types represented are indicated in highlighted boxes in the clustering tree. Any original presumptive wastewater ExPEC strains that did not exhibit high genetic similarity with a clinical ExPEC strain at the whole and core genome levels are indicated in the clustering tree with an asterisk (*).

Virulence and antibiotic resistance gene composition of W-ExPEC and C-ExPEC strains

To better clarify the pathogenic potential of W-ExPEC strains, the composition of virulence genes (VG) and antibiotic resistance genes (ARG) were compared between the wastewater strains and their clinical counterparts. Virulence gene composition was roughly bimodally distributed across the strains analyzed (Fig. 3b), with subcluster 2a (i.e., naturalized wastewater strains) harboring the fewest virulence genes compared to the flanking strains in Clusters 1 and 3. Interestingly, the comparatively low abundance of virulence genes in this subcluster supports previous hypotheses that these naturalized wastewater strains are non-pathogenic and have become endogenous to wastewater treatment plants17. In contrast, each W-ExPEC strain with at least one close clinical ExPEC counterpart in this analysis possessed an extensive VG repertoire, ranging from 195 to 278 VGs (Supplementary Table 5). W-ExPEC strains from sequence types ST95, ST127, and ST73 consistently harbored the highest number of VGs, though there did not appear to be any discernable pattern of annotated VGs that could consistently discriminate a particular ExPEC ST lineage (Supplementary Table 5). Importantly, none of the 37 W-ExPEC strains possessed VGs characteristic of the major E. coli intestinal pathotypes, including the enteropathogenic E. coli (EPEC)-associated eaeA gene41, the Shiga-toxin producing E. coli (STEC)-associated stx1 and stx2 genes42, and the enterohaemorrhagic E. coli (EHEC) O157:H7-associated rfbE gene43, suggesting that the W-ExPEC strains, if pathogenic, specifically possess extraintestinal pathogenic potential.

On a pairwise basis, the wastewater strains generally harbored a very similar repertoire of virulence genes as their clinical counterparts (Supplementary Table 5). For instance, the ST131 W-ExPEC strain 3G11 and clinical strain BBEC_211 shared 214 VGs, only differing by one VG uniquely harbored in each strain. Indeed, only 3G11 possessed cah, which encodes a calcium-binding and heat-extractable autotransporter protein associated with biofilm formation and colonization44, while BBEC_211 uniquely harbored the int gene, encoding an integrase. Similar observations were made for various ST95 W-ExPEC and C-ExPEC strains. For example, while the wastewater strains 3B9 and 4G1 shared an astounding 276 virulence genes with the neonatal meningitic E. coli strain NMEC_4, the wastewater strains additionally possessed neuE, which is thought to encode for a K1 polysialic acid capsule biosynthesis protein, whereas NMEC_4 uniquely harbored aaiW, an uncharacterized protein.

In terms of antibiotic resistance genes, W-ExPEC strains possessed anywhere from 43 to 56 ARGs, though generally, their clinical counterparts harbored more. In particular, the ST131 W-ExPEC strains possessed the highest number of ARGs, followed by ST44 and ST73 strains (Supplementary Table 6). Interestingly, several ARGs of major clinical importance were found in the W-ExPEC strains. For instance, aminoglycoside modification enzymes (AME), representing the most common and important resistance mechanism against aminoglycoside antibiotics45,46, were well-represented amongst the wastewater strains. Indeed, 7 W-ExPEC ST131 strains isolated from treated wastewater effluents (2B4, 1G10A, 3E4, 4C7, 3G8, 3G9, 4C1) harbored the AAC(3)-IId gene, whereas five ST131 strains isolated from chlorinated sewage (WU1030, WU1036, WU1155, WU1265, WU1266) harbored both AAC(3)-IIe and AAC(6’)-Ib-cr. Other W-ExPEC strains across the ST131 (2F5, 2F6, 2B4, 1G10A, 3E4), ST95 (1G6), ST73 (3H3, 4F9), and ST10 (2F12) lineages were also found to harbor APH(3”)-Ib and APH(6)-Id. Beyond AME genes, several W-ExPEC strains also possessed various beta-lactamase genes, including the ST131 W-ExPEC strains WU1030, WU1036, WU1155, WU1265, and WU1266, which all possessed ampC, ampH, blaCTX-M-15 and blaOXA-1; W-ExPEC strain 4C1, which possessed ampC, ampH, blaCTX-M-15 and blaTEM-181; and W-ExPEC strain 5A5, which harbored ampC, ampH, blaCTX-M-14 and blaOXA-1. As these strains possessed various combinations of beta-lactamases, they may represent ESBL-producing E. coli, supporting previous analyses23. Additional ARGs of clinical concern represented amongst the W-ExPEC strains include the sulfonamide-resistance genes sul1, sul2 and sul347, and the tetracycline-efflux genes tet(A) and tet(B)48.

On a pairwise basis, while some W-ExPEC strains differed from their corresponding C-ExPEC strains by as many as 11 antibiotic resistance genes, others shared incredibly similar ARG profiles. For instance, the ST131 W-ExPEC strains 2F5 and 2F6 differed from BBEC_156 by only 1 ARG, blaCTX-M-9 (Supplementary Table 6), whereas the ST95 W-ExPEC strains 3B9, 4G1, WU1151 and WU1274, shared the same repertoire of ARGs as the clinical strain BBEC_38 aside from blaTEM-181 (Supplementary Table 6). Remarkably, some wastewater strains shared an identical ARG composition with their clinical counterparts, including the W-ExPEC isolate 3E4 with BBEC_267; W-ExPEC strain 3G9 with BBEC_211; and W-ExPEC strains 3B9 and 4G1 with NMEC_4 (Supplementary Table 6).

Discussion

The remarkable similarity between W-ExPEC and C-ExPEC strains across the core, whole, and accessory genome suggests that many strains surviving wastewater treatment may be highly pathogenic with septicemic and meningitic potential. Indeed, growing evidence indicates that, across the wastewater treatment train, certain pathogens appear to differentially survive the disinfection processes designed to eliminate them. Reflecting this, E. coli strains harboring UPEC-associated virulence genes and pathogenicity islands have been recovered from treated wastewater19,20,21,22,23, making up anywhere from 40%22 to 60% of the surviving E. coli population following wastewater treatment20,21. Concerningly, some strains isolated from treated wastewater matrices have even been found to belong to clinically relevant ExPEC lineages, including the pandemic O25b-ST131 clonal group23. While this collectively indicates that UPEC strains have specifically evolved resistance to water treatment, growing evidence suggests that this observation may extend to other ExPEC pathotypes. Indeed, virulence gene screening has previously identified potential ExPEC strains in treated effluents49, and it has been estimated that up to ~15% of E. coli strains surviving wastewater treatment represent potential NMEC based on the presence of the ibeA gene22.

Unfortunately, virulence gene screening approaches are limited for characterizing potential ExPEC in wastewater since no single VG (or set of VGs) clearly differentiates the ExPEC pathotypes50. This is further complicated by the identification of ExPEC virulence genes within commensal strains of E. coli51,52, suggesting that these genes are generally required for adhesion and survival by all E. coli within the gut. Despite this, several recent comparative whole-genome studies23,24, including the present study, provide very compelling evidence to suggest that ExPEC are particularly adept at surviving wastewater treatment, including strains known to cause urinary tract infections, septicemia and meningitis in humans. Concerningly, these surviving strains also appear to carry an abundance of ARGs that may confer resistance to antibiotics of concern for human medicine, and may include ESBL-producing strains23,53,54.

In this analysis, of the 637 E. coli isolates collected from chlorinated sewage and treated wastewater, 86 presumptive ExPEC isolates were identified according to a virulence gene screening panel. Multilocus sequence typing clustered the W-ExPEC isolates into several major ExPEC-associated lineages including ST131, ST95, and ST73, a finding observed in other studies19,23,25,55. Several other ExPEC-associated sequence types of growing concern were also represented among the isolates, including ST10, ST44, ST127, ST357, and ST53826. As several sequence types were represented in the surviving E. coli populations recovered from chlorinated sewage and full-scale treated effluents, water treatment resistance may have independently evolved in multiple ExPEC lineages. Through additional comparative genomic approaches, several wastewater isolates were found to be virtually identical to a clinical BBEC or NMEC strain across the core, whole, and accessory genome, suggesting E. coli strains surviving wastewater treatment may possess the capacity to cause extraintestinal disease. For instance, compared to the septicemic strain BBEC_156, wastewater strains 2F5 and 2F6 shared 98.82% whole genome similarity and differed by only 5 SNPs across a ~417 kbp core genome backbone. These wastewater strains also shared 203 VGs with BBEC_156 and harbored an ARG repertoire that differed only by the blaCTX-M-9 gene. Similarly, compared to BBEC_211, 3G9 shared 98.80% whole genome similarity, differed by only 3 SNPs across the core genome, and shared 212 VGs and an identical ARG repertoire. A remarkably high degree of genetic similarity was also observed between select W-ExPEC and clinical NMEC strains. Indeed, compared to NMEC_4, wastewater strains 3B9 and 4G1 both shared over 99.5% whole genome similarity, differed by only 13 and 15 core genome SNPs respectively, while sharing 276 VGs and an identical ARG profile.

The observation that wastewater ExPEC strains demonstrate resistance to wastewater treatment, similar to the recently characterized naturalized wastewater E. coli, warrants further study. Naturalized wastewater E. coli strains appear to differentially survive the wastewater treatment process16, exhibit resistance to chlorine, heat, and oxidants15,18, and possess an extensive repertoire of stress resistance genes compared to enteric strains17. Compared to ExPEC strains, naturalized wastewater strains possess far fewer virulence genes (Fig. 3); however, these strains do appear to share virulence genes in common with their clinical counterparts23, suggesting that these ‘virulence’ factors may play an important role in survival in non-host environments. Reflecting this, the shared virulence genes between UPEC and naturalized wastewater strains include iron-transport proteins and siderophores that may facilitate the uptake of iron at dilute concentrations typically found in urine and wastewater23. Considering that naturalized wastewater E. coli are also resistant to water treatment, the shared accessory genes between ExPEC and naturalized wastewater strains may explain the differential survival of ExPEC during water treatment19,20,21,22,23. Indeed, the finding that naturalized wastewater E. coli and ExPEC strains share a common resistance to wastewater treatment and appear to exhibit some degree of similarity at the accessory genome raises questions concerning their relationship. Interestingly, previous whole genome phylogenetic analyses demonstrated that naturalized wastewater E. coli appear to have evolved from phylogroup A17, and it is notable that some ExPEC strains clustered within this phylogroup in this analysis (Fig. 2). While the exact evolutionary relationship between ExPEC and naturalized wastewater E. coli strains is outside the scope of this analysis, this finding does suggest that these distinct E. coli lineages may share a common evolutionary origin, offering an interesting direction for future study.

Regardless of the exact nature of the genetic relationship between naturalized wastewater E. coli and ExPEC strains, the underpinnings of their evolutionary emergence are critical to understanding from a public health viewpoint. A pre-requisite for the co-selection of water treatment resistance and pathogenesis in ExPEC strains requires that some level of sustained waterborne ExPEC transmission must occur, particularly from wastewater-contaminated environments. Presently, urinary tract infections (UTIs) account for more than 10 million physician visits each year in the US, of which the vast majority are caused by UPEC56,57,58. As with many clinical infections, the number of reported cases likely underestimates the true prevalence of the health issue. Consequently, UTI rates in the community are likely higher than expected, which both: (a) exerts selective pressure on the pathogenesis (i.e., virulence genes) of ExPEC pathotypes in the population; and (b) provides a constant influx of ExPEC into wastewater treatment plants. Once within the wastewater treatment plant, microbial disinfection processes likely drive the selection of treatment resistance in ExPEC; however, and most importantly, the ability to maintain both phenotypes of water treatment resistance and pathogenesis requires that ExPEC surviving the wastewater treatment processes must cycle back into the human population. Importantly, recreational exposure to contaminated natural water bodies (i.e., swimming) has already been epidemiologically linked to an increased prevalence of UTIs, largely caused by UPEC59. Considering that ESBL E. coli isolated from clinical, wastewater and recreational water samples from the same geographical location have been shown to be of the same clonal lineage60, wastewater-contaminated recreational water may be an important route of ExPEC transmission.

Concerningly, the impact of wastewater contamination on ExPEC transmission may reach far beyond recreational water sources. It has been estimated that 50% of drinking water treatment plants in the US are impacted by wastewater effluents and that wastewater discharge may account for 50% of the volume of water flowing in US rivers during the year, which can reach as high as 90% under low streamflow conditions61,62. This is important to consider as these rivers may be used for drinking, irrigation, and recreational purposes, providing further routes of transmission for pathogens back into the population. Indeed, ESBL-producing Enterobacteriaceae, including E. coli, have been found in 6.4% of drinking water samples that failed bacteriological drinking water quality parameters in the US, suggesting that drinking water may be an underestimated vehicle for transmission of ExPEC in the community63. Similarly, Paulshus and colleagues demonstrated that 10% of E. coli isolates recovered from community wastewater pumping stations represented ESBL-producing strains, of which 44% clustered within sequence types ST131 and ST648, suggesting that wastewater systems may be contributing a considerable amount of antibiotic-resistant ExPEC disseminating throughout the community24. Despite this, most research efforts have classically focused on linking ExPEC transmission to the food supply. Indeed, E. coli isolates harboring ExPEC virulence genes64,65, as well as those phylogenetically linked to extraintestinal disease in the clinic66, have been isolated from food products including retail meat, eggs, milk, and produce—implicating food as an important source of ExPEC transmission. The present analysis, however, adds to a growing body of research that indicates that water may represent an important, yet overlooked, source of exposure for these pathogens67,68,69. As far as we are aware, no epidemiological studies have examined the relationship between exposure to or consumption of contaminated water with meningitis or septicemia caused by ExPEC. In light of these findings, we would strongly encourage such studies to be undertaken.

Interestingly, several wastewater strains that were previously found to share high genetic similarity with clinical UPEC strains23 also exhibited a high degree of similarity with a clinical BBEC or NMEC counterpart, emphasizing the importance of including a wide representation of relevant ExPEC isolates for comparative genomic purposes. For instance, while wastewater isolates WU1030, WU1036, WU1155, WU1265, and WU1266 each exhibited >96.03% similarity to a clinical UPEC23, the degree of whole-genome similarity between these isolates and the septicemic strain BBEC_265 exceeded 99% (Table 1). Conversely, some wastewater isolates representing potential BBEC or NMEC strains in this study shared greater similarity with clinical UPEC counterparts in previous analyses23. As these findings demonstrate, comparative genomic studies between wastewater and clinical ExPEC strains require a comprehensive pathotype database—and for this reason, we also believe that this study likely underestimated the occurrence of ExPEC strains present in wastewater effluents and chlorinated sewage.

In the present study, presumptive ExPEC strains were initially identified through virulence gene screening, which included sequence type markers for the ST131 lineage. While the use of virulence genes to characterize ExPEC strains is in line with other studies19,20,21,22,49,54,65, no single gene marker clearly defines this pathotype, indicating our initial screen likely missed many potential ExPEC strains present in wastewater. Furthermore, although our PCR screen was optimized for strains in the ST131 pandemic lineage37, and specifically the prominent O25b-ST131 clonal group66, several other emerging ExPEC lineages have since been identified, including ST95, ST73, ST10, and ST12726. Targeting ST131 in our assay, although not to the exclusion of non-ST131 strains, likely led to some bias in the selection process and the subsequent under-representation of other ExPEC sequence types in the final repository. In addition, although we created a relatively large local genome database of clinical BBEC and NMEC strains for comparative genomic purposes (n = 320), there are limitations regarding just how many sequences can be reasonably compared. Thus, although several wastewater strains did not share >96.03% whole genome similarity with a clinical BBEC or NMEC strain in our study, these strains may still represent potential ExPEC. Indeed, several isolates that lacked a clinical match in this study still clustered within ExPEC-relevant sequence types, including ST131, ST73, and ST10 (Supplementary Table 2), and grouped within clinical ExPEC-dominated clusters (Fig. 3). Consequently, we believe that including more clinical strains into our repository would have led to the identification of a greater number of W-ExPEC strains, supporting the idea that ExPEC comprises a significant proportion of the viable E. coli found in treated wastewater effluents.

The prospect that pathogenic E. coli may have evolved resistance against water treatment is concerning, particularly considering that water treatment represents the single greatest intervention for control of infectious diseases in modern society. This is further compounded by the observation that many of these strains are also resistant to antibiotics. Importantly, this finding raises the question—if pathogenic E. coli are evolving resistance to water treatment, could this be occurring with other pathogenic microbes? The wastewater microbiome represents an incredibly diverse microbial community70 upon which the same selection pressures can act. Indeed, several studies suggest a common paradigm for the evolution of water treatment resistance in various extraintestinal pathogenic taxa, including in Legionella70, Acinetobacter71, and Mycobacterium72. Alongside these pathogenic genera, others including Roseomonas, Pseudomonas, Aeromonas, Yersinia, and Escherichia, have been found to be particularly abundant across all stages of wastewater treatment, even following disinfection73. With the growing burden of various extraintestinal diseases worldwide, future research is needed to evaluate the risks associated with the potential waterborne transmission of ExPEC and other pathogens that might be evolving resistance to water treatment.

In the present study, wastewater E. coli strains recovered from chlorinated sewage and finished effluents were found to exhibit an extremely high degree of genetic similarity across the whole, core, and accessory genome to clinical BBEC and NMEC strains. As select wastewater strains were found to be virtually identical to a clinical BBEC or NMEC counterpart, the evidence suggests that clinically relevant BBEC and NMEC strains may readily survive municipal wastewater treatment processes. Taken together with previous findings suggesting that clinically relevant UPEC strains comprise a significant proportion of the surviving wastewater E. coli population following water treatment, ExPEC pathotypes overall may have evolved a differential capacity to resist water treatment disinfection. Considering the massive microbial diversity present in wastewater, these findings raise the prospect that other extraintestinal pathogenic microbes may be following a similar evolutionary path towards water treatment resistance. These findings prompt further research into environmental persistence and prevalence of these pathogens following wastewater treatment, and the public health risks associated with exposure to downstream impacted water environments—raising the concerning possibility that extraintestinal diseases such as urinary tract infections, septicemia, and meningitis could be transmissible through waterborne routes of exposure.

Methods

Recovery of E. coli isolates from chlorinated sewage and treated effluents

To reflect the variability that exists in the disinfection processes utilized across wastewater treatment programs, both chlorine-stressed and water treatment-resistant E. coli isolates were collected for this analysis. Chlorine-stressed isolates were obtained from chlorinated sewage according to the United States Environmental Protection Agency’s (U.S. EPA) Alternate Test Procedure74. Briefly, raw sewage samples obtained from 10 different treatment plants across Alberta, Canada, were treated with 3% sodium hypochlorite with a free chlorine residual of 0.3–0.5 ppm. Chlorine dose and contact time reduced the culturable E. coli population by ~4 log10, as estimated by the Colilert QuantiTray® system (IDEXX Laboratories, Inc.), and verified in parallel samples. Following chlorine treatment, residual reactivity was neutralized with a 10% solution of sodium thiosulfate. The chlorine-treated wastewater samples were then inoculated into either ColiTag® or lauryl trypticase broth/BCG media as outlined by Method 9221.F in Standard Methods for the Examination of Water and Wastewater75. Positive ColiTag® and LB/BCG cultures were then plated onto X-Gluc agar plates and incubated at 44.5 °C for 24 h to selectively grow E. coli. Blue colonies were picked and streaked onto non-selective blood agar plates and incubated at 35 °C for 24 h. All presumptive E. coli isolates were biochemically confirmed as E. coli using a Vitek® 2 Automated Bacterial Identification System (BioMerieux, St. Laurent, Quebec, Canada) according to the manufacturer’s instructions. The resulting library of isolates represented the chlorine-resistant population of E. coli.

Wastewater treatment-resistant E. coli isolates were obtained from partially treated (i.e., secondary-treated) and finished effluents derived from three full-scale municipal wastewater treatment plants (WWTPs) in Calgary, Alberta. The wastewater treatment processes utilized in these treatment plants consist of grit removal, primary clarification, activated sludge, secondary clarification and UV disinfection at low or medium pressure doses of 25–30 mJ/cm2 at peak flow, with maximal treatment capacities ranging from 140 to 1020 ML a day. To obtain the wastewater treatment-resistant isolates, samples of wastewater effluent were processed by standard membrane filtration (100 mL) onto X-Gluc agar plates. Blue colonies representing putative E. coli were picked, inoculated into a 96-well plate containing 100 µL of 1X Luria-Bertani (LB) broth and incubated overnight without shaking at 37 °C. A total of 1212 isolates were collected from wastewater effluent matrices, of which a random collection of 261 isolates were confirmed as E. coli using a Vitek®2 Automated Bacterial Identification System and selected for further analysis.

Analysis of ExPEC-related virulence genes and molecular markers

All selected E. coli isolates were grown in TSB overnight at 37 °C and their genomic DNA (gDNA) extracted from the TSB culture using DNeasy Blood & Tissue kits (QIAGEN, Toronto, Canada) according to the manufacturer’s instructions. A total of 376 presumed chlorine-tolerant isolates and 261 water treatment-resistant isolates were assayed against the uspC-IS30-flhDC marker by PCR to eliminate naturalized wastewater E. coli isolates from the library. The remaining isolates in the library were subsequently screened against the uidA gene to further confirm isolates as E. coli, and all confirmed isolates were then screened against a panel of ExPEC-associated virulence genes and molecular markers, as previously described23. All PCR panel gene targets and primers used in this study are provided in Supplementary Table 7. Purified genomic DNA was quantified after extraction using the Qubit fluorimeter (Thermo Fisher Scientific Inc.), and all PCR reactions were performed on an ABI 2720 thermocycler (Applied Biosystems). The PCR reactions for the uidA76 and uspC–IS30–flhDC16 markers were carried out according to previously described protocols. For the other molecular markers, the reaction mixtures consisted of 20~40 ng of gDNA template, 12.5 µL of 1X GoTaq Hotstart Mastermix (Promega, Madison, WI), and 500 nM of each primer. Cycling conditions varied according to the specific molecular marker. For papC, sfa-foc, iroN and ibeA, cycling conditions were as follows: 95 °C for 2 min, followed by 33 cycles of 30 s at 95 °C, 30 s at 63 °C and 45 s at 72 °C, followed by a 7 min incubation at 72 °C. For fyuA and chuA: 95 °C for 2 min, followed by 33 cycles of 30 s at 95 °C, 30 s at 63 °C and 1 min at 72 °C, followed by a 7 min incubation at 72 °C. For kpsM: 95 °C for 2 min, followed by 35 cycles of 20 s at 95 °C, 20 s at 62 °C and 45 s at 72 °C, followed by a 7 min incubation at 72 °C. For the ST131 marker: 95 °C for 2 min, followed by 35 cycles of 20 s at 95 °C, 20 s at 57 °C and 45 s at 72 °C, followed by a 7 min incubation at 72 °C. For the O25b–ST131 marker: 95 °C for 4 min, followed by 30 cycles of 5 s at 94 °C and 10 s at 65 °C, followed by a 5 min incubation at 72 °C. All PCR products were run on 1.5% agarose gels and photographed on an ImageQuant LAS 4000 (GE Healthcare Life Sciences). PCR screening results for the ExPEC-associated virulence genes were then confirmed through whole-genome sequencing for each wastewater E. coli isolate (see below).

Whole-genome sequencing and assembly of presumptive wastewater ExPEC strains

Chlorine-tolerant and wastewater treatment-resistant E. coli isolate harboring at least 3 ExPEC virulence genes as well as any isolates positive for the NMEC-associated ibeA gene or ExPEC-associated ST131 genetic marker (regardless of the presence of other virulence genes) were selected for whole-genome sequencing as presumptive wastewater ExPEC (W-ExPEC) strains. Genomic DNA from the presumptive W-ExPEC isolates (n = 86) was sent to Genome Quebec (Montreal, Canada) for sequencing using an Illumina HiSeq X platform (Illumina) with paired-end 150 nucleotide reads. Trimmomatic Version 0.3977 was used to trim the low-quality reads with the following parameters: SLIDINGWINDOW = 4:15, LEADING = 3, TRAILING = 3, MINLEN = 36. De novo genomic assembly was then performed using SPAdes Version 3.11.178 with the ‘—careful’ and ‘-k 21,33,55,77’ options. Any SPAdes-assembled contigs shorter than 1000 bp were excluded from downstream analyses.

Core genome SNP analysis of W-ExPEC and C-ExPEC isolates

To evaluate the likelihood of our presumptive W-ExPEC isolates being pathogenic, whole genomes were compared against a genomic library of clinical ExPEC (C-ExPEC) strains associated with either blood-borne bacteremia (BBEC) or meningitis (NMEC). First, 320 clinical NMEC and BBEC isolates were downloaded from NCBI databases to construct a local repository of clinical ExPECs (C-ExPECs) (Supplementary Table 4). Core genome SNP analyses were performed for each of the W-ExPEC (n = 86) genomes and against the local repository of 320 C-ExPEC strains using REALPHY v1.1379. Whole-genome assemblies of all W-ExPEC and C-ExPEC isolates (n = 406) were collected into a local folder to be used as input for REALPHY. One of the E. coli assemblies (strain 3G6) in the input REALPHY folder was randomly selected to be the reference sequence, against which the genome sequences of all other isolates were mapped to produce a core genome alignment for SNP assessment. MEGA-X v10.1.080 was then used to determine the number of pairwise SNPs differences amongst all 406 strains in this study.

Pairwise whole-genome comparisons of W-ExPEC and C-ExPEC isolates

W-ExPEC strains that were identified as sharing a high core genome similarity with at least one clinical BBEC or NMEC counterpart—based on having fewer than 250 differing SNPs in a ~417 kbp core genome backbone—were selected for further comparative genetic analyses (n = 37). In this case, whole-genome approaches were used to evaluate the overall genetic relatedness between W-ExPEC strains and their closest clinical counterpart.

First, pairwise whole-genome comparisons were performed between the 37 selected W-ExPEC strains and the local repository of 320 C-ExPEC strains using REALPHY v1.1379 with default parameters. The whole-genome similarity for each pairwise comparison was estimated by REALPHY v1.13 based on the percentage of each C-ExPEC genome that mapped onto each W-ExPEC genome as a reference. A previously established upper median value for whole-genome sequence similarity between highly diverse STEC O157:H7 E. coli isolates23 was set as a threshold for further evaluating the genetic similarity between the W-ExPECs and C-ExPECs. This threshold of 96.03% served as the lower limit for identifying W-ExPEC strains that shared a high degree of genetic similarity, and thus a presumed similar pathogenic potential, with clinical ExPEC strains.

Second, additional pairwise whole-genome comparisons were performed for each identified W-ExPEC (n = 37) against a local repository of 46 representative intestinal pathogenic E. coli and naturalized wastewater E. coli strains downloaded from NCBI (Supplementary Table 4). The W-ExPEC strains were compared to intestinal pathogenic strains (i.e., enterohaemorrhagic E. coli [EHEC], enteropathogenic E. coli [EPEC], enterotoxigenic E. coli [ETEC], etc.) to rule out the possibility that any significant similarities observed between the wastewater and clinical ExPEC strains could be explained by a shared propensity for general pathogenicity and not true extraintestinal pathogenicity, especially since certain intestinal pathotypes such as ETEC have been recovered from wastewater matrices after primary and secondary treatment81. Naturalized wastewater strains were also included in the analysis to assess their degree of similarity with W-ExPEC strains, controlling for the genetic similarity that might be driven through sharing a similar ecological niche (i.e., sewage/wastewater).

Core genome phylogenetic analysis, phylogrouping, and multilocus sequence typing of W-ExPEC and C-ExPEC isolates

W-ExPEC strains that shared ≥96.03% similarity at the whole genome level with at least one C-ExPEC strain were selected for further analysis. A maximum-likelihood phylogenetic tree was generated using RAxML 8.2.1282 based on the core genome alignments produced by REALPHY v1.13 between W-ExPEC strains (n = 37) and their closest clinical counterparts (n = 38), as well as E. coli strains of known phylogroups28,29,30.

The phylogroups of the W-ExPEC and C-ExPEC strains were predicted with the ClermonTyping method, using the ClermonTyper (version 21.03) webserver83. Multilocus sequence typing (MLST) analyses were also performed on all presumptive W-ExPEC strains and clinical ExPEC strains using mlst 2.19.0 (https://github.com/tseemann/mlst) with the Escherichia coli #1 scheme. All information pertaining to the bacterial strains included in the phylogenetic analysis can be found in Supplementary Table 4. The phylogenetic tree was then visualized and annotated using the R packages ggplot2 version 3.3.384, ape version 5.4.185 and ggtree version 2.2.486.

Pan-genomic analysis, accessory genome clustering tree, and whole genome screening of virulence and antibiotic resistance genes

A pan-genomic analysis of all presumptive W-ExPEC strains (n = 86), clinical NMEC and BBEC strains representing the closest clinical counterparts, reference UPEC strains, naturalized wastewater strains, and laboratory reference strains was performed using Roary 3.13.087. Reference UPEC strains and naturalized wastewater strains were included in the analysis, as previous studies have demonstrated that these subpopulations can be isolated from chlorinated sewage and finished wastewater effluents16,17,23. Laboratory reference E. coli strains (i.e., E. coli K12 MG1655, E. coli K12 W3110, E. coli K12 BW2592, and E. coli ATCC 25922) were included in the analysis to represent a non-pathogenic reference subgroup. All original presumptive W-ExPEC isolates identified were included, including any presumptive W-ExPEC strains without clinical NMEC and BBEC matches as they may still represent strains with extraintestinal pathogenic potential that lacked a suitable clinical counterpart in the repository used for this analysis. An accessory genome phylogenetic analysis was then conducted based on the binary presence and absence of accessory genes within the calculated pan-genome, as determined by Roary 3.13.087. The accessory genome phylogenetic tree was then visualized and annotated using the R packages ggplot2 version 3.3.384, ape version 5.4.185, and ggtree version 2.2.486.

The genomes of the selected W-ExPEC strains and their closest clinical counterparts were also screened for the presence of virulence genes and antibiotic resistance genes for comparison using ABRicate version 1.0.1 (https://github.com/tseemann/abricate), with a minimum query coverage of 90% and a minimum percent identity of ≥80%. For virulence gene identification, W-ExPEC and C-ExPEC genomes were screened against the Virulence Factor Database (VFDB; updated January 12, 2021)88 and Escherichia coli Virulence Factor Database (Ecoli_vf; updated January 12, 2021) (https://github.com/phac-nml/ecoli_vf). For antibiotic resistance gene identification, E. coli genomes were screened against the Comprehensive Antibiotic Resistance Database (CARD; updated January 12, 2021)89. The number of virulence and antibiotic resistance genes were then visualized and appended to the accessory genome tree.