Overview

Shigella has been known for a long time by its clinical manifestation, “bacillary dysentery,” even before its identification by Kiyoshi Shiga as the causative agent during a severe Japanese outbreak [1]. Shigella ranks currently as the second leading etiology of diarrhea-associated mortality and is responsible annually for approximately 212,438 deaths, of which 63,713 are children younger than 5 years and 74,402 individuals older than 70 [2]. Although primarily a disease of developing countries, Shigella infection, or shigellosis, remains a public health issue across the globe, with nearly 500,000 cases in the USA annually [3]. Shigella is transmitted through the oral–fecal pathway with a low infectious dose of 10 to 100 cells enough to result in disease [4]. Shigellosis is a non-systematic, enteric, and acute infection characterized by colonic epithelium destruction responsible for bloody diarrhea, sometimes accompanied by mucus, abdominal pain, and fever [5]. Occasionally, Shigella can lead to invasive infections such as meningitis, osteomyelitis, spleen abscess, and sepsis, occurring mainly in malnourished and HIV patients [6].

Shigella species are intracellular Gram-negative, facultative aerobic, and non-sporulating bacilli belonging to the Enterobacteriaceae family [7]. Despite the close relatedness to Escherichia coli, the genus Shigella and its four species were formally validated in 1954 [8]. Since their discovery, several studies have attempted to classify Shigella spp. accurately within the Enterobacteriaceae family. In 1982, based on 192 morphological, biochemical, and phenotypic properties, Dodd and Jones determined that Shigella spp. fell into a major distinct cluster more closely related to Yersinia and Proteus/Providencia species than to E. coli [9]. This phenotypic classification strengthened the traditional separation of Shigella as a separate genus. However, with the rolling of the molecular era, many ambiguities have been raised about the precise taxonomical position of Shigella, while the dilemma of the relatedness to E. coli persisted. Shigella spp. and E. coli appeared as “one species genetically” with DNA–DNA hybridization (DDH) experiments that revealed 80–90% similarity in identity [10]. After that, the sequencing analysis of eight housekeeping genes grouped Shigella into three major clusters, and a limited number of outliers were found to evolve independently from multiple non-pathogenic E. coli ancestors [11]. Even with the availability of completely sequenced genomes and the inclusion of either more housekeeping genes in analysis or the whole set of conserved genes (also known as the core genomes) have confirmed the previous notion that Shigella spp. were intermixed with E. coli [12], [13]. Currently, there is a consensus that Shigella spp. belong to the E. coli species, but the nomenclature has been kept for historical and medical reasons [14]. However, this consensus is still a contentious issue where some researchers claim that the four species of Shigella genus are not clones of E. coli, but members of the Escherichia genus, at the same footing with E. coli [14,15,16].

Although challengeable in clinical microbiology laboratories, differentiating Shigella from E. coli can be guided by many distinctive morphological features. Indeed, more than 80% of E. coli strains are motile, able to decarboxylate lysine, ferment many sugars, are indole positive, and produce gas from d-glucose. Nevertheless, Shigella are non-motile, unable to decarboxylate lysine, do not produce acid from salicin or hydrolyze esculin, ferment few sugars, and do not produce gas from d-glucose, with the exception of Shigella flexneri serotype 6 and Shigella dysenteriae 3. In addition, Shigella sonnei strains can ferment lactose slowly and can be mucate positive [17, 18]. However, the real identification “problem” is more evident in differentiating Shigella from a group of E. coli variants called “inactive E. coli,” which includes the enteroinvasive E. coli (EIEC) pathovar. Indeed, EIEC shares with Shigella some biochemical properties, among them: negativity to lactose, immobility, inability to decarboxylate lysine (lysine decarboxylase negative), and absence of gas production [19, 20]. Nevertheless, some exceptions are present in the case of EIEC wherein some strains for instance those belonging to serotype O124:H30 are mostly mobile, decarboxylate lysine, and can also ferment lactose [21,22,23]. Besides, Shigella spp. and EIEC might also be considered as a single pathotype based on several reasons, including (1) the acquisition of a similar virulence plasmid (pINV) that mediates invasion of host cells; (2) intracellular survival by inactivation of sets of genes as an adaptive mechanism; and (3) the frequent assumption that EIEC is evolving to a Shigella-like phenotype, which is reflected in the similar patterns of gene expression [17, 18, 24, 25]. Despite having similar disease symptoms, EIEC is generally less virulent than Shigella with a higher infectious dose and reduced potential for propagated person-to-person transmission [26,27,28]. While Shigella dysenteriae type 1 can only cause HUS (hemolytic uremic syndrome) among Shigella, EIEC is not known to trigger this syndrome [18, 29]. Besides, EIEC can also be differentiated from Shigella by a minimal number of tests, including motility, mucate, salicin fermentation, esculin hydrolysis, and the combined positivity of indole production and gas formation from d-glucose, and acetate and Christensen citrate utilization. EIEC isolates may be positive for one or more of these tests, but Shigella are generally negative [19, 30, 31]. Subsequently, a guide that combined biochemical, physiological, and serological features was designed for the daily identification of EIEC, E. coli, and Shigella in diagnostic laboratories [19].

Amid the technological evolution, an accurate technique able to differentiate between Shigella spp. and E. coli, especially EIEC, continues to be a significant diagnostic challenge due to their genetic and phenotypic relatedness as indicated above. The distinction is paramount because Shigellosis is a mandatory notifiable disease in most countries, whereas EIEC is not [32]. The correct identification will also elucidate the epidemiology of Shigella spp. and their trends of developing antimicrobial resistance, which will facilitate treatment regimens based on different antimicrobial susceptibility profiles [33]. In addition, the differentiation between Shigella spp. allows for a better understanding of each species’ unique epidemiology such as the prevalence of S. flexneri in low- and middle-income countries versus S. sonnei, which is more dominant in high-income countries [6]. The correct identification at the serotype level is a cornerstone for determining the spatial–temporal distribution of circulating serotypes, understanding the differences in disease burden across countries, and tracking the potential emergence of novel serotypes, investigating outbreaks, and critically evaluating implemented policies for vaccine development and disease containment. Here, we discuss both phenotypic and molecular identification techniques used to differentiate Shigella from EIEC and to identify Shigella at the species level (Table 1, Fig. 1). Subsequently, different countries and laboratories can choose an identification method that suits diagnostic capabilities. This review also highlights inherent loopholes in Shigella’s phenotypical serotyping strategy and summarizes the proposed molecular serotyping alternatives.

Table 1 Summary of the most frequently used methods for Shigella identification
Fig. 1
figure 1

Shigella identification methods and strategies. Blue: phenotypic techniques; yellow: molecular techniques. *EIEC can be differentiated from Shigella by a number of tests, which include motility, mucate, and salicin fermentation, esculin hydrolysis, the combined positivity of indole production and gas formation from d-glucose, and acetate utilization. #Novel approaches of MALDI-TOF include ClinProTools, referenced library, and short-term lactose incubation

Identification techniques

Phenotypic identification techniques

Biochemical test systems

Commercial biochemical identification systems are based on one of five different technologies or a combination thereof: pH-based reactions, utilization of carbon sources, enzyme-based reactions, visual detection of bacterial growth, and/or detection of volatile or non-volatile fatty acids[ 47]. There are multiple tests dedicated to Enterobacteriaceae identification. These tests are categorized into manual such as API 20E, RapiD 20E (BioMérieux, Marcy-l'Étoile, France), RapID ONE, and Micro-ID (Remel, San Diego, California, USA); and automated such as BD Phoenix 100 ID/AST system NID panel (Becton Dickinson, New Jersey, USA), Vitek 2 (BioMérieux), and MicroScan Neg ID Type 2 (Beckman Coulter, California, USA). In terms of their effectiveness in identifying Shigella, Api 20E that has been largely accepted in the last decades in the clinical microbiology laboratories failed to identify 3 to 10% of Shigella strains [47,48,49], while BD Phoenix misidentified nearly 17% of Shigella isolates and defined them as E. coli [36, 50, 51]. As for Vitek 2, it repeatedly misidentified a commensal inactive E. coli as S. sonnei [52]. However, the evaluation studies’ reliability required more analysis because the tested identification system (that is conventional or commercial biochemical methods) was generally questionable regarding its ability to separate E. coli and Shigella [51, 53].

Serotyping

The four Shigella species (subgroups) are divided into serotypes and subserotypes based on their O antigen. S. dysenteriae (subgroup A) has 15 serotypes, S. flexneri (subgroup B) has 18 serotypes, Shigella boydii (subgroup C) has 20 serotypes, and S. sonnei (subgroup D) has a single serotype [4]. Notably, a confirmed identification of Shigella spp. must be based on both serological and biochemical profiles [54]. Traditionally, serotyping was performed using in house or commercial antisera to LPS O-antigen that are divided into polyvalent and monovalent antisera. The polyvalent antisera contain antibodies for multiple Shigella serotypes and can subsequently determine Shigella subgroups, while the monovalent antisera contain serotype-specific antibodies [54]. Among the most common commercialized Shigella serotyping antisera kits are Wellcolex Color Shigella Kit (Thermo Fisher Scientific Inc., Massachusetts, USA) and Vision Polyvalent Shigella Antisera (ProLab Diagnostics Inc., Ontario, Canada) providing polyvalent antisera. Meanwhile, some companies provide both polyvalent and monovalent antisera such as BioRad Laboratories Inc. (California, USA), Deben Diagnostics Ltd. (Ransomes industrial estate, UK), and Denka Seiken Co., Ltd. (Tokyo, Japan). While serotyping was admittedly regarded as the gold standard for Shigella species identification [36], it is considered laborious, time consuming, and impractical for a large number of samples. Also, additional issues lessen the usefulness of such an approach. First, many intra- and inter-species cross-reactions are observed, and commercial antisera are ideally 91% accurate [55]. During a cohort test, 28% of S. sonnei isolates were misidentified by conventional serotyping techniques, and additional tests such as PCR analysis of ipaH and lacY genes or repeated serotyping were used to resolve this discrepancy [36]. Indeed, inherent similarities between E. coli and Shigella O-antigens hinder the reliability of serotyping. Of 34 distinct O antigens identified in Shigella, 21 are identical or very similar to those described in E. coli [18]. EIEC O112ac is similar or identical to S. dysenteriae 2/S. boydii 15/S. boydii 1, EIEC O124 to S. dysenteriae 3/provisional Shigella serovar 3615.53, EIEC O136 to S. dysenteriae 3/S. boydii 1, EIEC O143 to S. boydii 8, EIEC O152 to provisional Shigella serovar 3341:55, EIEC O135 to S. flexneri, and EIEC O164 to S. dysenteriae 3. Second, occasional provisional Shigella serovars, which are biochemically indistinguishable from Shigella spp. but fail to agglutinate with standard commercial antisera, are problematic [56, 57]. This may be due to morphologic transition from smooth to untypable rough strains without O antigens, accounting for 6 to 10% of annual Shigella cases in the USA [55]. The presence of capsular antigens may also prevent Shigella strains from reacting with the antisera [54]. In addition, the emergence of novel and atypical serotypes able to escape host immunity responses can also be explained by serotype conversion phenomena mediated by either temperate bacteriophages or plasmids carrying serotype encoding genetic elements [58,59,60]. The diagnosis techniques for these non-serotypeable Shigella are discussed further below. Third, distinct connections between biochemical features, serotypes, and phylogenetic relationships are not readily notable. For example, phenotypic variability observed within a particular serotype rises when increasing the testing of isolates, while the presence of serotypes that are genetically and serologically related but unrelated phylogenetically, falling into distinct Shigella clades, convolute further the usefulness of serotyping [57, 58].

MALDI-TOF MS

Matrix-assisted laser desorption ionization time of flight mass spectrometry (MALDI-TOF MS) has recently been recognized as a rapid, cost-effective, high-throughput, and reliable microbial identification tool with broader applicability to a large spectrum of microorganisms [37]. Despite this versatility, conventional MALDI-TOF assays using MALDI-TOF Biotyper (Bruker Daltonics, Bremen, Germany), and Vitek 2 MS systems (bioMérieux, Marcy l’Etoile, France), failed to distinguish Shigella spp. from E. coli due to the high degree of similarity between their spectra [36, 61]. However, studies suggesting the use of a specialized automated algorithm (ClinProTools) or customized reference library reflecting the genetic diversity of Shigella and E. coli outperformed routine MALDI-TOF assays by enabling accurate discrimination between E. coli and Shigella with misidentification rates reaching approximately 3% [36, 38]. Recently, an approach merging biochemical methods and the MALDI-TOF assays seems to be interesting. By adding a short-term incubation in a high-lactose fluid medium before MALDI-TOF analysis, Ling et al. identified seven novel differential MS peaks serving as biomarkers to reliably identify these related bacteria with nearly 98% accuracy [62].

Molecular identification techniques

PCR-based identification techniques

In the last decades, several PCR-based identification techniques were developed to differentiate between Shigella spp., E. coli, and EIEC [39, 40]. However, PCR development faced challenges in selecting appropriate targets that allow an accurate differentiation between these targets, where some PCR assays were unable to separate Shigella from EIEC [63, 64]. Generally, PCR identification schemes for Shigella often target plasmid virulence genes such as virA, ial, she, and tuf, which are vulnerable to horizontal gene transfers; potentially leading to false-positive and false-negative results [65]. One of the primary gene targets commonly integrated into PCR schemes is ipaH, a multicopy gene encoding a virulence factor and located on both the chromosome and the large invasion plasmid pINV, which is exclusively found in Shigella and EIEC isolates [65, 66]. Recently, ipaH amplification was used as the first step in two different algorithms proposed for the Shigella identification to differentiate between the Shigella/EIEC (ipaH +) and non-invasive E. coli (ipaH −). The culture-dependent algorithm was followed by profiling phenotypical, biochemical, and serological features. In contrast, the molecular algorithm targeted additionally the wzx genes of S. sonnei phase I, S. flexneri serotypes 1–5, S. flexneri 6, and S. dysenteriae serotype 1 [57]. After analysis with whole-genome sequencing (WGS), the culture-dependent algorithm succeeded in identifying 100% of S. dysenteriae, S. sonnei, and non-invasive E. coli isolates, but only 85% of S. flexneri, and 93% of S. boydii and EIEC. While the molecular algorithm fully identified all targeted species or serotypes, it could not precisely detect the ipaH positive serotypes, with none assessed wzx, and binned them into a single group as either EIEC, S. boydii, S. sonnei phase II, or S. dysenteriae serotypes 2–15 [57].

To differentiate Shigella from EIEC, one PCR scheme amplifies, in addition to ipaH, a lactose permease encoding gene (lacY) present in E. coli including EIEC [40]. This scheme enabled the differentiation of EIEC O121 and O124 groups and Shigella but could not classify EIEC O164 group consistently. Another scheme delineates Shigella from E. coli, including EIEC, by targeting the β-glucuronidase-encoding gene (uidA) commonly found in E. coli and Shigella spp. and lacY which is specifically observed in E. coli strains [39]. However, the accuracy of this PCR approach appeared not to be as excepted. While it correctly identified in silico 100% of S. sonnei, it failed to define 8% of S. flexneri, 14% of S. boydii, 20% of S. dysenteriae, 23% of non-invasive E. coli, and 38% of EIEC isolates [27]. Moreover, the utility of lacY can be questioned because while S. flexneri and S. boydii lack the lac genes (Y, A, and Z), other Shigella spp. possess some lac genes. S. dysenteriae has lacA and lacY and S. sonnei has all the lac genes. However, they cannot ferment lactose due to the lack of permease activity [67]. Furthermore, after 4-h enrichment of the sample in a growth medium, a conventional pentaplex PCR could identify Shigella at genus level and differentiate between S. flexneri, S. sonnei, and S. dysenteriae by amplifying the specific targets invC, rfc, wbgZ, and rfpB, respectively, with an internal control (ompA) [68]. Notably, most of the targets in this pentaplex PCR are located on mobile elements, making them vulnerable to horizontal gene transfer, and thus limiting the usefulness of such a PCR scheme in identification. Interestingly, a new proposed phylogenomic-based multiplex PCR assay by Sahl et al. was able to identify unknown Shigella isolates and classify them into appropriate phylogenetic clades [69]. However, when the primers were tested on a considerable genetically diverse isolate collection, they could not phylogenetically differentiate Shigella [27]. To override the issue of targeting plasmid virulence genes, Kim et al. designed primers targeting novel genetic markers identified through comparative genomics that can differentiate Shigella from diarrheagenic E. coli, including EIEC, and identify the four Shigella spp [65]. Additional steps can be added to PCR such as restriction fragment length polymorphism (PCR–RFLP) [70] or an immunocapturing technology [71] to increase either the sensitivity and/or specificity of detection. However, some of these methods are relatively expensive, technically demanding, and require special equipment which complicate application as diagnostic or epidemiological tools.

Single locus sequence-based identification techniques

16S rRNA gene sequencing

Although representing 0.1% of the coding part of a microbial genome, 16S rRNA gene sequencing has been recognized as a highly useful tool in bacterial classification and has been widely used to provide genus and species identification for isolates. However, its usefulness is impaired by its low discriminatory power and poor resolution to distinguish between closely related bacteria [43, 72]. The reported 16S rRNA gene sequence similarities between E. coli and Shigella spp. exceed 99%; reaching up to 99.8% with S. flexneri, 99.9% with S. sonnei, and 99.7% with S. boydii [43]. Subsequently, 16S rRNA gene sequencing is not considered a reliable tool for differentiating between E. coli and Shigella spp. because they were intermingled together in the 16S rRNA gene–based phylogenetic tree. Using Sanger sequencing, only 26.7% of the E. coli strains were correctly identified, compared to 33.3% as S. sonnei and 40% as S. dysenteriae [41]. This was achieved using a species finder, a web-based tool for prokaryotic species identification based on the similarity of 16S rRNA gene sequences (https://cge.cbs.dtu.dk/services/SpeciesFinder/) with the known reference sequences available at the center of genomic epidemiology. However, the latter revealed a notably inadequate performance in comparison to KmerFinder (another in silico tool, discussed below) and gyrB sequence analysis because only 74% of non-serotypeable Shigella were reliably identified to the species level [43].

rpoB sequencing

Being a single copy protein-encoding housekeeping gene, rpoB can be more advantageous than 16S rRNA gene in microbial identification. While rpoB is deemed a high-resolution marker able to reveal molecular variation down to the population level, it has an overlapping similarity between closely related isolates such as Shigella and E. coli [73]. The rpoB sequence similarities between E. coli and Shigella spp. exceed 93% [73], reaching up to 99.8% with S. flexneri, 99.4% with S. sonnei, and 99.78% with S. boydii. However, Devanga Ragupathi et al. revealed that rpoB and another housekeeping gene malate–lactate dehydrogenase (mdh) accurately identified Shigella and different E. coli virotypes [67].

gyrB sequencing

Compared with the 16S rRNA gene, gyrB that encodes the β subunit protein of the DNA gyrase (Topoisomerase Type II) seems to have a more significant evolutionary divergence, with an ability to distinguish between closely related species. With regards to E. coli and Shigella spp., the gyrB similarity percentages between E. coli and either S. sonnei, S. flexneri, or S. boydii were 98.1, 97.8, and 98%, respectively, being lower than those obtained with 16S rRNA gene analysis, which hints seemingly for the comparative accurateness of gyrB gene sequence analysis [74]. Many studies revealed an outperformance of gyrB in comparison to 16S rRNA gene sequencing [43, 75, 76], where the identification results of gyrB sequencing were highly congruent to the KmerFinder tool with 100% identification of non-serotypeable isolates at the species level [43].

Whole-genome sequencing

Cumulative data generated from many genetic and intergenic regions could provide a more in-depth resolution of the identity of an isolate. Therefore, decoding bacterial genomes via WGS is an up-and-coming desirable technology, particularly with its precipitously decreasing cost, and is predicted to replace conventional microbial diagnostic workflow and become a public health resource for global surveillance [77]. WGS can be followed by multiple analyses to identify, serotype, classify, and even understand their pathogenesis Shigella spp. Numerous enticing WGS-based approaches were assessed concerning their ability to differentiate Shigella and E. coli including K-mers, whole-genome single nucleotide polymorphism (SNP), and average nucleotide identity (ANI) [27, 44, 45]. Notably, the common limitation hindering the complete integration of WGS, especially in low-income countries, is the investment requirement (equipment, reagents) of the WGS platforms and experts in bioinformatic analysis.

K-mers-based approaches

K-mers-based species identification tools (such as KmerFinder the online tool available at https://cge.cbs.dtu.dk/services/KmerFinder/) split the WGS data of unknown isolate into relatively short oligomers of a defined length k, then compare the resulting content of k-mers to a set of k-mers derived from a collection of reference genomes [43, 44, 78]. The similarity between the query and reference sets is expressed as a percentage value indicating the portion of common kmers. The kmer-based identification predicted 98.4% of 1982 Shigella and E. coli isolates in agreement with traditional biochemistry and serology schemes. The 25 discrepant results revealed either the superiority of kmer approach over the traditional schemes when an non-functional O antigen biosynthesis genes in S. flexneri could conventionally misidentify them as S. boydii, or the inferiority of kmer notably for 10 EIEC isolates misidentified as S. flexneri or S. boydii by the kmer-derived identification [44].

SNP-based approaches

SNP-based approaches catch only informative genetic signatures in both gene-encoding and intergenic regions, thus omitting the inclusion of genetically conserved meaningless data [79]. Therefore, SNP, generally considered stable and reproducible molecular markers, can provide additional strain differentiation at a thorough level, which is ultimately essential for outbreak investigation and surveillance strategies of important pathogens such as Shigella spp [27, 80]. Beyond the typing scope, SNP analysis can also draw the true phylogeny of Shigella spp. and decipher their enigmatic relations with EIEC [27]. In addition, SNPs are valuable markers for developing rapid, accurate, and discriminative diagnostic methods [80, 81]. Based on in silico analysis of eight Shigella genomes, 24 informative SNPs were selected from nine genes (gapA, lpxC, sanA, thrB, yaaH, ybaP, ygaZ, yhbO, and ynhA) and were found useful in identifying Shigella spp. as well as providing some resolving power among individual strains within the same species [81]. When analyzing a comprehensive genomic collection of Shigella and EIEC, Pettengill et al. revealed that their phylogenetic profile does not resemble their distinct genera designation [27]. Besides, they identified a panel of 254 SNP markers able to classify EIEC and Shigella isolates into phylogenetic clades from WGS data, rather than classifying them into genus and species [27].

ANI-based approaches

The average nucleotide identity (ANI) between two genomes has been suggested as a valid alternative to the wet-laboratory DDH methods for species delineation. Genomes can be defined as members of the same species if sharing ≥ 95–96% ANI [82]. Various ANI-based approaches and software are currently available to compare the genomes in silico [45]. To reduce the high computational requirements of ANI-based approaches, a novel method known as the whole genome parameter (WGP) was proposed for the delineation of bacterial genomes using four statistical parameters calculated from numerical representations of whole bacterial genomes (phase signal and cumulated phase signal) [45]. However, when these methods were tested for their ability to delineate Shigella spp. from E. coli, the majority including the WGP failed because the tested Shigella isolates yielded high similarity values with E. coli (~ 97.84%) above the delineation species threshold of WGP that is set at 96% [45]. Although these results mirrored the inability of traditional DDH to separate Shigella and E. coli, the usefulness of the ANI approach must be validated on a large diverse sample of E. coli and Shigella spp. rather than a small sample [45, 83, 84]. Notably, the Genome-To-Genome Distance Calculator (GGDC) web tool with the ANI-f1 formula, one of the tested ANI-based delineation tools, showed some power in differentiating E. coli from Shigella spp. and in generating ANI-f1 values under the species delineation threshold (70%) for most of the comparisons between E. coli and Shigella strains [45].

Extended MLST schemes

Multi locus sequence typing (MLST) is a sequence-based genotyping technique based on sequencing several housekeeping genes. Three MLST schemes that were developed originally for E. coli have also been applied to Shigella; the Achtman scheme includes seven housekeeping genes (adk, fumC, gyrB, icd, mdh recA, and purA), while the Pasteur scheme contains eight genes (dinB, icdA, pabB, polB, putP, trpA, trpB, and uidA), and the Whitmann scheme targets 15 genes (arcA, aroE, aspC, clpX, cyaA, dnaG, fadD, grpE, icdA, lysP, mdh, mtlD, mutS, rpoS, uidA) [85,86,87]. Although sequence types (ST) were assigned by MLST schemes regardless of the species identity as either E. coli or Shigella, categorizing isolates into ST seems to mirror Shigella classification [44] where the majority of the isolates within the same species had closely related STs belonging to the same clonal complexes (CCs). Chattaway et al. suggested the combined use of kmer and MLST to differentiate E. coli from Shigella [44]. However, some CCs can encompass many species as in the case of CC288 that is membered by S. boydii and S. dysenteriae isolates [44].

Thanks to the advent of WGS, the MLST schemes with usually seven genes can be extended to encompass more loci distributed over the chromosome (i.e., WgMLST or Whole genome MLST), or conserved loci shared among most of the isolates of the same species (cgMLST or core genome MLST). Undoubtedly, these new MLST facets give a more in-depth genomic insight and a higher resolution than the conventional MLST, especially for closely related bacteria [46, 79, 88]. For Shigella and E. coli, the same cgMLST (targeting 2513 core genes) and wgMLST (targeting 25,002 coding genes) schemes are available at the publicly accessible Enterobase database (http://enterobase.warwick.ac.uk/species/index/ecoli). BioNumerics proposed another wgMLST scheme based on Enterobase but with modifications (17,350 target genes—2513 core genes and 14,837 accessory genes) for both Shigella and E. coli [89]. These recent schemes are mostly used for epidemiologic investigations of E. coli and rarely for typing Shigella [90,91,92,93.] In addition to their promises in typing, the cgMLST technique demonstrates a capability in resolving the discrepancies raised between the culture-dependent and molecular-dependent algorithms proposed by Van den Beld et al. This is done by configuring the cgMLST-based clustering of inconclusive isolates with reference strains [57]. However, the ability of cgMLST in species allocation requires additional investigation. While cgMLST clustered most of the Shigella and EIEC genomes according to their species, it formed some clusters with mixed-species due to their deviating phenotypic features [94].

Genoserotyping

The issues mentioned above of phenotypic serotyping spurred the development of several molecular techniques allowing the detection and characterization of isolates at the genetic level, regardless of whether the genetic material was expressed. Monitoring disease burden requires fast and high-throughput methods that facilitate identification and surveillance at the serotype level. Generally, molecular serotyping techniques are considered as fast methods generating a deluge of objective information in a relatively short period because of their high-throughput capabilities. Although WGS will complement or replace Shigella’s conventional serotyping soon, it is not ready for routine use in most clinical microbiological laboratories.

In brief, molecular serotyping was firstly applied to Shigella by Coimbra et al., proposing the use a restriction method (rfb-RFLP) with the enzyme MboII of an amplified region that harbors O-antigen encoding genes (known as the rfb cluster) to decipher the serotype-specific rfb polymorphism [95]. This technique had shown a closer resolution to traditional serotyping scheme generating discernible O-antigen patterns for each serotype except for S. boydii 12, which showed two distinct patterns, and S. flexneri serotypes 1–5, X and Y, which all gave the same indistinguishable pattern [95]. A dynamic software (Molecular serotyping tool) was then developed to ensure a quick identification at the serotype level and compare the rfb-RFLP patterns of clinical isolates to those in a database encompassing profiles of 171 previously known Shigella and E. coli [96].

Furthermore, many multiplex PCR schemes have been established for Shigella serotyping as quick and affordable methods, especially for S. flexneri. Sun et al. developed a single tube multiplex PCR assay with eight sets of primers targeting O-antigen synthesis and modification genes, which allowed the identification of 14 out of 15 serotypes of S. flexneri (except serotype Xv) with a high agreement (97.8%) with traditional slide agglutination methods [97]. This conventional PCR was also upgraded to a real-time version [98]. Evaluation studies proved its full correlation with WGS and its outperformance over traditional methods as discrepancies between phenotypic and genotypic techniques were attributed to the presence of novel genotypes, non-specific cross-reactions, or genetic modifications in O-antigen synthesis or modification genes [99]. In addition, two other multiplex PCR assays could efficiently determine the 19 serotypes of S. flexneri recognized so far, where “PCR A” defined serotype genes and “PCR B” identified serotype 7 specific genes and group antigenic factors genes [100]. To resolve the PCR multiplex-associated problems, particularly the differentiation between similar-sized bands, Li et al. developed a DNA microarray able to simultaneously detect 34 distinct O-antigen Shigella forms with high sensitivity and specificity [101]. However, these methods that rely on O-antigen specific biosynthetic genes must be complemented with biochemical tests for a reliable differentiation because many Shigella serotypes share identical O-antigen with commensal E. coli, in addition to the high level of observed recombination among serotype-specific genes mostly encoded on mobile genetic elements [35, 58]. WGS provides new insights into the Shigella phylogeny that has never been tackled before. By performing in silico molecular serotyping based on Sun et al.’s (2011) scheme, Connor et al. revealed that the serotype weakly predicted the phylogenetic relationships between strains of S. flexneri, where each of the seven identified phylogenetical groups encompassed two or more serotypes [58, 97]. In addition, WGS places the whole genetic repertoire under scrutiny and offers the ability to interrogate many genes simultaneously. Indeed, analyzing a sole genetic marker could mislead the identification at both species and serotype levels due to the considerable genomic variability of the individual genomic targets [55].

WGS could maintain backward compatibility with historical data by providing a framework for in silico genome-derived serotyping, along with its ability to identify novel serotypes. For example, after an in-depth examination of 259 Shigella genomes belonging to 53 serotypes, Wu et al. have recently developed an automated pipeline, ShigaTyper, able to quickly identify and predict 59 serotypes from Illumina paired-end reads with high accuracy (98.2%) [55]. Likewise, Ventola et al. proposed two novel tools to be implemented in the National Reference of Salmonella and Shigella of Belgium for Shigella surveillance. The first tool consisted of a cost-effective Luminex assay based on a modular multiplex oligonucleotide ligation‐PCR procedure targeting five genetic markers for species identification and 11 serotype markers for S. sonnei and S. flexneri, in a single test [82, 102]. The second tool is a WGS-based workflow for automated prediction of Shigella serotypes, focusing on gene functionality [82].

Conclusion and general considerations

Identifying bacterial pathogens at genus, species, and strain levels is indispensable in supporting appropriate diagnosis and treatment, assessing the disease burden, tracking sources, performing traceback investigations, and disclosing changes in the frequency of phylogenetic groups in humans/animal disease and environmental niches. Concerning Shigella identification and serotyping, this remains a daunting task, especially in developing countries 75. Essentially, the hardship might arise from the taxonomic ambiguities lurking behind the separation of E. coli and Shigella because of their genetic relatedness. With the beginning of the WGS era, it is necessary to reconsider Shigella/E. coli based on phylogenetic criteria with or without renaming of genera and species to better serve medicine and science interests. Pending a more refined taxonomic concept for E. coli and Shigella, clinical microbiological laboratories should select the most appropriate identification tests to set Shigella apart from E. coli, more particularly EIEC, in terms of trade-offs between their advantages and disadvantages as discussed earlier in the review. In laboratories with low-resources settings, clinical symptoms and phenotypic tests can be used to differentiate Shigella from EIEC and even to serotype Shigella. When possible, an algorithm merging both phenotypic and molecular tests might help elucidate an isolate’s real identity. In high-resource settings, WGS could serve as an “all-in-one test” for both identification and serotyping of Shigella spp., and also for disclosing novel genomic markers and validating previously well-established methods on extensive diverse genomic collections. When considering WGS as a technique, one should pay close attention to the questions at hand and select the most appropriate analytical approaches. WGS-based approaches with low resolution and speciation objectives such as those based on K-mers and ANI can draw the proper (or real) borders between these different taxonomic entities; that we call species. However, these approaches (e.g., ANI) must be verified thoroughly on a representative collection. Otherwise, WGS-based approaches with finer resolution (usually denoted as typing approaches) such as those based on SNP, cgMLST, or wgMLST can go far within the same species and dissect the borders between serotypes, clones, and even isolates. Therefore, the question that must be answered in future studies is at what level of resolution should a typing approach with finer classifications (number of SNP or different alleles, for example) show clades that share characteristics attributable to Shigella but not to E. coli in regards for example to clinical symptoms, infectious doses, and biochemical features. Undeniably, WGS (will) become the future gold standard for the Shigella surveillance and epidemiologic investigations, particularly with the steadily decreasing cost of sequencing platforms and the growing number of user-friendly bioinformatics tools and pipelines. Meanwhile, appropriate backward compatibility should be maintained to harmonize the data between the different stakeholders and establish firm bridges with historical data.