Introduction

Clean drinking water is a basic human right1; however, access to safe drinking water is not universal. Globally, an estimated 1.8 billion people drink water that is contaminated with faecal indicator bacteria thermotolerant coliform or Escherichia coli2 and are thus at risk of diarrhoeal diseases3. Especially at risk are children under 5 years of age, for whom diarrhoea is the fifth leading cause of death4, with unsafe water and unsafe sanitation accounting for 72% and 56% of diarrhoea deaths, respectively4.

The UN’s Sustainable Development Goal (SDG) 6.1 was set to “achieve universal and equitable access to safe and affordable drinking water for all”;5 in order to sustainably address this goal, safe drinking water should be viewed as a “source to sip” framework6 (Fig. 1). Water should be collected from an improved source7 that is accessible, sustainable, and of adequate quality;8 transported using a clean fetching container;9,10 treated consistently and correctly over a sustained period11,12,13,14,15 using a device that has been adequately operated;16,17,18,19,20,21 stored using a clean vessel after treatment9,10,22, and consumed using a clean cup9,23,24. Taken together, these important components comprise household water treatment and safe storage (HWTS), which can be employed to provide protection against diarrheal illness25, potentially resulting in substantial positive health impacts26. Use of HWTS is widespread: an estimated 1.1 billion people employ HWTS practices27, and in contexts where SDG goal 6.1 has not yet been reached and there is insufficient or non-existent access to a safely managed on-premises water supply, HWTS is key to protecting public health.

Fig. 1: The “source to sip” framework adapted for HWTS.
figure 1

POUWT methods are one part of effective HWTS. Field performance of POUWT methods, as well as factors affecting such, are the focus of the present Perspective.

Although all links in the “source to sip” chain as depicted in Fig. 1 are important to protect consumer health, this Perspective focuses on point-of-use water treatment (POUWT) methods, which are the final—and sometimes only—safety barrier against waterborne disease. There are a wide range of POUWT techniques, most commonly taking the form of chemical disinfection (e.g., hypochlorite), UV disinfection (e.g., solar disinfection) or filtration (e.g., ceramic filtration)28.

“How well do POUWT methods reduce waterborne microbial risks?” This question can be answered using a process called challenge testing (also referred to as microbiological testing or efficacy testing) to look at the microbiological performance of POUWT strategies. Challenge testing consists of spiking test water with virus, bacteria, and/or protozoa or their surrogates and treating water to determine the microbiological reduction efficacy (log10 reduction values, LRVs, of pathogens or their surrogates) in a controlled laboratory setting25,29. Although laboratory-based challenge testing is a valuable tool to evaluate the performance of POUWT approaches under controlled and replicable conditions, we posit that such a controlled environment represents a “best-case scenario”—even if test conditions are intended to mimic poor source water quality—and challenge testing potentially does not provide an accurate representation of POUWT performance experienced by the end user.

Laboratory-generated POUWT efficacy data is contextualized via health risk assessments using the quantitative microbial risk assessment (QMRA) framework. QMRA can be used to estimate expected health gains from introducing a given POUWT method into a community30,31 or examine the trade-off between POUWT efficacy and compliance during use11,12,13,14,15. We will show in this Perspective that such QMRA analyses are utilizing performance data (LRVs) of POUWT methods that have been overestimated using laboratory-based studies; consequently, the conclusions reached by such QMRA analyses could be inaccurate. There is a research gap to gather more representative data (i.e., field-based assessments); such techniques do exist32,33,34,35,36,37 but are not directly applicable to low-resource contexts.

The objective of this Perspective is to explore the data discrepancy between laboratory and field assessments of POUWT methods. We will present the evidence of such a data discrepancy, discuss the resultant public health implications, and propose a strategy to fill this space.

POUWT performance in the laboratory and field

Methods

Our narrative review grew from the impetus to investigate the origin of the LRV comparison estimates (i.e., laboratory-based LRVs, or best-case scenario, versus field-based LRVs, or baseline performance) published by the WHO25,38, which were not systematically derived. The studies we examined for this Perspective were identified in a narrative process via one of two means: (1) by investigating papers cited by the WHO25,38 to construct LRV comparison estimates (i.e., laboratory-based LRVs, or best-case scenario, versus field-based LRVs, or baseline performance); and (2) by searching scholarly databases (e.g., Google Scholar, Web of Science) using search terms similar to those in studies referenced by the WHO25 (e.g., “Ceramic filt*” + “challenge” + “drink* water”). We included only peer-reviewed studies that directly reported laboratory and field log reduction values (LRVs) for the same POU technology.

The laboratory versus field performance discrepancy

Laboratory challenge tests are a useful tool that can tell us the likely maximum performance of the POUWT method under evaluation. This applies even if laboratory assessments are intended to simulate challenging conditions, for example the use of a high-turbidity, high-organic-content test water25,29,39. Laboratory assessments can be effectively employed to identify water quality- or treatment-related limitations of products and screen performance between several design conditions40,41, options or products42,43.

In the field, microbiological performance of POUWT strategies is typically assessed by sampling water before and/or after treatment42 (e.g., at the inlet and outlet of a filter), which is well-suited to evaluate compliance with health-based water quality targets. To a more limited extent, field evaluations can examine risk reduction or potential for protection offered to the end user by a given POUWT method, although such studies can be censured by influent microbe concentrations.

The data in Table 1 show that reported discrepancies between laboratory efficacy and microbiological field performance range between 0.1 LRV44 and 8 LRV17. Aggregate estimates published by the WHO25 of the laboratory versus field performance discrepancy range between 1 and 4 LRV for virus, bacteria and protozoans. Field studies conducted on ceramic filters exemplify evidence of wide variation (i.e., several LRV) in the field data, between households and visits20,21, and in the laboratory data, between filters and within individual filters over time20. In general, there was a paucity of direct comparisons between laboratory efficacy and field performance, especially with respect to solar disinfection (SODIS), for which no direct comparisons could be found. Comparative evaluations, although imperfect, provide important context-specific information and more such studies are needed.

Table 1 Reported discrepancies between laboratory efficacy and microbiological field performance, with comparison to aggregate estimates published by the WHO25.

Although not presented in Table 1, some studies noted a decline in other non-microbiological performance indicators between laboratory and field, such as decreasing ceramic filter flow rates over time17. This was especially noted in the case of high-turbidity source water17 and/or elevated turbidity in filter effluent23,45 (see Supplementary Table 1 through 5).

There are general limitations of comparing laboratory and field studies. Field studies typically report on bacterial reduction, excluding virus and protozoan reductions due to limitations in field quantification methods. Field study sampling points varied considerably. “Before treatment” water samples were collected from the water source16,18,23,46,47 (e.g., local tap, borehole or surface water), stored water in the household44 or directly from the top bucket of a filter17,21,48. “After treatment” water samples were collected directly from the bottom bucket of the filter21 (bypassing the spigot), from the filter spigot17,18,44,47,49 (bypassing the drinking cup) or from the drinking vessel16,47, possibly confounding treatment performance with potential re-contamination or re-growth. Variable environmental bacterial concentrations were noted as potentially driving variations in measured POUWT performance23,44,50. There is a relatively high potential for field POUWT performance to be censored or limited by environmental bacterial concentrations20,21, which are typically several orders of magnitude lower than those used in spiked laboratory studies (which are not intended to simulate bacterial concentrations of natural waters), although censored data does occur in laboratory studies17.

The laboratory versus field performance discrepancy, explained

Explanatory factors have been suggested regarding the observed discrepancy between laboratory and field data (Table 1). With respect to ceramic filters, inconsistent filter performance was sometimes due to varied manufacturing processes18 and cracking or damage was cited as allowing short-circuiting of water through filter elements17,23,51,52. Decreased flow rates or blockage has been in some cases attributed to irreversible fouling including biofouling17,52. It was noted that lack of access to local supply chains for repair or replacement of such damaged filters hindered the performance19,53. Improper user cleaning practices, including backwashing or washing with unclean water17,18,19 or touching hands to the external filter element or clean water receptacle18,20,21 were observed, as were general user practices such as improper retrieval of water from filter (e.g., dipping hands into receptacle)21 or using untreated water to rinse the drinking cup19.

With respect to biosand filters, variable or unfavorable filter use conditions were postulated to explain the discrepancy between laboratory and field performance (e.g., frequency of use, treated water volume, residence/standing time of water within the filter and/or receptacle)23,44. For both ceramic and biosand filters, variable or poor source water quality (i.e., microbiological or non-microbiological quality) was cited as hindering microbiological performance23,44.

Water quality was also cited as hindering chlorine disinfection due to the potential for free chlorine consumption to leave a decreased residual for disinfection16,23, particularly in cases where no prior treatment occurred to remove turbidity or organic material prior to chlorination16. Similarly, natural variations in field water chemistry that were not present in the laboratory were cited for electrolytic disinfection with silver54. Long storage periods or re-contamination of household storage containers can also consume free chlorine residual, leading to a decrease in disinfection and therefore microbiological reduction16. Incorrect or inconsistent chlorine dosage was also noted in the field, particularly in cases where procurement of chlorine is difficult or relatively expensive, where users sometimes aim to make supplies last longer by under-dosing their water16. In the case of electrochlorinator devices, running out of battery charge, breakage or technical problems caused a decline in performance and/or cease in use23. Variability of human use and unpredictable human factors were cited with respect to the laboratory versus field LRV discrepancy for silver electrolysis disinfection54.

Why does the laboratory versus field performance discrepancy matter?

There are several potential implications of conflating laboratory-demonstrated microbiological efficacy with field-validated performance of POUWT techniques. Literature has been published55 that reaches conclusions and recommendations based solely on laboratory-based data, ignoring factors impacting field performance and therefore potential end-user health protection. Although not applicable to most POUWT manufacturers, achievement of high LRVs in the laboratory could potentially give manufacturers license to imply that their devices confer a high degree of protection to the user, despite the fact that sustained, proper use may be difficult, as has been observed for some devices17,19.

Some reported differences in laboratory versus field data16,17,23,46 (Table 1) exceed the default highly protective performance target set by the WHO25 for bacterial reduction (i.e., differences in excess of 4 LRV). A difference in laboratory versus field performance of 4 LRV is estimated by the WHO25 itself for bacterial indicators with respect to size exclusion approaches (i.e., ceramic filtration, bacterial reduction, Table 1). Such data implies that some POUWT methods found to be highly protective based on laboratory data have the potential to confer zero LRVs (and therefore limited to no protection) in the field. Other differences in reported laboratory versus field data18,20,21,23 are equal to or greater than that needed to “graduate” from protective to highly protective (i.e., ≥2 LRV) under the WHO performance targets25, implying that some techniques found to be highly protective based on laboratory data could meet lower protective or interim status based on field data.

Laboratory-generated performance data (LRVs) of POUWT strategies are used as input data for QMRA studies30,56,57,58, based on which recommendations can be made by public health organizations or local governments regarding method selection or guidance for treatment. If POUWT performance has been overestimated using laboratory-based data, and end-users are seeing decreased performance, then such public health recommendations could be inaccurate or problematic. For example, recommendations may end up favouring a method that has a higher laboratory efficacy but lower field usability and performance15, which could compromise the health protection offered to the end user.

HWTS practitioners are now re-framing the paradigm of POUWT approaches from a “silver bullet” technology, which was based on high LRVs generated via laboratory studies, to one that includes research on sustainability and POUWT approaches that take context into account and reduce the need for behavior change59. Such a shift follows Gartner’s Hype Cycle, from the initial “technology trigger”; the “peak of inflated expectations” (i.e., the silver bullet); “trough of disillusionment” (i.e., observed decline in adherence over time and non-significant health outcomes from randomized controlled field trials); to the “slope of enlightenment” and presently to the “plateau of productivity”, including field evaluations of POUWT devices to ascertain the true LRVs and therefore potential protection conferred to the end-user59.

What can be done to address the laboratory versus field data discrepancy?

During laboratory efficacy testing, spiked water having anywhere from 105 to 109 organisms per 100 mL has been used to challenge test POUWT methods and thus calculate LRVs on the order of 5–9 LRV25. As noted above, field performance studies are limited by the use of lower environmental levels of microorganisms (i.e., lack of a high spike). This can lead to censored LRVs characterized by non-detected effluent microorganisms, which was observed in some studies (Table 1). Therefore, challenge water with a higher organism spike has been suggested for field evaluations23,37,60,61.

Such spike organism(s) should be safe for human consumption (i.e., “food-safe”) to be used outside a laboratory setting and to test POUWT techniques under actual use, to satisfy ethical and safety requirements for human study participants. In addition, spike organisms should be easily transportable and culturable using feasible techniques that can be deployed outside the laboratory setting. An appropriate bacterial surrogate (a probiotic health supplement containing non-pathogenic E. coli) has been identified and was subject to preliminary validation through previous work37 via an established surrogate selection framework62. Baker’s yeast (Saccharomyces cerevisiae) has been identified as a possible non-pathogenic surrogate for protozoans32,33, has been applied as a challenge organism to evaluate in situ microbiological performance in non-potable water applications34,36 and has been recommended for further use for other in situ evaluations35. A suitable viral surrogate has not yet been proposed in the published literature; this is a research gap that would be valuable to address, completing the “suite” of food-safe microbiological surrogates.

Using probiotic E. coli and baker’s yeast as food-safe surrogates for bacteria and protozoa, respectively, we propose the concept of “field challenge testing”. Under this concept, POUWT techniques would be challenge tested in situ using food-safe surrogates as a compliment to data obtained in the laboratory.

Useful applications of the field challenge test method

Given the great global need for effective HWTS, there is a corresponding need for effective POUWT evaluation protocols to assess microbe reduction these technologies under conditions that are representative of real life situations, including user conditions and water quality63. The field challenge test method aims to address this need.

One potential application for field challenge testing would be to ascertain the (non-censored) performance of POUWT strategies under real-use conditions. Field challenge studies would comprise sending specifically-trained enumerators to visit households and conduct field challenge studies using the POUWT method on premise, in a similar fashion to existing water quality data collection techniques currently employed by the WHO/UNICEF Joint Monitoring Programme64,65. POUWT users could be engaged as study participants to use their own POUWT method to treat a volume (e.g., 1 L) of spiked water, containing probiotic supplement and/or baker’s yeast as outlined above. Field challenge testing could be combined with other established survey methods, such as water quality testing at points of collection and consumption65, a participant questionnaire66 and/or a HWTS sanitary inspection67. Following testing, enumerators would ensure that microbes are flushed and/or cleaned from the POUWT device with either 70% ethanol, or soap and clean water as appropriate; participants would not drink the spiked test water.

Enumerators would be trained for the express purpose of conducting microbiological challenge tests, proficient in water quality testing methods including aseptic technique. They would use established field water quality testing methods to process influent and effluent water samples resulting from the field challenge test, such as field membrane filtration to enumerate E. coli64,65, and SimPlate method for Yeast and Mold Color Indicator (Y&M-CI) for the detection and quantification of yeast68, which has been validated for use against conventional agar plating methods69,70 and is appropriate for low-resource field contexts due to the pre-packaged sterile materials it uses, as well as the lack of requirement for reagent refrigeration68.

Field challenge testing would garner more field-relevant data (i.e., under real-use conditions, where it matters most), thereby reflecting the influence of the contexts in which they are used, as opposed to idealized laboratory conditions. Risky user behaviors and/or environmental factors could be identified via observations or questionnaires, such as cross-contamination during water treatment or low-quality influent water, as well as quantifying impacts on device performance. Results could be compiled to monitor and classify health risk following approaches similar to that used by the WHO71 to integrate sanitary inspection and water quality data. Identification of the riskiest factors could help to guide feedback for the design of POUWT, and/or instructional campaigns for best practices regarding POUWT methods.

It is now accepted that compliance is essential in QMRA modelling to effectively estimate health gains, which are then used to make public health recommendations11,13,14,72. We propose that site-specific microbiological field performance data, gathered via field challenge testing by use of food-safe surrogates (i.e., probiotic bacteria73 and/or baker’s yeast32,33,34,35,36) could and should be incorporated into QMRA models the same way, although a research gap exists regarding food-safe viral surrogates. QMRA models are currently essential in understanding risk and facilitating important public health decisions; adding site-specific challenge test data in combination with compliance data would be highly instructive74. Field challenge testing would also be useful in application to technologies for which compromised performance is not solely driven by user error; namely, ceramic water filters (CWFs)17,23,51,52. CWFs are typically manufactured in decentralized facilities, where lack of access to a centralized laboratory for microbiological quality control causes a triple burden of logistical complexity, cost and time delays75,76,77,78 while excluding local stakeholders from long-term gains in skills and knowledge79. Ideally, CWFs would be tested on-site or locally79 allowing CWF manufacturers to implement low-cost microbiological quality control, for which there have been recent calls78. Similar microbiological methods would be used as described above, except that challenge testing would take place at established CWF manufacturing facilities, rather than by household survey. This would support filter manufacturers to produce, and therefore consumers to purchase, consistently high-quality technologies, while keeping knowledge and quality control practices local to the community79.

Limitations of the field challenge test method

All methodologies are subject to limitations, and it is important to report limitations for study transparency. As noted above, spike organisms would be rehydrated in local source water using commercially-available probiotic supplements or baker’s yeast, which poses several limitations. The probiotic and the yeast are both dry powder, and therefore their addition to water will increase turbidity, even though small amounts of powder are used, potentially causing interactions with chlorine (i.e., chlorine demand), UV disinfection (i.e., shielding), or filtration (i.e., clogging)25. Traditional (laboratory-based) challenge testing entails pre-culturing spike organism(s) in a nutrient medium (i.e., non-selective agar or broth) to bring the microorganisms to stationary phase and purifying the mixture (i.e., by centrifugation or by agar washing) before spiking25,29. Aside from minimizing interactions with various treatment mechanisms as described, this process ensures that organisms are at their most robust and resistant to treatment (in particular disinfection), thus providing a conservative LRV estimate25,29. Such pre-culturing is not ethically possible with field challenge testing, where study participants could use their own POUWT devices for testing. This is due to the potential to inadvertently culture a pathogenic microorganism during the pre-culture phase—even if the seed microbe is food-safe probiotic or baker’s yeast and aseptic technique is employed.

Although validated and widely used, limitations in field-based culture methods include the somewhat increased potential for sample contamination during processing, although this can be mitigated by QC techniques such as regular (i.e., daily or every 10 tests) negative control (i.e., processing an additional “blank” test of locally-purchased bottled water, assumed to be free of detectable levels of contamination), as typically used by MICS surveys64,65. In addition to blank tests, in the MICS data collection programmes, data reliability is aided by intensively training enumerators prior to deployment, using standardized and pre-sterilized materials for microbe quantification, and continually monitoring enumerators, typically by a supervisor who is an experienced laboratory technician65. In addition, field enumeration techniques typically require additional time to process samples compared to laboratory techniques, meaning that triplicate samples are difficult and time-consuming to process. The limited capacity for triplicate analysis could be mitigated by employing additional enumerators to conduct field challenge testing or bringing samples back to a laboratory for analysis, depending on the study location.

Beyond any culture technique-based study limitations, the use of field challenge data as inputs to QMRA modelling is also subject to limitations. The use of non-pathogenic microbes as a process indicators for reference pathogens is established for in situ studies or in cases where using a pathogenic spike organism is infeasible80. In cases where non-pathogenic surrogates are used to measure LRVs (i.e., for in situ studies), the non-pathogenic surrogate functions more as a general (process) indicator, demonstrating the overall process efficacy80. The choice of indicator-pathogen relationship is an assumption that impacts reference pathogen LRV estimates, which can have a relatively high impact on QMRA models81, the extent of which should be assessed via sensitivity analysis.

All studies are subject to limitation; however, even if the techniques are imperfect, that does not mean they are not useful and informative. Field-generated data can fill critical data gaps and augment laboratory-based findings82, allowing a more comprehensive characterization of the state of public health conditions in low-resource contexts. We therefore believe that our proposed method is a useful way to fill the research gap as outlined in the present Perspective.

Outlook and summary

In this Perspective, we highlight a discrepancy between the laboratory versus field performance of POUWT methods and discuss the resulting overestimation of potential health protection conferred upon the end user if laboratory-based estimates are solely used. We propose field challenge testing as a strategy to address this research gap, using food-safe bacterial and protozoan surrogates. Such a method would generate information that is more representative of POUWT performance experienced by the end user, which can be influenced by factors including environmental water quality, correct and sustained use of POUWT, access to maintenance and repair, and correct cleaning, water retrieval and maintenance activities (Fig. 1). When such factors are represented in the data, then POUWT techniques can be viewed within the larger “source to sip” framework of HWTS, which encompasses the water source (in terms of protection, accessibility and sustainability), POUWT approaches and safe water storage and consumption6. By doing so, sustainable and scalable interventions can be made to reduce exposure to faecal contamination via drinking water and realize health gains, of which there is an urgent need83.