Introduction

In sub-Saharan Africa (SSA) 42% of people lack access to a basic water supply, defined as an improved water source accessible within a 30 min fetching time1. However, even people using a basic water supply may still be drinking contaminated water, as water quality is not considered directly. While improved water sources generally tend to produce water with better microbial quality than unimproved supplies, a systematic review shows that many improved sources are contaminated2. One 2012 study estimated that 52% of the SSA population was exposed to fecally contaminated drinking water2, and another reported that 31% of boreholes/tubewells in SSA exceeded the World Health Organization (WHO) guideline of no detectable fecal indicator bacteria per 100-mL sample3. Ensuring water safety is especially difficult in rural SSA, where many people rely on small, community-managed systems for drinking water. These rural systems tend to have even poorer water quality than urban water systems4,5.

Risk-based preventative management is recommended to protect and improve water safety in all settings. As the WHO suggests, “the most effective means of consistently ensuring the safety of a drinking-water supply is through the use of a comprehensive risk assessment and risk management approach”6. Sanitary inspection is a visual survey of risk factors that may contribute to the likelihood of fecal contamination in water systems, and is considered an effective and low-cost tool for risk assessment7. In small community-managed water systems, sanitary inspections are typically conducted using forms developed by the WHO, or adaptations of these8. The forms are technology-type specific, and each comprises 9–12 yes/no questions representing the presence/absence of sanitary risk factors. The total number of sanitary risk factors is summed to an overall sanitary risk score, which is often used to compare the level of risk between systems.

Several studies have explored the relationship between sanitary inspection and water quality. Although sanitary inspection and water quality are conceptually linked9, some studies found no significant association between the two10,11,12,13. Water supply professionals have questioned the effectiveness of sanitary inspection because of this apparent contradiction13. However, previous assessments of the relationship have common weaknesses14. Literature comparing sanitary inspection and water quality analysis has depended on the untested assumption that sanitary risk score and E. coli concentration of a water source are positively and linearly associated in all cases. This has been assessed by applying well-known statistical tests to compare E. coli concentration (as a continuous, ordinal, or presence/absence measure) to sanitary risk scores. Sanitary risk score, however, is not a comprehensive representation of system risk at a given point in time but is a simplified output of a tool designed to identify observable risk factors and guide corrective action14.

This study explores the nature of the relationship between sanitary inspection and microbial water quality analysis from a new perspective, in order to improve the general understanding of each tool and identify opportunities to enhance the effective application of each. We address the following questions:

  1. 1.

    What is the nature of the relationship between sanitary risk score and E. coli occurrence in water supplies?

  2. 2.

    How best can this relationship be quantified and modeled?

  3. 3.

    Can delivery and monitoring of drinking-water service be improved based on these findings?

We used data collected from 1028 boreholes with handpumps in rural SSA to explore the relationship between sanitary inspection and microbial water quality. The data were analyzed using the statistical models applied in previous studies, and using an alternative statistical model that, we suggest, more accurately represents the causal pathways of water system contamination. We compared these models to evaluate model fit and considered implications with respect to our research questions.

Results

Descriptive statistics

Of 1028 handpumps included in our sample, 805 (78%) had no detectable E. coli in a 100-mL sample. Sanitary risk scores ranged from zero (low risk) to nine (high risk), with an average sanitary risk score of 3.4 (Fig. 1). The sanitary risk scores were normally distributed.

Fig. 1: Frequency of sanitary risk scores at 1028 handpumps with boreholes in sub-Saharan Africa.
figure 1

The frequency of sanitary risks scores shows that 2-7 sanitary risk factors were identified at most handpumps and zero received a risk score of 10, which indicates the highest levels of sanitary risk.

The number of handpumps where each sanitary risk factor was present is shown in Fig. 2. A missing fence was the most commonly observed risk factor (78% of handpumps) and the presence of a latrine within 10 m was the least common risk factor (1% of handpumps) (Fig. 2).

Fig. 2: Frequency of individual sanitary risk factors at 1028 handpumps in sub-Saharan Africa.
figure 2

The number of handpumps at which each risk included on the WHO borehole with handpump sanitary inspection form was present.

Established models

Sanitary inspection results and E. coli occurrence from the 1028 handpumps were compared using previously published statistical models (Table 1). No model showed a significant association between E. coli occurrence and either sanitary risk score or sanitary risk score class at a 95% confidence level. Of the models using a sanitary risk score or sanitary risk class as the dependent variable, logistic regression with a binary E. coli presence/absence variable demonstrated the best model fit (AIC = 1079).

Table 1 Results of previous models showing no significant association between sanitary risk score and water quality analysis—AIC values show that logistic regression is the model with the best fit.

Previously published models also used multivariable regressions to assess the relationship between individual sanitary risk factors and E. coli occurrence. Logistic regression comparing a binary presence/absence E. coli dependent variable and individual sanitary risk factors had the best model fit (AIC = 1074) (results in Table 2). Logistic regression revealed a significant association between E. coli presence/absence and several individual sanitary risk factors. The presence of a latrine on higher ground (OR = 0.60, p = 0.046), a broken drainage channel (OR = 1.8, p = 0.005), and cracks in the apron (OR = 0.56, p = 0.002) were significantly associated with E. coli presence at the 95% confidence level (Table 2).

Table 2 Results of logistic regression assessment of individual sanitary risk factors—sanitary risk factors with * are significantly associated with E. coli presence/absence at the 95% confidence level.

Alternative model

Table 3 shows the distribution of the source, carrier, and barrier breakdown risk scenarios. The largest proportion of handpumps (41%, n = 421) had a contamination source, carrier, and barrier breakdown present at the time of data collection. Only 2% (n = 24) handpumps had no sanitary risk factors.

Table 3 Summary statistics describing possible risk scenarios, with logistic regression results for the alternative model both with and without rainfall, incorporated—risk scenarios with * are significantly associated with E. coli presence/absence at the 95% confidence level.

The results of logistic regression comparing the E. coli presence/absence variable and each of the sanitary risk scenarios are shown in Table 3. One scenario was significantly associated with contamination at the 95% confidence level—where only the breakdown is present and the source and carrier are absent. Two more scenarios were significantly associated with E. coli occurrence at a 90% confidence level, including the expected scenario in which a source, carrier, and barrier are present. The alternative model had an AIC value of 1079 and R2 = 0.0086.

The alternative model initially had one contamination carrier (stagnant water within 2 m). When rainfall within the past 2 days was added as another potential carrier, the AIC decreased to 1070 and the R2 increased to 0.019. The results of the model in which rainfall is included are shown in Table 3. When rainfall is included, none of the risk scenarios are significantly associated with E. coli occurrence at the 95% confidence level.

Discussion

This study explored the relationship between sanitary inspection and microbial water quality using data from more than 1000 handpump across 12 countries in SSA. The results validate the use of an alternative model, which reflects the causal pathway of water system contamination, and shows that the relationship between sanitary inspection and microbial contamination can be modeled more thoughtfully. When the previous statistical models were applied, they revealed no significant correlation between sanitary risk score and E. coli occurrence. When our alternative model—which accounts for relationships between contamination sources, carriers, and barrier breakdowns—was applied, it demonstrated a weak but significant association between handpump breakdowns and E. coli occurrence. Furthermore, it demonstrates that some sanitary risk factors, especially barrier breakdowns, are more strongly associated with water quality than others.

The alternative model revealed that the risk scenario in which a barrier breakdown is present, but the contamination source and carrier absent was the only scenario significantly associated with E. coli occurrence (OR = 3.6, p = 0.019). This may suggest that barrier breakdown is the best predictor of E. coli occurrence, that important contamination sources and carriers are missing from the current sanitary inspection forms and models, and/or that sources and carriers are more difficult to reliably capture using sanitary inspection tools than barrier breakdowns. The limited role of carrier risk factors in the model is perhaps unsurprising since only one sanitary risk factor (stagnant water within two meters) falls into the carrier category. In order to address this, recent rainfall—previously reported to be an important predictor and carrier of microbial contamination10,13,15—was added to the model. When rainfall was incorporated as a carrier of contamination, the fit of the alternative model improved slightly (AIC: 1079 to 1070, R2: 0.0086 to 0.019), but none of the risk scenarios were significantly associated with E. coli occurrence.

It is not surprising that statistical models comparing sanitary risk score and E. coli occurrence produce no significant association, in line with the results reported by the studies that used the established models12,13,16,17,18. Sanitary inspection was designed as a risk assessment and decision-making tool, and is expected to be only loosely correlated with E. coli occurrence in a given water sample at a given point in time14. This loose correlation is demonstrated by the R2 values, which did not exceed 3% in any of the models in this study. We suggest three explanations for the small R2 values: (1) a single E. coli measurement does not accurately represent the microbial water quality of a given water system over time; (2) current WHO sanitary inspection forms do not adequately identify all relevant sources, carriers and barrier breakdowns affecting water system contamination; (3) the use of a model that does not weight risk factors distorts associations in which a small number of important factors dominate microbial risk. It is possible that one, two, or all three of these explanations is applicable in some settings.

A single microbial water sample is unlikely to accurately represent the central tendency of water quality in a water system. E. coli occurrence and concentration are highly variable over space and time. Previous studies have shown that microbes are not homogenously distributed in water supplies, meaning that two water quality samples could contain very different E. coli concentrations even if collected from the same source at the same time19. Microbial contamination also varies dramatically over short time periods, and is influenced by seasonal weather patterns and recent rainfall20.

The alternative model was designed based on the assumption that a contamination source, carrier, and barrier breakdown would all be necessary for contamination to occur at the water system. However, the results show that only the scenario significantly associated with E. coli occurrence is the one in which a barrier breakdown alone is present. This suggests that the sanitary risk factors in current forms may not adequately capture all relevant contamination sources, carriers, and barrier breakdowns for water systems in the study setting. Environmental, technical, and social factors not included in current sanitary inspection forms may also be important in explaining contamination. Topography15, water system age21, and population density22, for example, have all been identified as determinants of water system contamination, and are not incorporated in the WHO sanitary inspection forms.

Finally, the WHO sanitary inspection forms do not weight risk factors, but some factors may be more influential than others. The presence of cracks in a slab, for example, maybe more or less strongly associated with microbial contamination than the absence of a fence, and the relative importance of different factors may even vary by setting, season, and/or construction quality. In many cases, one serious barrier breakdown may dominate all other risk factors in that category. Such limitations might reduce the apparent association between sanitary inspection and microbial water quality. However, all sanitary risk factors would presumably be worth noting and addressing, and thus the current sanitary inspection approach may be considered more suitable as a risk assessment tool than as a proxy for microbial data.

The results of this study have implications for rural water service management, policy, and further research. With regard to management, sanitary inspection should be prioritized over water quality analysis as a means of identifying and prioritizing repairs to water systems. It is especially valuable in settings where water quality testing is infrequent or infeasible. Although sanitary inspection and water quality analysis are recommended as complementary tools, many small rural water systems in SSA do not conduct water quality monitoring with the frequency recommended by WHO23 or others24. Therefore, a single test that fails to detect E. coli may give managers a false sense of security if not accompanied by sanitary inspection data. Sanitary inspection results vary less over time than microbial water quality, require less technical training to perform, and provide actionable information even when carried out infrequently. The operator of even a small water system should be able to conduct a sanitary inspection independently one or more times per year; operators should be trained, or at least invited in a routine inspection carried out by governments and other implementers in order to learn. When a sanitary risk factor is identified, action should be taken to repair the system or otherwise mitigate the risk. Corrective actions might include sealing cracks and leaks, repairing or replacing damaged or absent components, or, in extreme cases, rehabilitating damaged systems. Operators should be trained to carry out basic remedial actions, or access external support for more involved remediation.

Implications for policy include potential modifications to current sanitary inspection forms to enhance their utility for assessing water system vulnerabilities. Additional evidence-based risk carriers and sources may be included in sanitary inspections, or open access data can be downloaded and incorporated after sanitary inspection data collection, as appropriate.

Implications for research include the need to develop larger and richer datasets linking sanitary inspection data, longitudinal meteorological, water quality, and/or health outcome data, and to use these datasets to further validate and improve current sanitary inspection tools and forms. Such activities may include testing the integration of season as a variable or covariate in sanitary inspections, investigating the relevance of additional risk factors (such as well depth, intermittent versus continuous water supply, soil characteristics, community factors, etc.) in sanitary inspections, and/or further exploring the importance of interactions among sanitary risk factors. Exploration of the extent to which the importance of different sanitary risk factors varies across different settings could also inform further refinements.

There are several limitations to this work. Although a sample size of 1028 was sufficient for regression analysis, the dataset aggregates data from 12 geographically diverse countries in SSA. A larger dataset would have enabled disaggregation at the country level, which might have yielded country- and region-specific findings. Models used in this study were relatively simple and did not include water system variables such as well depth, soil characteristics, continuous versus intermittent pressure of piped systems, etc., or control for covariates such as community size, degree of rurality, population density, subsurface geology, the intensity of animal agriculture activities, land use and sanitation variables, and other factors capable of contributing to fecal contamination of water systems; including such variables in future analyses may improve associations between sanitary inspection scores and microbial water quality data.

With respect to the alternative model used in this study, some risk factors were difficult to classify. While the presence of damaged or cracked drainage channels was classified as a barrier factor, this could have been classified as a carrier factor, since it relates to the release of pumped water which can infiltrate the borehole and transport nearby contamination into the water supply.

Our updated alternative model provides a simple conceptual framework for representing and evaluating sanitary risk factors with respect to their likely function in microbial transmission pathways. In the current work, each risk category (contamination source, carrier, barrier breakdown) is classified as either present or absent. A more detailed method that both incorporate these groups of factors and the individual risk factors within them, and/or accounts for the cumulative impact of multiple risk factors in a given category, may enhance the value of these sanitary inspection tools. As noted above, the benefits of any such improvements with respect to diagnostic and predictive power would have to be weighed against any associated loss of simplicity before determining whether their inclusion is appropriate for routine sanitary inspections.

Sanitary inspection and water quality analysis are complementary, valuable tools for characterizing and protecting water safety. Sanitary inspection is especially useful in settings where water quality analysis is not carried out frequently, as it enables water safety risks to be identified and addressed. As service levels in rural SSA increase, we must consider the most effective ways to protect water safety in rural water systems, and the relevance of sanitary inspection as a risk assessment tool is likely to be enhanced. We recommend that water system operators be trained in sanitary inspection and remedial action, and conduct inspections regularly. Furthermore, sanitary inspections may more effectively lead to timely and appropriate action if they are combined with mobile survey tools and/or semi-automated analysis tools to aid rapid response and effective decision-making. As these monitoring and implementation efforts are undertaken, researchers should also seek to continue to collect and compile data on microbial water quality, sanitary risk, and relevant outcomes and to continue to explore the relationship between water system vulnerability and contamination. Such exploration will benefit from using a framework that incorporates evidence on the causal pathways of contamination, as well as social and environmental determinants of contamination, and that facilitates timely and effective response to identified risks to water safety.

Methods

Study population

Data were collected as part of an evaluation of water, sanitation, and hygiene (WaSH) programs implemented by the international non-governmental organization (NGO) World Vision (WV). The data were collected from rural communities in 14 countries—Ethiopia, Ghana, Honduras, India, Kenya, Malawi, Mali, Mozambique, Niger, Rwanda, Tanzania, Uganda, Zambia, and Zimbabwe. For this analysis, Honduras and India were excluded because they are the only countries that do not fall in the sub-Saharan Africa SDG region or the lowest access WaSH country cluster25.

Study design

In-country enumerators and research supervisors were hired in each country to carry out data collection. Research supervisors were trained by staff from The Water Institute at UNC; enumerators were trained by research supervisors.

In each country, a list of rural subnational administrative sampling units (e.g., blocks, districts) was obtained from either the national census bureau or national statistics office. Subnational administrative sampling units were stratified into World Vision and comparison areas, based on whether World Vision worked in the district. For each country, 56 WV and 56 comparison primary sampling units (PSUs) were then were randomly selected from the list of subnational administrative sampling units using a probability proportional to size method. If the PSU was estimated to include more than 200 households, it was divided into secondary sampling (SSU) with 200 households or less, and one SSU was randomly selected to represent that PSU.

Once the 112 sampling units were selected, trained enumerators mapped all households and water sources within the community. In each sampling unit, 25 households were randomly selected for a household survey. All unimproved and improved water sources were surveyed. All water source surveys included a sanitary inspection.

Up to five functional, non-surface water sources were randomly selected for water quality analysis in each sampling unit. ‘Functional’ systems were defined as those producing water at the time of data collection. If there were fewer than five water sources meeting these criteria in a given sampling unit, all eligible water sources in the sampling unit underwent water quality analysis.

Only boreholes with handpumps (referred to as “handpumps” from here on) were included in this analysis in order to ensure comparability between water sources.

Sanitary inspection

Sanitary inspections were conducted using an adapted version of the WHO sanitary inspection form for tubewells (boreholes) with handpumps8. The form comprises ten yes/no questions listed in Supplementary Table 1. Each water source is evaluated using the form and receives a sanitary risk score ranging from zero to ten, where zero indicates that none of the evaluated sanitary risk factors are present at the handpump and a ten indicates that all are present. Handpumps are categorized into sanitary risk classes similar to those used in other studies9,26,27: low risk (0–2), intermediate-risk (3–5), high risk (6–8), and very high risk (9–10).

Water quality analysis

Samples from the selected water sources were tested to determine E. coli concentration (most probable number [MPN] in a 100-mL sample) using the Compartment Bag Test (CBT) field test (Aquagenx LLC, Chapel Hill, NC, USA),28. Briefly: samples were collected from the selected handpumps using 100-mL WhirlPak bags and processed immediately according to the manufacturer’s instructions. Processed CBTs were incubated at ambient temperature for 48 h where temperatures were between 25 and 30 °C, or for 24 h where temperatures were above 30 °C. Where ambient temperatures were below 25 °C, thermal incubation at 35–37 °C for 24 h was used (field incubator). Results were recorded as E. coli MPN/100 mL and categorized into an ordinal variable based on WHO health risk categories and a binary “presence” (≥1 E. coli MPN/100 mL) or “absence” (<1 E. coli MPN/100 mL) variable29.

Field blanks and duplicate field samples were collected from a random 10% of water sources and analyzed similarly. For field blanks, an unopened bottle from a reputable local brand of bottled water was opened at the water source and 100 mL of the bottled water was processed. Duplicate water samples were collected immediately after the collection of the experimental sample. All field blanks and duplicates were processed at the same time and in the same location as the corresponding experimental sample, using identical procedures. All samples (experimental, duplicate, and blank) were labeled with barcodes, reducing the likelihood that enumerators might introduce bias due to knowledge of the sample source when reading CBT results.

Established models for statistical analysis

Statistical analyses were carried out in Stata/SE 13.1 (StataCorp). Summary statistics were calculated to describe the distribution of E. coli MPN/100 mL and sanitary inspection score results.

Statistical methods described in previous literature14 were used to assess the relationship between sanitary inspection and E. coli occurrence:

  1. a.

    Logistic regression: used to analyze the relationship between the presence/absence of detectable E. coli in 100-mL sample (binary) and sanitary risk score (ordinal), and to compare presence/absence of detectable E. coli (binary) with the presence/absence of individual sanitary risk factors (binary)18,27,30,31.

  2. b.

    Ordinal logistic regression: used to compare the WHO water quality risk category (ordinal) with both sanitary risk score (ordinal) and the presence/absence of individual sanitary risk factors (binary).

  3. c.

    Non-parametric tests: Pearson’s chi-square test was used to determine whether E. coli presence/absence (binary) was independent of sanitary risk score (ordinal)15,32.

Alternative model for statistical analysis

The sanitary risk factors included on the WHO borehole with handpump form represent contamination sources, carriers, and barrier breakdowns14, defined as reservoirs of pathogens such as nearby latrines or fertilized fields; vehicles that transport pathogens from sources into the system such as surface runoff; and weaknesses in system infrastructure, such cracks in the concrete floor or base, that may allow pathogens to enter. Here, we propose an alternative model of the relationship between sanitary inspection and microbial water quality, based on the assumption that contamination events occur if three conditions are met: (1) a source of contamination is present, (2) a carrier facilitates the transport of contaminants to the system, and (3) there is a failure in one or more barriers designed to prevent contamination.

Each sanitary risk factor was categorized as a “source”, “carrier” or “barrier breakdown” (Fig. 3). Other studies have differentiated sanitary risk factors similarly, including Aldana33, who divided sanitary risk factors into “potential sources of fecal contamination; potential routes by which polluting agents gain access to water sources; and factors that can accelerate contamination.”

Fig. 3: Theoretical framework to assess sanitary inspection.
figure 3

Each risk factor included in the WHO sanitary inspection for handpump has been classified as a contamination source, carrier, or breakdown barrier. The framework assumes that at least one contamination source, carrier, and breakdown barrier must all be present to cause water point contamination.

To represent this framework in a statistical model, three binary variables were created to represent the presence of a contamination “source”, “carrier”, and “barrier breakdown”, respectively. For each water system, the binary variable was considered present if any of the sanitary risk factors from the category was present. The binary variable was considered absent if none of the sanitary risk factors in that category were present. For example, the binary variable “source” was considered present if a latrine within 10 meters (m) was present, an uphill latrine was present, another pollution source was present within 10 m of the water source, or any combination of the three; “source” was only considered absent when none of those sanitary risk factors was reported. Logistic regression was used to assess the association between E. coli occurrence and every possible combination of “source”, “carrier”, and “barrier breakdown” variables. We hypothesized that only the scenario where all three variables were present would be associated with E. coli occurrence because there would be a break in the causal pathway of contamination if any was absent. In one version of the model, recent rainfall was added to the carrier category using the response to a survey question: “Has it rained in the past two days?” because rainfall is a well-recognized vector of contamination15,20,34,35.

Model comparison

Models were assessed using Akaike’s Information Criterion (AIC)36, which assigns a score to each model based on the estimated divergence from the true model. AIC enables a fair comparison among various models and is useful because model simplicity is rewarded. AIC score is a relative model fit measure where a lower AIC value indicates a better model fit.