Elsevier

Water Research

Volume 220, 15 July 2022, 118714
Water Research

Adaptive soft sensing of river flow prediction for wastewater treatment operation and risk management

https://doi.org/10.1016/j.watres.2022.118714Get rights and content

Highlights

  • Soft-sensor predictions were used to predict the receiving river flow to balance with wastewater discharge operation.

  • Eleven machine learning methods were compared with hyperparameter optimized.

  • Probabilistic predictions minimized overestimations to provide proper risk management.

  • Daily adaptive predictions were evaluated for future flexible wastewater management.

Abstract

Many wastewater utilities have discharge permits directly tied with the receiving river flow, so it is critical to have accurate prediction of the hydraulic throughput to ensure safe operation and environment protection. Current empirical knowledge-based operation faces many challenges, so in this study we developed and assessed daily-adaptive, probabilistic soft sensor prediction models to forecast the next month's average receiving river flowrate and guide the utility operations. By comparing 11 machine-learning methods, extra trees regression exhibits desired deterministic prediction accuracy at day 0 (overall accuracy index: 3.9 × 10−3 1/cms2) (cms: cubic meter per second), which also increases steadily over the course of the month (e.g., MAPE and RMSE decrease from 41.46% and 23.31 cms to 3.31% and 2.81 cms, respectively). The overall classification accuracy of three river flow classes reaches 0.79 at the beginning and increases to about 0.97 over the course of the predicted month. To manage the uncertainty caused by potential false negative classification as overestimations, a probabilistic assessment on the predictions based on 95% lower PI is developed and successfully reduces the false negative classification from 17% to nearly zero with a slight sacrifice of overall classification accuracy.

Introduction

Wastewater management is facing systemic challenges from global climate change, extreme weather events, and aging infrastructure. More frequent flash flooding and extended droughts results in highly skewed flow distributions (Lv et al., 2020; Rosenberger et al., 2021), which poses great challenges to the operation of wastewater utilities (Veronesi et al., 2014; Hughes et al., 2021). Recent studies have started to investigate the effects of storm events on wastewater treatment efficiency (Zhu and Anderson, 2017a; Heo et al., 2021) and system resiliency (Zhu and Anderson, 2017b; Zhang et al., 2021), but few groups have attempted to make a connection between the variability of river flow and its effects on wastewater treatment operation. Many utilities have discharge permits directly tied with the receiving river flow, so more accurate prediction of receiving river flow can become critical to treatment operation to ensure the consistent compliance of environmental regulations and protect the water environment. Englert et al. (2013) reported that, compared to a typical winter season with high dilution potential (∼35% wastewater), summer droughts with low dilution potential for treated wastewater (∼90%) could result in significant negative effects in downstream freshwater ecosystems, such as macroinvertebrate mediated leaf mass loss (i.e., during a leaf litter decomposition process) and gammarids' feeding rate (i.e., in situ bioassays to assess non-food-quality related implications).

Taking the Pacific Northwest U.S. as an example, the variability of a receiving river flow has increased gradually especially in the wet weather season each year, but because often wastewater utilities have allowable nutrient discharge limits, both daily maximum and monthly average, tied to the receiving river flow, the toxicity limit gets more stringent as the river flow decreases. Therefore, utilities are faced with the dilemma about how to manage the nutrient driven inventory and how to deal with the uncertainty on hydraulic throughput and nitrification capacity. If the utility maintains complete nitrification continuously, the extra biomass inventory will create a risk of secondary clarifier overload during wet weather events due to increased throughput. In contrast, if a low inventory is maintained, once the river flow drops, the utility will be at risk of not meeting the permit due to a more stringent nutrient discharge limit caused by reduced flow. Currently, utilities (such as Clean Water Services or CWS that was studied in this work) mostly rely on empirical knowledge collected by the operators to make decisions based on estimated flow in conjunction with weather forecasting, which are not effective (qualitative rather than quantitative) or efficient (lagged responses to a flow change) so can be greatly improved.

The need for additional guidance could be addressed by soft sensors based on machine learning (ML) algorithms, as such methods will help provide accurate flow prediction and early-warning mechanisms that lead to the development and execution of more effective control strategies. Soft sensing, which can be part of a hybrid (mechanistic and data-driven) modeling architecture (Schneider et al., 2022) or a digital twin strategy (Torfs et al., 2022), offers numerous merits, such as low cost, fast response, and potential to work in parallel with hard sensors (Fortuna et al., 2007). Soft sensors can be used to forecast both influent wastewater flow and receiving river flow to help decision-makers better estimate the appropriate biomass inventory level in the bioreactors, so either biomass overloading or discharge overlimit can be prevented. Soft sensor studies have demonstrated the effectiveness of wastewater flow forecasting though not the receiving river flow. For example, Fernandez et al. (2009) tested fuzzy neural networks models on forecasting of the next day's average influent wastewater flow based on the day of the week and the most recent day's average flow, and the average prediction errors were below 10%. Zhu and Anderson (2019) developed a MATLAB-based iterated stepwise multiple linear regression (ISMLR) package to predict the next day's influent flow at the Kirie water reclamation plant in Chicago, in which they compared nine individual or hybrid (i.e., combining more than one statistical and ML methods) soft sensor methods, and suggested that the hybrid model, based on ISMLR and artificial neural network (ANN), obtained the best performance (adj.R2 ≈ 0.834, MAPE ≈ 13.3%) using historical and easily-acquirable influent data. More recently, Apaydin et al. (2021) adopted singular spectral analysis (SSA), seasonal-trend decomposition using Loess (STL), and ANN to forecast monthly river flow based on streamflow, precipitation, relative humidity, and temperature data, but such studies (Kisi et al., 2019; Apaydin et al., 2021) only provided a one-time, deterministic prediction that cannot be used in daily wastewater operations. They might also have data leakage problems associated with cross-validation or SSA/STL applications, resulting in over-optimistic results. Data leakage management in ML modeling was not studied frequently by environmental researchers, but the problems can easily happen without a rigorous data preprocessing. For example, feature scaling applied before k-fold cross-validation leaks information between the pre-training (k-1 parts of the training dataset) and validation subsets, and the scaling should be used as a part of the cross-validation process based on only the pre-training subset.

In this study, we aimed to fill the gap from empirical knowledge-based to more intelligent, quantifiable decision making and help better manage wastewater operation and discharge by developing a daily-updated prediction model to forecast the next month's average receiving river flowrate. Routine historical data that can be readily obtained by the utility were used and ML methods were assessed to obtain reliable forecasting results. Furthermore, to avoid the aforementioned operation risk, we developed ML-based probabilistic prediction models to minimize river flow overestimation, as river flow is the determinant factor for effluent ammonia nitrogen limits (Table S1). This paper is organized by first presenting facility and data collection and preprocessing, followed by soft sensor development and performance evaluation. We also compared 11 ML methods for deterministic prediction and discussed the adaptive prediction and classification among the best performing models. Results of probabilistic, adaptive prediction and classification are subsequently presented and discussed.

Section snippets

Wastewater facility, receiving river, and associated data collection

Rock Creek Water Resource Recovery Facility (RC WRRF) is one of the four wastewater treatment facilities that are operated by Clean Water Services (CWS) in urban Washington County, Oregon, US. The RC WRRF has operated since 1978 and treats an average of about 168,000 m3/day wastewater to provide services for more than 300,000 residents. During the wet weather season, the facility has toxicity-based effluent ammonia limits. The allowable ammonia limit, both daily maximum (highest allowable value

Optimal model determination

PCA and ISMLR were applied and compared for their effectiveness on feature selection. Fig. 3 exhibits the prediction performance (i.e., OAI) of the 11 ML methods based on PCA (Fig. 3(a)) or ISMLR (Fig. 3(b)), followed by a final model selection step from their corresponding best three methods based on PCA (Fig. 3(c)) or ISMLR (Fig. 3(d)). For each method, the box plot shows a range of prediction results based on different combinations of hyperparameter values; a higher OAI value stands for a

Conclusion

This work presents a first feasibility study in developing an adaptive, probabilistic soft sensor tool to predict the next month's average receiving river flow, which is critical for guiding WRRF operation and control effluent limits (e.g., ammonia nitrogen). Results from this work are summarized as follows:

  • The study assessed two feature selection methods with 11 ML methods, leading to about 5000 ML models with different hyperparameter values tested based on a two-step model selection process.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

J.-J. Zhu and Z.J. Ren acknowledge the support from Princeton University Andlinger Center for Energy and Environment Innovation Grant. The authors thank the support from the Intelligent Water Systems Challenge organized by the Water Research Foundation and Water Environment Federation.

References (25)

Cited by (0)

View full text