Position Paper
Identification of source information for sudden water pollution incidents in rivers and lakes based on variable-fidelity surrogate-DREAM optimization

https://doi.org/10.1016/j.envsoft.2020.104811Get rights and content

Highlights

  • A variable-fidelity surrogate model is used to determine the source of sudden water pollution incidents in rivers and lakes.

  • A new sample point is added during the update of the surrogate model.

  • Using the surrogate model instead of the physical model reduces the computation time by a factor 200.

Abstract

For sudden water pollution incidents in rivers and lakes, the ability to quickly identify the pollution source is of great importance for providing early accident warning and implementing emergency control measures. Based on Bayesian reasoning, a variable-fidelity surrogate-differential evolution adaptive metropolis optimization(DREAM) optimization model for coupled inversion process is established in the posterior space of the pollution source.In order to verify the effectiveness of the algorithm, this paper takes lake A as the research area, and gives a hypothetical water pollution emergency, the pollution source location, release time and released mass of water pollutants suddenly released into water bodies were determined according to the method proposed in this paper. The results show that in the case of ensuring the accuracy of calculation, the algorithm can accelerate more than 200 times and effectively improves the computational efficiency of the traditional method for obtaining the source information of sudden water pollution events.

Introduction

Deliberate or unintentional chemical spills can threaten ecological safety and human health (Jiabiao W et al.,2018). In 2010, there were 156 environmental emergencies directly dispatched or disposed by the Ministry of ecological environment in China, on average once every two working days(XU Jing et al., 2018), including many heavy water pollution emergencies such as the leakage of mining sewage in Zijinshan, Fujian Province, the explosion of Dalian Xingang oil pipeline, etc.; in 2011, the leakage accident was caused by the accident of phenol tank car on Hangzhou Xinjing expressway, part of phenol flowed into Xin'an River with rainwater, which has a significant impact on the drinking water safety of 550000 residents in Hangzhou; in 2015, the accident of Longxi tailings pond in Gansu caused more than 3000 m3 waste water to flow into the water of Western Han Dynasty, and then into Jialing River, which seriously polluted more than 300 km of rivers in Sichuan, Shaanxi and Gansu, with an impact of more than 30 days, causes great concern at home and abroad (Wang et al., 2015). Quick and accurate assessments of pollution source information are of great importance for providing accident warnings and achieving emergency control. However, in actual response work, pollution source information is generally unknown in the early stage of an accident and even well after the accident is reported, which makes it almost impossible to perform rapid source control and implement reduction measures in the early accident stage. In addition, many pollutants, including organic matter, heavy metals, bacteria, etc., are difficult to identify with the naked eye and are found before an accident causes harm. Therefore, spills are associated with the basic scientific tasks of emergency pollution inversion and discharge history reconstruction. The main method of addressing these issues is by solving the inversion problem and determining the location of the pollution source, the pollutant discharge time and the emission intensity based on the limited data obtained at fixed monitoring points in a given area. Such data includes the concentrations of pollutants. Many researchers have studied these types of problems, including Ghane et al. (2016), Xu and Gomez-Hernandez (2016), Yang et al. (2016), Jiabiao W et al. (2018), and others.

According to the mathematical characteristics of these methods, they can be divided into. analytical methods, regularization methods,deterministic methods based on optimization and random methods based on the probability density.

In analytical methods, it is necessary to know the speed of the pollutant diffusion process and the concentration of the pollution source (Alapati and Kabala, 2015). Most of the early inversion studies were based on such methods. For example, Skaggs and Kabala (1995) used the quasi-reversibility (QR) method to identify the emission history of a single point source at a known location. By solving the dimensionless convection-dispersion equation, we can change the well-posed problem equation, which is very similar to the convection-dispersion equation. This approach provides high computational efficiency, but its accuracy is relatively poor. Regularization techniques can turn ill-posed questions into well-posed problems and can be used to directly solve inverse problems. For example, for a pollution source at a known location, Skaggs and Kabala (1994) used the Tikhonov regularization (TR) method to model the release process of a pollutant from the source in reverse with a one-dimensional homogeneous steady flow model. Wei et al. (2010) designed a method based on the optimal perturbation regularization coupling method and multipoint source fractional diffusion equations. However, the application of this method was based on the premise of sacrificing some solution accuracy. The most commonly used methods are singular value decomposition (SVD) and truncated singular value decomposition (TSVD).

The gradient-based optimization algorithm has been widely used in the optimization of nonlinear models. If the initial value is appropriate, the method can quickly obtain the optimal solution (Zhang et al., 2013). However,if the nonlinear model is non-convex model, the local solution may not be the global solution, and the model parameters may not be accurately identified, the gradient-based optimization algorithm will easily fall to local minima, so it is regarded as a local optimization algorithm. The heuristic algorithm which does not depend on the gradient of the objective function, can solve optimization problems involving nonconvex models, and has been widely used in the field of pollution source identification. For example, Parolin et al. (2015) used the Luus-Jaakola algorithm, particle collision algorithm, ant colony optimization algorithm and golden section method to identify the source intensity and location, and the approach was applied to the Macae estuary on the southeast coast of Brazil.

However, because the inversion problem is often ill posed, when using a deterministic method based on optimization, observation error or a small error in the model calculation may cause large deviations in the results (Hazart et al., 2014), resulting in errors in traceability. In contrast, stochastic methods reflect the randomness of objective things through probability distributions, and they are suitable for dealing with uncertain problems. The stochastic methods that were first applied for the identification of pollution sources include multivariate nonlinear regression and the associated maximum likelihood method. At present, the most commonly used methods are based on statistical induction, minimum relative entropy and Bayesian inference. A regression method can be used to solve simple pollution source identification problems, but for complex nonlinear models, regression methods often need to be combined with optimization algorithms. The advantage of statistical induction is that it can be used for uncertainty analysis based on large amounts of data. However, the limited amount of pollutant concentration data obtained during the emergency response process is not sufficient for supporting inversion studies based on such methods. The advantage of the minimum relative entropy method is that uncertainty analysis can be performed for a given problem to obtain a new problem that can be solved based on the prior problem distribution. Woodbury et al. (1996) applied the quasi-inverse method and the minimum relative approach. The entropy method can identify the emission history of a single point source at a known location; one of the most popular stochastic methods is the inverse probability density method based on the adjoint equation. Neupauer and Wilson (1999) first introduced the concept of the inverse probability density function for pollution source control; in addition, they showed that the inverse position function and the transfer time probability distribution function are the adjoint functions of the forward concentration. Ghane et al. (2016) and Cheng et al. (2010) used this method to identify the time and location of pollution sources in rivers and lakes and verified the excellent performance of the method. Furthermore, the inverse probability method and the linear regression model have been combined to transform the pollution source identification problem into an optimization model, and the differential evolution algorithm was used to identify the pollution source location, emission time and total discharge amount for sudden water pollution events (Jiabiao et al., 2018). These studies demonstrate the applicability of the algorithm and its good application prospects. However, it is relatively difficult to trace a contaminant under high-dimensional unsteady flow conditions, and the stability of the method requires further study. The widely used Bayesian-MCMC method is another stochastic method based on probability theory. Based on Bayesian inference and the MCMC sampling technique, the method transforms point source identification into the posterior estimation of pollution source parameters. Based on the prior information of the likelihood function and the relevant problem parameters, the posterior parameters are determined. The resulting probability distribution, rather than a single optimal solution, provides more information regarding the inversion of pollution events than do other methods, and this approach can be used to simultaneously quantify the uncertainty of tracking results. Many heuristic algorithms, such as genetic algorithms (GAs; Singh and Datta, 2006; Zhang and Xin, 2017), artificial neural networks (Srivastava and Singh, 2014; Singh and Datta, 2007), and differential evolution algorithms (Yang et al., 2016), are used to accelerate calculations.

In the specific inversion process, based on a Bayesian stochastic method, the pollution source information is regarded as a random variable given a known monitoring value; this approach can fully characterize the inherent uncertainty of the model parameters and avoid interdependence to some extent. The decision-making risk associated with “optimal” parameter distortion is especially important in studies of the traceability of sudden water pollution events. However, due to the complexity of surface water simulation systems, obtaining source information from inversion modelling requires high computational costs (usually requiring tens of thousands of repeated calls to hydrodynamic water quality models). Considering resource limitations and the urgency of emergency response, the application of surrogate models can greatly improve the speed of inversion. A surrogate model, also known as a proxy model, can approximate the main features of the original model with very low computational costs and obtain the relationship between the input and output of the studied system. The function of such a model is similar to that of a simulation (mechanism) model, i.e., for the same input, the output of the surrogate model is very close to the output of the full model. Compared with simulation models, surrogate models have a small calculation load and takes less time to run, which will directly improve the inversion speed and shorten the emergency response time. In fact, in the past few decades, surrogate models have been widely used in industrial design and groundwater inversion; for example, Zeng et al. (2012) and \Zhang et al. (2013) applied sparse grid interpolation in groundwater models to reduce the computational burden. The common surrogate models can be divided into three categories: projection-based surrogate models, simplified models, and data-driven surrogate models (Smith, 2013).

By projecting the control equations onto the low dimensional subspace, we can construct a projection-based surrogate system. There are two disadvantages in the projection-based surrogate system: firstly, the construction of orthogonal basis vectors is based on the output samples of some models, which makes it difficult to apply this method to parameter inversion; secondly, the construction of projection-based surrogate system needs to modify the model source code(Asher MJ et al., 2015). For the former problem, people can choose the best sample which contains both parameters and time-domain output(Lieberman C et al., 2010). The latter problem is caused by the principle of projection based on surrogate system, so it can not be solved(Jiangjiang zhang. 2017).

By neglecting the secondary process and only considering the primary process, the simplified model can be constructed with lower computational cost, which can greatly reduce the computational complexity of model simulation(Piperni, P. et al., 2013). Although these methods are very flexible and have great potential application value, their implementation process is determined by specific problems, so they do not have universal applicability(Jiangjiang zhang. 2017).

Since the first two model types include modifications to the system model itself, the implementation of such models is difficult in many cases, especially when the system model is a black box. A data-driven surrogate system does not require modifications to the system model. This approach has few parameters, is easy to implement and is universally applicable; therefore, it is suitable for sudden water pollution events that require an immediate response.

Based on the above analysis, to quickly and accurately obtain pollution source information, this study builds a variable-fidelity surrogate-DREAM-optimized coupled inversion method based on Bayesian inference for the posterior space of the pollution source. In this method, the variable-fidelity model is based on a combination of high- and low-fidelity models using an additive bridge function, and the high- and low-fidelity models use Latin hypercube sampling and high- and low-precision hydrodynamic water quality numerical calculations to generate samples. By using the sample point update strategy to reduce the uncertainty in the surrogate model and improve inversion, a differential evolution adaptive metropolis optimization algorithm (DREAM), coupled surrogate optimization process, and minimized surrogate prediction (MSP) criterion for the adaptive insertion of new samples are applied to improve the surrogate model of the pollution source. The accuracy of the solution in the posterior space is high; thus, this approach provides an efficient source-by-source calculation model for the source intensity, emission time and location. In order to verify the effectiveness of the method, this paper takes lake A as the research background, and assumes a sudden water pollution accident to simulate the process of pollutant tracing. The calculation results show that the new method effectively improves the calculation efficiency, meets the needs of sudden river pollution source analysis, and can be used for the active identification of river pollution source information and the corresponding emergency response.

This paper is organized as follows. Section 2 presents the surrogate modelling based on variable-fidelity model framework. Section 3 presents the procedures of methods of tracing the pollution source. Section 4 presents the process and results of case study. Section 5 provides the conclusions.

Section snippets

Surrogate system

A surrogate model is also called a metamodel, simplified model, model simulator or response surface. Such a model can approximate the main features of the original model with a very low computational cost and obtain the relationship between the system input and output. Some commonly used surrogate models include the kriging (KRG), radial basis function (RBF), polynomial response surface (PRS), and support vector regression (SVR) models (Razavi, S. et al., 2012). The Gaussian process (GP) is a

Bayesian reasoning

According to Bayes' theorem, the uncertainty parameter m in the above contaminant transport model can be estimated from the measurement datad. For simplicity, we rewrote the model to the following compact form:d=G(m)+εd,where G(m) is a forward simulation operator and εdis the monitoring error. According to Bayesian reasoning,p(m|d)p(d|m)p(m),where p(m)is the prior probability of identifying parametermand p(m|d) is the corresponding posterior probability. Assuming that the observed data noise

Description of the problem

Lake A is Kunming Lake in Xixian New Area, Xian city, Shaanxi Province, China, and it consists of a north lake and south lake. It's still under construction. The total reservoir area is 10.4 square kilometres, with a total reservoir capacity of 46 million cubic metres. The north lake, with an area of 7 square kilometres and a storage capacity of 22 million cubic metres, is the flood diversion area of the Feng River and provides storage when the river floods; thus, this lake is of great cultural

Conclusion

For sudden water pollution incidents in rivers, the ability to quickly identify the pollution source is of great importance for early accident warning and emergency control. Based on Bayesian reasoning, a variable-fidelity surrogate-DREAM optimization model for coupled inversion is established in the posterior space of a pollution source. In this approach, the VFM surrogate includes high- and low-fidelity models fused by an addition bridge function. These high- and low-fidelity models are

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This research was supported by the National Natural Science Foundation of China (Grant No. 51979222 and 91747206), the Natural Science Foundation of Shaanxi Province, China(Grant No. 2019JLM-62) and the Natural Science Basic Research Plan in Shaan xi Province of China (Grant No.2019JM-284).

We would like to extend special thanks to the editor and Reviewers for insightful advice and comments on the manuscript.

References (41)

  • M.J. Asher et al.

    A review of surrogate models and their application groundwater modeling

    Water Resour. Res.

    (2015)
  • C.J.F.T. Braak

    A Markov chain Monte Carlo version of the genetic algorithm differential evolution: easy bayesian computing for real parameter spaces

    Stat. Comput.

    (2006)
  • Q. Duan et al.

    Effective and efficient gobal optimization for conceptual rainfall-runoff models

    Water Resour. Res.

    (1992)
  • Sébastien Erpicum et al.

    Two-dimensional depth-averaged finite volume model for unsteady turbulent flows

    J. Hydraul. Res.

    (2014)
  • W.K. Hastings

    Monte Carlo sampling methods using Markov chains and their applications

    Biometrika

    (1970)
  • Z.H. Han et al.

    A Variable-Fidelity Modeling Method for Aero-Loads Prediction

    (2008)
  • W. Jiabiao et al.

    New approach for point pollution source identification in rivers based on the backward probability method

    Environ. Pollut.

    (2018)
  • zhang Jiangjiang

    Bayesian monitoring design and parameter Inversion for groundwater contaminant source identification

    (2017)
  • C. Lieberman et al.

    Parameter and state model reduction for large-scale statistical inverse problems

    SIAM Journal on Scientific Confuting

    (2010)
  • R.M. Neupauer et al.

    Adjoint method for obtaining backward‐in‐time location and travel time probabilities of a conservative groundwater contaminant

    Water Resour. Res.

    (1999)
  • Cited by (0)

    View full text