An iterative strategy for contaminant source localisation using GLMA optimization and Data Worth on two synthetic 2D Aquifers

https://doi.org/10.1016/j.jconhyd.2019.103554Get rights and content

Highlights

  • A non-linear optimisation is developed for contaminant source localisation.

  • An iterative approach based on data worth is constructed to collect the best new measurements.

  • Two heterogeneous synthetic cases are designed to illustrate the approach.

  • The method was developed to be applicable to real case studies with unknown parameters.

  • The method estimates simultaneously the source location and the hydraulic conductivity field.

Abstract

A contaminant source localisation strategy was developed considering unknown heterogeneous hydraulic conductivity field, unknown dispersivity and unknown location of a continuous contaminant source. The Gauss-Levenberg-Marquardt algorithm is combined with a data worth analysis to estimate the unknown parameters and identify the best locations of additional measurements. The data collection strategy is iterative, based on the ability of the additional dataset to decrease the uncertainties on the contaminant source location. Two 2D synthetic models are considered. The method is first illustrated with a simple model and a more complex model is then considered to evaluate the ability of the approach to locate the contaminant source from hydraulic heads and concentration data. This approach is parsimonious in terms of model runs and applicable to real cases. The results give a good estimate of the source location and the dispersivity, with acceptable NRMSE for each case. New observations introduced at each iteration decrease the standard deviation of the source location and improve the NRMSE. The estimated hydraulic conductivity field presents the same features as the original field.

Introduction

The identification of contaminant source location is a challenge for the management of numerous contaminated sites. A good knowledge of the source location allows a better strategy for the site remediation and the reduction of treatment costs. Migration plume processes are linked to the porous medium properties which are highly heterogeneous causing concentrated mass fluxes in a small zone. Studies showed that 70–80% of the plume mass discharge occurs in 10–20% of the contaminated area (Guilbeault et al., 2005). This 10% is the true source zone, which may often be quite small but has to be localised.

During the past 30 years, groundwater flow and pollutant transport models coupled with different inversion methods have been developed for source identification. A number of methods are available in the literature to estimate source localisation in synthetic cases from observations, mostly concentration data. These methods can be classified as follows: nonlinear optimisation approach, geostatistical approach or backward simulation approach (Bagtzoglou and Atmadja, 2005).

Optimisation methods are the most used approaches and allow source localisation from the comparison between observed and simulated data after a forward simulation. Gorelick et al. (1983) were the first to use an optimisation method for source localisation. The test was carried out on a 2D homogeneous synthetic case, in steady and transient states. All parameters of the geological medium were known (hydrodynamics and transport). Observations were represented by concentration history of the tracer, moreover, errors on observations were considered. The method presented in their paper is a restrictive method and is not suitable for real cases due to the requirement of known parameters, such as Wagner (1992), who used a nonlinear optimisation method and worked on a synthetic 2D case in steady state. He developed an approach to simultaneously respond to parameter estimation and source characterization with nonlinear maximum likelihood estimation. The source was considered as continuous. Initial parameters are unknown (field of hydraulic conductivity, K, dispersivity, αL, porosity, w and boundary conditions). Since the K field is parametrized with only two zones of piecewise constancy, this facilitates the source localisation. Mahar and Datta (1997) used a nonlinear optimisation method combining source localisation and the identification of the best location for additional measurement points (measurement of source fluxes). They considered a synthetic 2D homogeneous case in steady state with known hydrogeological and transport parameters. The same authors worked with a homogeneous 2D case (Mahar and Datta, 2000) to identify the location of a transient source with several observations (concentrations). Datta et al. (2009) used the same synthetic case to estimate simultaneously parameters (K field, αL and w) and release history of several potential sources with a nonlinear program. Estimated parameters were close to the actual value and proved the robustness of the method in a homogeneous case. Sun et al. (2006) also worked on several 2D homogeneous cases in transient state with constrained robust least squares estimator combined with an optimization in order to reduce significantly the computing time. Aral et al. (2001) used a genetic algorithm allowing a nonlinear optimisation and provides a fair results improvment. The latter approach is based on a 2D heterogeneous model in steady state (K field and αL are known) with a transient source. Other optimisation methods (stochastic or ensemble optimisation for instance) have also been developed by Singh et al. (2004); Yeh et al., 2007, Yeh et al., 2014); Ayvaz (2016); Xu and Gómez-Hernández, 2016, Xu and Gómez-Hernández, 2018 and applied to this domain.

Unlike previous papers that carried out their analysis on synthetic cases, Bashi-Azghadi et al. (2016) worked on a real case with a nonlinear optimisation method which minimises the number of probing wells and average regret in estimating polluted area. The site is a highly polluted oil refinery due to leakage from several tanks and the contaminant release is known. To minimise the number of probing wells, the authors used a Monte Carlo analysis to assess uncertainties with a large number of randomly generated scenarios considering several variables such as the oil tank position at the origin of the contamination. The method permits the location of the source with a limited number of observations.

During the last 20 years, geostatistical approaches have been used for source identification. Snodgrass and Kitanidis (1997) used a geostatistical method combined with Bayesian theory (Tarantola, 1987) to localise a source between two potential zones in a 1D homogeneous synthetic case with a concentration release. They considered known hydraulic conductivity fields (K fields) and dispersivities (α). Michalak and Kitanidis (2004a) used geostatistical inverse modelling for contaminant source identification in a real site with observed field data with known hydraulic and transport parameters. This study showed that the history of contamination can be estimated with a good precision. Gzyl et al. (2014) worked also in a real site with several sources using a multi-step approach to identify the contaminant release history with a geostatistical method. Butera et al. (2013) developed an analytical method for source localisation based on the geostatistical approach used by Snodgrass and Kitanidis (1997).

Other methods using backward simulation coupled or not to geostatistics, were developed, these methods consider known, homogeneous or heterogeneous, K field and known αL. Bagtzoglou et al. (1992) were among the first to use the inversion of transport equation to localise a source. In their work, transport equation was modelled with an inverse time and an unchanged dispersivity. Neupauer and Wilson (1999) used the adjoint method in a synthetic case with only one observation. The approach was also tested with several observations to localise sources in a real site in a 1D case (Neupauer and Wilson, 2005). Michalak and Kitanidis (2004b) coupled the adjoint method for backward simulation with a geostatistical approach to identify the historical distribution of contaminant. Cupola et al. (2015) compared the adjoint method with the approach developed by Butera et al. (2013) for a source localisation in a sand tank. Authors established the reliability of both methods and showed that the adjoint method is able to detect only one source compared to Cupola et al. (2015) approach.

Recently, Xu and Gomez Hernandez (2018) presented a method to simultaneously identify a source and estimate a K field with a Kalman filter-like approach. The identification was proven with this method but uncertainty about K field generated was significant, also the dispersivity was considered as known. This study is original in that the K field is also estimated together with the source location, which makes the case closer to real world problem.

Table 1 summarises the characteristics of the existing approaches and shows that few studies can be applied to real case. Indeed, the challenge for real world cases is to deal with unknown heterogeneous field of hydraulic properties and unknown transport parameters (dispersivities). In such contexts, the source identification may become challenging. Numerous studies consider known or homogeneous parameter fields. Yet, estimated location may be largely uncertain when inferred with biased parameters. The implementation of source identification methods that require a large number of model runs is often limited by the computational burden associated with advective/dispersive transport models.

Considering these constraints, this paper proposes an optimisation approach using the GLMA to identify jointly the source, hydraulic and transport properties. In order to provide an approach applicable to realistic case studies with unknown parameters, we consider here that hydraulic conductivity, dispersivity and source location are unknown and the contaminant release is constant. To our knowledge the GLMA has not been used before for contaminant source localisation.

As the problem includes a lot of uncertainty and generally very few measurement points, the second objective is to analyse the collection of complementary field data to better constrain the uncertainty of the unknown parameters. More precisely, the objective is to add new measurement points to decrease uncertainties about source location. To place these additional observations, a Data-worth analysis can be used with available tools as PREDUNC (Moore et al., 2010) or PYEMU (a python script) developed by White et al. (2016). Due to the cost of a drilling, only a limited number of data can be added and DW analysis leads to optimize observation strategies with limited costs.

Data Worth analysis (DW) was studied by several authors in hydrogeology to optimize the data collection, while increasing value of information and therefore shrinking more the uncertainty. This analysis is performed with nonlinear optimisation methods such as the ones developed in, Freeze et al., 1992; James and Freeze, 1993; James and Gorelick, 1994; Fu and Gómez-Hernández, 2009. From the 70's to the 2000's, several studies conducted DW analysis in the context of hydrogeological problems (i.e. Gates and Kisiel, 1974; Maddock, 1973; Dausman et al., 2010; Hill et al., 2013).

Other authors used DW analysis for hydrogeological and remediation problems based on finding the best location for Pump & Treat (Tucciarelli and Pinder, 1991; Freeze et al., 1992; James and Gorelick, 1994). Wallis et al. (2014) used the DW for transport problem with an evaluation of the introduction of multiple observations in a tracer test experiment. Wöhling et al. (2016) extended the multiple observations approach with two types of data (hydraulic conductivity K and heads data H) to decrease the predictive uncertainty of the hyporheic fluxes travel time. They used a genetic algorithm to find an optimal combination of predefined number of measurement location. Vilhelmsen and Ferré (2017) worked only with a single type of data, hydraulic conductivity. They extend the DW analysis to select multiple observations to reduce one or multiple forecast. Wöhling et al. (2016) tested the multiple observations with a genetic algorithm. 2500 potential new measurement location is included and 1 million combinations of a maximum of 4 measurements are considered to find the best location of new data, which is not easy to implement for a practical case.

GLMA combined with a Data Worth analysis are used in this manuscript to develop an innovative practical methodology for real world case studies, this is a new contribution to the domain, according to the authors knowledge. More precisely, an iterative method is described for source identification based on GLMA coupled with a linear DW analysis to identify the best location of new measurement. The method is then applied to two synthetic cases with heterogeneous hydraulic conductivity field.

This manuscript is organized as follows. The first section, Material and Methods, details the GLMA method and the iterative approach to add new measurements with the DW analysis description. The second and third sections present the construction of the considered 2D heterogeneous synthetic cases together with the results on the source localisation. The presented approach is eventually summarized and discussed.

Section snippets

Strategy

The global strategy is based on an iterative approach to minimise uncertainties at each phase of the source localisation. The strategy can be applied in real situations, as it requires only a small number of wells in the plume and one or two additional sampling campaigns. Fig. 1 shows a schematic representation of the strategy. One cycle corresponds to one run of the source localisation algorithm (GLMA using PEST++: Parameter ESTimation code) and addition of new observations to provide new

Synthetic cases construction

Dimensions of the case A domain are 200 × 50 m with a mesh size of 1 m in Y-axis and 5 m for the X-axis.

Synthetic case construction

Dimensions of the case B domain are 400 × 100 m with a mesh size of 1 m for y and 5 m for x.

Results for the source localisation

The Table 3 summaries results from optimisation used for cases A and case B (Table 4 shows computing time for each phases). The source localisation in the synthetic case A gave the best estimation with a high correlation between observed and simulated data (Fig. 9). The final value for Ys is 30.7 (29 is the reference value). Indeed, pilot points interpolation method used during the modelling phases of the case A is identical to the method used for the construction case (same pilot points

Discussion and conclusion

This study introduced the development and implementation of a source localisation strategy on two synthetic cases. The method has been developed to be applicable to a real problem with, (i) a limited number of new observations to reduce uncertainties about the source location, (ii) spatial constraints for the source position similar to what happens on a real site, (ii) consideration of a steady state source (stabilized plume) without historical knowledge of contaminant and (iv) unknown

Acknowledgments

This work was developed during Elyess ESSOUAYED Phd and supported by INNOVASOL, Bordeaux INP ENSEGID and "EA 4592 Georessources et Environnement".

References (50)

  • Tamer M. Ayvaz

    A hybrid simulation--optimization approach for solving the areal groundwater pollution source identification problems

    J. Hydrol.

    (2016)
  • A. Bagtzoglou et al.

    Mathematical methods for hydrologic inversion: the case of pollution source identification

    Water Pollut.

    (2005)
  • A. Bagtzoglou et al.

    Application of particle methods to reliable identification of groundwater pollution sources

    Water Resour. Manag.

    (1992)
  • S.N. Bashi-Azghadi et al.

    Pollution source identification in groundwater systems: application of Regret theory and Bayesian networks

    Iran. J. Sci.Technol. Trans. Civil Eng.

    (2016)
  • I. Butera et al.

    Simultaneous identification of the pollutant release history and the source location in groundwater by means of a geostatistical approach

    Stoch. Env. Res. Risk A.

    (2013)
  • J. Carrera et al.

    Estimation of aquifer parameters under transient and steady state conditions: 2. Uniqueness, stability, and solution algorithms

    Water Resour. Res.

    (1986)
  • F. Cupola et al.

    Laboratory sandbox validation of pollutant source location methods

    Stoch. Env. Res. Risk A.

    (2015)
  • A.M. Dausman et al.

    Quantifying data worth toward reducing predictive uncertainty

    Groundwater

    (2010)
  • J.E. Doherty et al.

    Approaches to highly parameterized inversion: pilot-point theory, guidelines, and research directions

    Scientific Investigations Report

    (2010)
  • R.A. Freeze et al.

    Hydrogeological decision analysis: 4. The concept of data worth and its use in the development of site investigation strategies

    Ground Water

    (1992)
  • J.S. Gates et al.

    Worth of additional data to a digital computer model of a groundwater basin

    Water Resour. Res.

    (1974)
  • L.W. Gelhar et al.

    A critical review of data on field‐scale dispersion in aquifers

    Water resources research

    (1992)
  • S.M. Gorelick et al.

    I: I I, V

    Water Resour. Res.

    (1983)
  • S.M. Gorelick et al.

    Aquifer reclamation design: the use of contaminant transport simulation combined with nonlinear programing

    Water Resour. Res.

    (1984)
  • M.A. Guilbeault et al.

    Mass and flux distributions from DNAPL zones in Sandy Aquifers

    Ground Water

    (2005)
  • Cited by (13)

    • Contaminant source characterization in a coastal aquifer influenced by tidal forces and density-driven flow

      2022, Journal of Hydrology
      Citation Excerpt :

      The rest of parameters controlling flow and contaminant transport (such as the initial conditions and external forcing terms) are assumed to be known in all previous studies reviewed in Table 1. In most previous studies the contaminant release is time-dependent and transient (often following a rectangle function as in Xu and Gómez-Hernández, 2016; Chen et al., 2018), but some studies have addressed continuous sources (e.g. Mahar and Datta, 1997; Essouayed et al., 2020). Past studies commonly solve the CSC problem in case studies where simple groundwater velocity fields (with e.g. parallel velocity vectors) prevail under steady state conditions (e.g. Sun et al., 2006; Cupola et al., 2015; Xu and Gómez-Hernández, 2016; Chen et al., 2018; Chen et al., 2021).

    • Application of an iterative source localization strategy at a chlorinated solvent site

      2021, Journal of Hydrology X
      Citation Excerpt :

      As a result, it is still an outstanding challenge to locate contaminant sources in realistic heterogeneous conditions, with unkown K field, while also minimising sample collection cost. The major objective of this study is to locate a contaminant source at a real contaminated site using a new method (Essouayed et al. 2020). To our knowledge, it is the first example of application of an iterative searching strategy at a real site.

    • Inverse modeling of contaminant transport for pollution source identification in surface and groundwaters: a review

      2021, Groundwater for Sustainable Development
      Citation Excerpt :

      However, for a more realistic situation where measurements and parameter estimates are erroneous and uncertain, the optimally designed network certainly improve the source identification results (Skaggs and Kabala 1994; Mahar and Datta 1997). One idea currently being pursued is the optimal design of observation networks through expressing source identification problem in a way that considers the best layout of the observation points as well (Mahar and Datta 1997; Datta et al., 2009a; Yeh et al., 2007; Telci and Aral 2011; Datta et al., 2013; Jha and Datta, 2014; Zhang et al., 2015; Esfahani and Datta 2018; Essouayed et al., 2020). Most recently, for pollution source identification in groundwater Essouayed et al. (2020) developed an innovative data collection strategy which is based on an iterative approach to minimize uncertainties at each phase of the source localization by adding new observation points.

    View all citing articles on Scopus
    View full text