Developments and applications of Shapley effects to reliability-oriented sensitivity analysis with correlated inputs

https://doi.org/10.1016/j.envsoft.2021.105115Get rights and content

Highlights

  • Reliability-oriented sensitivity analysis for correlated inputs is addressed.

  • New indices for target sensitivity analysis are proposed: the target Shapley effects.

  • Two estimation methods are discussed (Monte Carlo and given-data).

  • The usefulness of the new indices is showcased on two toy-cases and a flood model.

  • A final application is proposed on a COVID-19 epidemiological model.

Abstract

Reliability-oriented sensitivity analysis methods have been developed for understanding the influence of model inputs relative to events which characterize the failure of a system (e.g., a threshold exceedance of the model output). In this field, the target sensitivity analysis focuses primarily on capturing the influence of the inputs on the occurrence of such a critical event. This paper proposes new target sensitivity indices, based on the Shapley values and called “target Shapley effects”, allowing for interpretable sensitivity measures under dependent inputs. Two algorithms (one based on Monte Carlo sampling, and a given-data algorithm based on a nearest-neighbors procedure) are proposed for the estimation of these target Shapley effects based on the 2 norm. Additionally, the behavior of these target Shapley effects are theoretically and empirically studied through various toy-cases. Finally, the application of these new indices in two real-world use-cases (a river flood model and a COVID-19 epidemiological model) is discussed.

Introduction

Nowadays, numerical models are extensively used in all industrial and scientific disciplines to describe physical phenomena (e.g., systems of ordinary differential equations in ecosystem modeling, finite element models in structural mechanics, finite volume schemes in computational fluid dynamics) in order to design, analyze or optimize various processes and systems. These numerical models are often useful from either a scientific standpoint (e.g., by improving the understanding of modeled physical phenomena) or from an engineering standpoint (e.g., to better assist a decision-making process). In addition to this tremendous growth in computational modeling and simulation, the identification and treatment of the multiple sources of uncertainties has become an essential task from the early design stage to the whole system life cycle. As an example, such a task is crucial in the management of complex systems such as those encountered in energy exploration and production (De Rocquigny et al., 2008) and in sustainable resource development (Beven, 2008).

In addition, the emergence of global sensitivity analysis (GSA) of model outputs played a fundamental role in the development and enhancement of these numerical models (see, e.g., Pianosi et al. (2016); Razavi et al. (2021) for recent reviews). Mathematically, if the model inputs (resp. output) are denoted by X (resp. Y) and the model is written G(⋅), such asY=G(X),

GSA aims at understanding the behavior of Y with respect to (w.r.t.) X=(X1,,Xd) the vector of d inputs. GSA has been extensively used as a versatile tool to achieve various goals: for instance, quantifying the relative importance of inputs regarding their influence on the output (a.k.a. “ranking”), identifying the most influential inputs among a large number of inputs (a.k.a. screening) or analyzing the input-output code (i.e., the numerically modeled phenomenon) behavior (Saltelli et al., 2008; Iooss and Lemaître, 2015).

When complex systems are critical or need to be highly safe, numerical models can also be of great help for risk and reliability assessment (Lemaire et al., 2009). Indeed, to track potential failures of a system (which could lead to dramatic environmental, human or financial consequences), numerical models allow a simulation of its behavior far from its nominal one (see, e.g., Richet and Bacchi (2019) in flood hazard assessment). In such a context, analytical or experimental approaches can be inappropriate, too expensive, or too difficult to perform. Based on numerical simulations, the tail behavior of the output distribution can be studied and typical risk measures can be estimated (Rockafellar and Royset, 2015). Among others, the probability that the output Y exceeds a given threshold value tR, given by P(Y>t) and often called a failure probability, is widely used in many applications. When {Y > t} is a rare event (i.e., associated to a very low failure probability), advanced sampling-based or approximation-based techniques (Morio and Balesdent, 2015) are required to accurately estimate the failure probability. In this very specific context, dedicated sensitivity analysis methods have been developed, especially in the structural reliability community (see, e.g., Wu (1994); Song et al. (2009); Wei et al. (2012)). In such a framework, called reliability-oriented sensitivity analysis (ROSA) (Chabridon, 2018; Perrin and Defaux, 2019), the idea is to provide importance measures dedicated to the problem of rare event estimation.

Formally, standard GSA methods mostly focus on quantities of interest (QoI) characterizing the central part of the output distribution (e.g., the variance for Sobol’ indices (Sobol, 1993), the entire distribution for moment-independent indices (Borgonovo, 2007)), while ROSA methods focus on risk measures and their associated practical difficulties (e.g., costly to estimate, inducing a conditioning on the distributions, non-trivial interpretation of the indices). Following Raguet and Marrel (2018), ROSA methods can be categorized regarding the type of study they consider, i.e., according to the following two categories:

  • target sensitivity analysis (TSA) aims at measuring the influence of the inputs (considering their entire input domain) on the occurrence of the failure event. This means considering the following random variable, defined by the indicator function of the failure domain: 1{G(X)>t};

  • conditional sensitivity analysis aims at studying the influence of the inputs on the conditional distribution of the output Y|{G(X) > t}, i.e., exclusively within the critical domain. By Eq. (1), a conditioning also appears on the inputs' domain.

Various indices have been proposed to tackle these two types of studies (see, e.g., Li et al. (2012); Wei et al. (2012); Perrin and Defaux (2019); Marrel and Chabridon (2021)). The present paper is dedicated to ROSA (under the assumption that the QoI is a failure probability) and focuses on a TSA study. However, a new consideration for TSA is addressed in the present work: the possible statistical dependence between the inputs.

Indeed, most of the common GSA methods (and it is similar for the ROSA ones) have been developed under the assumption of independent inputs. As an example, the well-known Sobol’ indices (Sobol, 1993) which rely on the so-called functional analysis of variance (ANOVA) and Hoeffding decomposition (Hoeffding, 1948), can be directly interpreted as shares of the output variance that are due to each input and combination of inputs (called “interactions”) as long as the inputs are independent.

When the inputs are dependent, the inputs' correlations dramatically alter the interpretation of the Sobol’ indices. To handle this issue, several approaches have been investigated in the literature. For instance, Jacques et al. (2006) proposed to estimate indices for groups of correlated inputs. However, this approach does not allow for a quantification of the influence of individual inputs. Amongst other similar works, Li et al. (2010); Chastaing et al. (2012) proposed to extend the functional ANOVA decomposition to a more general one (e.g., taking the covariance into account). However, the indices obtained for these approaches can be negative, which limits their practical use due to interpretability challenges (i.e., as a share of the output's variance). In addition to this, other works (see, e.g., Xu and Gertner (2008); Mara and Tarantola (2012)) considered a Gram–Schmidt procedure to decorrelate the inputs and proposed to estimate two kinds of contributions for each variable (an uncorrelated one and a correlated one). These developments finally resulted in the proposition of a set of four Sobol’ indices (instead of the two standard ones which are the first-order index and total index in the independent case) which enable the correlation effects to be fully captured in a GSA (Mara et al., 2015). Despite this achievement, this approach remains difficult to implement in practice (see Benoumechiara and Elie-Dit-Cosaque (2019) for extensive studies). Finally, the VARS approach (Do and Razavi, 2020) (allowing a thorough analysis of the inputs-output relationships) can handle input correlation but is out of scope of the present work which only focuses on variance-based sensitivity indices, directly computed from the numerical model.

Recently, another method has been developed by considering another type of indices: the Shapley effects. The initial formulation originates from the “Shapley values” developed in the field of Game Theory (Shapley, 1953; Osborne and Rubinstein, 1994). The underlying idea is to fairly distribute both gains and costs to multiple players working cooperatively. By analogy with the GSA framework, the inputs can be seen as the players while the overall process can be seen as attributing shares of the output variability to the inputs. Considering the variance of the output in a GSA formulation leads to the so-called “Shapley effects” proposed by Owen (2014). In the same vein, Owen and Prieur (2017); Iooss and Prieur (2019); Benoumechiara and Elie-Dit-Cosaque (2019) bridge the gap between Sobol’ indices and Shapley effects while illustrating the usefulness of these new indices to handle correlated inputs in the GSA framework.

Thus, the present work attempts to extend the use of Shapley effects to the ROSA context. Overall, the main objective is to develop a ROSA index which enables TSA to be performed (i.e., capturing the influence of the inputs on a risk measure, typically a failure probability here) under the constraint of dependent inputs. This work relies on the use of recent promising results and numerical tools (both in field of TSA Spagnol (2020) and Shapley effects’ estimation Broto et al. (2020)).

The outline of this paper is the following. Section 2 is devoted to a pedagogical introduction of the statistical dependence challenges for variance-based sensitivity indices, that can be solved by Shapley effects. Section 3 presents a new formulation of TSA, based on Shapley effects leading to the novel target Shapley effects, while Section 4 develops two algorithms for their estimation. Section 5 provides illustrations on simple toy-cases which give analytical expressions of the target Shapley effects, allowing deeper appreciation of their behavior. Section 6 applies these new sensitivity indices to two use-cases: a simplified model of a river flood and an epidemiological model applied to the COVID-19 pandemic. Finally, Section 7 gives conclusions and research perspectives.

Throughout this paper, the mathematical notation E() (resp. V()) will represent the expectation (resp. variance) operator.

Section snippets

Variance-based sensitivity analysis with dependent inputs: the Shapley solution

While devoted to computer experiments, GSA has close connections with multivariate data analysis and statistical learning (Christensen, 1990; Hastie et al., 2002). Indeed, in all these topics, one important issue is often to provide a weight to some variables (the inputs) w.r.t. its impact on another variables (the outputs). Depending on the domain, such a weight can either be called a “sensitivity index” or an “importance measure”. A very convenient way is to base these weights on the ANOVA

A brief overview of reliability-oriented sensitivity analysis

When focusing on complex systems, one often needs to prepare for possible critical events, which potentially have a low occurrence probability but lead to a system failure. Such failures may have dramatic human, environmental and economic consequences, depending on the context. The fields of reliability assessment and risk analysis (Lemaire et al., 2009; Richet and Bacchi, 2019), aim to prevent these failures. Mathematically, a reliability problem focuses on a risk measure computed from the

Estimation methods and practical implementation of target Shapley effects

The estimation of the target Shapley effects Eq. (26) can be done into two distinct steps:

  • Step #1: estimation of the conditional elements, i.e., the estimation of T-SA2 or T-EA for all APd;

  • Step #2: an aggregation procedure, i.e., a step to compute the T-Shj by plugging in the previous estimations of Step #1 in Eq. (26).

In the following, two estimation methods are proposed: the first one based on a Monte Carlo sampling procedure, and the second one based on a nearest-neighbor approximation

Analytical results using a linear model with Gaussian inputs

To illustrate the behavior of the proposed indices, a first toy-case involving a linear model and multivariate Gaussian inputs is presented. In this setting, analytical results can be derived for the marginal distributions of all subsets of inputs, their conditional distribution, and the distribution of the output given a subset of inputs. Subsequently, analytical formulas can be obtained for both the target Sobol’ indices and the target Shapley effects.

Let (β0,β)=(β0,β1,,βd)Rd+1, μ=(μ1,,μd)

Applications

In this section, two models related to real-world phenomena which include dependent random inputs are studied in the context of TSA.

Conclusions and ways forward

This paper proposes a set of novel indices adapted to target sensitivity analysis while being able to handle correlated inputs. The objective is to quantify the importance of inputs on the occurrence of a critical failure event of the system under study. The proposed indices are based on a cooperative Shapley procedure which aims at allocating the effects of the interaction and correlation equally between all the inputs in the same manner as the Shapley effects in global sensitivity analysis.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

We are grateful to the three anonymous referees as well as Jérôme Morio and François Bachoc for their helpful remarks. We also thank Sébastien Da Veiga, Clémentine Prieur and Fabrice Gamboa for interesting discussions and for having provided the dataset on the COVID-19 model. Finally, we would like to thank Victoria Stanford for her help in proofreading this work.

References (71)

  • J. Morio

    Extreme quantile estimation with nonparametric adaptive importance sampling

    Simulat. Model. Pract. Theor.

    (2012)
  • J. Nossent et al.

    Sobol’ sensitivity analysis of a complex environmental model

    Environ. Model. Software

    (2011)
  • F. Pianosi et al.

    Sensitivity analysis of environmental models: a systematic review with practical workflow

    Environ. Model. Software

    (2016)
  • S. Razavi et al.

    The future of sensitivity analysis: an essential discipline for systems modelling and policy making

    Environ. Model. Software

    (2021)
  • G. Sarazin et al.

    Estimation of high-order moment-independent importance measures for Shapley value analysis

    Appl. Math. Model.

    (2020)
  • S. Song et al.

    Subset simulation for structural reliability sensitivity analysis

    Reliab. Eng. Syst. Saf.

    (2009)
  • P. Wei et al.

    Efficient sampling methods for global reliability sensitivity analysis

    Comput. Phys. Commun.

    (2012)
  • C. Xu et al.

    Uncertainty and sensitivity analysis for models with correlated parameters

    Reliab. Eng. Syst. Saf.

    (2008)
  • N. Benoumechiara et al.

    Shapley effects for sensitivity analysis with dependent inputs: bootstrap and kriging-based algorithms

    ESAIM. Proc. Surv.

    (2019)
  • K. Beven

    Environmental Modelling: an Uncertain Future?

    (2008)
  • A. Brandenburger

    Cooperative Game Theory: Characteristic Functions, Allocations, Marginal Contribution

    (2007)
  • B. Broto et al.

    Variance reduction for estimation of shapley effects and adaptation to unknown input distribution

    SIAM/ASA J. Uncertain. Quantification

    (2020)
  • T. Browne et al.

    Estimate of Quantile-Oriented Sensitivity Indices

    (2017)
  • V. Chabridon

    Reliability-oriented Sensitivity Analysis under Probabilistic Model Uncertainty – Application to Aerospace Systems

    (2018)
  • V. Chabridon et al.

    Mechanical Engineering under Uncertainties

    (2020)
  • G. Chastaing et al.

    Generalized Hoeffding-Sobol decomposition for dependent variables - application to sensitivity analysis

    Electron. J. Stat.

    (2012)
  • R. Christensen

    Linear Models for Multivariate, Time Series and Spatial Data

    (1990)
  • L. Clouvel

    Uncertainty Quantification of the Fast Flux Calculation for a PWR Vessel

    (2019)
  • L. Cui et al.

    Moment-independent importance measure of basic random variable and its probability density evolution solution

    Sci. China Tech. Sci.

    (2010)
  • N. Do et al.

    Correlation effects? A major but often neglected component in sensitivity and uncertainty analysis

    Water Resour. Res.

    (2020)
  • K. Elie-Dit-Cosaque

    Développement de mesures d’incertitudes pour le risque de modèle dans des contextes incluant de la dépendance stochastique, Ph.D. thesis

    Université Claude Bernard - Lyon

    (2020)
  • B.E. Feldman

    Relative importance and value

    SSRN Electron. J.

    (2005)
  • J.-C. Fort et al.

    New sensitivity analysis subordinated to a contrast

    Commun. Stat. Theor. Methods

    (2016)
  • J. Fox et al.

    Generalized collinearity diagnostics

    J. Am. Stat. Assoc.

    (1992)
  • U. Grömping

    Relative importance for linear regression in R: the Package relaimpo

    J. Stat. Software

    (2006)
  • Cited by (21)

    • Sensitivity analysis: A discipline coming of age

      2021, Environmental Modelling and Software
      Citation Excerpt :

      In addition, most often, emulators (cheap surrogate models developed from the full model representation), typically rooted in machine learning, are used to generate sensitivity measures such as Sobol’. The other papers in the present special issue represent an interesting compilation of ongoing SA research topics: comparing the efficiency of existing SA methods (Puy et al., 2021a; Azzini et al., Rosati); SA for spatially and temporally distributed outputs (Roux et al., 2021); SA for problems with dependent variables (Il Idrissi et al., 2021); development of new software tools for SA (Kimet al., 2021); development of efficient visualization approaches to understand SA results (Şalap-Ayça et al., 2021); application of SA to statistical modelling problems such as propensity score matching (Woo et al., 2021); combining methods such as variance based and distribution based (Baroni and Francke, 2020); and new applications of SA to large models (Korgaonkar et al., 2020; Susini and Todd, 2021). Historically, various heuristics based on principles of SA (but not named so) have been the fundamental underpinnings of a variety of analyses in modelling and decision making.

    View all citing articles on Scopus
    View full text