Interfaces with Other Disciplines
Multi-factor dependence modelling with specified marginals and structured association in large-scale project risk assessment

https://doi.org/10.1016/j.ejor.2021.04.043Get rights and content

Highlights

  • Present a multi-factor dependence model for large-scale project risk assessment.

  • Elicit structured association (SA) factors from ubiquitous project plans.

  • Transform SA factors into a mathematically consistent correlation matrix.

  • Evaluate performance against distribution types, skewness, and project sizes.

  • Demonstrate scalability to large-scale projects given partially specified dependence.

Abstract

This paper examines the high-dimensional dependence modelling problem in the context of project risk assessment. As the dimension of uncertain performance units (i.e., itemized costs and activity times) in a project increases, specifying a feasible correlation matrix and eliciting relevant pair-wise information, either from historical data or with expert judgement, becomes practically unattainable or simply not economical. This paper presents a factor-driven dependence elicitation and modelling framework with scalability to large-scale project risks. The multi-factor association model (MFAM) accounts for hierarchical relationships of multiple association factors and provides a closed-form solution to a complete and mathematically consistent correlation matrix. Augmented with the structured association (SA) technique for systematic identification of hierarchical association factors, the MFAM offers additional flexibility of utilizing the minimum information available in standardized, ubiquitous project plans (e.g., work breakdown structure, resource allocation, or risk register), while preserving the computational efficiency and the scalability to high dimensional project risks. Numerical applications and simulation experiments show that the MFAM, further combined with extended analytics (i.e., parameter calibration and optimization), provides credible risk assessments (with accuracy comparable to full-scale simulation) and further enhances the realism of dealing with high-dimensional project risks utilizing all relevant information.

Introduction

This paper tackles the problem of dependence modelling for large-scale project risk assessment. Dependence modelling constitutes an essential element of risk-adjusted project planning and predictive control, in particular for probabilistic cost estimates (GAO, 2020; Garvey, Book & Covert, 2016), stochastic network schedules (GAO, 2015; Trietsch, Mazmanyan, Gevorgyan & Baker, 2012; van Dorp, 2005), project-end outcome updates (Cho, 2009; Kim, 2015), and predictive performance tracking (Kim & Kwak, 2018). Inter-dependence between project tasks is also one of the driving factors of project complexity along with the project size and the variety of tasks (Baccarini, 1996; Tatikonda & Rosenthal, 2000). Consequently, accounting for the nature and the degree of dependence is a demanding challenge for proper management of modern projects with increasing complexity and structural uncertainty (Mo, Yin & Gao, 2008; Williams, 1999).

The need for quantitative risk assessment as a decision support tool has been well recognized since several seminal papers in capital investments (Hertz, 1964) and operations research (Malcolm, Roseboom, Clark & Fazar, 1959; Van Slyke, 1963). In practice, however, projects often behave in a way that clashes with what is expected from the best practices and standards for successful completion in time (Love, Wang, Sing & Tiong, 2013; Schonberger, 1981) and within the budget (Flyvbjerg, 2006; Love, Sing, Wang, Edwards & Odeyinka, 2013). Elementary statistics shows that project cost and time, as a risk-adjusted sum of random variables, tend to exceed the sum of their isolated marginal estimates when there exist positive inter-variable associations. Empirical data also suggest that (i) inter-variable correlations are commonly observed and (ii) ignoring correlation leads to systematic underestimation of the real risks (Chau, 1995; Newton, 1992; Skitmore & Ng, 2002). Moreover, as the uncertainty dimension increases the percent underestimation of the total cost (or time) drastically increases (Garvey et al., 2016, p.322). Subsequently, a proper consideration of inter-variable dependence is widely emphasized as a crucial element of contingency settings and project risk assessment in general (GAO, 2015, p.115; GAO, 2020, p.155; NASA, 2013, pp.33–37).

In theory, dependence modelling can be straightforward. In a narrow sense, a vector of dependent random variables can be specified as a mixture of univariate marginals (X) and the corresponding correlation matrix (ΣX).XΣX;ΣX

The correlation-driven dependent vector (XΣ) in Eq. (1) provides a mathematically rigorous representation. However, specifying a feasible correlation matrix is a data-intensive process. In practice, the burden of data collection for correlation specification can be unattainably challenging (Lurie & Goldberg, 1998). In particular, high-dimensional dependence modelling can be overly restrictive, mostly due to three well-known challenges, which can be collectively referred to as the curse of dimensionality. First of all, the number of pairwise correlations required to fully specify a correlation matrix increases quadratically. Although the general perception of large-scale projects changes over time, projects with thousands or more activities are becoming increasingly common in practice (GAO, 2015, pp.102–104; Safran, 2020). For example, a risk model with 1000 variables requires assessments of 1000C2 = 499,500 correlation coefficients. The burden of data collection in this scale, either from historical data or with expert judgment, would be practically unattainable, or simply not economical. Even more challenging, there are also situations where pair-wise correlations are restricted by the selection of marginal distributions (Demirtas & Hedeker, 2011; Lurie & Goldberg, 1998). A more detailed discussion on the curse of dimensionality will be presented in Section 2.

A sensible way of dealing with the curse of dimensionality is to reconstruct the problem in a way that reduces the data collection and elicitation burden (Morgan, Henrion & Small, 1992). A decision maker may conveniently avert the dimensionality issues by adopting drastic simplification assumptions, while sacrificing the flexibility of representing various dependence combinations (Goh & Sim, 2011; Trietsch et al., 2012). In the project control literature, Bayesian networks were examined as an analytic framework for factor modelling and adaptive project time updating (Cho, 2009; van Dorp, 2020). Cho (2009) presented a single-factor Bayesian model in which all activities in a project are influenced by a single resource factor. van Dorp (2020) also proposed a single-factor dependence model that employs a new family of power distributions, the two-sided power distributions, to represent the mode of a PERT (program evaluation and review technique) distribution. As a robust solution for large-scale risk analysis, however, single-factor approaches can be overly restrictive in the way that all pairwise correlations in the analysis are calibrated by a single factor. In these regards, a dependence model can be considered more realistic when it offers the flexibility of accounting for multiple risk factors commonly observed in real project settings.

Methodologically, however, increasing the number of risk factors for dependence specification is also prohibited by the quantity and quality of the data available for corresponding parameter estimation. Whenever available, empirical data from past projects or expert assessments should be used. When a project is more predictable with a plethora of similar projects in the past, empirical data or subjective assessments by experts can be used. At the same time, there are more challenging projects with unique scope, innovative methodologies, and increased complexities in terms of component interfaces and project scales. These projects are as a rule less predictable and can be hardly characterized with quantitative data collected from past projects. Consequently, the nature and degree of risks inevitable in such one-of-a-kind projects cannot be fully quantified using empirical data alone. Here we observe a dilemma, somewhat inevitable in project risk assessment: the less there exist relevant empirical data from similar projects, the more the need for a sensible risk assessment increases. As a viable alternative, subjective assessments of the pair-wise correlations can be employed. Yet, the efficacy of subjective correlation assessments rapidly diminishes as the number of random variables increases mostly due to the mathematical consistency required for a feasible correlation matrix (See Section 2.1 for more details of this issue).

These observations indicate that the robustness of a solution to the dependence modelling problem for large-scale project risk analysis can be enhanced with three analytical features: multi-factor capability, applicability under limited empirical data, and dimensional scalability. Accordingly, the objective of this article is set to present a multi-factor dependence modelling framework that provides the flexibility of addressing the limited data availability, while preserving the scalability to high-dimensional project risks. To achieve this goal, we investigate a dependent vector that can be fully specified with three input elements:Xrb;r,Ψwhere b = (b1,…,bd)T is a vector of observable random variables of which marginals, fb={f(bi)}, (i = 1,…,d) are specified independently prior to accounting for possible dependence; r = (r1,…,rK)T, is a vector of association factor (AF) variables that are elicited and specified as the proxy of the pairwise dependence between the base variables (b); and Ψ= [ψik] (i = 1,…,d; k = 1,…,K) is a d × K allocation matrix (AM) of binary elements, which defines the relationships between b and r.

Specifically, we present an analytic framework with two stages: the structured association (SA) and the multi-factor association model (MFAM). First, the SA establishes a hierarchical structure of all relevant AFs identified in a project, providing a qualitative solution to the multi-factor capability and the applicability to limited data for a robust dependence model. Then, the MFAM transforms the qualitative SA information into a quantitative, mathematically consistent correlation matrix (Σr) of a vector of specified marginals. Adopting analytic (non-simulation) approaches (i.e., the second-moment approach), the MFAM offers a computationally efficient algorithm, readily scalable to high dimensional risk modelling and analysis.

Note that inter-variable association may arise due to causal relationship between variables or a common factor that may affect two or more variables concurrently (Bolstad, 2007, p.3). The key premise underlying our approach is that by selecting the AFs wisely a decision maker is able to establish a balance between the data availability and the modelling flexibility, while effectively mitigating the curse of dimensionality. A selection of the AFs can be considered wise if the marginal distributions of the factors and the corresponding allocation matrix can be elicited and specified using all relevant information readily available in standard project settings. In this paper, we focus on establishing stochastic association between project performance units (i.e., itemized costs and activity times, hereinafter PUs) using all relevant information accessible in standard project environments, for example, the work breakdown structure (WBS), resource plans, and a risk register (PMI, 2013, p.163). It should be properly emphasized that project risk information from various sources is often available in unstructured forms (e.g., drawings, organizational plans, and resource plans) (Xing, Zhong, Luo, Li & Wu, 2019). In particular, a risk register is used to identify and track all relevant risks in a project and their attributes relevant to project outcomes (PMI, 2013). The information embedded in such project plans is conspicuously observable and thus objective. Yet project plans exist, as a rule, in qualitative formats. In this study, we adopt an ontological approach to transform any qualitative association information reflected in project plans into quantitative dependence information expressed as a correlation matrix. Ontology, as a branch of philosophy, offers a flexible perspective on the evolving nature of projects (Morris, 2013, p.236). In analytic settings, ontology allows a framework that represents the knowledge in a domain as a set of concepts and their relationships (Rodger, 2013) and has attracted growing attention in risk studies, for instance, in safety (Xing et al., 2019), supply chain (Palmer et al. 2018), and environment (Scheuer, Haase & Meyer, 2013).

The main contributions of this article can be highlighted in three aspects.

  • The SA-MFAM approach enhances the realism of dependence modelling by offering the flexibility of accounting for multiple risk factors observed in individual projects based on all relevant information readily available in individual projects.

  • The MFAM yields an analytic, closed-form solution to the correlation matrix, which can be further parameterized and calibrated meeting the limited data availability in individual projects.

  • Adopting a factor-driven approach, the MFAM always generates a mathematically consistent matrix, preserving the scalability to high-dimensional risk modelling and analysis.

The rest of this article is organized as follows. The following section outlines the challenges in large-scale dependence modelling and presents the SA technique as a viable solution. Section 3 formulates the MFAM. In Section 4, we carry out a set of credibility tests and evaluate the performance of MFAM against Monte Carlo simulation. Section 5 demonstrates the implications of dependence (or ignoring dependence) in project decision making using three SA-MFAM applications. Conclusions and future research issues are summarized in Section 6.

Section snippets

Large-scale correlation assessment

Uncertainty models, in a generic setting, have a form of a joint distribution with three components: (i) a function of random variables in d-dimension, G(X), X = (X1,…,Xd)T, (ii) a set of univariate marginal distributions, fX={f(Xi)},i=1,...,d and (iii) a set of dependence parameters, mostly in terms of a correlation matrix (i.e., a symmetric, positive semi-definite matrix with unit diagonal elements), ΣX=([ρij]:1ρij1;1i,jd).

In large-scale project risk analyses, the efforts to develop a

Multi-factor association model

The SA-induced dependence between PUs involves, as a rule, multiple factors. This section presents a multi-factor association (MFA) model that transforms the relative association information embedded in a hierarchical set of multiple AFs into a mathematically consistent correlation matrix. Characteristic properties and practical implementation options of the MFA model are also highlighted.

Credibility test

A test project is analyzed to demonstrate the performance (i.e., accuracy, robustness, and computational efficiency) of the MFA analysis as compared against Monte Carlo simulation. The project parameters are designed in a way that challenges the primary premise underlying the MFAM (i.e., the MOM approximation). Specifically, the test settings account for three control factors: (i) asymmetricity of variables, (ii) effects of base distribution types (PERT-beta vs. triangular distribution), and

Applications

Empirical data from previous projects or general experiences provide valuable, although often incomplete, dependence information relevant to a new project (Ranasinghe, 2000; Touran & Wiser, 1992; Wang & Huang, 2000). It would be sensible then to make full use of all relevant information whenever possible. The MFA model's closed-form correlation matrix provides a robust solution to the problem of utilizing incomplete dependence information for coherent risk analysis. This section presents three

Conclusions

Proper dependence consideration is crucial for realistic project risk assessment and making informed decisions under uncertainty. We present an analytic framework that combines a systematic way of accounting for multiple risk factors (the SA) and a quantitative dependence model based on the second moment approach (the MFAM). The SA-MFAM offers an analytic, closed-form, and computationally tractable alternative to the correlation-driven approaches for dependence modelling. Specifically, the

References (56)

  • J.R. van Dorp

    Statistical dependence through common risk factors: With applications in uncertainty analysis

    European Journal of Operational Research

    (2005)
  • J.R. van Dorp

    A dependent project evaluation and review technique: A Bayesian network approach

    European Journal of Operational Research

    (2020)
  • C.-.H. Wang et al.

    A new approach to calculating project cost variance

    International Journal of Project Management

    (2000)
  • C. Werner et al.

    Expert judgement for dependence in probabilistic modelling: A systematic literature review and future research directions

    European Journal of Operational Research

    (2017)
  • T.M. Williams

    The need for new paradigms for complex projects

    International Journal of Project Management

    (1999)
  • X. Xing et al.

    Ontology for safety risk identification in metro construction

    Computers in Industry

    (2019)
  • T. Bedford et al.

    Vines: A new graphical model for dependent random variables

    Annals of Statistics

    (2002)
  • W.M. Bolstad

    Introduction to Bayesian statistics

    (2007)
  • M.C. Cario et al.

    Modeling and generating random vectors with arbitrary marginal distributions and correlation matrix

    (1997)
  • K.W. Chau

    Monte Carlo simulation of construction costs using subjective data

    Construction Management & Economics

    (1995)
  • R.T. Clemen et al.

    Assessing dependence: Some experimental results

    Management Science

    (2000)
  • J. Cohen

    Statistical power analysis for the behavioral sciences,

    (1988)
  • H. Demirtas et al.

    A practical way for computing approximate lower and upper correlation bounds

    The American Statistician

    (2011)
  • B. Flyvbjerg

    From Nobel prize to project management: Getting risks right

    Project Management Journal

    (2006)
  • GAO Schedule assessment guide: Best practices for project schedules

    Applied research and methods

    (2015)
  • GAO Cost estimating and assessment guide: Best practices for developing and managing capital program costs

    Applied research and methods

    (2020)
  • P.R. Garvey et al.

    Probability methods for cost uncertainty analysis: A systems engineering perspective

    (2016)
  • S. Ghosh et al.

    Behavior of the NORTA method for correlated random vector generation as the dimension increases

    ACM Transactions on Modeling and Computer Simulation (TOMACS)

    (2003)
  • Cited by (4)

    • A method based on the theories of game and extension cloud for risk assessment of construction safety: A case study considering disaster-inducing factors in the construction process

      2022, Journal of Building Engineering
      Citation Excerpt :

      Choudhry et al. [32] analysed the unsafe behaviour of construction workers. To evaluate large-scale projects, Kim [33] used multidimensional risk factors that comprehensively summarise the safety status of the project sites. However, relevant studies typically employed evaluation methods that do not consider the safety inspection data for construction sites.

    • Risk Assessment of Bridge Construction Project Based on Fast ICA Algorithm

      2023, 2nd IEEE International Conference on Distributed Computing and Electrical Circuits and Electronics, ICDCECE 2023
    View full text