Analysis of accidents through combination of CAST and TRACEr techniques: A case study

https://doi.org/10.1016/j.jlp.2021.104639Get rights and content

Highlights

  • Systemic-based and human error identification techniques complemented each other.

  • Systemic factors related to the human errors arose long before the accident studied.

  • The used methodology obtained better directed and efficacious safety recommendations.

Abstract

This article presents a methodology for accident analysis developed from the combination of CAST - Causal Analysis based on System Theory, derived from the STAMP (Systems-Theoretic Accident Model and Process) accident causation model and the human error analysis/investigation tool TRACEr (Technique for the Retrospective and Predictive Analysis of Cognitive Errors). This combination was proposed from the perceived need to structure one of the stages of the application of the CAST technique where unsafe control actions, or human errors, that occurred in the course of accidental events are analyzed, for which the application of the TRACEr technique proved extremely useful. From the application of the proposed combination to an accident that occurred in an oil production unit in operation on the coast of southeastern Brazil, it was possible to obtain a deeper understanding of the psychological phenomena that preceded one of the unsafe control actions, as well as the identification of performance shaping factors that contributed to its occurrence. These results demonstrated the conceptual cohesion and mutual complementarity of the associated techniques, which allowed a comprehensive understanding of the accidental event and, consequently, the elaboration of safety recommendations more appropriate to the findings of the analysis, better specified to the aspects of human factors involved.

Introduction

There have been efforts in the literature to establish a relationship between the system theoretic accident model and process (STAMP) model and human factor-based tools. Karanikas (2018) assessed the extent to which the theoretical basis of STAMP encompasses the contemporary human factors (HF), and established the points of convergence of the STAMP model with various human factor tools (e.g., SHERPA, TRACEr, and HET). This study concluded that STAMP encompassed the human factors at an appropriate level and treated the modern sociotechnical systems in a more holistic way (compared to the classical approaches to human performance reliability). Additionally, it pointed out that the human factors discussed in the STAMP could be enriched with some other topics mentioned in the reviewed literature, and that these topics could be cataloged as system variables and/or factors that would contribute to an accident. According to Salmon et al. (2012), STAMP has an advantage in that it includes the context in which decisions are made, as well as the category of failures in the mental model. This feature allows the context to be considered in identifying and describing the control failures and helps clarify why mistakes or inappropriate decisions were made.

In this study, we propose a standard methodology to analyze unsafe control actions committed by human controllers that could occur during an accident and the ensuing aftermath. To verify the proposal's validity, we used the same to analyze the accident that occurred on the coast of southeastern Brazil, on 11/02/2015, at the floating production storage and offloading unit Cidade de São Mateus (FPSO CDSM).

An outcome of its application is the correct identification, characterization, and attribution of errors that occurred, as well as the systemic origins of (a) the actions or decisions taken during the accidental event or over the life cycle of the installation, which defined objective aspects of the operating environment and, at a given moment, were linked up to disturbances or abnormalities not adequately addressed by the system; (b) failures of the mental model of the controller, which refers to deficiencies in the way human beings to whom was assigned control of a system, or part of it, understand the components of this system, its operation, and use, and also their expectations on how the systems and components will behave (Wickens et al., 2014); and (c) Performance Shaping Factors (PSFs), defined as conditions that can compromise the ability to perform appropriately, increasing the likelihood that errors will be produced (Edmonds et al., 2016).

The causal models of accidents are the basis for the investigation and analysis of accidents to prevent their recurrence and verify whether the systems are suitable for use from the viewpoint of operational safety (Leveson, 2004). The selection of investigation methods must be guided by the objectives and scope of the analysis. If, in a given event, failures are observed throughout the sociotechnical system, a systemic-based method must be chosen. On the other hand, if the objective of the investigation is to evaluate the decision-making process of the human operators involved in the event, a cognitive task analysis approach must be chosen (Salmon et al., 2011).

In the systemic model of accidental causality STAMP, safety is expressed as a control problem, based on the interactions between the components of the system, and considering the restrictions imposed on the behavior of each one of them and their interactions. This is because, in complex systems, accidents often occur as a result of the interactions between the components that usually worked correctly (N. G. Leveson, 2011, Leveson, 2011). Accidents are analyzed in terms of insufficient control, which occurs owing to an absence or inadequacy of safety restrictions that should have been imposed on the design and operation of the system (Düzgün and Leveson, 2018).

Several studies were conducted to assess the empirical validity and reliability of the STAMP model in the analysis of accidents, by comparing the results of its application with either those of the earlier methods, or those of other non-systemic methods. Filho et al. (2017) used four studies on the same accident carried out by different analysts, two of which used STAMP, while the other two used Accimap. Their findings indicated that the use of a more structured method, such as STAMP could help produce a more reliable accident analysis result, as this technique is especially effective in producing a wider range of recommendations at various levels of the system. Another survey, conducted by Zhou and Yan (2018), compared the results obtained by applying a method used by the Rail Accident Investigation Branch (RAIB) and CAST (causal analysis based on system theory) to an accident that occurred on the London Underground. Their study concluded that CAST provided a global understanding of the organization, its management, and its operational hierarchical control structure. It allowed them to establish a relationship among various causal factors, as well as human errors at the time of the accident. In addition, it also provided a comprehensive perspective to ensure the safety of the system.

For the application of STAMP principles in accident analysis, it was necessary to develop a new technique that facilitated comprehending the entire accident process and identifying the most important systemic causal factors involved (N. Leveson, 2011). This new method, called CAST, provided the ability to examine the entire arrangement of the sociotechnical system, identify weaknesses in the safety control structure, and finally, make ground changes that allowed the elimination of all the causal factors, including the systemic factors.

The identification of the causal factors of an accident begins with the examination of each basic component of the control loop, as shown in Fig. 1, and a determination of how its incorrect operation can contribute to an inadequate control. Three general categories of inadequate controls were proposed: (I) the operation of the controller, (II) the behavior of the actuators and controlled processes, and (III) communication and coordination between the controllers and decision-makers. The failed control actions by a controller can be generated by (1) wrong or missing control inputs/external information, (2) inadequate control algorithms, and (3) inconsistent, incomplete, or incorrect process models. Where human beings are involved in the control structure, the context and behavioral modeling mechanisms also play an important role in the causality, and hence, they must be studied (N. Leveson, 2011).

Wrong or missing inputs refer to situations in which control actions, which are essential for the safe behavior of a controller and must be provided by the levels above it, may not be provided or provided incorrectly (N. Leveson, 2011). In addition, missing or incorrect information not associated with the control can also affect the operation of the controller.

Process models, in the case of human controllers, are called mental models and are related to one of the ways in which information is organized in our long-term memory. This happens, according to Wickens et al. (2014), about central concepts or topics, called schema, for example, an equipment or a system. Schemas of dynamic systems are commonly called mental models and include our understanding of the components of the system, i.e., how the system works and how to use it. A mental model is important because we create our expectations about how the equipment or system will behave from it. Mental models can be personal or shared by many people. They can also vary in their degree of completeness and correctness. Incorrect models may have been conceived wrongly and delivered to the controller through training. A process model can be incomplete if it does not define the appropriate behavior of the controller for all possible process states or how to deal with disturbances, including component failures that are not treated or treated incorrectly (N. Leveson, 2011).

Control algorithms, in the case of human controllers, refer to the procedures used by them. They are influenced by the initial training, procedures provided, process feedback, and experimentation over time (N. Leveson, 2011). Procedures can be unsafe if they have been designed improperly; they can become insecure because the process changes, making them inadequate, and also owing to natural adaptations, within the well-known dichotomy between the prescribed and real work.

The procedure for executing the CAST algorithm technique and the activities for applying the CAST tool are as follows (N. Leveson, 2011).

  • 1.

    Establish the chain of events close to loss

  • 2.

    Identify the system and the hazard (at the system-level) involved in the loss.

  • 3.

    Identify the safety constraints and system requirements associated with that hazard.

  • 4.

    Document the safety control structure.

  • 5.

    Analyze the loss at the level of the physical process.

  • 6.

    Analyze the highest levels of the safety control structure: understand how and why each successive higher level has allowed or contributed to an inadequate control at the level under analysis; the flawed control decisions/actions in terms of the information available to the decision maker; the necessary information that was not available; the behavior modeling mechanisms; the value structures underlying the decision; and the failures in the process models of those who made the decisions and why those failures existed.

  • 7.

    Examine the global coordination and communication contributors to the loss.

  • 8.

    Determine the dynamics and changes in the safety control system and structure related to the loss, and any weakening of the safety control structure over time.

  • 9.

    Generate recommendations.

When taken by human beings, unsafe, inappropriate, or flawed control decisions or actions can be configured, or conceptualized, as human failures. A control, in the case of human controllers, involves the accomplishment of several tasks that are prescribed by the components at superior levels; thus, it is influenced by the training provided, established procedures, and the other conditions for their accomplishment.

The concept of “human error” has been the subject of a large number of academic works. Hollnagel (1998) points out that, despite the fact that the term “error” has a relatively simple meaning in everyday life, its precise technical definition has proved to be extremely difficult, owing to the differences in the premises or starting points employed by the researchers. Hence, it has not been possible to arrive at a consensus on what constitutes the definitive qualities of an “error”. An additional difficulty pointed out by Woods et al. (1994) apud Hollnagel (1998) is that “a human error is the post hoc attribution of a cause to an observed result, in which the cause refers to an action or performance characteristic”. Dekker (2014), in turn, understands that complex systems are not essentially safe and the people who work there need to ensure safety by adapting themselves, under pressure, and acting under uncertainty, through trade-offs between safety and other objectives. Hence according to Dekker, human error is not the cause of a failure, but an effect or a symptom of a deeper problem, and is systematically linked to the characteristics of the tools, tasks, and operational environment, in which human beings operate. Therefore, Dekker concludes that human error should not be the conclusion of an investigation, but its starting point. Nonetheless, the occurrence of human failures during the execution of safety critical tasks (SCTs) has been usually pointed out as contributing factors in serious accident investigations, such as Piper Alpha, Chernobyl, and Texas city (Energy Institute, 2011).

“Human error”, however, has evolved and its scope has broadened considerably and continuously in the past decades, moving from the consideration of individual errors, made by the operators to the understanding that it should also include actions and decisions taken during the previous phases of the life cycle of the sociotechnical system. It should also include managerial and organizational choices made at its higher levels. In support of this, Taylor (2016) points out that, in nearly all accidents investigated, it appears that there is a network of causes and influences that is quite complex, and it is almost impossible to attribute an error exclusively to an operator or a maintenance technician. In fact, many errors have been observed to be contributed by plant designers and managers. Reason (2009) supports this view, stating that any classification of errors that is restricted to the processing of individual information, will provide only a partial picture of the possible varieties of erroneous behavior. Instead, it requires another level of analysis considering that, most of the times, human beings do not plan and execute their actions in isolation, but within a regulated social environment. In this sense, HSE (2007) points out that the consequences of human failures can be immediate or delayed. This time lapse between wrong actions/decisions and their consequences is the basis of the classification presented, which divides them into “active failures” and “latent failures”. The active failures have immediate consequences on health and safety, and are typically, originated by “sharp end” workers. The latent failures are experienced by people, such as designers, decision makers and managers, whose tasks are spatially and temporally distant from the operational activities. In general, latent failures occur in health and safety management systems owing to, for example, inadequate design of facilities and/or equipment, ineffective training and communications, insufficient supervision, and unclear roles and responsibilities (Health and Safety Executive, 2007).

If the human failure analysis is to be included in an accident investigation, then, if the identified critical factors and causes for the accident involve some type of faulty human behavior, and there is sufficient information to adequately specify it, this behavior can and must be analyzed with a validated tool (Edmonds et al., 2016). According to them, there are two requirements for human error analysis: (1) a taxonomy, that is, a structured way of dividing and classifying human failures; and (2) the use of methods for the subsequent analysis of the identified human failures, so that the reasons for the observed behaviors can be understood, as well as appropriate solutions for the same can be identified.

TRACEr (technique for the retrospective and predictive analysis of cognitive errors) was designed to be used, predictively, in human error identification (HEI), and retrospectively, in the analysis of accidents. Thus, this technique can be used at any stage of a system's life cycle (Shorrock, 2002). Its development was based on two models of information processing (Shorrock and Kirwan, 2002), also known as cognitive frameworks. These are (i) the Wickens (1992) framework and (ii) the model developed by Hollnagel and Cacciabue (1991) called simple model of cognition (SMoC). Fig. 2, below, shows the categories of the TRACEr technique taxonomies and their relationships.

Even though TRACEr was originally developed for use in air traffic control (ATC) systems, several modifications of this original technique were developed subsequently, to extend its application to other domains. Among these, Theophilus et al. (2017) mention TRACEr-lite, a simplification of the original technique; TRACEr-RAV, a derivation of TRACEr for use in the analysis of railway accidents; and TRACEr-MAR, an application of TRACEr in a maritime context. Based on the validity of the technique in different industries, these authors adapted the same for oil and gas industry, called TRACEr-OGI. This adaptation was based on the premise that human errors remain one of the major factors in several accidents in this industry. This technique was used to support the application of the proposed methodology to the case studied in the current work as well.

Eleven categories of taxonomies were established in the TRACER-OGI adaptation, as shown in Table 1. They were divided into three major groups; the first group, consists of those taxonomies that describe the context in which the error occurs; the second group deals with the mechanisms that produce the error, or of the operator, and finally, the third group describes the barriers that allow, or would allow, the detection and recovery of the error. The structure explicitly maps the relationships between these groups, to clarify the classification. When ordered, these various classifications of each taxonomy form a detailed image of the event.

While determining the point at which an error occurred, the most effective course of action to combat the error in the future can be directed at the mechanisms that allowed the error to occur, as pointed out by Edmonds et al. (2016).

The cognitive domains in TRACEr include the following (Shorrock and Kirwan, 2002):

  • -

    Perception: errors in detection and visual research and in hearing.

  • -

    Memory: forgetting (or mistakenly remembering) short-term or long-term information, previous actions, or planned actions.

  • -

    Judgment, planning, and decision-making: mistakes in predicting/evaluating, making decisions, and planning.

  • -

    Action execution: actions or speeches performed differently than planned.

Section snippets

Rationale of the proposition

While associating the CAST and TRACEr tools, it was necessary to relate the conceptual bases of both, to determine the feasibility and gains of this combination. When an unsafe control action (UCA) occurs under the responsibility of a human controller, to properly understand such an occurrence, the more structured the method of analysis, based on a cognitive model and valid associated taxonomies is, the more effective the recommendations process will be, and more applicable and better addressed

Application of the methodology to the FPSO CDSM accident

To evaluate the applicability of the proposed methodology in practical scenarios, it was applied to an accident event. In general, the objective was to identify what could be correctly classified as a “human error”, and apply the methodology to clarify its causes, and associate the causes with the systemic contexts, in which they occurred. In this work, only the data pertaining to one of the UCAs identified (henceforth referred to as UCA.2) are presented.

Link between the reconstructed reasoning and human factor concepts, establishing clear connections between the concept and the data (Step 2, sub-step 2.4 of the proposed method)

  • Typology of error

The human error identified as the decision to send the teams into an enclosure, in which there was an explosive atmosphere, is characterized, according to the classification of Reason (2009) as a mistake. It was made during the evaluation of a risk, to confront which, efforts were made that were not effective in reversing the degradation of the safety margins of the system.

Analyzing the error through the scheme proposed by Rasmussen (1983), although there were applicable rules

Discussion

The application of the proposed combination provided elaborate recommendations that are (a) directly related to PSFs that contributed to the failures in the cognitive process, which resulted in the error or specified to the systemic factors that influenced the occurrence of it; (b) addressed the components of the hierarchical control structure that held direct responsibility for their implementation, as expected of any safety recommendation resulting from accident investigations or analysis (c)

Conclusions

The conceptual cohesion and mutual complementarity of the STAMP causal accident model and the human error analysis/investigation analysis tool TRACEr were verified.

The applied methodology also provided gains to the analysis by standardizing a step in the CAST procedure. Although established in its fundamentals, it still did not have a standardized and replicable procedure for its execution.

In particular, this procedural standardization allowed a deeper understanding of the aspects related to

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

All persons who have made substantial contributions to the work reported in the manuscript (e.g., technical help, writing and editing assistance, general support), but who do not meet the criteria for authorship, are named in the Acknowledgements and have given us their written permission to be named. If we have not included an Acknowledgements, then that indicates that we have not received substantial contributions from non-authors.

References (32)

  • S. Dekker

    The Field Guide to Understanding Human Error

    (2014)
  • S. Dekker

    The Field Guide to Human Error Investigations

    (2002)
  • J. Edmonds et al.

    Human Factors in the Chemical and Process Industries: Making it Work in Practice, Human Factors in the Chemical and Process Industries: Making it Work in Practice

    (2016)
  • Guidance on Human Factors Safety Critical Task Analysis

    (2011)
  • A. Filho et al.

    Four studies, two methods, one accident—another look at the reliability and validity of Accimap and STAMP for systemic accident analysis

    Proc. Eur. Saf. Reliab. Conf.

    (2017)
  • Health et al.

    Reducing Error and Influencing Behaviour

    (2007)
  • Cited by (3)

    • Using Fire Dynamics Simulator to reconstruct a fire scene in a hospital-based long-term care facility

      2022, Journal of Loss Prevention in the Process Industries
      Citation Excerpt :

      FDS was used to reconstruct the fire scene as well as the evolution of fire-related hazard factors such as visibility, and CO and O₂ concentration level (Huang, 104349; Chi and Peng, 2017; Wang et al., 2022), and the emergency egress time for the people inside the simulated building was determined (Shi and Xie, 2021). The simulation results were analyzed and compared with the relevant fire scene investigation data (e.g., victim location) to rebuild the fire scene and understand the causes of heavy fire-induced casualties (Fagundes et al., 2022). The findings are expected to improve fire evacuation in medical institutions, thereby promoting more appropriate countermeasures and management systems in care service institutions in the aged society of the future.

    View full text