Full length article
Automated text classification of near-misses from safety reports: An improved deep learning approach

https://doi.org/10.1016/j.aei.2020.101060Get rights and content

Abstract

Examining past near-miss reports can provide us with information that can be used to learn about how we can mitigate and control hazards that materialise on construction sites. Yet, the process of analysing near-miss reports can be a time-consuming and labour-intensive process. However, automatic text classification using machine learning and ontology-based approaches can be used to mine reports of this nature. Such approaches tend to suffer from the problem of weak generalisation, which can adversely affect the classification performance. To address this limitation and improve classification accuracy, we develop an improved deep learning-based approach to automatically classify near-miss information contained within safety reports using Bidirectional Transformers for Language Understanding (BERT). Our proposed approach is designed to pre-train deep bi-directional representations by jointly extracting context features in all layers. We validate the effectiveness and feasibility of our approach using a database of near-miss reports derived from actual construction projects that were used to train and test our model. The results demonstrate that our approach can accurately classify ‘near misses’, and outperform prevailing state-of-the-art automatic text classification approaches. Understanding the nature of near-misses can provide site managers with the ability to identify work-areas and instances where the likelihood of an accident may occur.

Introduction

Construction is a dangerous industry, which is exacerbated by its temporary nature and innate complexity [36], [33], [24], [38]. In the United States, for example, in 2017 a total of 4674 fatalities were recorded in the workplace, with 971 (20.7%) occurring in construction (20.7%). Moreover, most of these fatalities were attributable to being struck by an object, electrocution, and caught-in/between plant and machinery [30]. Contrastingly in China, a staggering 3843 fatal injuries were recorded in 2017 on construction sites [7]. However, the precursors of these fatalities, accidents and near misses are not being systematically identified during construction [52].

A near miss has been defined as an unplanned event that has the potential to cause but does not result in personal injury, environmental or equipment damage, or interruption to regular operation [32]. Henrich [15] observed that approximately 91% of accidents produced no injuries, while 9% were minor and less than 1% major. Also, Heinrich [15] hypothesised that a multitude of near misses is a prerequisite for a workplace injury or fatality. Thus, being able to anticipate the likely precursors of a potential accident would provide site managers with the ability to put in place measures to ensure peoples safety on site.

Safety reports are an extremely valuable source of information that can be used by site managers to learn about the conditions and events that have contributed to the occurrence of accidents and therefore facilitate interventions to ensure positive safety outcomes [13], [22]. Within a safety report, near-misses are typically documented in an unstructured or semi-structured free-text data format, which contains information such as the description of the event, its time, and location. The analysis of near-miss data can be labour intensive and time-consuming, and it requires an understanding of safety to be able to derive meaningful insights. With considerable headway being made in the field of artificial intelligence, we can now automatically process, organize and handle free-text data.

Under the auspices of safety management, numerous text classification-based approaches been developed to acquire a better understanding of accident causation [5]. However, such studies have tended to focus on developing machine learning-based approaches that consist of: (1) the manual extraction of text features; and (2) inputting these features into a classifier. Goh et al. [13], for example, evaluated six machine learning approaches that were used to classify accidents reports. Goh et al. [13] found that support vector machine (SVM) provided the most accurate and reliable results compared to the use of Linear Regression, Random Forest, k-nearest Neighbor, Decision Tree, and Naïve Bayes approaches. Despite the success of hand-crafted features, they are prone to suffering from the problem of weak generalization, which profoundly affects their classification accuracy [2], [23], [21]. In contrast to machine learning, ontology-based approaches have been used for text classification as they utilise the semantic features of free-text [49], but are unavailable for specific domains or different languages [1].

Our research aims to improve the accuracy of text classification and enable generalisations about the nature of near-miss reports to be made. Possessing knowledge of the nature of near-misses, which arise during construction can provide site managers with a much-needed understanding of their various guises and frequencies. As a result, this can provide site managers with an ability to identify work-areas and instances where the likelihood of an accident may occur. Against this backdrop we utilise deep learning and Bidirectional Transformers for Language Understanding (BERT) to develop a robust automatic text classification model of near-misses. Essentially, a BERT has a multi-layer bidirectional transformer encoder-decoder structure that possesses a powerful ability to classify texts. The BERT has been demonstrated to not only attain the highest level of accuracy for text classification but is also generalisable across different databases [53].

Our paper commences by reviewing past studies on text classification in construction and examines the developments of deep learning-based text classification in computer science. Then, we develop our deep learning automated text classification model, which is subsequently tested, and its performance evaluated.

Section snippets

Text classification in construction

A wide range of digital techniques has been adopted in construction projects to improve collaboration, coordination, and the exchange of information between organizations [8], [3], [39], [14]. Varying data formats are collected, stored and exchanged in construction (e.g., images, video, and text), which are used to support planning, control, and decision making [8]. However, a considerable portion of this data is stored in an unstructured or semi-structured text format [40], [8], [3], [39], [14]

Source of data

Huazhong University of Science and Technology has been collaborating with the Wuhan Metro Group Co., Ltd (China), who are in the process of extending its metro-rail network. Consequently, researchers were provided with safety reports from several construction sites involved with the extending Wuhan’s existing metro-rail network. The data were recorded by engineers who have extensive experience in documenting safety incidents, which were categorised in accordance with the Quality and Safety

Architecture of BERT

The BERT’s model architecture is a multi-layer bidirectional transformer encoder-decoder structure [53]. The encoder maps the input sequence of symbol representations x = (x1,…, xn) to a sequence of continuous representations z = (z1,…, zn), and then the decoder generates an output sequence y = (y1,…, yn) of symbols one element at a time. The structure of the encoder and decoder are presented in Fig. 2. Then, the key elements of the transformer are briefly introduced in the following section.

Experiments and results

The developed BERT-based text classification approach was performed on a server with an Intel(R) Xeon(R) E5-2670 CPU, NVIDIA(R) GeForce GTX 1080 with 8 GB video memory GPU, and 64 GB RAM. For the purpose of this research the Python programming language was used. The TensorFlow 1.8.0 deep learning framework was drawn upon.

Discussion

We have developed a novel approach for classifying text contained with safety reports. To our knowledge, there has been a paucity of studies that have utilised deep learning to classify safety incidents, particularly near-misses. The results of the classification process can provide managers with insights into the context and nuances of near-misses. Being able to acquire context knowledge of near-misses provides managers with the ability to respond quickly to situations that may result in

Conclusion

We have developed a deep learning-based text classification approach to classify near-miss data contained within safety reports automatically. The developed BERT approach was designed in the form of bi-directional pre-training for language representation, eliminates the need for heavily-engineered task-specific architectures. The experimental results we have produced demonstrate that our approach can accurately and automatically classify near-miss data contained within safety reports.

The

Declaration of Competing Interest

The authors declared that they have no conflicts of interest to this work. We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

Acknowledgments

The authors would like to acknowledge the financial support provided by the National Natural Science Foundation of China (Grant No. 71732001, No. 51678265, No. 51978302) and China School Council. The authors would also like to thank the Editor and the four anonymous reviewers for their constructive and insightful comments, which helped improve this manuscript.

References (53)

  • H. Li et al.

    Proactive behaviour-based safety management for construction safety improvement

    Saf. Sci.

    (2015)
  • H.R. Marucci-Wellman et al.

    Classifying injury narratives of large administrative databases for surveillance—A practical approach combining machine learning ensembles and human review

    Accid. Anal. Prev.

    (2017)
  • M.M. Mirończuk et al.

    A recent overview of the state-of-the-art elements of text classification

    Expert Syst. Appl.

    (2018)
  • A. Qazi et al.

    Project complexity and risk management (ProCRiM): Towards modelling project complexity driven risk paths in construction projects

    Int. J. Project Manage.

    (2016)
  • G.M. Waehrer et al.

    Costs of occupational injuries in construction in the United States

    Accid. Anal. Prev.

    (2007)
  • R.A. Stein et al.

    An analysis of hierarchical text classification using word embeddings

    Inf. Sci.

    (2019)
  • F. Zhang et al.

    Construction site accident analysis using text mining and natural language processing techniques

    Autom. Constr.

    (2019)
  • B. Zhong et al.

    Convolutional neural networks: deep learning-based classification of building quality problems

    Adv. Eng. Inform.

    (2019)
  • C. Zhou et al.

    Characterizing time series of near-miss accidents in metro construction via complex network theory

    Saf. Sci.

    (2017)
  • Y. Bengio et al.

    Representation learning: A review and new perspectives

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2013)
  • J. Cheng et al.

    Long short-term memory-networks for machine reading

  • China State Administration of Work Safety (Work Safety Summary in 2017)....
  • C. Caldas et al.

    Automated classification of construction project documents

    J. Comput. Civil Eng.

    (2002)
  • X. Fu et al.

    Semi-supervised aspect-level sentiment classification model based on variational autoencoder

    Knowl.-Based Syst.

    (2019)
  • M.Y. Goh et al.

    Construction accident narrative classification: An evaluation of text mining techniques

    Accid. Anal. Prev.

    (2017)
  • H.W. Heinrich

    Industrial Accident Prevention

    (1959)
  • Cited by (0)

    View full text