Full length articleAutomated text classification of near-misses from safety reports: An improved deep learning approach
Introduction
Construction is a dangerous industry, which is exacerbated by its temporary nature and innate complexity [36], [33], [24], [38]. In the United States, for example, in 2017 a total of 4674 fatalities were recorded in the workplace, with 971 (20.7%) occurring in construction (20.7%). Moreover, most of these fatalities were attributable to being struck by an object, electrocution, and caught-in/between plant and machinery [30]. Contrastingly in China, a staggering 3843 fatal injuries were recorded in 2017 on construction sites [7]. However, the precursors of these fatalities, accidents and near misses are not being systematically identified during construction [52].
A near miss has been defined as an unplanned event that has the potential to cause but does not result in personal injury, environmental or equipment damage, or interruption to regular operation [32]. Henrich [15] observed that approximately 91% of accidents produced no injuries, while 9% were minor and less than 1% major. Also, Heinrich [15] hypothesised that a multitude of near misses is a prerequisite for a workplace injury or fatality. Thus, being able to anticipate the likely precursors of a potential accident would provide site managers with the ability to put in place measures to ensure peoples safety on site.
Safety reports are an extremely valuable source of information that can be used by site managers to learn about the conditions and events that have contributed to the occurrence of accidents and therefore facilitate interventions to ensure positive safety outcomes [13], [22]. Within a safety report, near-misses are typically documented in an unstructured or semi-structured free-text data format, which contains information such as the description of the event, its time, and location. The analysis of near-miss data can be labour intensive and time-consuming, and it requires an understanding of safety to be able to derive meaningful insights. With considerable headway being made in the field of artificial intelligence, we can now automatically process, organize and handle free-text data.
Under the auspices of safety management, numerous text classification-based approaches been developed to acquire a better understanding of accident causation [5]. However, such studies have tended to focus on developing machine learning-based approaches that consist of: (1) the manual extraction of text features; and (2) inputting these features into a classifier. Goh et al. [13], for example, evaluated six machine learning approaches that were used to classify accidents reports. Goh et al. [13] found that support vector machine (SVM) provided the most accurate and reliable results compared to the use of Linear Regression, Random Forest, k-nearest Neighbor, Decision Tree, and Naïve Bayes approaches. Despite the success of hand-crafted features, they are prone to suffering from the problem of weak generalization, which profoundly affects their classification accuracy [2], [23], [21]. In contrast to machine learning, ontology-based approaches have been used for text classification as they utilise the semantic features of free-text [49], but are unavailable for specific domains or different languages [1].
Our research aims to improve the accuracy of text classification and enable generalisations about the nature of near-miss reports to be made. Possessing knowledge of the nature of near-misses, which arise during construction can provide site managers with a much-needed understanding of their various guises and frequencies. As a result, this can provide site managers with an ability to identify work-areas and instances where the likelihood of an accident may occur. Against this backdrop we utilise deep learning and Bidirectional Transformers for Language Understanding (BERT) to develop a robust automatic text classification model of near-misses. Essentially, a BERT has a multi-layer bidirectional transformer encoder-decoder structure that possesses a powerful ability to classify texts. The BERT has been demonstrated to not only attain the highest level of accuracy for text classification but is also generalisable across different databases [53].
Our paper commences by reviewing past studies on text classification in construction and examines the developments of deep learning-based text classification in computer science. Then, we develop our deep learning automated text classification model, which is subsequently tested, and its performance evaluated.
Section snippets
Text classification in construction
A wide range of digital techniques has been adopted in construction projects to improve collaboration, coordination, and the exchange of information between organizations [8], [3], [39], [14]. Varying data formats are collected, stored and exchanged in construction (e.g., images, video, and text), which are used to support planning, control, and decision making [8]. However, a considerable portion of this data is stored in an unstructured or semi-structured text format [40], [8], [3], [39], [14]
Source of data
Huazhong University of Science and Technology has been collaborating with the Wuhan Metro Group Co., Ltd (China), who are in the process of extending its metro-rail network. Consequently, researchers were provided with safety reports from several construction sites involved with the extending Wuhan’s existing metro-rail network. The data were recorded by engineers who have extensive experience in documenting safety incidents, which were categorised in accordance with the Quality and Safety
Architecture of BERT
The BERT’s model architecture is a multi-layer bidirectional transformer encoder-decoder structure [53]. The encoder maps the input sequence of symbol representations x = (x1,…, xn) to a sequence of continuous representations z = (z1,…, zn), and then the decoder generates an output sequence y = (y1,…, yn) of symbols one element at a time. The structure of the encoder and decoder are presented in Fig. 2. Then, the key elements of the transformer are briefly introduced in the following section.
Experiments and results
The developed BERT-based text classification approach was performed on a server with an Intel(R) Xeon(R) E5-2670 CPU, NVIDIA(R) GeForce GTX 1080 with 8 GB video memory GPU, and 64 GB RAM. For the purpose of this research the Python programming language was used. The TensorFlow 1.8.0 deep learning framework was drawn upon.
Discussion
We have developed a novel approach for classifying text contained with safety reports. To our knowledge, there has been a paucity of studies that have utilised deep learning to classify safety incidents, particularly near-misses. The results of the classification process can provide managers with insights into the context and nuances of near-misses. Being able to acquire context knowledge of near-misses provides managers with the ability to respond quickly to situations that may result in
Conclusion
We have developed a deep learning-based text classification approach to classify near-miss data contained within safety reports automatically. The developed BERT approach was designed in the form of bi-directional pre-training for language representation, eliminates the need for heavily-engineered task-specific architectures. The experimental results we have produced demonstrate that our approach can accurately and automatically classify near-miss data contained within safety reports.
The
Declaration of Competing Interest
The authors declared that they have no conflicts of interest to this work. We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.
Acknowledgments
The authors would like to acknowledge the financial support provided by the National Natural Science Foundation of China (Grant No. 71732001, No. 51678265, No. 51978302) and China School Council. The authors would also like to thank the Editor and the four anonymous reviewers for their constructive and insightful comments, which helped improve this manuscript.
References (53)
- et al.
Semantic text classification: A survey of past and recent advances
Inf. Process. Manage.
(2018) - et al.
Big data in the construction industry: A review of present status, opportunities, and future trends
Adv. Eng. Inf.
(2016) - et al.
Using ontology-based text classification to assist Job Hazard Analysis
Adv. Eng. Inf.
(2014) - et al.
Analyses of systems theory for construction accidents prevention with specific reference to OSHA accidents reports
Int. J. Project Manage.
(2013) - et al.
An approach to the use of word embeddings in an opinion classification task
Exp. Syst. Appl.
(2016) - et al.
Computer vision for behaviour-based safety in construction: A review and future directions
Adv. Eng. Inf.
(2020) - et al.
Computer vision applications in construction safety assurance
Autom. Constr.
(2020) - et al.
A Big-Data-based platform of workers' behaviour: Observations from the field
Accid. Anal. Prevent.
(2016) - et al.
A deep hybrid learning model to detect unsafe behavior: integrating convolution neural networks and long short-term memory
Autom. Constr.
(2018) - et al.
Putting into practice error management theory: Unlearning and learning to manage action errors in construction
Appl. Ergon.
(2018)