1 Introduction

Natural hazard-triggered technological accidents are known as Natechs (Showalter and Myers 1992; Cruz et al. 2006; Cruz and Okada 2008). In this study, we are interested in the Natechs that involve facility/equipment damage with the subsequent release of hazardous materials (hazmat). Natechs can cause huge economic losses (Girgin and Krausmann 2016; Krausmann and Salzano 2017) as well as long-term effects on human health and the environment (Krausmann and Cruz 2013). Natechs are generally considered as hazmat release accidents that occur from the impact of a natural hazard on vulnerable industrial infrastructures (for example, storage tanks, fixed facilities, oil drill platforms). As a result, Natechs are more complex, and can have more severe consequences than the triggering natural hazard alone. Therefore, Natechs bring a huge challenge to risk managers.

In this regard, learning from previous Natechs is crucial for risk analysis, assessment, and reduction. Over the past decades, researchers have studied Natech risk assessment methodologies and the vulnerability of storage tanks (Antonioni et al. 2007; Campedel et al. 2008; Antonioni et al. 2009, 2015; Khakzad and Van Gelder 2017, 2018; Khakzad et al. 2018). Meanwhile, many studies attempted to understand the incidence and characteristics of these events by historically analyzing Natechs (Krausmann and Mushtaq 2008; Cruz and Krausmann 2009; Cozzani et al. 2010; Krausmann and Cruz 2013; Girgin and Krausmann 2014, 2016; Kumasaki et al. 2017; Shah et al. 2018). Some of the above studies have used several databases that contain records of chemical release accidents (including Natechs) to support research concerning Natech-related issues, such as the National Response Center (NRC) database of the United States (United States Coast Guard 2017), the eNatech database of the Joint Research Centre of the European Commission (European Commission 2019a), the Failure and ACcidents Technical information System (FACTS) (TNO Industrial and External Safety 2019), the Research and Information on Accidents database (ARIA) (Bureau for Analysis of Industrial Risk and Pollution 2019), and the major accidents reporting systems (eMARS) database (European Commission 2019b), among others. Among these databases, the eNatech is the sole database specifically dedicated to Natechs. However, due to the fact that eNatech contains only few records (just over 60 records), and the collection of accidents is not systematic, this database is not very useful for analyzing Natech incidence over a territory and over time. Therefore, it is necessary to use other available and larger databases for this purpose. However, with larger databases come greater challenges.

Previous studies have retrieved Natechs from the “Description” field in the available databases. To do this, the researchers have to filter appropriate data according to the research purpose. For example, in a study by Krausmann et al. (2011), the authors first extracted records regarding certain types of equipment (for example, storage tanks, pipelines) and their hazmat release cause, and then reviewed the description of the incident field manually for each record to obtain more details regarding the incident causes. In their analysis the authors reviewed several databases. The total number of Natech events identified was just over 1000, which must have taken some time to review by hand. The study of Krausmann et al. (2011) provided valuable insight concerning the extraction of Natech cases from databases, but also highlighted that this task is heavily time consuming, particularly if a larger set of incidents needs to be reviewed. This is the case when we attempt to retrieve a complete set of Natechs from a million-volume level database, such as the complete NRC database.

The NRC database is operated by the United States Coast Guard, receiving and cataloguing all citizens’ reports for potential chemical release incidents. Since its establishment in 1974 (Clow 1980), the NRC database has evolved over the years, receiving approximately 25,000 to 35,000 hazmat release reports per year. The NRC database contains records from 1990 onwards, and it is open and available to the public. Furthermore, these reports, which can be downloaded from the NRC website, are released as separate files that contain all hazmat release reports recorded per year. During the period of 1990 to 2017, the total number of reports exceeded 820,000. According to Girgin and Krausmann (2014), there are over 110,000 records from 1983 to 1989. Compared with other available databases, the NRC database contains by far the largest number of reports about chemical release incidents. Due to the large amount of data, it is almost impossible to retrieve Natechs by checking the description one by one. Moreover, the NRC database is increasing by almost 29,000 records on average per year. This is near to the total size, for example, of the FACTS database, half of the total number of records in the ARIA database, and almost 29-fold the total number of records in the eMARS database. Also the number of potential chemical release incident and accident reports in all available databases per year is increasing. As these databases expand, it will be useful to find a fast and efficient method to retrieve Natechs from large databases for accident analysis.

Improving Natech identification in the NRC database is challenging also because the NRC database has many inherent problems (Girgin and Krausmann 2014). First, there are many different types of reports catalogued in the NRC database, such as: (1) reports of planned hazmat release incidents; (2) accident reports without hazmat release; (3) accident reports with hazmat release unrelated to natural phenomena; and (4) accident reports with hazmat release related to natural phenomena (Natech events). Moreover, there may be several reports on the same incident as well as incomplete or limited details about some incidents altogether. This is in part due to the fact that any citizen can report hazmat release cases to the NRC call center. This results in many records having incomplete information. Due to the lack of knowledge and information, these redundant, complex, and confusing reports bring a huge challenge on Natech identification. In addition, changes to reporting criteria and changes to the data collection and entry forms have resulted in differences in the database itself for different periods (Cruz and Okada 2008; Krausmann and Salzano 2017). For example, since 2003, the NRC database started to include “hurricane” as one of the incident causes in the field of “Incident cause.”

Another problem concerns the uncertainty regarding the natural hazard cause, as the general option “natural phenomena” can also be indicated. Thus, we can deduct that one Natech was caused by a natural phenomenon, but not which particular type of natural hazard. In order to identify these accidents, we would need to review each and every accident description, which, if done manually, would be a monumental task.

One way to identify Natechs is keyword search. A keyword extraction method has been employed to extract Natechs from the NRC database in the study by Sengul et al. (2012). Nonetheless, due to the problems introduced by language expression, it is also a time-consuming work to extract Natechs from the NRC database by checking the description of each incident. For example, using “snow” as a keyword to extract snow-triggered accidents (or “incidents” as reported in the NRC database) from the NRC database, we will get some results as shown in Table 1. All the records in Table 1 were identified using this keyword. However, none of these records were actually caused by snow. One option to avoid having this type of problem is to check every record manually, although this method is practically infeasible for large databases. Consequently, an accurate and efficient automated method of retrieving Natechs from the NRC database is needed.

Table 1 Example records in the National Response Center (NRC) database

For the purpose of extracting Natechs and identifying related natural phenomena from the NRC database, we introduce machine learning theory into this study. A new Natech retrieval framework, a Semi-Intelligent Natech Identification Framework (SINIF), is proposed based on a keyword extraction method and machine learning theory. A total number of 826,078 chemical release reports stored in the NRC database from 1990 to 2017 were analyzed using the proposed SINIF. In this particular study we focused on all the incidents reports involving chemical releases from fixed industrial facilities, storage tanks, processing equipment, on- and offshore platforms, and pipelines. Furthermore, we include in the analysis reports involving releases from mobile sources, such as tanker trucks, vessels, and trains.

In the following section we provide a brief introduction to machine learning, and why we selected it and how it is adapted for the purpose of this study. We then explain in detail the design and architecture of the SINIF. In Sect. 4, we introduce the methodology including data collection, processing, and analysis workflow. The results and discussion are elaborated in Sect. 5, while the key conclusions of this study are presented in Sect. 6.

2 What is Machine Learning?

Machine learning refers to a series of algorithms and statistical models that help researchers understand complex data, especially big data. They are typically developed and implemented on computer systems to perform specific tasks effectively based on pattern recognition and statistical inference (Bishop 2006). Notable applications of machine learning include image classification (Lu and Weng 2007; Sudharshan et al. 2019), regression (Huang et al. 2011; Bashar and Mahmud 2019), and face detection and recognition (Campadelli et al. 2004; Ranjan et al. 2019). Furthermore, another application area of machine learning is Natural Language Processing (NLP). Natural Language Processing attempts to make computers understand the natural language of human beings in order to tackle and complete useful tasks (Chowdhury 2003), such as document classification (Manevitz and Yousef 2001; Rubin et al. 2012), email spam filtering (Blanzieri and Bryl 2008; Diale et al. 2019), and so forth. Machine learning, in particular, has a remarkable performance record on solving problems of text multi-classification (Le and Mikolov 2014; Sboev et al. 2016; Sotthisopha and Vateekul 2018). In brief, text multi-classification methods attempt to classify series of long text or sentences into more than two categories depending on the specific purpose, such as content recognition and emotion recognition.

According to Murphy (2012), machine learning is generally divided into two principal types: supervised learning and unsupervised learning. Both approaches dictate that the models have to be initially trained in order to make the computer system “learn” or “understand” the characteristics of the input data so that these machine learning models can be used to evaluate the target dataset. However, the difference between supervised and unsupervised learning is that the former asks whether the training data contains any predetermined labels, while the latter has no such requirement. Therefore, supervised learning is better suited to perform tasks characterized by an expected and clearly defined output, such as classification or regression problems. On the other hand, unsupervised learning is more suitable for solving problems that do not have any a priori output, such as knowledge discovery. By conceptualizing our research aim in the context of machine learning, in essence we attempt to identify the NRC incident reports using certain labels that include either “NotNatech” or the type of the natural hazard that may have caused the chemical release. These labels are then used to evaluate whether the incident fits the description of a Natech or not and what is the associated triggering natural phenomenon. Considering the above characteristics of the two learning types, it would appear that we can employ a supervised deep learning method to achieve this and retrieve Natechs from the NRC database.

The incidents reports recorded in the NRC database do have a field named “Description of incident” to record the incident description from the reporter. The description is organized as a long text that includes information about the cause of the incident, the weather conditions when the incident happened, the affected equipment, whether this report was a duplicate report of the previous record or not, and when the incident happened and so forth. Based on this, it is logical to assume that the records, which can be potentially classified as Natech-related incidents, will probably have similar descriptions. For example, if there are several incident reports that can be potentially classified as hurricane-triggered Natech accidents, then certain defined keywords, such as “hurricane/typhoon,” “heavy rain,” “high speed,” “strong wind,” or the name of hurricane/typhoon, should appear in their respective descriptions. This concept explains the rationale why the keyword extraction method can be used to retrieve Natech-related records from the NRC database. However, such method requires the selected keywords to appear in the description of the analyzed incident report. If the description does not contain the selected keywords, then the analyzed incident report is omitted from being identified as a Natech, unless otherwise verified (such as in the studies of Sengul et al. (2012) or Girgin and Krausmann (2014), in which the authors confirmed whether the report was related to a Natech accident or not based on an analisys of multi-source incident databases). Conversely, when applying a deep learning method to analyze the description of the incidents, each incident description will be transformed into a word bag, which is a multi-set of containing words. These transformed word bags will be analyzed by the selected algorithms, and the content similarity will be determined. This analysis allows to identify whether the descriptions belong to one class or another, and consequently represents the final step in the Natech identification and classification stage.

Following this rationale, the process of extracting Natechs from the NRC database can be conceptualized in the following two steps. First, classifying the incident description into different categories depending on the triggering cause; secondly, filtering out and extracting the records of which the triggering accident cause is related to natural phenomena. Thus, the research purpose of this study, namely extracting Natechs from the NRC database, can be considered as a multi-classification problem. As mentioned above, machine learning algorithms based on supervised learning approaches are more suitable for solving this kind of problem. In order to decide on the appropriate machine learning algorithm for this case, we selected two commonly employed supervised learning algorithms, the Long Short-Term Memory (LSTM) and the Convolutional Neural Network (CNN), to develop the SINIF.

The LSTM was first proposed in 1997 (Hochreiter and Schmidhuber 1997), and is a kind of recurrent neural network (RNN). It was originally developed to address the challenges previous RNN faced with capturing long-term time correlations in the input data. By the time of this study, the LSTM has been widely employed in the areas of speech recognition (Fernández et al. 2007), handwriting recognition (Graves and Schmidhuber 2009), and text classification (Shih et al. 2018).

The CNN was first proposed by Fukushima (1980), who labeled it Neocognitron. A CNN network always has one or more convolutional layers. This method is commonly used in visual imagery. With the advancement of machine learning over the years, the application scope of the CNN has become much broader, coping competently with an array of complex tasks ranging from image recognition (Cireşan et al. 2012) to video analysis and human action recognition (Ji et al. 2013) and text classification (Shin et al. 2018).

3 Design of the Semi-Intelligent Natech Identification Framework (SINIF)

In order to extract Natechs by using machine learning, we designed the SINIF (see Fig. 1). The main idea of the SINIF is that the input data (incident description) will be analyzed and classified by both the keyword extraction method and the network implemented by a machine learning algorithm, separately. In more detail, every record of input data will be classified into various categories according to the triggering cause determined by both the keyword extraction method and the network, separately. In the final step, the category classified by the keyword extraction method will be compared with the category classified by the network for each and every record. If the two automatically generated results do not match for any of the analyzed records, then the specific record will be manually checked and identified by the researchers. Following the above idea, five procedures are designed in the SINIF and described as follows:

Fig. 1
figure 1

Structure of the Semi-Intelligent Natech Identification Framework (SINIF)

  1. (1)

    Keyword extraction analysis The input data will be analyzed by the keyword extraction method in this step. According to the keyword identified by researchers, the input data will be classified into different categories.

  2. (2)

    Network training The network will be built based on the implementation of the machine learning algorithm selected by researchers in this step. The network will be trained by a set of sample data that was generated from the initial input dataset.

  3. (3)

    Network analysis The trained network will be used to analyze input data.

  4. (4)

    Data comparison A scan to check if the cause identified by the keyword extraction method matches the cause identified by the network will be conducted in this step. If the two automatically identified causes do not match with each other, researchers need to inspect the accident description and identify the triggering cause based on their own knowledge and understanding.

  5. (5)

    Output results The Natech reports will be separated from other hazmat release incidents reports on the basis of whether the triggering cause is related to a natural phenomenon, and if so, they will be further categorized according to the type of natural hazard.

The network itself can be based on any of the machine learning algorithms available. As mentioned in Sect. 2, we used the LSTM and CNN to develop the SINIF. Researchers can choose a suitable machine learning algorithm to build their own SINIF according to their research purpose. More details about the workflow of the SINIF are explained in Sect. 4.1.

4 Methodology

The methodology for this study includes the review of the relevant literature and data collection from the NRC database. All NRC data during 1990 to 2017 were downloaded in the form of Excel files directly from the website of NRC (United States Coast Guard 2017). Each file contains several sheets with the official information (such as report time, call type, responsible company, and so on), incident information (such as description, time, location, cause, if it was a planned event, and so on), hazmat information (whether there was hazmat release, type of hazmat, amount of release, equipment type, and so on), weather conditions, and other information.

In this section, we explain the data processing method, the workflow of the SINIF, and the indices that are used to measure the performance of machine learning algorithms.

4.1 Data Processing and Analysis

The NRC data were analyzed according to the following steps, the summary of which can be considered as the workflow of SINIF:

  1. (1)

    The fields of “SEQNOS,” “Description of incident,” and “Incident cause” of all reports with a total number of 826,078 records between 1990 and 2017 in the NRC database were gathered into the total sample set named U.

  2. (2)

    It should be noted that a preliminary inspection of the sample set U led to an important observation related to this next step. In more detail, it seems that when the NRC call center received reports related to a previously recorded incident, a copy of the original report’s “SEQNOS” ID was also registered in the “Description of incident” field of the new report. Based on this finding, we eliminated 3641 identified duplicate reports from the sample set U by examining the “Description of incident” and “SEQNOS” fields respectively for multiple listings of the same “SEQNOS” ID, merging any repeating or fragmented descriptions of identified duplicate reports into one per incident.

  3. (3)

    Then the field “Description of incident” in U was analyzed through using the keyword extraction method with the aim of identifying and labeling the accident causes based on the same natural phenomenon classification used by Sengul et al. (2012) (that is, Wind, Weather, Unknown, Tornado, Storm, Rain, Lightning, Hurricane, Flood, Earthquake, Cold). Moreover, we decided to include an additional category named “NotNatech” so as to differentiate reports that are not Natech reports. After labeling all reports in set U using the selected keywords, we examined the content of the reports categorized as related to natural phenomena to assess if it was a planned event and whether there was hazmat release, in order to ensure those reports made reference to Natech reports. This process resulted in 32,348 records being identified as Natech reports and having the corresponding keyword label assigned to them, over 790,000 reports being labeled as “NotNatech.”

  4. (4)

    Based on the results of step 2, we manually checked the “Description of incident” of around 15,000 reports tagged with keywords related to natural phenomena and around 8000 reports labeled as “NotNatech” to ensure they had been labeled correctly. Then these two groups of reports were merged as a new sample set named A with 24,060 records.

  5. (5)

    The LSTM and CNN algorithms were employed as the kernel networks to analyze set A, separately.

  6. (6)

    The sample set A was separated into a training set T1 (80% of A) and a testing set T2 (20% of A), so that the networks built in step 5 will be trained by T1 and tested by T2 accordingly. Afterwards, the text data in the T1 dataset were transformed into sequences of word indices to match the requirements for the network training. A crucial point for the comparison of the two algorithms was to ensure that they were trained under the same conditions; therefore, we set the same values for the critical parameters for the two networks. For example, the learning rate was set at 0.01 and the epoch was set at 30.

  7. (7)

    The accuracy of the trained networks was then assessed and verified, and the network with the highest accuracy was selected to analyze the original sample set U.

  8. (8)

    Finally, a comparison of the results of steps 3 and 7 was carried out for each record to determine if the labels matched. If the results differed, the field “Description of incident” of each specific record was manually amended according to the researchers’ interpretation.

4.2 Performance Measurement Indices

For the purpose of measuring the performance of selected machine learning algorithms, we present several indicative measurements to verify the networks (Özgür et al. 2005; Sokolova and Lapalme 2009; Sboev et al. 2016). First, we separated T2 into four subgroups for each individual class Xi shown below, and then we calculated the performance indices according to the size of each group for each individual class Xi.

True positive (TP): the item was labeled as class Xi in T2, and was also labeled as Xi by the network. False positive (FP): the item was not labeled as class Xi in T2, but was labeled as Xi by the network. True negative (TN): the item was not labeled as class Xi in T2, and was not also labeled as Xi by the network. False negative (FN): the item was labeled as class Xi in T2, and was not labeled as Xi by the network.

According to the above identified subgroups, we introduced three main performance indices to evaluate the accuracy of selected deep learning algorithm. The Precision (\(P_{i}\)) calculated by Eq. 1 was used to evaluate the class agreement of data labels with the labels given by the network for each individual class Xi.

$$P_{i} = \frac{{{\text{TP}}_{i} }}{{{\text{TP}}_{i} + {\text{FP}}_{i} }}$$
(1)

The Recall (\(R_{i}\)) in Eq. 2 indicates the effectiveness of selected algorithm to identify given labels for each individual class Xi.

$$R_{i} = \frac{{{\text{TP}}_{i} }}{{{\text{TP}}_{i} + {\text{FN}}_{i} }}$$
(2)

A harmonic average value (\(F_{\beta = 1}^{i}\)) of Precision and Recall was used to assess the efficiency of selected algorithm for class Xi.

$$F_{\beta = 1}^{i} = \frac{{\left( {1 + \beta^{2} } \right)P_{i} R_{i} }}{{\beta^{2} \cdot P_{i} + R_{i} }},\quad \beta = 1$$
(3)

The Macro-average value (\(\bar{F}\)), which is the average value of \(F_{\beta = 1}^{i}\), was calculated by Eq. 4 to indicate the overall effectiveness of selected algorithm; the Accuracy (Acc) based on Eq. 5 was used to evaluate the overall accuracy of the algorithm.

$$\bar{F} = \frac{{\mathop \sum \nolimits_{i = 1}^{n} F_{1}^{i} }}{n}$$
(4)
$${\text{Acc}} = \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {{\text{TP}}_{i} + {\text{TN}}_{i} } \right)}}{{\mathop \sum \nolimits_{i = 1}^{n} \left( {{\text{TP}}_{i} + {\text{FP}}_{i} + {\text{FN}}_{i} + {\text{TN}}_{i} } \right)}}$$
(5)

5 Results and Discussion

In this section, we first focus on the total number of Natechs and the associated natural hazard causes for each Natech report in the NRC database. The analysis was conducted based on data generated through the keyword extraction method. Then, we proceeded to assess which machine learning algorithm of the two—the LSTM and CNN—was more suitable to implement the SINIF in order to retrieve Natech information from the NRC database. Finally, we offer a discussion on the quality of the SINIF classifying results for retrieving Natech data from the NRC database.

5.1 Keyword Extraction Results

According to the results of the keyword extraction method, the total number of Natechs reported to the NRC from 1990 to 2017 is 32,348, which represents 3.93% of all incidents reported to the NRC during that period and corresponds to 2.22% to 7.39% of the total incidents in each year. As shown in Fig. 2a, hurricanes produced the largest number of Natechs (22.73% in total), while storm triggered 20.60%, and an additional 38.71% were attributable to rain, wind, flood, and other undetermined weather hazards. Figure 2b explains the frequency of Natechs caused by various natural phenomena in accordance with the results of the keyword extraction method. The total number of Natechs increased notably after 2005, due to the quantity change of hurricane-related Natechs. Meanwhile, another contributing factor is that hurricanes started to be registered as a separate triggering hazard from 2003 in the NRC database. Furthermore, it becomes apparent from Fig. 2a that using the keyword extraction method to identify the triggering natural hazard of Natechs has some issues, because there were 130 Natechs per year on average labeled as “Unkown.”

Fig. 2
figure 2

Number of Natech reports with various natural phenomena (a). Number of Natech reports associated with various natural phenomena according to the keyword extraction results (b). Note: the black line is the average number of Natech reports

Although the results showed rates similar to those from the research of Sengul et al. (2012) considering the proportion of Natechs in all NRC incident reports for the study period, the triggering natural hazards demonstrate significant differences. The differences between these two results can probably be attributed to the following two aspects. First, the reports analyzed in this study cover a period from 1990 to 2017 instead of the period from 1990 to 2008, which was used in the original study of Sengul et al. (2012). Second, as mentioned in the introduction, due to the language expression, there will be a large bias if we only use the keyword retrieval method to analyze the NRC database without checking the description of an incident or checking other incident reports database. As a result, we can confirm that it is quite difficult to achieve the research target of this study through analyzing data by solely using the keyword extraction method.

5.2 Efficiency Analysis of Machine Learning Algorithms

In this study, we introduced machine learning to retrieve Natechs and identify their triggering natural hazard. We analyzed the training set and testing set by implementing two basic machine learning algorithms, the LSTM and the CNN, with the purpose of checking which algorithm is more suitable to achieve the research target. We then plotted the heat maps of the confusion matrix for each algorithm as shown in Fig. 3.

Fig. 3
figure 3

Confusion matrix of classification results of the Long Short-Term Memory (LSTM) (a). Confusion matrix of classification results of the convolutional neural network (CNN) (b)

Each cell in the confusion matrix represents the number of records the network misclassified into the classes of their respective columns, when they should have been placed into the classes indicated by their rows. Thus, the confusion matrix describes the distribution of the predicted classes in respect to the classes that have been identified by the keyword extraction method.

Through the analysis of the confusion matrices, it becomes apparent that it is not easy to extract Natechs and identify their natural hazard causes by using only the machine learning method. Figure 3 shows that both the LSTM and the CNN are capable of identifying the probable triggering natural hazards for most of the Natech records. However, due to the ambiguous descriptions of different natural hazards, there were certain biases in the Natechs triggered by hurricanes, storms, rain, or other weather-related causes. These findings are probably stemming from the fact that, for an untrained observer, the above natural hazards are induced by seemingly similar meteorological phenomena. Therefore, the vocabulary and expressions used to describe such accidents are often confusingly similar among Natechs. In addition, due to the same reason, some of the accidents, which were not triggered by natural hazards, were easily mistaken as Natechs by the networks. These ambiguities pose another set of challenges to this research endeavor, making it very difficult to accomplish through the use of machine learning or keyword extraction alone. In this context and following the discussion in the introduction, the development and implementation of the SINIF can be considered a potential method to retrieve Natechs and identify the natural hazard cause successfully.

As explained in Sect. 3, we set out to find a suitable machine learning algorithm as the kernel network upon which the SINIF will be implemented. We calculated the indices explained in Sect. 4.2 in order to assess which machine learning algorithm is more suitable to accomplish the main task of this study. Table 2 presents the results of these performance indices for each algorithm respectively. As the table shows, both the LSTM and the CNN scored differently depending on the index in each of the categories. In terms of precision indices for different categories, the LSTM could retrieve “NotNatech” and additionally identify the natural hazard cause more accurately than the CNN on Natechs triggered by flood, hurricane, lightning, rain, storm, tornado, weather, or wind. On the other hand, from the viewpoint of recall indices, the CNN could retrieve more “NotNatech” than the LSTM. As a comprehensive index of precision and recall, the harmonic average (\(F_{\beta = 1}^{i}\)) index shows that the LSTM is superior to the CNN, a difference ranging from 0.03 up to 0.26 in favor of the LSTM was observed across all categories (Table 2). Furthermore, we calculated the macro-average value and accuracy for both algorithms under examination. The results show that the LSTM managed to retrieve 94.28% Natechs from the testing set, while the CNN only 85.64%. A larger macro-average value (0.8979) meant that the LSTM was more accurate than its competitor algorithm, which achieved a value of just 0.7497. The latter two indices emphatically show that the LSTM performed better in general and thus, is more suitable than the CNN in achieving the research target of this study. Moreover, the accuracy values of the LSTM and the CNN show that machine learning theory and the proposed SINIF in particular are more than capable in retrieving Natechs and identifying the associated natural hazards.

Table 2 Performance indices for each potential natural hazard cause determined by the machine learning algorithms

There are two key reasons explaining the effectiveness of the LSTM over the CNN in extracting Natechs from the NRC database. First, the NRC data were organized as time series data in this study. This is supported by LeCun and Bengio (1997), who noted that CNN is not as competent as the recurrent neural networks (RNN) in analyzing time series data, while the LSTM is one advanced type of RNN. Second, the aim of this study has been framed as a kind of text multi-classification problem, and as LeCun et al. (2015) suggested, the LSTM algorithm is better suited to solve text-learning problems.

5.3 Semi-Intelligent Natech Identification Framework (SINIF) Results

The arguments supporting why LSTM is more suitable than CNN in designing the identification framework for the extraction of Natechs from the NRC database were presented above. After developing the SINIF and preparing the input dataset by removing any duplicates, the remaining NRC records between 1990 and 2017 (822,437 reports in total) were examined by the trained LSTM network. Then, they were classified either according to the triggering natural hazard, receiving a corresponding label, or as “NotNatech” if the hazmat release report was not caused by a natural hazard. The classification results of the LSTM network were then compared with the results of the keyword extraction method. There were 791,085 records successfully tagged either as “NotNatech” or with the corresponding label using the LSTM network; the keyword extraction method yielded the same record number. Of these, 25,467 records were classified as Natech reports. They were grouped as “certain Natech reports” set \(R_{\text{C}}\). In contrast, the other 31,352 records did not obtain matching labels from the two methods, and thus formed the group “uncertain Natech reports” set \(R_{\text{U}}\). After manually rechecking the field “Description of incident” and “Incident cause” for each record of \(R_{\text{U}}\), an extra 7444 accidents were identified and added to the final set of Natech reports \(R_{\text{N}}\).

Although we attempted to reduce duplicate data and succeeded to a certain extent on step 2, following the method described in Sect. 4.1, there may still be other types of duplicate reports within \(R_{\text{N}}\).

The NRC call center receives and catalogues hazmat-release reports made by any citizens in the United States. Due to the lack of information and multiple report sources, the NRC call center could not ascertain whether these reports were related to an already registered report or not. For the propose of reducing this kind of duplicate reports, we had to assume that all the information for the chemical release incidents recorded in the NRC database is correct, while the remaining doubt pertains to whether they were actually related to a single incident or not. Based on this, checking the location, address, incident date-time, and incident type presented a solution to sort and remove such duplicates in this study. The idea for this additional confirmatory step was to transform the address or location information to specific geographic coordinates, and then manually check the coordinate information, incident type, incident description, and other fields of reports that occurred at around the same time to ensure that they do not refer to the same Natech event.

Thus, in the first step we collected the location information of every accident mentioned in each report of the \(R_{\text{N}}\) set. For some reports in \(R_{\text{N}}\) (6014 reports), the information of latitude and longitude had been registered already. However, the latitude and longitude information of the rest 26,897 reports in \(R_{\text{N}}\) was missing; only the address, street, county, and state information had been collected. With the aim of updating the missing location information of those reports, we re-organized the address, street, county, and state information of each report in a specific format according to the requirements of the Places API.Footnote 1 This service is provided by the Google Cloud Platform to help map developers obtain geographic coordinates by using address information. Through searching with the Places API service, the missing location information of the reports in \(R_{\text{N}}\) were updated.

Second, the reports in \(R_{\text{N}}\) were grouped based on the incident date-time. A total of 3734 reports contained duplicate records, while there were 1461 groups of date-time with multiple reports related to Natech events. As a result, the incident description, incident type, and release materials for each record of the 3734 reports were manually checked to determine whether it was a duplicate report or not. The rest 29,177 reports of \(R_{\text{N}}\) were thus considered to refer to a single Natech event respectively due to the differences they exhibited in the date-time, even in cases where they may have been triggered by a single natural hazard phenomenon with a wide area of effect (such as a hurricane or a mudslide). An example to illustrate this follows: 20 incidents appear to have happened on 16 September 2004 at 12:00, and at the same location (38.19 N, 86.11 W). Moreover, all of these reports had quite similar description and the same type of fixed facility. Examining the above listed evidence, these 20 reports were considered duplicate records of a single incident and merged into one record. Following the above steps, 70 reports were identified as duplicate records, 84.29% of which were related to hurricanes and the rest to mudslide, cold, and storm. Remarkably, this additional confirmatory step to remove the duplicate records revealed that it is indeed quite probable for the NRC call center to receive multiple reports of a single incident in the case of natural hazards with a wide area of effect.

In summary, a total of 32,841 Natech reports was retrieved using the SINIF (examples are shown in Table 3). On average, there have been 1173 Natech reports per year submitted to the NCR database during the period from 1990 to 2017. These Natech reports comprise 3.98% of all incident reports in the NRC database, and between 2.26% and 7.27% of the total reports each year. Figure 4a shows that hurricanes were mentioned in the largest number of Natech reports (24.42%), while 19.27% and 18.29% of the Natech reports included references of “rain” and “storm” respectively. An additional 36.26% was attributable to wind, flood, cold, tornado, and other weather-related events. Figure 4b shows that the number of Natech reports has an upward trend from 1990 to 2017. Apart from 2016, the number of Natech reports has exceeded the average value after 2005, 2008, and 2012 respectively, which coincides with a higher report count of hurricane-induced Natechs. The number of hurricane-triggered Natech reports increased sharply and fluctuated drastically after 2003, whereas the frequencies of rain, storm, flood, and wind caused Natech reports remained relatively stable for the same time frame.

Table 3 Examples of National Response Center (NRC) reports identified by the Semi-Intelligent Natech Identification Framework (SINIF)
Fig. 4
figure 4

Number of Natech reports with various natural phenomena (a). Number of Natech reports associated with various natural phenomena according to SINIF results (b). Note: the black line is the average number of Natech reports

During the process of comparing the results of the SINIF with those from the keyword extraction method alone, Natech reports involving snow, heat, tide, rough sea, and mudslides were also retrieved. Such records comprised 3.01% of the total number of the Natech reports in the NRC database during the study period. This finding shows that the accuracy of the keyword extraction method is directly affected by the selection of keywords. If the selected keywords do not explicitly include terms that describe the potential natural hazard types, the process is highly probable to miss reports of Natechs triggered by natural hazards not defined in advance. In other words, the system has no information about what to look for initially, and so fails to identify the triggering natural hazards. However, due to the comparison function embedded in SINIF’s design, the framework is able to identify potential natural hazards, which might otherwise have been omitted by the keyword extraction method alone. By complementarily employing a deep learning method through the neural networks embedded in its structure, the SINIF actually attempts to identify the underlying patterns in the descriptions of the records in order to determine the triggering natural hazard, instead of seeking predefined terms in the text. Therefore, the SINIF seems to hold an advantage over the sole application of the keyword extraction method in this regard, as it offers an added identification and verification step.

Another point to note is the distribution of Natech reports with regard to various climatic changes and natural phenomena. Generally, the distribution patterns generated by the SINIF seem more reasonable than the ones from the keyword extraction method in the sense that they align with certain observed environmental tendencies. Several researchers have noted a dramatic rise in terms of intensity, size, duration, and frequency of hurricanes around the Atlantic Ocean since the early 1980s, which could be attributed to climate change (Emanuel 2007; Knutson et al. 2010; Torn and Snyder 2012; Landsea and Franklin 2013). It is not unreasonable to assume, consequently, that hurricanes and hurricane-related natural hazards, such as storms, rains, and floods, are becoming increasingly likely to trigger Natechs, and that this tendency will continue in the future. Figure 4b suggests that the number of Natech reports related to hurricanes grew remarkably since 2003, with notable spikes in 2005 when Rita, Wilma, and Katrina struck, 2008 with Ike, 2012 with Sandy, and 2017 when Harvey, Maria, and Irma occurred. Indeed, hurricane-related Natech reports increased 16-fold since the year 2003, thereby supporting this alarming trend. Apart from the above, the terms most commonly used in the incident descriptions were plotted in the form of a word cloud (Fig. 5), in order to evaluate their frequency throughout the NRC records. As expected, hurricane, storm, and rain were almost always mentioned in the descriptions, they are indeed the most reported triggering cause as evidenced by Fig. 5. This means that the majority of Natech reports throughout the period under investigation were probably induced by such weather-related causes. The fact that the SINIF demonstrated satisfactory competency in handling the practical issues and thus bringing into light these findings, serves as another example to support that it is a more suitable and accurate tool compared to the sole application of the keyword extraction method in retrieving Natech reports and identifying their associated natural hazards.

Fig. 5
figure 5

Word cloud map. Note: Among the terms, size increases proportionally to the frequency of usage

6 Conclusion

This study developed a Semi-Intelligent Natech Identification Framework (SINIF) to retrieve Natech related reports and identify the associated triggering natural hazards from the NRC database. The incident reports recorded in the NRC database between 1990 and 2017 were analyzed through the keyword extraction method and the SINIF, separately. The comparison of the results for the keyword extraction method alone and the SINIF indicates that the latter is a more suitable analytical technique to achieve accurately and efficiently the research purpose, namely the extraction of Natechs from a large database (the NRC database in this study). First, compared to using the keyword extraction method alone, keyword selection in the SINIF plays by default a smaller role in determining the extraction results, since it is combined with an additional structure of a deep learning network. Second, the SINIF’s results are more reasonable compared to those from the keyword extraction method alone. Third, the SINIF is capable of analyzing large databases with comparatively higher accuracy than the alternative method of keyword extraction.

According to the results of the SINIF, 3.98% of hazmat release accidents were identified as Natech-related reports between 1990 and 2017. Furthermore, according to the retrieved NRC reports, the majority of Natech reports (98.24%) were mentioned as related to meteorological phenomena, while hurricanes (24.42%), heavy rains (19.27%), and storms (18.29%) were detected as the main causes of Natech reports. As evidenced by the findings of this study, the observed rising overall number of Natech reports since 2004 can be primarily attributed to the increased occurrence of hurricane-triggered Natechs. The frequency and severity of extreme weather-related events, such as hurricanes, might rise due to climate change. In turn, such an increase in these parameters would suggest that Natech reports could become more likely to happen in the future. However, due to the uncertainty of Natech events, we should be careful to conclude that the increasing trend on the number of Natech reports or events could be attributed to climate change. But the findings of this study points to an interesting research direction on whether climate change has an impact on the incidence of Natech event, and how it works.

As with any data extraction method, the human factor poses significant limitations to the SINIF. Arguably, its framework is very dependent on the keyword extraction method by design. The fact that this technique relies so heavily on the initial selection of searching terms from the researchers, as demonstrated in this study as well, introduces the human error in this machine learning framework. Even the network analysis in steps 4 through 7 and the comparison in step 8 are incapable of completely eliminating such effects. According to the workflow of the SINIF, in order to retrieve the target records from large databases, researchers must input “reasonable” search keywords—terms that they estimate adequately describe the target incidents—into the SINIF. However, no matter how carefully the researchers select keywords, the risk that the selected keywords might not cover all the potential Natech reports still exists unless they manually check millions of chemical release records within the large database. Furthermore, if the researchers’ selected keywords are not included in the actual incident descriptions within the dataset, such records may not be selected to be added to the training data in the following step, because the SINIF has no way of assessing the keywords themselves. This may even result in a number of Natech reports within the database not being correctly identified as such. Another shortcoming of the SINIF is related to the selection of machine learning algorithm, which is implemented as the kernel network. As demonstrated by our analysis, various algorithms can produce starkly different results in classifying the descriptions of accidents. Therefore, we recommend researchers who want to apply the SINIF to assess the respective advantages and disadvantages of each kernel algorithm beforehand. In our case, the LSTM proved more suitable than the CNN to build the SINIF for retrieving Natech reports from the NRC database. The last weakness of the SINIF is the high requirements of technical expertise in computer science. The users should have a basic understanding of the machine learning algorithm they select in order to modify accordingly the parameters of the kernel network, and assure the accuracy of their results. In other words, the SINIF is not quite user-friendly yet.

Despite the aforementioned disadvantages of the SINIF, there are also significant benefits for researchers dealing with technological accidents. This endeavor set out to develop a machine learning algorithm for the extraction of Natech related reports from the NRC database and the identification of their triggering natural hazard causes. However, an important asset of this SINIF is its versatility. Naturally, this framework can be applied to analyze a wide array of databases, such as the FACTS or eMARS. Minor modifications may be necessary to take into account the different field names used in each database, though. But apart from employing the SINIF in order to accomplish the same task with different initial datasets, the algorithm can easily be used by researchers to focus on specific categories of Natechs, or even to examine other aspects of technological accidents depending on the information available in the dataset. In addition, because of its high compatibility, users can implement the SINIF based on any of the existing machine learning algorithms to improve the accuracy of the extraction results.