Identifying causality and contributory factors of pipeline incidents by employing natural language processing and text mining techniques,Process Safety and Environmental Protection

当前位置： X-MOL 学术 › Process Saf. Environ. Prot. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Identifying causality and contributory factors of pipeline incidents by employing natural language processing and text mining techniques
Process Safety and Environmental Protection ( IF 6.9 ) Pub Date : 2021-05-31 , DOI: 10.1016/j.psep.2021.05.036
Guanyang Liu , Mason Boyd , Mengxi Yu , S. Zohra Halim , Noor Quddus

The key to learning from the past incidents is to identify the underlying causes and contributory factors of the incidents. A large amount of text data on incident narratives has been accumulated over the years and can be a good learning source, if properly utilized. However, the vast amount and unstructured nature of the text data impedes generating insights on occurring patterns of incidents. This research sets upon applying natural language processing (NLP) and text mining techniques to utilize the resource for understanding contributing factors and causations behind the incidents with pipeline industry as an illustrative example. The 3587 records of incident narratives of the ‘comment’ section in the incident database of Pipeline and Hazardous Materials Safety Administration (PHMSA) are exploited. Two methods of text analytics, K-means clustering and co-occurrence network, are employed to infer latent causality of incidents. The results demonstrate that both methods are capable of identifying contributing factors under specific failure types. The co-occurrence network approach exhibits advantages on extracting dependency among the contributory factors, while K-means clustering is only able to indicate general correlations. The workflow proposed in this paper provides new perspectives of identifying contributing factors and their causal dependency from incident text data for promising applications in risk analysis and accident modeling.

中文翻译：

采用自然语言处理和文本挖掘技术识别管道事故的因果关系和促成因素

从过去的事件中学习的关键是确定事件的根本原因和促成因素。多年来积累了大量关于事件叙述的文本数据，如果使用得当，可以成为一个很好的学习资源。然而，文本数据的海量和非结构化性质阻碍了对事件发生模式的洞察。本研究以应用自然语言处理 (NLP) 和文本挖掘技术来利用资源来理解事件背后的促成因素和因果关系，以管道行业为例。利用了管道和危险材料安全管理局 (PHMSA) 事件数据库中“评论”部分的 3587 条事件叙述记录。两种文本分析方法，K-means 聚类和共现网络用于推断事件的潜在因果关系。结果表明，这两种方法都能够识别特定故障类型下的影响因素。共现网络方法在提取贡献因素之间的依赖性方面具有优势，而 K-means 聚类只能表明一般相关性。本文提出的工作流程提供了从事件文本数据中识别影响因素及其因果依赖性的新视角，以用于风险分析和事故建模中的有前景的应用。共现网络方法在提取贡献因素之间的依赖性方面具有优势，而 K-means 聚类只能表示一般相关性。本文提出的工作流程提供了从事件文本数据中识别影响因素及其因果依赖性的新视角，以用于风险分析和事故建模中的有前景的应用。共现网络方法在提取贡献因素之间的依赖性方面具有优势，而 K-means 聚类只能表示一般相关性。本文提出的工作流程提供了从事件文本数据中识别影响因素及其因果依赖性的新视角，以用于风险分析和事故建模中的有前景的应用。

更新日期：2021-06-02

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11