Context-aware sequence labeling for condition information extraction from historical bridge inspection reports,Advanced Engineering Informatics

当前位置： X-MOL 学术 › Adv. Eng. Inform. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Context-aware sequence labeling for condition information extraction from historical bridge inspection reports
Advanced Engineering Informatics ( IF 8.0 ) Pub Date : 2021-06-23 , DOI: 10.1016/j.aei.2021.101333
Tianshu Li , Mohamad Alipour , Devin K. Harris

Effective upkeep of aging infrastructure systems with limited funding and resources calls for efficient bridge management systems. Although data-driven models have been extensively studied in the last decade for extracting knowledge from past experience to guide future maintenance decision making, their performance and usefulness have been limited by the level of detail and accuracy of currently available bridge condition databases. This paper leverages an untapped resource for bridge condition data and proposes a new method to extract condition information from it at a high level of detail. To that end, a natural language processing approach was developed to formalize structural condition knowledge by formulating a sequence labeling task and modeling inspection narratives as a combination of words representing defects, their severity and location, while accounting for the context of each word. The proposed framework employs a deep-learning-based approach and incorporates context-aware components including a bi-directional Long Short Term Memory (LSTM) neural network architecture and a Conditional Random Field (CRF) classifier to account for the context of words when assigning labels. A dependency-based word embedding model was also used to represent the raw text while incorporating both semantic and contextual information. The sequence labeling model was trained using bridge inspection reports collected from the Virginia Department of Transportation bridge inspection database and achieved an F1 score of 94.12% during testing. The proposed model also demonstrated improvements compared with baseline sequence labeling models, and was further used to demonstrate the capability of detecting condition changes with respect to previous inspection records. Results of this study show that the proposed method can be used to extract and create a condition information database that can further assist in developing data-driven bridge management and condition forecasting models, as well as automated bridge inspection systems.

中文翻译：

用于从历史桥梁检查报告中提取状态信息的上下文感知序列标记

在资金和资源有限的情况下，有效维护老化的基础设施系统需要高效的桥梁管理系统。尽管在过去十年中广泛研究了数据驱动模型以从过去的经验中提取知识以指导未来的维护决策，但它们的性能和实用性受到当前可用桥梁状况数据库的详细程度和准确性的限制。本文利用未开发的桥梁状况数据资源，并提出了一种新方法，可以从中提取高度详细的状况信息。为此，开发了一种自然语言处理方法，通过制定序列标记任务并将检查叙述建模为代表缺陷、严重性和位置的单词组合，从而形式化结构条件知识，同时考虑每个单词的上下文。所提出的框架采用基于深度学习的方法，并结合上下文感知组件，包括双向长短期记忆 (LSTM) 神经网络架构和条件随机场 (CRF) 分类器，以在分配时考虑单词的上下文标签。基于依赖的词嵌入模型也用于表示原始文本，同时结合语义和上下文信息。序列标记模型使用从弗吉尼亚州交通部桥梁检查数据库收集的桥梁检查报告进行训练，并在测试期间获得了 94.12% 的 F1 分数。与基线序列标记模型相比，所提出的模型还展示了改进，并进一步用于证明检测相对于先前检查记录的条件变化的能力。这项研究的结果表明，所提出的方法可用于提取和创建状态信息数据库，该数据库可以进一步协助开发数据驱动的桥梁管理和状态预测模型，以及自动桥梁检查系统。

更新日期：2021-06-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11