“FabNER”: information extraction from manufacturing process science domain literature using named entity recognition,Journal of Intelligent Manufacturing

当前位置： X-MOL 学术 › J. Intell. Manuf. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

“FabNER”: information extraction from manufacturing process science domain literature using named entity recognition
Journal of Intelligent Manufacturing ( IF 8.3 ) Pub Date : 2021-06-24 , DOI: 10.1007/s10845-021-01807-x
Aman Kumar , Binil Starly

The number of published manufacturing science digital articles available from scientific journals and the broader web have exponentially increased every year since the 1990s. To assimilate all of this knowledge by a novice engineer or an experienced researcher, requires significant synthesis of the existing knowledge space contained within published material, to find answers to basic and complex queries. Algorithmic approaches through machine learning and specifically Natural Language Processing (NLP) on a domain specific area such as manufacturing, is lacking. One of the significant challenges to analyzing manufacturing vocabulary is the lack of a named entity recognition model that enables algorithms to classify the manufacturing corpus of words under various manufacturing semantic categories. This work presents a supervised machine learning approach to categorize unstructured text from 500K+ manufacturing science related scientific abstracts and labelling them under various manufacturing topic categories. A neural network model using a bidirectional long-short term memory, plus a conditional random field (BiLSTM + CRF) is trained to extract information from manufacturing science abstracts. Our classifier achieves an overall accuracy (f1-score) of 88%, which is quite near to the state-of-the-art performance. Two use case examples are presented that demonstrate the value of the developed NER model as a Technical Language Processing (TLP) workflow on manufacturing science documents. The long term goal is to extract valuable knowledge regarding the connections and relationships between key manufacturing concepts/entities available within millions of manufacturing documents into a structured labeled-property graph data structure that allow for programmatic query and retrieval.

中文翻译：

“FabNER”：使用命名实体识别从制造过程科学领域文献中提取信息

自 1990 年代以来，从科学期刊和更广泛的网络上获得的已发表的制造科学数字文章数量每年都呈指数增长。新手工程师或经验丰富的研究人员要吸收所有这些知识，需要对已发表材料中包含的现有知识空间进行大量综合，以找到基本和复杂查询的答案。缺乏通过机器学习，特别是自然语言处理 (NLP) 在特定领域（如制造业）上的算法方法。分析制造词汇的重大挑战之一是缺乏命名实体识别模型，该模型使算法能够根据各种制造语义类别对制造词汇语料库进行分类。这项工作提出了一种有监督的机器学习方法，可对来自 50 万多篇与制造科学相关的科学摘要的非结构化文本进行分类，并将它们标记为各种制造主题类别。训练使用双向长短期记忆和条件随机场 (BiLSTM + CRF) 的神经网络模型从制造科学摘要中提取信息。我们的分类器实现了 88% 的整体准确率（f1-score），非常接近最先进的性能。两个用例示例展示了开发的 NER 模型作为制造科学文档的技术语言处理 (TLP) 工作流程的价值。

更新日期：2021-06-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>