Improving Software Bug-Specific Named Entity Recognition with Deep Neural Network,Journal of Systems and Software

当前位置： X-MOL 学术 › J. Syst. Softw. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Improving Software Bug-Specific Named Entity Recognition with Deep Neural Network
Journal of Systems and Software ( IF 3.7 ) Pub Date : 2020-07-01 , DOI: 10.1016/j.jss.2020.110572
Cheng Zhou , Bin Li , Xiaobing Sun

Abstract There is a large volume of bug data in the bug repository, which contains rich bug information. Existing studies on bug data mining mainly rely on using information retrieval (IR) technology to search relevant historical bug reports. These studies basically treat a bug report as a closed unit, ignoring the semantic and structural information within it. Named-entity recognition (NER) is an important task of information extraction (IE) technology. Based on NER, fine-grained factual information could be comprehensively extracted to further form structured data, which provides a new way to improve the accessibility of bug information. However, bug NER is different from general NER tasks. Bug reports are free-form text, which include a mixed language environment studded with code, abbreviations and software-specific vocabularies. In this paper, we propose a deep neural network approach for bug-specific entity recognition called DBNER using bidirectional long short-term memory (LSTM) with Conditional Random Fields decoding model (CRF). DBNER extracts multiple features from the massive bug data and uses attention mechanism to improve the consistency of entity tags in the bug reports. Experiment results show that the F1-score reaches an average of 91.19%. In addition, in cross-project experiments, the DBNER’s F1-score reaches an average of 84%.

中文翻译：

使用深度神经网络改进特定于软件错误的命名实体识别

摘要 Bug存储库中存在大量的Bug数据，其中包含丰富的Bug信息。现有的错误数据挖掘研究主要依靠信息检索（IR）技术来搜索相关的历史错误报告。这些研究基本上将错误报告视为一个封闭的单元，忽略其中的语义和结构信息。命名实体识别（NER）是信息提取（IE）技术的一项重要任务。基于NER，可以综合提取细粒度的事实信息，进一步形成结构化数据，为提高bug信息的可访问性提供了一种新的途径。但是，bug NER 不同于一般的 NER 任务。错误报告是自由格式的文本，其中包括一个混杂着代码、缩写和软件特定词汇表的混合语言环境。在本文中，我们提出了一种深度神经网络方法，称为 DBNER，使用双向长短期记忆 (LSTM) 和条件随机场解码模型 (CRF)，用于特定错误的实体识别。DBNER 从海量 bug 数据中提取多个特征，并使用注意力机制提高 bug 报告中实体标签的一致性。实验结果表明，F1-score平均达到91.19%。此外，在跨项目实验中，DBNER 的 F1-score 平均达到 84%。实验结果表明，F1-score平均达到91.19%。此外，在跨项目实验中，DBNER 的 F1-score 平均达到 84%。实验结果表明，F1-score平均达到91.19%。此外，在跨项目实验中，DBNER 的 F1-score 平均达到 84%。

更新日期：2020-07-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11