Predicting software defect type using concept-based classification,Empirical Software Engineering

当前位置： X-MOL 学术 › Empir. Software Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Predicting software defect type using concept-based classification
Empirical Software Engineering ( IF 3.5 ) Pub Date : 2020-02-12 , DOI: 10.1007/s10664-019-09779-6
Sangameshwar Patil , B. Ravindran

Automatically predicting the defect type of a software defect from its description can significantly speed up and improve the software defect management process. A major challenge for the supervised learning based current approaches for this task is the need for labeled training data. Creating such data is an expensive and effort-intensive task requiring domain-specific expertise. In this paper, we propose to circumvent this problem by carrying out concept-based classification (CBC) of software defect reports with help of the Explicit Semantic Analysis (ESA) framework. We first create the concept-based representations of a software defect report and the defect types in the software defect classification scheme by projecting their textual descriptions into a concept-space spanned by the Wikipedia articles. Then, we compute the “semantic” similarity between these concept-based representations and assign the software defect type that has the highest similarity with the defect report. The proposed approach achieves accuracy comparable to the state-of-the-art semi-supervised and active learning approach for this task without requiring labeled training data. Additional advantages of the CBC approach are: (i) unlike the state-of-the-art, it does not need the source code used to fix a software defect, and (ii) it does not suffer from the class-imbalance problem faced by the supervised learning paradigm.

中文翻译：

使用基于概念的分类预测软件缺陷类型

从描述中自动预测软件缺陷的缺陷类型可以显着加快和改进软件缺陷管理过程。针对此任务的基于监督学习的当前方法的一个主要挑战是需要标记的训练数据。创建此类数据是一项昂贵且费力的任务，需要特定领域的专业知识。在本文中，我们建议通过在显式语义分析 (ESA) 框架的帮助下对软件缺陷报告进行基于概念的分类 (CBC) 来规避这个问题。我们首先创建软件缺陷报告和软件缺陷分类方案中的缺陷类型的基于概念的表示，方法是将它们的文本描述投影到维基百科文章跨越的概念空间中。然后，我们计算这些基于概念的表示之间的“语义”相似度，并分配与缺陷报告具有最高相似度的软件缺陷类型。所提出的方法实现了与最先进的半监督和主动学习方法相媲美的准确性，无需标记的训练数据。CBC 方法的其他优点是：(i) 与最先进的技术不同，它不需要用于修复软件缺陷的源代码，以及 (ii) 它不会遇到所面临的类不平衡问题通过监督学习范式。所提出的方法实现了与最先进的半监督和主动学习方法相媲美的准确性，无需标记的训练数据。CBC 方法的其他优点是：(i) 与最先进的技术不同，它不需要用于修复软件缺陷的源代码，以及 (ii) 它不会遇到所面临的类不平衡问题通过监督学习范式。所提出的方法实现了与最先进的半监督和主动学习方法相当的准确性，无需标记的训练数据。CBC 方法的其他优点是：(i) 与最先进的技术不同，它不需要用于修复软件缺陷的源代码，以及 (ii) 它不会遇到所面临的类不平衡问题通过监督学习范式。

更新日期：2020-02-12

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11