Deep Semantic Feature Learning for Software Defect Prediction,IEEE Transactions on Software Engineering

当前位置： X-MOL 学术 › IEEE Trans. Softw. Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Deep Semantic Feature Learning for Software Defect Prediction
IEEE Transactions on Software Engineering ( IF 7.4 ) Pub Date : 2020-12-01 , DOI: 10.1109/tse.2018.2877612
Song Wang , Taiyue Liu , Jaechang Nam , Lin Tan

Software defect prediction, which predicts defective code regions, can assist developers in finding bugs and prioritizing their testing efforts. Traditional defect prediction features often fail to capture the semantic differences between different programs. This degrades the performance of the prediction models built on these traditional features. Thus, the capability to capture the semantics in programs is required to build accurate prediction models. To bridge the gap between semantics and defect prediction features, we propose leveraging a powerful representation-learning algorithm, deep learning, to learn the semantic representations of programs automatically from source code files and code changes. Specifically, we leverage a deep belief network (DBN) to automatically learn semantic features using token vectors extracted from the programs’ abstract syntax trees (AST) (for file-level defect prediction models) and source code changes (for change-level defect prediction models). We examine the effectiveness of our approach on two file-level defect prediction tasks (i.e., file-level within-project defect prediction and file-level cross-project defect prediction) and two change-level defect prediction tasks (i.e., change-level within-project defect prediction and change-level cross-project defect prediction). Our experimental results indicate that the DBN-based semantic features can significantly improve the examined defect prediction tasks. Specifically, the improvements of semantic features against existing traditional features (in F1) range from 2.1 to 41.9 percentage points for file-level within-project defect prediction, from 1.5 to 13.4 percentage points for file-level cross-project defect prediction, from 1.0 to 8.6 percentage points for change-level within-project defect prediction, and from 0.6 to 9.9 percentage points for change-level cross-project defect prediction.

中文翻译：

用于软件缺陷预测的深度语义特征学习

软件缺陷预测可以预测有缺陷的代码区域，可以帮助开发人员找到错误并确定测试工作的优先级。传统的缺陷预测功能往往无法捕捉到不同程序之间的语义差异。这会降低基于这些传统特征的预测模型的性能。因此，需要在程序中捕获语义的能力来构建准确的预测模型。为了弥合语义和缺陷预测功能之间的差距，我们建议利用强大的表示学习算法深度学习，从源代码文件和代码更改中自动学习程序的语义表示。具体来说，我们利用深度信念网络 (DBN) 使用从程序抽象语法树 (AST)（用于文件级缺陷预测模型）和源代码更改（用于更改级缺陷预测模型）中提取的标记向量自动学习语义特征. 我们检查了我们的方法在两个文件级缺陷预测任务（即文件级项目内缺陷预测和文件级跨项目缺陷预测）和两个变更级缺陷预测任务（即变更级缺陷预测）上的有效性项目内缺陷预测和变更级别跨项目缺陷预测）。我们的实验结果表明，基于 DBN 的语义特征可以显着改善所检查的缺陷预测任务。具体来说，语义特征相对于现有传统特征（在 F1 中）的改进范围从 2.1 到 41。

更新日期：2020-12-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>