Defect Prediction With Semantics and Context Features of Codes Based on Graph Representation Learning,IEEE Transactions on Reliability

当前位置： X-MOL 学术 › IEEE Trans. Reliab. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Defect Prediction With Semantics and Context Features of Codes Based on Graph Representation Learning
IEEE Transactions on Reliability ( IF 5.9 ) Pub Date : 2020-12-10 , DOI: 10.1109/tr.2020.3040191
Jiaxi Xu , Fei Wang , Jun Ai

To optimize the process of software testing and to improve software quality and reliability, many attempts have been made to develop more effective methods for predicting software defects. Previous work on defect prediction has used machine learning and artificial software metrics. Unfortunately, artificial metrics are unable to represent the features of syntactic, semantic, and context information of defective modules. In this article, therefore, we propose a practical approach for identifying software defect patterns via the combination of semantics and context information using abstract syntax tree representation learning. Graph neural networks are also leveraged to capture the latent defect information of defective subtrees, which are pruned based on a fix-inducing change. To validate the proposed approach for predicting defects, we define mining rules based on the GitHub workflow and collect 6052 defects from 307 projects. The experiments indicate that the proposed approach performs better than the state-of-the-art approach and five traditional machine learning baselines. An ablation study shows that the information about code concepts leads to a significant increase in accuracy.

中文翻译：

基于图表示学习的代码语义和上下文特征缺陷预测

为了优化软件测试过程，提高软件质量和可靠性，人们进行了许多尝试来开发更有效的软件缺陷预测方法。之前关于缺陷预测的工作使用了机器学习和人工软件度量。不幸的是，人工度量无法表示缺陷模块的句法、语义和上下文信息的特征。因此，在本文中，我们提出了一种使用抽象语法树表示学习通过语义和上下文信息的组合来识别软件缺陷模式的实用方法。还利用图神经网络来捕获有缺陷的子树的潜在缺陷信息，这些信息根据修复诱导的变化进行修剪。为了验证所提出的预测缺陷的方法，我们基于 GitHub 工作流定义挖掘规则，从 307 个项目中收集 6052 个缺陷。实验表明，所提出的方法比最先进的方法和五个传统的机器学习基线表现更好。消融研究表明，有关代码概念的信息可显着提高准确性。

更新日期：2020-12-10

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>