Automated defect identification via path analysis-based features with transfer learning,Journal of Systems and Software

当前位置： X-MOL 学术 › J. Syst. Softw. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Automated defect identification via path analysis-based features with transfer learning
Journal of Systems and Software ( IF 3.7 ) Pub Date : 2020-08-01 , DOI: 10.1016/j.jss.2020.110585
Yuwei Zhang , Dahai Jin , Ying Xing , Yunzhan Gong

Abstract Recently, artificial intelligence techniques have been widely applied to address various specialized tasks in software engineering, such as code generation, defect identification, and bug repair. Despite the diffuse usage of static analysis tools in automatically detecting potential software defects, developers consider the large number of reported alarms and the expensive cost of manual inspection to be a key barrier to using them in practice. To automate the process of defect identification, researchers utilize machine learning algorithms with a set of hand-engineered features to build classification models for identifying alarms as actionable or unactionable. However, traditional features often fail to represent the deep syntactic structure of alarms. To bridge the gap between programs’ syntactic structure and defect identification features, this paper first extracts a set of novel fine-grained features at variable-level, called path-variable characteristic, by applying path analysis techniques in the feature extraction process. We then raise a two-stage transfer learning approach based on our proposed features, called feature ranking-matching based transfer learning, to increase the performance of cross-project defect identification. Our experimental results for eight open-source projects show that the proposed features at variable-level are promising and can yield significant improvement on both within-project and cross-project defect identification.

中文翻译：

通过基于路径分析的特征和迁移学习自动识别缺陷

摘要近年来，人工智能技术已广泛应用于软件工程中的各种专业任务，如代码生成、缺陷识别和错误修复。尽管在自动检测潜在软件缺陷方面广泛使用静态分析工具，但开发人员认为大量报告的警报和人工检查的昂贵成本是在实践中使用它们的主要障碍。为了使缺陷识别过程自动化，研究人员利用机器学习算法和一组手工设计的特征来构建分类模型，以将警报识别为可操作或不可操作。然而，传统的特征往往不能代表警报的深层句法结构。为了弥合程序的句法结构和缺陷识别特征之间的差距，本文首先通过在特征提取过程中应用路径分析技术，在变量级别提取一组新的细粒度特征，称为路径变量特征。然后，我们提出了一种基于我们提出的特征的两阶段转移学习方法，称为基于特征排序匹配的转移学习，以提高跨项目缺陷识别的性能。我们对八个开源项目的实验结果表明，在变量级别上提出的特征很有希望，并且可以在项目内和跨项目缺陷识别方面产生显着的改进。通过在特征提取过程中应用路径分析技术。然后，我们提出了一种基于我们提出的特征的两阶段转移学习方法，称为基于特征排序匹配的转移学习，以提高跨项目缺陷识别的性能。我们对八个开源项目的实验结果表明，在变量级别上提出的特征很有希望，并且可以在项目内和跨项目缺陷识别方面产生显着的改进。通过在特征提取过程中应用路径分析技术。然后，我们提出了一种基于我们提出的特征的两阶段转移学习方法，称为基于特征排序匹配的转移学习，以提高跨项目缺陷识别的性能。我们对八个开源项目的实验结果表明，在变量级别上提出的特征很有希望，并且可以在项目内和跨项目缺陷识别方面产生显着的改进。

更新日期：2020-08-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11