当前位置: X-MOL 学术J. Comput. Lang. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
PathPair2Vec: An AST path pair-based code representation method for defect prediction
Journal of Computer Languages ( IF 1.7 ) Pub Date : 2020-05-28 , DOI: 10.1016/j.cola.2020.100979
Ke Shi , Yang Lu , Jingfei Chang , Zhen Wei

Software project defect prediction (SDP) can predict the bug probability of software by their features and allocate their testing efforts. The existing software defect prediction methods can be divided into two categories: methods based on traditional handcrafted features and methods based on automatically made abstract features, especially those made by deep learning. The current research indicates that deep learning-based automatic features can achieve better performance than handcrafted features.

Code2vec (Alon et al. 2019) is one of the best source code representation models, which leverages deep learning to learn automatic representations from code. In this paper, inspired by code2vec, we propose a new AST path pair-based source code representation method (PathPair2Vec) and apply it to software project defect prediction. We first propose the concept of the short path to describe each terminal node and its control logic. Then, we design a new sequence encoding method to code the different parts of the terminal node and its control logic. Finally, by pairs of short paths, we describe the semantic information of code and fuse them by an attention mechanism. Experiments on the PROMISE dataset show that our method improves the F1 score by 17.88% over the state-of-the-art SDP method, and the AST path pair-based source code representation can better identify the defect features of the source code.



中文翻译:

PathPair2Vec:用于缺陷预测的基于AST路径对的代码表示方法

软件项目缺陷预测(SDP)可以根据其功能预测软件的错误概率并分配其测试工作。现有的软件缺陷预测方法可以分为两类:基于传统手工特征的方法和基于自动生成的抽象特征的方法,特别是基于深度学习的方法。当前的研究表明,基于深度学习的自动功能比手工制作的功能具有更好的性能。

Code2vec(Alon et al.2019)是最好的源代码表示模型之一,它利用深度学习从代码中学习自动表示。本文在code2vec的启发下,提出了一种新的基于AST路径对的源代码表示方法(PathPair2Vec),并将其应用于软件项目缺陷预测。我们首先提出短路径的概念来描述每个终端节点及其控制逻辑。然后,我们设计了一种新的序列编码方法来对终端节点的不同部分及其控制逻辑进行编码。最后,通过成对的短路径,我们描述了代码的语义信息,并通过注意机制将它们融合在一起。在PROMISE数据集上进行的实验表明,与最新的SDP方法相比,我们的方法将F1得分提高了17.88%,

更新日期:2020-05-28
down
wechat
bug