MPT‐embedding: An unsupervised representation learning of code for software defect prediction,Journal of Software: Evolution and Process

当前位置： X-MOL 学术 › J. Softw. Evol. Process › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

MPT‐embedding: An unsupervised representation learning of code for software defect prediction
Journal of Software: Evolution and Process ( IF 1.7 ) Pub Date : 2020-12-15 , DOI: 10.1002/smr.2330
Ke Shi _{1,

2} , Yang Lu _{1,

3} , Guangliang Liu ₁ , Zhenchun Wei _{1,

3} , Jingfei Chang ₁

Affiliation

Software project defect prediction can help developers allocate debugging resources. Existing software defect prediction models are usually based on machine learning methods, especially deep learning. Deep learning‐based methods tend to build end‐to‐end models that directly use source code‐based abstract syntax trees (ASTs) as input. They do not pay enough attention to the front‐end data representation. In this paper, we propose a new framework to represent source code called multiperspective tree embedding (MPT‐embedding), which is an unsupervised representation learning method. MPT‐embedding parses the nodes of ASTs from multiple perspectives and encodes the structural information of a tree into a vector sequence. Experiments on both cross‐project defect prediction (CPDP) and within‐project defect prediction (WPDP) show that, on average, MPT‐embedding provides improvements over the state‐of‐the‐art method.

中文翻译：

MPT嵌入：用于软件缺陷预测的代码的无监督表示学习

软件项目缺陷预测可以帮助开发人员分配调试资源。现有的软件缺陷预测模型通常基于机器学习方法，尤其是深度学习。基于深度学习的方法倾向于构建直接使用基于源代码的抽象语法树（AST）作为输入的端到端模型。他们对前端数据表示没有给予足够的重视。在本文中，我们提出了一种表示源代码的新框架，称为多透视树嵌入（MPT-embeddding），这是一种无监督的表示学习方法。MPT嵌入从多个角度解析AST的节点，并将树的结构信息编码为向量序列。跨项目缺陷预测（CPDP）和项目内缺陷预测（WPDP）的实验表明，平均而言，

更新日期：2020-12-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文