当前位置: X-MOL 学术Future Gener. Comput. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Pkg2Vec: Hierarchical package embedding for code authorship attribution
Future Generation Computer Systems ( IF 6.2 ) Pub Date : 2020-10-23 , DOI: 10.1016/j.future.2020.10.020
Roni Mateless , Oren Tsur , Robert Moskovitch

Authorship attribution of software is the task of identifying the author of a given piece of code. Code attribution is of importance in multiple scenarios, ranging from software plagiarism to cybersecurity. In this paper, we introduce authorship attribution of software packages that better reflect real-world scenarios in which code is organized in packages and written by teams. We present a novel approach for software package authorship attribution called Pkg2Vec, based on a hierarchical deep neural network (DNN) architecture, corresponding to the hierarchical nature of software (code) packages. The hierarchical neural network model consists of a token level encoder and an attention mechanism for a function level encoder, together producing package embedding. Beyond package embedding, we use keywords and API calls as resilient features, which reflect the programmer’s intention and style. Pkg2Vec is evaluated on a large dataset of public packages and compared to a number of other source code authorship attribution state-of-the-art algorithms. We find that Pkg2Vec significantly outperforms other approaches, achieving a 13% improvement in accuracy



中文翻译:

Pkg2Vec:嵌入代码作者身份的分层软件包

软件的作者身份归属是确定给定代码段作者的任务。从软件窃到网络安全,代码归因在多种情况下都很重要。在本文中,我们介绍了软件包的作者身份归属可以更好地反映实际情况,在这种情况下,代码以程序包形式组织并由团队编写。我们基于分层的深度神经网络(DNN)体系结构,对应于软件包(代码)的分层性质,提出了一种称为Pkg2Vec的软件包作者归因的新颖方法。分层神经网络模型由令牌级编码器和功能级编码器的注意机制组成,共同产生包嵌入。除了包嵌入之外,我们还使用关键字和API调用作为弹性功能,这些功能反映了程序员的意图和风格。Pkg2Vec是在大型公共软件包数据集上进行评估的,并与许多其他源代码作者身份归属最新技术进行了比较。我们发现Pkg2Vec明显优于其他方法,

更新日期:2020-10-30
down
wechat
bug