当前位置: X-MOL 学术arXiv.cs.PL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
On using distributed representations of source code for the detection of C security vulnerabilities
arXiv - CS - Programming Languages Pub Date : 2021-06-01 , DOI: arxiv-2106.01367
David Coimbra, Sofia Reis, Rui Abreu, Corina Păsăreanu, Hakan Erdogmus

This paper presents an evaluation of the code representation model Code2vec when trained on the task of detecting security vulnerabilities in C source code. We leverage the open-source library astminer to extract path-contexts from the abstract syntax trees of a corpus of labeled C functions. Code2vec is trained on the resulting path-contexts with the task of classifying a function as vulnerable or non-vulnerable. Using the CodeXGLUE benchmark, we show that the accuracy of Code2vec for this task is comparable to simple transformer-based methods such as pre-trained RoBERTa, and outperforms more naive NLP-based methods. We achieved an accuracy of 61.43% while maintaining low computational requirements relative to larger models.

中文翻译:

关于使用源代码的分布式表示来检测 C 安全漏洞

本文介绍了在检测 C 源代码中的安全漏洞任务时对代码表示模型 Code2vec 的评估。我们利用开源库 astminer 从标记的 C 函数语料库的抽象语法树中提取路径上下文。Code2vec 在生成的路径上下文上进行训练,任务是将函数分类为易受攻击或非易受攻击。使用 CodeXGLUE 基准,我们表明 Code2vec 在此任务中的准确性可与简单的基于变换器的方法(如预训练的 RoBERTa)相媲美,并且优于更简单的基于 NLP 的方法。我们实现了 61.43% 的准确率,同时相对于较大的模型保持了较低的计算要求。
更新日期:2021-06-04
down
wechat
bug