当前位置: X-MOL 学术arXiv.cs.PL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning to map source code to software vulnerability using code-as-a-graph
arXiv - CS - Programming Languages Pub Date : 2020-06-15 , DOI: arxiv-2006.08614
Sahil Suneja, Yunhui Zheng, Yufan Zhuang, Jim Laredo, Alessandro Morari

We explore the applicability of Graph Neural Networks in learning the nuances of source code from a security perspective. Specifically, whether signatures of vulnerabilities in source code can be learned from its graph representation, in terms of relationships between nodes and edges. We create a pipeline we call AI4VA, which first encodes a sample source code into a Code Property Graph. The extracted graph is then vectorized in a manner which preserves its semantic information. A Gated Graph Neural Network is then trained using several such graphs to automatically extract templates differentiating the graph of a vulnerable sample from a healthy one. Our model outperforms static analyzers, classic machine learning, as well as CNN and RNN-based deep learning models on two of the three datasets we experiment with. We thus show that a code-as-graph encoding is more meaningful for vulnerability detection than existing code-as-photo and linear sequence encoding approaches. (Submitted Oct 2019, Paper #28, ICST)

中文翻译:

学习使用代码即图将源代码映射到软件漏洞

我们从安全角度探讨了图神经网络在学习源代码细微差别方面的适用性。具体来说,就节点和边之间的关系而言,是否可以从其图表示中学习源代码中漏洞的签名。我们创建了一个名为 AI4VA 的管道,它首先将示例源代码编码为代码属性图。然后以保留其语义信息的方式对提取的图进行矢量化。然后使用几个这样的图来训练门控图神经网络,以自动提取模板来区分脆弱样本的图和健康的图。我们的模型在我们试验的三个数据集中的两个数据集中优于静态分析器、经典机器学习以及基于 CNN 和 RNN 的深度学习模型。因此,我们表明,与现有的代码即照片和线性序列编码方法相比,代码即图编码对于漏洞检测更有意义。(2019 年 10 月提交,论文 #28,ICST)
更新日期:2020-06-17
down
wechat
bug