当前位置:
X-MOL 学术
›
arXiv.cs.PL
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning to map source code to software vulnerability using code-as-a-graph
arXiv - CS - Programming Languages Pub Date : 2020-06-15 , DOI: arxiv-2006.08614 Sahil Suneja, Yunhui Zheng, Yufan Zhuang, Jim Laredo, Alessandro Morari
arXiv - CS - Programming Languages Pub Date : 2020-06-15 , DOI: arxiv-2006.08614 Sahil Suneja, Yunhui Zheng, Yufan Zhuang, Jim Laredo, Alessandro Morari
We explore the applicability of Graph Neural Networks in learning the nuances
of source code from a security perspective. Specifically, whether signatures of
vulnerabilities in source code can be learned from its graph representation, in
terms of relationships between nodes and edges. We create a pipeline we call
AI4VA, which first encodes a sample source code into a Code Property Graph. The
extracted graph is then vectorized in a manner which preserves its semantic
information. A Gated Graph Neural Network is then trained using several such
graphs to automatically extract templates differentiating the graph of a
vulnerable sample from a healthy one. Our model outperforms static analyzers,
classic machine learning, as well as CNN and RNN-based deep learning models on
two of the three datasets we experiment with. We thus show that a code-as-graph
encoding is more meaningful for vulnerability detection than existing
code-as-photo and linear sequence encoding approaches. (Submitted Oct 2019,
Paper #28, ICST)
中文翻译:
学习使用代码即图将源代码映射到软件漏洞
我们从安全角度探讨了图神经网络在学习源代码细微差别方面的适用性。具体来说,就节点和边之间的关系而言,是否可以从其图表示中学习源代码中漏洞的签名。我们创建了一个名为 AI4VA 的管道,它首先将示例源代码编码为代码属性图。然后以保留其语义信息的方式对提取的图进行矢量化。然后使用几个这样的图来训练门控图神经网络,以自动提取模板来区分脆弱样本的图和健康的图。我们的模型在我们试验的三个数据集中的两个数据集中优于静态分析器、经典机器学习以及基于 CNN 和 RNN 的深度学习模型。因此,我们表明,与现有的代码即照片和线性序列编码方法相比,代码即图编码对于漏洞检测更有意义。(2019 年 10 月提交,论文 #28,ICST)
更新日期:2020-06-17
中文翻译:
学习使用代码即图将源代码映射到软件漏洞
我们从安全角度探讨了图神经网络在学习源代码细微差别方面的适用性。具体来说,就节点和边之间的关系而言,是否可以从其图表示中学习源代码中漏洞的签名。我们创建了一个名为 AI4VA 的管道,它首先将示例源代码编码为代码属性图。然后以保留其语义信息的方式对提取的图进行矢量化。然后使用几个这样的图来训练门控图神经网络,以自动提取模板来区分脆弱样本的图和健康的图。我们的模型在我们试验的三个数据集中的两个数据集中优于静态分析器、经典机器学习以及基于 CNN 和 RNN 的深度学习模型。因此,我们表明,与现有的代码即照片和线性序列编码方法相比,代码即图编码对于漏洞检测更有意义。(2019 年 10 月提交,论文 #28,ICST)