当前位置: X-MOL 学术Secur. Commun. Netw. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An Automatic Source Code Vulnerability Detection Approach Based on KELM
Security and Communication Networks Pub Date : 2021-06-16 , DOI: 10.1155/2021/5566423
Gaigai Tang 1, 2 , Lin Yang 2 , Shuangyin Ren 2 , Lianxiao Meng 1, 2 , Feng Yang 2 , Huiqiang Wang 1
Affiliation  

Traditional vulnerability detection mostly ran on rules or source code similarity with manually defined vulnerability features. In fact, these vulnerability rules or features are difficult to be defined accurately, which usually cost much expert labor and perform weakly in practical applications. To mitigate this issue, researchers introduced neural networks to automatically extract features to improve the intelligence of vulnerability detection. Bidirectional Long Short-term Memory (Bi-LSTM) network has proved a success for software vulnerability detection. However, due to complex context information processing and iterative training mechanism, training cost is heavy for Bi-LSTM. To effectively improve the training efficiency, we proposed to use Extreme Learning Machine (ELM). The training process of ELM is noniterative, so the network training can converge quickly. As ELM usually shows weak precision performance because of its simple network structure, we introduce the kernel method. In the preprocessing of this framework, we introduce doc2vec for vector representation and multilevel symbolization for program symbolization. Experimental results show that doc2vec vector representation brings faster training and better generalizing performance than word2vec. ELM converges much quickly than Bi-LSTM, and the kernel method can effectively improve the precision of ELM while ensuring training efficiency.

中文翻译:

一种基于KELM的源代码漏洞自动检测方法

传统的漏洞检测主要基于规则或源代码与手动定义的漏洞特征的相似性。事实上,这些漏洞规则或特征很难准确定义,通常需要耗费大量专家人力,在实际应用中表现不佳。为了缓解这个问题,研究人员引入了神经网络来自动提取特征,以提高漏洞检测的智能。双向长短期记忆 (Bi-LSTM) 网络已证明在软件漏洞检测方面取得了成功。然而,由于复杂的上下文信息处理和迭代训练机制,Bi-LSTM的训练成本很高。为了有效提高训练效率,我们提出使用极限学习机(ELM)。ELM 的训练过程是非迭代的,所以网络训练可以快速收敛。由于ELM由于其网络结构简单,通常表现出较弱的精度性能,因此我们引入了核方法。在这个框架的预处理中,我们引入了用于向量表示的 doc2vec 和用于程序符号化的多级符号化。实验结果表明,与 word2vec 相比,doc2vec 向量表示带来了更快的训练和更好的泛化性能。ELM 的收敛速度比 Bi-LSTM 快很多,核方法可以在保证训练效率的同时有效提高 ELM 的精度。我们引入了用于向量表示的 doc2vec 和用于程序符号化的多级符号化。实验结果表明,与 word2vec 相比,doc2vec 向量表示带来了更快的训练和更好的泛化性能。ELM 的收敛速度比 Bi-LSTM 快很多,核方法可以在保证训练效率的同时有效提高 ELM 的精度。我们引入了用于向量表示的 doc2vec 和用于程序符号化的多级符号化。实验结果表明,与 word2vec 相比,doc2vec 向量表示带来了更快的训练和更好的泛化性能。ELM 的收敛速度比 Bi-LSTM 快很多,核方法可以在保证训练效率的同时有效提高 ELM 的精度。
更新日期:2021-06-17
down
wechat
bug