SeqTrans: Automatic Vulnerability Fix Via Sequence to Sequence Learning,IEEE Transactions on Software Engineering

当前位置： X-MOL 学术 › IEEE Trans. Softw. Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

SeqTrans: Automatic Vulnerability Fix Via Sequence to Sequence Learning
IEEE Transactions on Software Engineering ( IF 6.5 ) Pub Date : 2022-03-07 , DOI: 10.1109/tse.2022.3156637
Jianlei Chi ₁ , Yu Qu ₂ , Ting Liu ₁ , Qinghua Zheng ₁ , Heng Yin ₂

Affiliation

Software vulnerabilities are now reported unprecedentedly due to the recent development of automated vulnerability hunting tools. However, fixing vulnerabilities still mainly depends on programmers’ manual efforts. Developers need to deeply understand the vulnerability and affect the system’s functions as little as possible. In this paper, with the advancement of Neural Machine Translation (NMT) techniques, we provide a novel approach called SeqTrans to exploit historical vulnerability fixes to provide suggestions and automatically fix the source code. To capture the contextual information around the vulnerable code, we propose to leverage data-flow dependencies to construct code sequences and feed them into the state-of-the-art transformer model. The fine-tuning strategy has been introduced to overcome the small sample size problem. We evaluate SeqTrans on a dataset containing 1,282 commits that fix 624 CVEs in 205 Java projects. Results show that the accuracy of SeqTrans outperforms the latest techniques and achieves 23.3% in statement-level fix and 25.3% in CVE-level fix. In the meantime, we look deep inside the result and observe that the NMT model performs very well in certain kinds of vulnerabilities like CWE-287 (Improper Authentication) and CWE-863 (Incorrect Authorization).

中文翻译：

SeqTrans：通过序列到序列学习自动修复漏洞

由于最近自动漏洞搜寻工具的发展，软件漏洞的报告数量前所未有。然而，修复漏洞仍然主要依靠程序员的手动努力。开发人员需要深入了解该漏洞，尽可能减少对系统功能的影响。在本文中，随着神经机器翻译（NMT）技术的进步，我们提供了一种称为 SeqTrans 的新颖方法来利用历史漏洞修复来提供建议并自动修复源代码。为了捕获易受攻击代码周围的上下文信息，我们建议利用数据流依赖性来构建代码序列并将其输入最先进的变压器模型。引入微调策略来克服小样本量问题。我们在包含 1,282 个提交的数据集上评估 SeqTrans，这些提交修复了 205 个 Java 项目中的 624 个 CVE。结果表明，SeqTrans 的准确率优于最新技术，在语句级修复中达到 23.3%，在 CVE 级修复中达到 25.3%。与此同时，我们深入研究结果，发现 NMT 模型在某些类型的漏洞（例如 CWE-287（不正确的身份验证）和 CWE-863（不正确的授权））中表现良好。

更新日期：2022-03-07

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11