Neural Transfer Learning for Repairing Security Vulnerabilities in C Code,IEEE Transactions on Software Engineering

当前位置： X-MOL 学术 › IEEE Trans. Softw. Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Neural Transfer Learning for Repairing Security Vulnerabilities in C Code
IEEE Transactions on Software Engineering ( IF 7.4 ) Pub Date : 2022-02-01 , DOI: 10.1109/tse.2022.3147265
Zimin Chen ₁ , Steve Kommrusch ₂ , Martin Monperrus ₁

Affiliation

In this paper, we address the problem of automatic repair of software vulnerabilities with deep learning. The major problem with data-driven vulnerability repair is that the few existing datasets of known confirmed vulnerabilities consist of only a few thousand examples. However, training a deep learning model often requires hundreds of thousands of examples. In this work, we leverage the intuition that the bug fixing task and the vulnerability fixing task are related and that the knowledge learned from bug fixes can be transferred to fixing vulnerabilities. In the machine learning community, this technique is called transfer learning. In this paper, we propose an approach for repairing security vulnerabilities named VRepair which is based on transfer learning. VRepair is first trained on a large bug fix corpus and is then tuned on a vulnerability fix dataset, which is an order of magnitude smaller. In our experiments, we show that a model trained only on a bug fix corpus can already fix some vulnerabilities. Then, we demonstrate that transfer learning improves the ability to repair vulnerable C functions. We also show that the transfer learning model performs better than a model trained with a denoising task and fine-tuned on the vulnerability fixing task. To sum up, this paper shows that transfer learning works well for repairing security vulnerabilities in C compared to learning on a small dataset.

中文翻译：

用于修复 C 代码中安全漏洞的神经迁移学习

在本文中，我们解决了利用深度学习自动修复软件漏洞的问题。数据驱动的漏洞修复的主要问题是现有的已知确认漏洞的少数数据集仅包含几千个示例。然而，训练深度学习模型通常需要数十万个示例。在这项工作中，我们利用错误修复任务和漏洞修复任务相关的直觉，并且从错误修复中学到的知识可以转移到修复漏洞中。在机器学习社区中，这种技术被称为迁移学习。在本文中，我们提出了一种基于迁移学习的安全漏洞修复方法 VRepair。VRepair 首先在一个大型错误修复语料库上进行训练，然后在一个更小的数量级的漏洞修复数据集上进行调整。在我们的实验中，我们表明仅在错误修复语料库上训练的模型已经可以修复一些漏洞。然后，我们证明迁移学习提高了修复易受攻击的 C 函数的能力。我们还表明，迁移学习模型的性能优于使用去噪任务训练并针对漏洞修复任务进行微调的模型。总而言之，本文表明，与在小型数据集上学习相比，迁移学习可以很好地修复 C 语言中的安全漏洞。我们证明迁移学习提高了修复易受攻击的 C 函数的能力。我们还表明，迁移学习模型的性能优于使用去噪任务训练并针对漏洞修复任务进行微调的模型。总而言之，本文表明，与在小型数据集上学习相比，迁移学习可以很好地修复 C 语言中的安全漏洞。我们证明迁移学习提高了修复易受攻击的 C 函数的能力。我们还表明，迁移学习模型的性能优于使用去噪任务训练并针对漏洞修复任务进行微调的模型。总而言之，本文表明，与在小型数据集上学习相比，迁移学习可以很好地修复 C 语言中的安全漏洞。

更新日期：2022-02-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>