Format-aware learn&fuzz: deep test data generation for efficient fuzzing,Neural Computing and Applications

当前位置： X-MOL 学术 › Neural Comput. & Applic. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Format-aware learn&fuzz: deep test data generation for efficient fuzzing
Neural Computing and Applications ( IF 6 ) Pub Date : 2020-06-13 , DOI: 10.1007/s00521-020-05039-7
Morteza Zakeri Nasrabadi , Saeed Parsa , Akram Kalaee

Appropriate test data are a crucial factor to succeed in fuzz testing. Most of the real-world applications, however, accept complex structure inputs containing data surrounded by meta-data which is processed in several stages comprising of the parsing and rendering (execution). The complex structure of some input files makes it difficult to generate efficient test data automatically. The success of deep learning to cope with complex tasks, specifically generative tasks, has motivated us to exploit it in the context of test data generation for complicated structures such as PDF files. In this respect, a neural language model (NLM) based on deep recurrent neural networks (RNNs) is used to learn the structure of complex inputs. To target both the parsing and rendering steps of the software under test (SUT), our approach generates new test data while distinguishing between data and meta-data that significantly improve the input fuzzing. To assess the proposed approach, we have developed a modular file format fuzzer, IUST-DeepFuzz. Our experimental results demonstrate the relatively high coverage of MuPDF code by our proposed fuzzer, IUST-DeepFuzz, in comparison with the state-of-the-art tools such as learn&fuzz, AFL, Augmented-AFL, and random fuzzing. In summary, our experiments with many deep learning models revealed the fact that the simpler the deep learning models applied to generate test data, the higher the code coverage of the software under test will be.

中文翻译：

格式感知的学习与模糊：深度测试数据生成，实现有效的模糊

适当的测试数据是成功进行模糊测试的关键因素。但是，大多数实际应用程序都接受复杂的结构输入，其中包含由元数据包围的数据，这些数据在包括解析和渲染（执行）在内的多个阶段中进行处理。一些输入文件的复杂结构使自动生成高效的测试数据变得困难。深度学习成功地处理了复杂的任务，特别是生成任务，促使我们在生成复杂数据（例如PDF文件）的测试数据的上下文中利用它。在这方面，基于深度递归神经网络（RNN）的神经语言模型（NLM）用于学习复杂输入的结构。为了同时针对被测软件（SUT）的解析和渲染步骤，我们的方法生成新的测试数据，同时区分数据和元数据，从而显着改善输入的模糊性。为了评估建议的方法，我们开发了一个模块化文件格式模糊器，IUST-DeepFuzz。我们的实验结果表明，与最新的工具（如学习与模糊，AFL，Augmented-AFL和随机模糊）相比，我们提出的模糊器IUST-DeepFuzz对MuPDF代码的覆盖率相对较高。总而言之，我们在许多深度学习模型上进行的实验揭示了这样一个事实，即用于生成测试数据的深度学习模型越简单，被测软件的代码覆盖率就越高。

更新日期：2020-06-13

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>