Evaluating and comparing memory error vulnerability detectors,Information and Software Technology

当前位置： X-MOL 学术 › Inf. Softw. Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Evaluating and comparing memory error vulnerability detectors
Information and Software Technology ( IF 3.8 ) Pub Date : 2021-05-07 , DOI: 10.1016/j.infsof.2021.106614
Yu Nong , Haipeng Cai , Pengfei Ye , Li Li , Feng Chen

Context:

Memory error vulnerabilities have been consequential and several well-known, open-source memory error vulnerability detectors exist, built on static and/or dynamic code analysis. Yet there is a lack of assessment of such detectors based on rigorous, quantitative accuracy and efficiency measures while not being limited to specific application domains.

Objective:

Our study aims to assess and explain the strengths and weaknesses of state-of-the-art memory error vulnerability detectors based on static and/or dynamic code analysis, so as to inform tool selection by practitioners and future design of better detectors by researchers and tool developers.

Method:

We empirically evaluated and compared five state-of-the-art memory error vulnerability detectors against two benchmark datasets of 520 and 474 C/C++ programs, respectively. We conducted case studies to gain in-depth explanations of successes and failures of individual tools.

Results:

While generally fast, these detectors had largely varied accuracy across different vulnerability categories and moderate overall accuracy. Complex code (e.g., deep loops and recursions) and data (e.g., deeply embedded linked lists) structures appeared to be common, major barriers. Hybrid analysis did not always outperform purely static or dynamic analysis for memory error vulnerability detection. Yet the evaluation results were noticeably different between the two datasets used. Our case studies further explained the performance variations among these detectors and enabled additional actionable insights and recommendations for improvements.

Conclusion:

There was no single most effective tool among the five studied. For future research, integrating different techniques is a promising direction, yet simply combining different classes of code analysis (e.g., static and dynamic) may not. For practitioners to choose right tools, making various tradeoffs (e.g., between precision and recall) might be inevitable.

中文翻译：

评估和比较内存错误漏洞检测器

语境：

内存错误漏洞是必然的，并且存在几个基于静态和/或动态代码分析的知名开源内存错误漏洞检测器。然而，缺乏基于严格的，定量的准确性和效率测量来对这种检测器的评估，而并不限于特定的应用领域。

客观的：

我们的研究旨在评估和解释基于静态和/或动态代码分析的最新内存错误漏洞检测器的优缺点，从而为从业人员选择工具和为研究人员和将来设计更好的检测器提供参考工具开发人员。

方法：

我们根据经验评估了五个最先进的内存错误漏洞检测器，并分别与520个和474个C / C ++程序的两个基准数据集进行了比较。我们进行了案例研究，以深入了解各个工具的成败。

结果：

尽管这些检测器通常速度很快，但它们在不同漏洞类别中的准确度差别很大，而总体准确度却中等。复杂的代码（例如，深层循环和递归）和数据（例如，深层嵌入的链表）结构似乎是常见的主要障碍。对于内存错误漏洞检测，混合分析并不总是优于纯静态或动态分析。但是，所使用的两个数据集之间的评估结果明显不同。我们的案例研究进一步解释了这些检测器之间的性能差异，并提供了其他可行的见解和改进建议。