Large-Scale Empirical Studies on Effort-Aware Security Vulnerability Prediction Methods,IEEE Transactions on Reliability

当前位置： X-MOL 学术 › IEEE Trans. Reliab. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Large-Scale Empirical Studies on Effort-Aware Security Vulnerability Prediction Methods
IEEE Transactions on Reliability ( IF 5.9 ) Pub Date : 2020-03-01 , DOI: 10.1109/tr.2019.2924932
Xiang Chen , Yingquan Zhao , Zhanqi Cui , Guozhu Meng , Yang Liu , Zan Wang

Security vulnerability prediction (SVP) can identify potential vulnerable modules in advance and then help developers to allocate most of the test resources to these modules. To evaluate the performance of different SVP methods, we should take the security audit and code inspection into account and then consider effort-aware performance measures (such as $ACC$ and $P_{\rm opt}$). However, to the best of our knowledge, the effectiveness of different SVP methods has not been thoroughly investigated in terms of effort-aware performance measures. In this article, we consider 48 different SVP methods, of which 36 are supervised methods and 12 are unsupervised methods. For the supervised methods, we consider 34 software-metric-based methods and two text-mining-based methods. For the software-metric-based methods, in addition to a large number of classification methods, we also consider four state-of-the-art methods (i.e., EALR, OneWay, CBS, and MULTI) proposed in recent effort-aware just-in-time defect prediction studies. For text-mining-based methods, we consider the Bag-of-Word model and the term-frequency-inverse-document-frequency model. For the unsupervised methods, all the modules are ranked in the ascendent order based on a specific metric. Since 12 software metrics are considered when measuring extracted modules, there are 12 different unsupervised methods. To the best of our knowledge, over 40 SVP methods have not been considered in previous SVP studies. In our large-scale empirical studies, we use three real open-source web applications written in PHP as benchmark. These three web applications include 3466 modules and 223 vulnerabilities in total. We evaluate these SVP methods both in the within-project SVP scenario and the cross-project SVP scenario. Empirical results show that two unsupervised methods [i.e., lines of code (LOC) and Halstead's volume (HV)] and four recently proposed state-of-the-art supervised methods (i.e., MULTI, OneWay, CBS, and EALR) can achieve better performance than the other methods in terms of effort-aware performance measures. Then, we analyze the reasons why these six methods can achieve better performance. For example, when using 20% of the entire efforts, we find that these six methods always require more modules to be inspected, especially for unsupervised methods LOC and HV. Finally, from the view of practical vulnerability localization, we find that all the unsupervised methods and the OneWay method have high false alarms before finding the first vulnerable module. This may have an impact on developers’ confidence and tolerance, and supervised methods (especially MULTI and text-mining-based methods) are preferred.

中文翻译：

努力感知安全漏洞预测方法的大规模实证研究

安全漏洞预测（SVP）可以提前识别潜在的漏洞模块，然后帮助开发人员将大部分测试资源分配给这些模块。为了评估不同 SVP 方法的性能，我们应该考虑安全审计和代码检查，然后考虑工作量感知性能度量（例如 $ACC$ 和 $P_{\rm opt}$）。然而，据我们所知，不同 SVP 方法的有效性尚未在工作量感知性能指标方面进行彻底调查。在本文中，我们考虑了 48 种不同的 SVP 方法，其中 36 种是监督方法，12 种是无监督方法。对于监督方法，我们考虑了 34 种基于软件度量的方法和两种基于文本挖掘的方法。对于基于软件度量的方法，除了大量的分类方法之外，我们还考虑了最近的努力感知即时缺陷预测研究中提出的四种最先进的方法（即 EALR、OneWay、CBS 和 MULTI）。对于基于文本挖掘的方法，我们考虑词袋模型和词频逆文档频率模型。对于无监督方法，所有模块都根据特定指标按升序排列。由于在测量提取的模块时考虑了 12 个软件指标，因此有 12 种不同的无监督方法。据我们所知，在以前的 SVP 研究中没有考虑 40 多种 SVP 方法。在我们的大规模实证研究中，我们使用三个用 PHP 编写的真实开源 Web 应用程序作为基准。这三个 Web 应用程序总共包含 3466 个模块和 223 个漏洞。我们在项目内 SVP 场景和跨项目 SVP 场景中评估这些 SVP 方法。实证结果表明，两种无监督方法[即代码行数 (LOC) 和 Halstead 的体积 (HV)] 和最近提出的四种最先进的监督方法（即 MULTI、OneWay、CBS 和 EALR）可以实现在工作量感知性能指标方面比其他方法有更好的性能。然后，我们分析了这六种方法能够获得更好性能的原因。例如，当使用整个工作量的 20% 时，我们发现这六种方法总是需要更多的模块来检查，特别是对于无监督方法 LOC 和 HV。最后，从实际的漏洞定位来看，我们发现所有无监督方法和 OneWay 方法在找到第一个易受攻击的模块之前都有很高的误报率。这可能会影响开发者的信心和容忍度，首选有监督的方法（尤其是 MULTI 和基于文本挖掘的方法）。

更新日期：2020-03-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>