An Automatic Software Vulnerability Classification Framework Using Term Frequency-Inverse Gravity Moment and Feature Selection,Journal of Systems and Software

当前位置： X-MOL 学术 › J. Syst. Softw. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

An Automatic Software Vulnerability Classification Framework Using Term Frequency-Inverse Gravity Moment and Feature Selection
Journal of Systems and Software ( IF 3.7 ) Pub Date : 2020-09-01 , DOI: 10.1016/j.jss.2020.110616
Jinfu Chen , Patrick Kwaku Kudjo , Solomon Mensah , Selasie Aformaley Brown , George Akorfu

Abstract Vulnerability classification is an important activity in software development and software quality maintenance. A typical vulnerability classification model usually involves a stage of term selection, in which the relevant terms are identified via feature selection. It also involves a stage of term-weighting, in which the document weights for the selected terms are computed, and a stage for classifier learning. Generally, the term frequency-inverse document frequency (TF-IDF) model is the most widely used term-weighting metric for vulnerability classification. However, several issues hinder the effectiveness of the TF-IDF model for document classification. To address this problem, we propose and evaluate a general framework for vulnerability severity classification using the term frequency-inverse gravity moment (TF-IGM). Specifically, we extensively compare the term frequency-inverse gravity moment, term frequency-inverse document frequency, and information gain feature selection using five machine learning algorithms on ten vulnerable software applications containing a total number of 27,248 security vulnerabilities. The experimental result shows that: (i) the TF-IGM model is a promising term weighting metric for vulnerability classification compared to the classical term-weighting metric, (ii) the effectiveness of feature selection on vulnerability classification varies significantly across the studied datasets and (iii) feature selection improves vulnerability classification.

中文翻译：

使用词频-反重力矩和特征选择的自动软件漏洞分类框架

摘要漏洞分类是软件开发和软件质量维护中的一项重要活动。典型的漏洞分类模型通常涉及术语选择阶段，其中通过特征选择来识别相关术语。它还涉及术语加权阶段，其中计算所选术语的文档权重，以及分类器学习阶段。通常，术语频率-逆文档频率（TF-IDF）模型是漏洞分类中使用最广泛的术语加权指标。然而，有几个问题阻碍了 TF-IDF 模型对文档分类的有效性。为了解决这个问题，我们提出并评估了一个使用术语频率反重力矩（TF-IGM）进行漏洞严重性分类的通用框架。具体来说，我们使用五种机器学习算法对包含总共 27,248 个安全漏洞的十个易受攻击的软件应用程序，广泛比较了词频-逆重力矩、词频-逆文档频率和信息增益特征选择。实验结果表明：(i) TF-IGM 模型与经典的术语加权度量相比，是一种很有前途的漏洞分类术语加权度量，(ii) 特征选择对漏洞分类的有效性在所研究的数据集上有显着差异； (iii) 特征选择改进了漏洞分类。以及使用五种机器学习算法对 10 个易受攻击的软件应用程序进行信息增益特征选择，这些应用程序包含总共 27,248 个安全漏洞。实验结果表明：(i) TF-IGM 模型与经典的术语加权度量相比，是一种很有前途的漏洞分类术语加权度量，(ii) 特征选择对漏洞分类的有效性在所研究的数据集上有显着差异； (iii) 特征选择改进了漏洞分类。以及使用五种机器学习算法对 10 个易受攻击的软件应用程序进行信息增益特征选择，这些应用程序包含总共 27,248 个安全漏洞。实验结果表明：(i) TF-IGM 模型与经典的术语加权度量相比，是一种很有前途的漏洞分类术语加权度量，(ii) 特征选择对漏洞分类的有效性在所研究的数据集上有显着差异； (iii) 特征选择改进了漏洞分类。

更新日期：2020-09-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11