当前位置: X-MOL 学术J. Syst. Softw. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An Automatic Software Vulnerability Classification Framework Using Term Frequency-Inverse Gravity Moment and Feature Selection
Journal of Systems and Software ( IF 3.7 ) Pub Date : 2020-09-01 , DOI: 10.1016/j.jss.2020.110616
Jinfu Chen , Patrick Kwaku Kudjo , Solomon Mensah , Selasie Aformaley Brown , George Akorfu

Abstract Vulnerability classification is an important activity in software development and software quality maintenance. A typical vulnerability classification model usually involves a stage of term selection, in which the relevant terms are identified via feature selection. It also involves a stage of term-weighting, in which the document weights for the selected terms are computed, and a stage for classifier learning. Generally, the term frequency-inverse document frequency (TF-IDF) model is the most widely used term-weighting metric for vulnerability classification. However, several issues hinder the effectiveness of the TF-IDF model for document classification. To address this problem, we propose and evaluate a general framework for vulnerability severity classification using the term frequency-inverse gravity moment (TF-IGM). Specifically, we extensively compare the term frequency-inverse gravity moment, term frequency-inverse document frequency, and information gain feature selection using five machine learning algorithms on ten vulnerable software applications containing a total number of 27,248 security vulnerabilities. The experimental result shows that: (i) the TF-IGM model is a promising term weighting metric for vulnerability classification compared to the classical term-weighting metric, (ii) the effectiveness of feature selection on vulnerability classification varies significantly across the studied datasets and (iii) feature selection improves vulnerability classification.

中文翻译:

使用词频-反重力矩和特征选择的自动软件漏洞分类框架

摘要 漏洞分类是软件开发和软件质量维护中的一项重要活动。典型的漏洞分类模型通常涉及术语选择阶段,其中通过特征选择来识别相关术语。它还涉及术语加权阶段,其中计算所选术语的文档权重,以及分类器学习阶段。通常,术语频率-逆文档频率(TF-IDF)模型是漏洞分类中使用最广泛的术语加权指标。然而,有几个问题阻碍了 TF-IDF 模型对文档分类的有效性。为了解决这个问题,我们提出并评估了一个使用术语频率反重力矩(TF-IGM)进行漏洞严重性分类的通用框架。具体来说,我们使用五种机器学习算法对包含总共 27,248 个安全漏洞的十个易受攻击的软件应用程序,广泛比较了词频-逆重力矩、词频-逆文档频率和信息增益特征选择。实验结果表明:(i) TF-IGM 模型与经典的术语加权度量相比,是一种很有前途的漏洞分类术语加权度量,(ii) 特征选择对漏洞分类的有效性在所研究的数据集上有显着差异; (iii) 特征选择改进了漏洞分类。以及使用五种机器学习算法对 10 个易受攻击的软件应用程序进行信息增益特征选择,这些应用程序包含总共 27,248 个安全漏洞。实验结果表明:(i) TF-IGM 模型与经典的术语加权度量相比,是一种很有前途的漏洞分类术语加权度量,(ii) 特征选择对漏洞分类的有效性在所研究的数据集上有显着差异; (iii) 特征选择改进了漏洞分类。以及使用五种机器学习算法对 10 个易受攻击的软件应用程序进行信息增益特征选择,这些应用程序包含总共 27,248 个安全漏洞。实验结果表明:(i) TF-IGM 模型与经典的术语加权度量相比,是一种很有前途的漏洞分类术语加权度量,(ii) 特征选择对漏洞分类的有效性在所研究的数据集上有显着差异; (iii) 特征选择改进了漏洞分类。
更新日期:2020-09-01
down
wechat
bug