Generating Informative CVE Description From ExploitDB Posts by Extractive Summarization,arXiv - CS - Information Retrieval

当前位置： X-MOL 学术 › arXiv.cs.IR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Generating Informative CVE Description From ExploitDB Posts by Extractive Summarization
arXiv - CS - Information Retrieval Pub Date : 2021-01-05 , DOI: arxiv-2101.01431
Jiamou Sun, Zhenchang Xing, Hao Guo, Deheng Ye, Xiaohong Li, Xiwei Xu, Liming Zhu

ExploitDB is one of the important public websites, which contributes a large number of vulnerabilities to official CVE database. Over 60\% of these vulnerabilities have high- or critical-security risks. Unfortunately, over 73\% of exploits appear publicly earlier than the corresponding CVEs, and about 40\% of exploits do not even have CVEs. To assist in documenting CVEs for the ExploitDB posts, we propose an open information method to extract 9 key vulnerability aspects (vulnerable product/version/component, vulnerability type, vendor, attacker type, root cause, attack vector and impact) from the verbose and noisy ExploitDB posts. The extracted aspects from an ExploitDB post are then composed into a CVE description according to the suggested CVE description templates, which is must-provided information for requesting new CVEs. Through the evaluation on 13,017 manually labeled sentences and the statistically sampling of 3,456 extracted aspects, we confirm the high accuracy of our extraction method. Compared with 27,230 reference CVE descriptions. Our composed CVE descriptions achieve high ROUGH-L (0.38), a longest common subsequence based metric for evaluating text summarization methods.

中文翻译：

通过提取摘要从ExploitDB帖子生成信息丰富的CVE描述

ExploitDB是重要的公共网站之一，它为官方CVE数据库造成了大量漏洞。这些漏洞中有60％以上具有高或严重的安全风险。不幸的是，超过73％的漏洞利用公开发布早于相应的CVE，而大约40％的漏洞利用甚至没有CVE。为了帮助记录ExploitDB帖子的CVE，我们提出了一种开放信息方法，从详细信息中提取9个关键漏洞方面（脆弱的产品/版本/组件，漏洞类型，供应商，攻击者类型，根本原因，攻击媒介和影响）。嘈杂的ExploitDB帖子。然后根据建议的CVE描述模板将ExploitDB帖子中提取的内容组合成CVE描述，这是请求新CVE所必须提供的信息。通过对13,017个手动标记的句子进行评估，并对3,456个提取的方面进行统计采样，我们确认了提取方法的准确性。与27,230个参考CVE描述相比。我们撰写的CVE描述达到了较高的ROUGH-L（0.38），这是最长的基于子序列的最长度量，用于评估文本摘要方法。

更新日期：2021-01-06

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>