当前位置: X-MOL 学术Comput. Secur. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Are Public Intrusion Datasets Fit for Purpose Characterising the State of the Art in Intrusion Event Datasets
Computers & Security ( IF 4.8 ) Pub Date : 2020-12-01 , DOI: 10.1016/j.cose.2020.102022
A. Kenyon , L. Deka , D. Elizondo

Abstract In recent years cybersecurity attacks have caused major disruption and information loss for online organisations, with high profile incidents in the news. One of the key challenges in advancing the state of the art in intrusion detection is the lack of representative datasets. These datasets typically contain millions of time-ordered events (e.g. network packet traces, flow summaries, log entries); subsequently analysed to identify abnormal behavior and specific attacks ( Duffield et al., April ). Generating realistic datasets has historically required expensive networked assets, specialised traffic generators, and considerable design preparation. Even with advances in virtualisation it remains challenging to create and maintain a representative environment. Major improvements are needed in the design, quality and availability of datasets, to assist researchers in developing advanced detection techniques. With the emergence of new technology paradigms, such as intelligent transport and autonomous vehicles, it is also likely that new classes of threat will emerge ( Kenyon, 2018 ). Given the rate of change in threat behavior ( Ugarte-Pedrero et al., 2019 ) datasets become quickly obsolete, and some of the most widely cited datasets date back over two decades. Older datasets have limited value: often heavily filtered and anonymised, with unrealistic event distributions, and opaque design methodology. The relative scarcity of (Intrusion Detection System) IDS datasets is compounded by the lack of a central registry, and inconsistent information on provenance. Researchers may also find it hard to locate datasets or understand their relative merits. In addition, many datasets rely on simulation, originating from academic or government institutions. The publication process itself often creates conflicts, with the need to de-identify sensitive information in order to meet regulations such as General Data Protection Act (GDPR) ( Regulation, 2016 ). Another final issue for researchers is the lack of standardised metrics with which to compare dataset quality. In this paper we attempt to classify the most widely used public intrusion datasets, providing references to archives and associated literature. We illustrate their relative utility and scope, highlighting the threat composition, formats, special features, and associated limitations. We identify best practice in dataset design, and describe potential pitfalls of designing anomaly detection techniques based on data that may be either inappropriate, or compromised due to unrealistic threat coverage. Such contributions as made in this paper is expected to facilitate continuous research and development for effectively combating the constantly evolving cyber threat landscape. CCS CONCEPTS Intrusion Detection;Intrusion Prevention; Anomaly Detection; Network Flow; Smart Cities

中文翻译:

公共入侵数据集是否适合表征入侵事件数据集中的最新技术

摘要 近年来,网络安全攻击给在线组织造成了重大破坏和信息丢失,新闻中频频出现重大事件。推进入侵检测技术发展的主要挑战之一是缺乏代表性数据集。这些数据集通常包含数百万个按时间排序的事件(例如网络数据包跟踪、流摘要、日志条目);随后进行分析以识别异常行为和特定攻击(Duffield 等人,April)。生成真实的数据集历来需要昂贵的网络资产、专门的流量生成器和大量的设计准备。即使虚拟化取得了进步,创建和维护具有代表性的环境仍然具有挑战性。数据集的设计、质量和可用性需要重大改进,协助研究人员开发先进的检测技术。随着智能交通和自动驾驶汽车等新技术范式的出现,也可能会出现新的威胁类别(Kenyon,2018 年)。鉴于威胁行为的变化率 (Ugarte-Pedrero et al., 2019),数据集很快就过时了,一些最广泛引用的数据集可以追溯到二十多年前。较旧的数据集价值有限:通常经过大量过滤和匿名处理,具有不切实际的事件分布和不透明的设计方法。(入侵检测系统)IDS 数据集的相对稀缺性因缺乏中央注册表和来源信息不一致而加剧。研究人员也可能发现很难找到数据集或了解它们的相对优点。此外,许多数据集依赖于模拟,源自学术或政府机构。发布过程本身经常会产生冲突,需要对敏感信息进行去识别化处理,以满足通用数据保护法 (GDPR)(法规,2016 年)等法规。研究人员面临的另一个最终问题是缺乏用于比较数据集质量的标准化指标。在本文中,我们尝试对使用最广泛的公共入侵数据集进行分类,提供对档案和相关文献的参考。我们说明了它们的相对效用和范围,突出了威胁构成、格式、特殊功能和相关限制。我们确定了数据集设计的最佳实践,并描述了基于可能不合适的数据设计异常检测技术的潜在陷阱,或由于不切实际的威胁覆盖而受到损害。本文所做的这些贡献有望促进持续的研究和开发,以有效应对不断变化的网络威胁形势。CCS 概念入侵检测;入侵防御;异常检测;网络流量;智慧城市
更新日期:2020-12-01
down
wechat
bug