当前位置: X-MOL 学术Knowl. Based Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Network intrusion detection with a novel hierarchy of distances between embeddings of hash IP addresses
Knowledge-Based Systems ( IF 8.8 ) Pub Date : 2021-02-25 , DOI: 10.1016/j.knosys.2021.106887
Manuel Lopez-Martin , Belen Carro , Juan Ignacio Arribas , Antonio Sanchez-Esguevillas

Including high-dimensional categorical predictors in a machine learning model is a major challenge. This is particularly appropriate for the IP and Port addresses of network connections when they are considered as predictors (features) in machine learning models. These features are particularly important for network intrusion detection, as many attacks exploit information about IP/Port addresses. The sparsity and high dimensionality of these features make it difficult their inclusion into the models, being discarded as useful information in many cases. This work proposes to replace the original network addresses by new features based on a set of distances defined between different components of the source and destination IP and Port addresses. These distances incorporate information on the probability of co-occurrence of source and destination addresses. The distances are calculated using a dense, low-dimensional vector representation (embedding) of the different network address components. The embeddings are obtained with a neural network, which requires few computational resources, plus an additional hash function that collapses the extremely large range of IP and Port values, making the model implementation feasible. A self-supervised learning framework under a hierarchical model is used to train the encoding network.

The novel features can be used to predict future co-occurrence of source and destination network addresses, and, when applied as features in a supervised model, they significantly increase the prediction performance of most classifiers for the detection of network intrusions. We demonstrate this prediction improvement over two modern network intrusion datasets: CICIDS2017 and CICDDoS2019.



中文翻译:

具有新的哈希IP地址嵌入之间距离的层次结构的网络入侵检测

在机器学习模型中包含高维分类预测器是一项重大挑战。当网络连接的IP和端口地址被视为机器学习模型中的预测变量(特征)时,这尤其适用。这些功能对于网络入侵检测特别重要,因为许多攻击都利用有关IP /端口地址的信息。这些特征的稀疏性和高维性使其难以将其包含到模型中,在许多情况下被当作有用信息丢弃。这项工作建议根据源IP地址和目标IP地址和端口地址的不同组件之间定义的一组距离,用新功能替换原始网络地址。这些距离包含有关源地址和目标地址同时出现的可能性的信息。使用不同网络地址组件的密集,低维矢量表示(嵌入)来计算距离。嵌入是通过一个神经网络获得的,该神经网络需要很少的计算资源,再加上一个附加的哈希函数,可以折叠极大范围的IP和Port值,从而使模型实现可行。分层模型下的自我监督学习框架用于训练编码网络。加上一个附加的哈希函数,该函数可折叠极大范围的IP和端口值,从而使模型实现可行。分层模型下的自我监督学习框架用于训练编码网络。加上一个附加的哈希函数,该函数可折叠极大范围的IP和端口值,从而使模型实现可行。分层模型下的自我监督学习框架用于训练编码网络。

这些新颖的功能可用于预测源和目标网络地址的未来同时出现,并且当在监督模型中用作功能时,它们可显着提高大多数分类器的预测性能,以检测网络入侵。我们通过两个现代网络入侵数据集(CICIDS2017和CICDDoS2019)展示了这种预测改进。

更新日期:2021-03-03
down
wechat
bug