Graph neural networks and cross-protocol analysis for detecting malicious IP addresses,Complex & Intelligent Systems

当前位置： X-MOL 学术 › Complex Intell. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Graph neural networks and cross-protocol analysis for detecting malicious IP addresses
Complex & Intelligent Systems ( IF 5.8 ) Pub Date : 2022-09-14 , DOI: 10.1007/s40747-022-00838-y
Yonghong Huang ₁ , Joanna Negrete ₂ , John Wagener ₃ , Celeste Fralick ₄ , Armando Rodriguez ₁ , Eric Peterson ₅ , Adam Wosotowsky ₆

Affiliation

An internet protocol (IP) address is the foundation of the Internet, allowing connectivity between people, servers, Internet of Things, and services across the globe. Knowing what is connecting to what and where connections are initiated is crucial to accurately assess a company’s or individual’s security posture. IP reputation assessment can be quite complex because of the numerous services that may be hosted on that IP address. For example, an IP might be serving millions of websites from millions of different companies like web hosting companies often do, or it could be a large email system sending and receiving emails for millions of independent entities. The heterogeneous nature of an IP address typically makes it challenging to interpret the security risk. To make matters worse, adversaries understand this complexity and leverage the ambiguous nature of the IP reputation to exploit further unsuspecting Internet users or devices connected to the Internet. In addition, traditional techniques like dirty-listing cannot react quickly enough to changes in the security climate, nor can they scale large enough to detect new exploits that may be created and disappear in minutes. In this paper, we introduce the use of cross-protocol analysis and graph neural networks (GNNs) in semi-supervised learning to address the speed and scalability of assessing IP reputation. In the cross-protocol supervised approach, we combine features from the web, email, and domain name system (DNS) protocols to identify ones which are the most useful in discriminating suspicious and benign IPs. In our second experiment, we leverage the most discriminant features and incorporate them into the graph as nodes’ features. We use GNNs to pass messages from node to node, propagating the signal to the neighbors while also gaining the benefit of having the originating nodes being influenced by neighboring nodes. Thanks to the relational graph structure we can use only a small portion of labeled data and train the algorithm in a semi-supervised approach. Our dataset represents real-world data that is sparse and only contain a small percentage of IPs with verified clean or suspicious labels but are connected. The experimental results demonstrate that the system can achieve \(85.28\%\) accuracy in detecting malicious IP addresses at scale with only \(5\%\) of labeled data.

中文翻译：

用于检测恶意 IP 地址的图神经网络和跨协议分析

Internet 协议 (IP) 地址是 Internet 的基础，它允许在全球范围内的人员、服务器、物联网和服务之间建立连接。了解什么连接到什么以及在哪里发起连接对于准确评估公司或个人的安全状况至关重要。IP 信誉评估可能非常复杂，因为该 IP 地址上可能托管大量服务。例如，一个 IP 可能为来自数百万不同公司的数百万个网站提供服务，就像网络托管公司经常做的那样，或者它可能是一个为数百万独立实体发送和接收电子邮件的大型电子邮件系统。IP 地址的异构性质通常使解释安全风险具有挑战性。去把事情弄得更糟，攻击者了解这种复杂性，并利用 IP 声誉的模糊性进一步利用毫无戒心的互联网用户或连接到互联网的设备。此外，脏列表等传统技术无法对安全环境的变化做出足够快的反应，也无法大规模扩展以检测可能在几分钟内创建和消失的新漏洞。在本文中，我们介绍了在半监督学习中使用跨协议分析和图神经网络 (GNN) 来解决评估 IP 信誉的速度和可扩展性。在跨协议监督方法中，我们结合了来自网络、电子邮件和域名系统 (DNS) 协议的特征，以识别对区分可疑和良性 IP 最有用的特征。在我们的第二个实验中，我们利用最具辨别力的特征并将它们作为节点的特征合并到图中。我们使用 GNN 将消息从一个节点传递到另一个节点，将信号传播给邻居，同时还获得了使发起节点受到相邻节点影响的好处。由于关系图结构，我们可以只使用一小部分标记数据并以半监督方法训练算法。我们的数据集代表稀疏的真实数据，仅包含一小部分带有经过验证的干净或可疑标签但已连接的 IP。实验结果表明，该系统可以实现将信号传播给邻居，同时还可以获得使发起节点受到相邻节点影响的好处。由于关系图结构，我们可以只使用一小部分标记数据并以半监督方法训练算法。我们的数据集代表稀疏的真实数据，仅包含一小部分带有经过验证的干净或可疑标签但已连接的 IP。实验结果表明，该系统可以实现将信号传播给邻居，同时还可以获得使发起节点受到相邻节点影响的好处。由于关系图结构，我们可以只使用一小部分标记数据并以半监督方法训练算法。我们的数据集代表稀疏的真实数据，仅包含一小部分带有经过验证的干净或可疑标签但已连接的 IP。实验结果表明，该系统可以实现\(85.28\%\)在大规模检测恶意 IP 地址时的准确率只有\(5\%\)的标记数据。

更新日期：2022-09-15

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>