当前位置: X-MOL 学术Secur. Commun. Netw. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Preprocessing Method for Encrypted Traffic Based on Semisupervised Clustering
Security and Communication Networks Pub Date : 2020-07-27 , DOI: 10.1155/2020/8824659
Rongfeng Zheng 1 , Jiayong Liu 2 , Weina Niu 3 , Liang Liu 2 , Kai Li 2 , Shan Liao 2
Affiliation  

The explosive growth in network traffic in recent times has resulted in increased processing pressure on network intrusion detection systems. In addition, there is a lack of reliable methods for preprocessing network traffic generated by benign applications that do not steal users’ data from their devices. To alleviate these problems, this study analyzed the differences between benign and malicious traffic produced by benign applications and malware, respectively. To fully express these differences, this study proposed a new set of statistical features for training a clustering model. Furthermore, to mine the communication channels generated by benign applications in batches, a semisupervised clustering method was adopted. Using a small number of labeled samples, our method aggregated historical network traffic into two types of clusters. The cluster that did not contain labeled malicious samples was regarded as a benign traffic cluster. The experimental results were compared using four types of clustering algorithms. The density-based spatial clustering of applications with noise (DBSCAN) clustering algorithm was selected to mine benign communication channels. We also compared our method with two other methods, and the results demonstrated that the benign channels mined through our method were more reliable. Finally, using our method, 1,811 benign transport layer security (TLS) channels were mined from 18,357 TLS communication channels. The number of flows carried by these benign channels comprised 65.37% of the entire network flows, and no malicious flow was included in our results, which proves the effectiveness of our method.

中文翻译:

基于半监督聚类的加密流量预处理方法

近年来,网络流量的爆炸性增长导致网络入侵检测系统的处理压力增加。另外,缺乏可靠的方法来预处理由良性应用程序生成的网络流量,这些方法不会从用户的设备中窃取用户的数据。为了缓解这些问题,本研究分别分析了良性应用程序和恶意软件产生的良性和恶意流量之间的差异。为了充分表达这些差异,本研究提出了一套新的统计特征,用于训练聚类模型。此外,为了分批挖掘良性应用程序生成的通信通道,采用了半监督聚类方法。使用少量标记的样本,我们的方法将历史网络流量聚合为两种类型的集群。不包含标记的恶意样本的群集被视为良性流量群集。使用四种类型的聚类算法比较了实验结果。选择具有噪声的应用程序基于密度的空间聚类(DBSCAN)聚类算法来挖掘良性通信通道。我们还将我们的方法与其他两种方法进行了比较,结果表明通过我们的方法挖掘的良性渠道更加可靠。最后,使用我们的方法,从18,357个TLS通信通道中挖掘了1,811个良性传输层安全性(TLS)通道。这些良性通道所承载的流量数量占整个网络流量的65.37%,并且我们的结果中没有包含恶意流量,这证明了我们方法的有效性。
更新日期:2020-07-27
down
wechat
bug