当前位置: X-MOL 学术Comput. Netw. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Clustering of unknown protocol messages based on format comparison
Computer Networks ( IF 4.4 ) Pub Date : 2020-06-18 , DOI: 10.1016/j.comnet.2020.107296
Fanghui Sun , Shen Wang , Chunrui Zhang , Hongli Zhang

As a solution to detect and analyse unknown or proprietary protocols, Protocol Reverse Engineering(PRE) has been developed swiftly in recent years. In this field, message clustering aimed at protocol format serves as a fundamental solution for differentiating of unknown protocol messages. This paper works on the problem of format-oriented message clustering of unknown protocols, including messages from proprietary or non-cooperative network environments with their specifications unknown. By introducing basic rules of ABNF, we define Token Format Distance (TFD) and Message Format Distance (MFD) to represent format similarity of tokens and messages, and introduce Jaccard Distance and an optimized sequence alignment algorithm (MFD measurement) to compute them. Then, a distance matrix is built by MFD and we feed it to DBSCAN algorithm to cluster unknown protocol messages into classes with different formats. In this process, we design an unsupervised clustering strategy with Silhouette Coefficient and Dunn Index applied to parameter selecting of DBSCAN. In experiment on two datasets, the harmonic average v-measures of homogeneity and completeness on result clusters are both above 0.91, with fmis and coverages no less than 0.97. Together with iqr of v-measure and fmi bellow 0.1 and 0.03 separately in boxplot analyses, this method is proved to have remarkable validity and stability. Comprehensive analyses and comparisons on these indexes also show considerable advantages of our method over previous work.



中文翻译:

基于格式比较的未知协议消息的聚类

作为检测和分析未知或专有协议的解决方案,近年来,协议逆向工程(PRE)得到了迅速发展。在该领域,针对协议格式的消息聚类用作区分未知协议消息的基本解决方案。本文针对未知协议的面向格式的消息聚类问题,包括来自专有或非合作网络环境的消息(其规范未知)。通过介绍ABNF的基本规则,我们定义了令牌格式距离(TFD)消息格式距离(MFD)以表示令牌和消息的格式相似性,并引入“ Jaccard距离”和优化的序列比对算法(MFD测量)来计算它们。然后,由MFD建立距离矩阵,并将其输入DBSCAN算法,以将未知协议消息聚类为具有不同格式的类。在这个过程中,我们设计了一种无监督聚类策略,将Silhouette Coefficient和Dunn Index应用于DBSCAN的参数选择。在两个数据集上的实验中,结果簇的均一性完整性的谐波平均v度量均在0.91以上,fmi s和覆盖率s不小于0.97。加上IQR在箱线图分析中,v-measurefmi bellow分别为0.1和0.03,证明该方法具有显着的有效性和稳定性。对这些指标的综合分析和比较也表明我们的方法比以前的工作有很多优势。

更新日期:2020-06-18
down
wechat
bug