当前位置: X-MOL 学术Comput. Secur. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
HSTF-Model: an HTTP-based Trojan Detection Model via the Hierarchical Spatio-Temporal Features of Traffics
Computers & Security ( IF 5.6 ) Pub Date : 2020-09-01 , DOI: 10.1016/j.cose.2020.101923
Jiang Xie , Shuhao Li , Xiaochun Yun , Yongzheng Zhang , Peng Chang

Abstract HTTP-based Trojan is extremely threatening, and it is difficult to be effectively detected because of its concealment and confusion. Previous detection methods usually are with poor generalization ability due to outdated datasets and reliance on manual feature extraction, which makes these methods always perform well under their private dataset, but poorly or even fail to work in real network environment. In this paper, we propose an HTTP-based Trojan detection model via the Hierarchical Spatio-Temporal Features of traffics (HSTF-Model) based on the formalized description of traffic spatio-temporal behavior from both packet level and flow level. In this model, we employ Convolutional Neural Network (CNN) to extract spatial information and Long Short-Term Memory (LSTM) to extract temporal information. In addition, we present a dataset consisting of Benign and Trojan HTTP Traffic (BTHT-2018). Experimental results show that our model can guarantee high accuracy (the F1 of 98.62% ~ 99.81% and the FPR of 0.34% ~ 0.02% in BTHT-2018). More importantly, our model has a huge advantage over other related methods in generalization ability. HSTF-Model trained with BTHT-2018 can reach the F1 of 93.51% on the public dataset ISCX-2012, which is 20+% better than the best of related machine learning methods.

中文翻译:

HSTF-Model:基于流量的分层时空特征的基于 HTTP 的木马检测模型

摘要 基于HTTP的木马具有极大的威胁性,由于其隐蔽性和混淆性,很难被有效检测。以往的检测方法通常由于数据集过时和依赖人工特征提取,泛化能力较差,这使得这些方法在其私有数据集下总是表现良好,但在真实网络环境中表现不佳甚至无法工作。在本文中,我们基于从数据包级别和流级别对流量时空行为的形式化描述,通过流量的分层时空特征(HSTF-Model)提出了一种基于 HTTP 的木马检测模型。在这个模型中,我们使用卷积神经网络 (CNN) 来提取空间信息和长短期记忆 (LSTM) 来提取时间信息。此外,我们提出了一个由良性和特洛伊木马 HTTP 流量(BTHT-2018)组成的数据集。实验结果表明,我们的模型可以保证高精度(BTHT-2018 中的 F1 为 98.62% ~ 99.81% 和 FPR 为 0.34% ~ 0.02%)。更重要的是,我们的模型在泛化能力方面比其他相关方法具有巨大优势。使用 BTHT-2018 训练的 HSTF-Model 在公共数据集 ISCX-2012 上可以达到 93.51% 的 F1,比最好的相关机器学习方法高出 20+%。
更新日期:2020-09-01
down
wechat
bug