Categorizing Malware via A Word2Vec-based Temporal Convolutional Network Scheme,Journal of Cloud Computing

当前位置： X-MOL 学术 › J. Cloud Comp. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Categorizing Malware via A Word2Vec-based Temporal Convolutional Network Scheme
Journal of Cloud Computing ( IF 3.7 ) Pub Date : 2020-09-23 , DOI: 10.1186/s13677-020-00200-y
Jiankun Sun , Xiong Luo , Honghao Gao , Weiping Wang , Yang Gao , Xi Yang

As edge computing paradigm achieves great popularity in recent years, there remain some technical challenges that must be addressed to guarantee smart device security in Internet of Things (IoT) environment. Generally, smart devices transmit individual data across the IoT for various purposes nowadays, and it will cause losses and impose a huge threat to users since malware may steal and damage these data. To improve malware detection performance on IoT smart devices, we conduct a malware categorization analysis based on the Kaggle competition of Microsoft Malware Classification Challenge (BIG 2015) dataset in this article. Practically speaking, motivated by temporal convolutional network (TCN) structure, we propose a malware categorization scheme mainly using Word2Vec pre-trained model. Considering that the popular one-hot encoding converts input names from malicious files to high-dimensional vectors since each name is represented as one dimension in one-hot vector space, more compact vectors with fewer dimensions are obtained through the use of Word2Vec pre-training strategy, and then it can lead to fewer parameters and stronger malware feature representation. Moreover, compared with long short-term memory (LSTM), TCN demonstrates better performance with longer effective memory and faster training speed in sequence modeling tasks. The experimental comparisons on this malware dataset reveal better categorization performance with less memory usage and training time. Especially, through the performance comparison between our scheme and the state-of-the-art Word2Vec-based LSTM approach, our scheme shows approximately 1.3% higher predicted accuracy than the latter on this malware categorization task. Additionally, it also demonstrates that our scheme reduces about 90 thousand parameters and more than 1 hour on the model training time in this comparison.

中文翻译：

通过基于Word2Vec的时间卷积网络方案对恶意软件进行分类

近年来，随着边缘计算范例的广泛普及，在保证物联网（IoT）环境中智能设备的安全性方面，仍然存在一些技术挑战。通常，如今，智能设备会通过IoT将各种数据用于各种目的，并且由于恶意软件可能会窃取并损坏这些数据，因此会造成损失并给用户带来巨大威胁。为了提高IoT智能设备上的恶意软件检测性能，我们在本文中基于Microsoft恶意软件分类挑战（BIG 2015）数据集的Kaggle竞争进行了恶意软件分类分析。实际上，受时间卷积网络（TCN）结构的启发，我们提出了一种主要使用Word2Vec预训练模型的恶意软件分类方案。考虑到流行的“一键编码”将输入名称从恶意文件转换为高维向量，因为每个名称在“一键向量”空间中表示为一维，因此通过使用Word2Vec预训练可以获得具有较少维的更紧凑的向量策略，然后可以减少参数并增强恶意软件功能表示。此外，与长短期记忆（LSTM）相比，TCN在序列建模任务中表现出更好的性能，更长的有效记忆和更快的训练速度。在此恶意软件数据集上进行的实验比较显示出更好的分类性能，同时减少了内存使用量和培训时间。特别是，通过我们的方案与基于Word2Vec的最新LSTM方法之间的性能比较，我们的方案显示大约为1。在恶意软件分类任务上，预测准确性比后者高3％。此外，它还表明，在此比较中，我们的方案减少了约9万个参数，并在模型训练时间上节省了超过1小时的时间。

更新日期：2020-09-23

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11