A network intrusion detection method based on semantic Re-encoding and deep learning

https://doi.org/10.1016/j.jnca.2020.102688Get rights and content

Abstract

In recent years, with the increase of human activities in cyberspace, intrusion events, such as network penetration, detection and attack, tend to be frequent and hidden. The traditional intrusion detection methods which prefer rules are not enough to deal with the increasingly complex network intrusion flow. However, the generalization ability of intrusion detection system based on classical machine learning method is still insufficient, and the false alarm rate is high. Aiming at this problem, we consider that normal network traffic and intrusion network traffic are obviously different in several semantic dimensions, though the intrusion traffic is more and more covert. Then we propose a new intrusion detection method, named SRDLM, based on semantic re-encoding and deep learning. The SRDLM method re-encodes the semantics of network traffic, increases the distinguish ability of traffic, and enhances the generalization ability of the algorithm by using deep learning technology, thus effectively improving the accuracy and robustness of the algorithm. The accuracy of the SRDLC algorithm for Web character injection network attack detection is over 99%. When detecting the NSL-KDD data set, the average performance is improved by more than 8% compared with the traditional machine learning method.

Introduction

With the development of information technology, people at present enjoy the convenience of network. While the number and scale of security threats is growing rapidly, which has caused great damage to network resources and privacy leaks. Methods and features of network intrusion are constantly changing and developing. Thus intrusion detection is still an important research issue at present.

Intrusion detection technology has been continuously studied by researchers (Moustafa et al., 2019; Bhuyan et al., 2013; Jaiganesh et al., 2013; Aburomman and Reaz, 2017; Kabir et al., 2018). In general, intrusion detection can be taken as a classification problem, classifying the incoming network into normal and attack one. Existing intrusion detection models mainly combine various existing machine learning methods with intrusion detection data sets. The intrusion detection data set is a general big data set, which is directly input into the existing various machine learning models to train the intrusion detection classifier. And various current learning methods can be broadly classified into three types: traditional machine learning based method, deep learning based method, and hybrid method.

The traditional machine-learning methods include Support vector machine (SVM), k-Nearest Neighbor (kNN), Decision Trees, and so on. As collected data sets become larger and larger, deep learning-based approaches are gaining much attention since they can learn computational process in depth and may lead to better generalization capabilities. There are methods like Deep belief network (DBN), Convolutional neural network (CNN), Recurrent neural network (RNN), AutoEncoder, and so on. In order to further improve the accuracy of recognition, the method of combining various data classification methods to form a hybrid classifier has been studied. A large number of experiments shown that hybrid-based techniques display a better detection performance for specific data sets. Because of a specific classifier and merge method, they can achieve higher precision and detection rate than a single method.

Through continuous efforts, researchers now are able to design high accuracy detectors for fixed intrusion data sets. However, due to the continuous dynamic changes of network intrusion traffic, high accuracy for only fixed data sets cannot guarantee the excellent detection performance in the face of dynamic traffic. Our work conducts to analyze the detectability of dynamic intrusion traffic, and we then propose an effective intrusion detection algorithm based on semantic re-encoding and deep learning. Semantic re-encoding technology attempts to re-express the semantic space of intrusion traffic to achieve the purpose of increasing the distinguishability of abnormal traffic. On the basis of semantic re-encoding, deep learning technology is used to enhance the generalization ability of the intrusion detection model. The main contributions of this work are as follows:

  • 1.

    We find that the semantics of network traffic are different. Normal network traffic and attack network traffic often have significant differences in narrative semantics. Based on this, a semantic re-encoding method for intrusion network flow is designed, which can effectively increase the distinguish ability of abnormal network traffic.

  • 2.

    We design a deep learning-based detection model for intrusion traffic, which enhances the generalization capabilities of intrusion detection models.

Experimental results show that our approach get competitive performance.

The rest of this paper is organized as follows: Section 2 introduces the related works. Section 3 describes our proposed method in detail. Section 4 shows experimental performances. Finally, the conclusion is presented in Section 5.

Section snippets

Related work

Previously, many researchers use methods on pure traditional classifiers to the intrusion detection field. There are classifiers like Naïve Bayes, SVM, decision trees, kNN and so on (Dhanabal and Shantharajah, 2015; Deshmukh et al., 2015; Heba et al., 2010; Naoum and Al-Sultani, 2012).These methods have indeed achieved a lot of achievements, and laid a solid foundation for later research.

Many researches have been conducted since deep-learning. Researches make lots of work on the preprocess of

Problem formulation

As people's activities in cyberspace become more frequent, network intrusion traffic presents a trend of continuous dynamic changes, which makes the detection model for fixed dataset design often unsatisfactory. More importantly, dynamically changing network intrusion traffic has a large number of hidden and burst features showing discontinuity, while the current mainstream deep learning model behaves better at characterizing continuously changing data features. Then how to improve the

Experimental results and analysis

This experiment examines two data sets. One is the dataset collected especially for web attack in Hangdian Security Lab, which contains both the normal and abnormal http streams, named Hduxss_data1.0. The other is the NSL-KDD, which is considered to be the benchmark evaluation data set in the field of intrusion detection. The experiment is performed on Pytorch 1.0 using a computer with GPU 2080ti, the operating system is Ubuntu 18.04, and the memory is 32G.

Conclusions

This paper proposes an SRDLM intrusion detection method based on semantic re-encoding and deep learning. The SRDLM algorithm has advantages in dealing with anomaly detection of network traffic with huge semantic coding space and negligible word order. However, for the network traffic that has been extracted features, semantic re-encoding technology has limited performance improvement in traffic detection. Semantic re-encoding technology can be combined with deep learning technology to achieve

CRediT authorship contribution statement

Zhendong Wu: Methodology, Writing - original draft, Writing - review & editing, Software, Resources. Jingjing Wang: Software, Writing - original draft. Liqin Hu: Investigation, Formal analysis. Zhang Zhang: Validation. Han Wu: Validation.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

This research is supported by National Natural Science Foundation of China (No.61772162), Key Projects of NSFC Joint Fund of China (No.U1866209), National Natural Science Foundation of China (No.61602144), National Key R&D Program of China (No.2018YFB0804102).

Zhendong Wu received the M.S. degree and the PhD degree in Computer Science and Technology from the Zhejiang University, Hangzhou, China. Currently, he is an Associate Professor with the School of Cyberspace, Hangzhou Dianzi University, Hangzhou, China. His current research interests include biometrics, biological cryptography, machine intelligence and natural language research.

References (28)

  • K. He et al.

    Deep residual learning for image recognition

  • F.E. Heba et al.

    Principle components analysis and support vector machine based intrusion detection system

  • C.-M. Hsu et al.

    Using long-short-term memory based convolutional neural networks for network intrusion detection

  • B. Ingre et al.

    Performance analysis of nsl-kdd dataset using ann

  • Cited by (56)

    • IPFS based storage Authentication and access control model with optimization enabled deep learning for intrusion detection

      2023, Advances in Engineering Software
      Citation Excerpt :

      This method did not provide better results in various environments. Wu, Z., et al. [23] devised SRDLM for recognizing network intrusions. Here, the intrusion detection procedure was carried out using the SRDLM technique by storing the semantics of network traffic, which enhances traffic classification and the generalizability of the algorithm.

    • Federated Learning for intrusion detection system: Concepts, challenges and future directions

      2022, Computer Communications
      Citation Excerpt :

      Such attacks that may not be exposed unless they penetrate the host. NIDS are deployed at a predetermined location throughout the network to scrutinize traffic from all connected networks [38]. It interprets all the traffic that passes through the sub-net and based on the comparison with anomalies library, an intrusion is identified.

    View all citing articles on Scopus

    Zhendong Wu received the M.S. degree and the PhD degree in Computer Science and Technology from the Zhejiang University, Hangzhou, China. Currently, he is an Associate Professor with the School of Cyberspace, Hangzhou Dianzi University, Hangzhou, China. His current research interests include biometrics, biological cryptography, machine intelligence and natural language research.

    Jingjing Wang is currently pursuing the master degree in Information security at Hangzhou Dianzi University, Hangzhou, China. Her research interests include data mining, deep learning and intrusion detection.

    Liqin Hu received the Ph.D. degree in mathematics from the Nanjing University of Aeronautics and Astronautics, Nanjing, China. She is a lecturer of the School of Cyberspace Security at Hangzhou Dianzi University. Her research interests include cryptography, and coding theory.

    Zhang Zhang is currently pursuing the master degree in School of Systems Science at Beijing Normal University, Beijing, China. His research interests include complex system and Machine learning techniques.

    Han Wu is currently pursuing the master degree in Cyberspace security at Hangzhou Dianzi University, Hangzhou, China. His research interests include computer vision, deep learning and datamining.

    View full text