Machine learning for network application security: Empirical evaluation and optimization

https://doi.org/10.1016/j.compeleceng.2021.107052Get rights and content

Highlights

  • Providing a comprehensive empirical evaluation of ML algorithms.

  • Applying optimization methods to evaluate performance with respect to networking attacks.

  • Providing accuracy results with and without optimization with different ML algorithms.

Abstract

Machine learning (ML) has demonstrated great potential to revolutionize the networking field. In this paper, we present a large-scale empirical study to evaluate the effectiveness of state-of-the-art ML algorithms for network application security. In our experiments, six classical ML algorithms and three neural network algorithms are evaluated over three networking datasets, KDDCup 99, NSL-KDD, and ADFA IDS 2017. Measurements are made between the non-optimized and optimized versions of ML algorithms. Furthermore, various training and testing ratios are experimented to assess each algorithm’s optimal performance. The results revealed that optimizing ML algorithms could help achieve better performance in detecting networking attacks. In particular, the Decision Tree proved to be the most accurate and fastest algorithm in the classical ML while the Recurrent Neural Network achieved the best performance among neural network algorithms.

Introduction

The Internet has progressed greatly due to the advancement of technologies, industries, and smart applications [1]. As a result, both consumers and industries rely on computer networks and networking technologies for information delivery and massive online demands in the world of cyber–human intelligence. As networks are prevalent today, they are becoming much more important concerning their security and Quality of Experience (QoE) [2]. For industries, having efficient and reliable networks to offer secure services is now becoming a necessity [3].

Unfortunately, networks often struggle with constraints due to increased traffic demand and higher computational requirements. Furthermore, being able to detect and prevent network attacks is becoming increasingly difficult, yet very important, due to sophisticated cybercriminals and the complexity of the network’s communications [4]. Each network has its properties, structures, and performance requirements, which get changed constantly. As such, having to develop powerful techniques and architectures for handling complex situations for various networking-related use-cases, particularly those which deal with security issues, is difficult to accomplish [5].

ML-based techniques have been employed in the networking field, but there has not been much clarification on what role(s) ML can play in the networking domain. Previous studies regarding networking relied on handcrafted, statistical techniques to identify desired patterns in different datasets solely based on known port numbers, which were relatively ineffective [6]. ML applicabilities for networking remain a relevant and challenging topic as businesses and consumers’ livelihood depends on the proper operation of enabling networks. Some companies report they lack sufficient personnel to handle network attacks. Some other companies, particularly small businesses, do not even allocate budget for handling network security, leading them to have out-dated systems that are more prone to being targeted. Furthermore, attacks such as DDoS (Distributed Denial of Service) still remain a threat for a majority of service providers since they can represent up to 25% percent of a country’s total internet traffic. Monitoring network traffic is more crucial than ever to ensure consumer data protection against malicious actors. To all ranges of companies, it is beneficial and affordable to take advantage of solutions that benefit from the unique capability of ML to obtain better optimization [4] to combat security issues in their intrusion detection and network monitoring [7]. The growing complexity and scale of computer and mobile networks combined with ML-based methods could be a motivating factor for researchers and practitioners to invest more in smart solutions for handling network security [8].

There are two main goals of the paper. The first is to provide a comprehensive overview of the machine learning algorithms used for the networking domain. The machine learning algorithms will be discussed regarding their inner workings, advantages, and any areas of networking they may apply to. The overview will also include various kinds of networking attacks in regard to what they are and how they work. The second and foremost goal of this paper is to provide a large-scale empirical evaluation of nine ML algorithms with optimization and without optimization techniques to give a more solid base work for handling various networking scenarios and assisting newcomers to be better equipped with knowledge oncoming network threats. To this end, six classical ML algorithms and three neural network algorithms are evaluated over three networking datasets, KDDCup 99 [9], NSL-KDD [10], and ADFA IDS 2017 [11]. Specifically, the paper seeks to answer the following main questions:

  • Which of the ML algorithms do perform the best in terms of accuracy for distinguishing between normal networks and networks with abnormalities (security threats)?

  • How does optimization of the ML algorithms influence their performance?

  • How do different optimization methods perform with specific ML algorithms?

Section 2 gives the related works. Section 3 discusses the network protocols. Section 4 overviews different kinds of network attacks. Section 5 provides the empirical study including the experiment environment, tools, datasets, ML techniques, and optimization methods used. Section 6 presents the results. Finally, Section 7 gives the concluding remarks and future works.

Section snippets

Related works

Several works have addressed this topic, with some articles discussing the benefits of incorporating ML into the Networking domain and some articles discussing how certain ML algorithms can be used for several problems in networking.

M. Wang’s work [4] attempts to discuss the workflow, advances, and opportunities for Machine Learning and Networking. He argues that applying Machine Learning for Networking can help solve old network questions/problems that still exist and stimulate newer network

Overview of network protocols

Many types of network protocols exist to communicate and exchange information. This is crucial information to consider because network attacks usually target certain aspects of the underlying protocols, such as the network layers, hardware, and others. Some protocols that we still use are more prone to networking attacks than others. A protocol is a collection of rules outlining the communication between computers on a network. Essentially, these rules are guidelines that regulate aspects of a

Different kinds of network security attacks

Networking attacks have changed over time. Malicious code and malware for networking attacks go as far back as the 1960s. In the 1970s, large systems at universities became targets of attacks and pranks involving trojans. It would not be until 1976 that the Data Encryption standard got approved for the first time. The 1980s would be when attacks gained more notoriety among the public, with the first case of a DDoS attack actually happening in November of 1988. Some if not all kinds of

Empirical evaluation

This section discusses the tools, environment, datasets, ML techniques, and optimization methods used in the experimental study.

Results and discussion

All nine machine learning algorithms discussed in Section 5.3 were assessed on the three datasets. The study first evaluated the nine algorithms without any optimizations added. Afterward, optimizations were added to the nine algorithms to see if any notable changes occurred when applying these optimizations. The results for the six classic ML methods are presented in 6.1, and the results for the three neural network methods are presented in 6.2.

Conclusions

The study indicated that applying optimization on ML methods has a positive impact on the performance across all three datasets. In particular, the ADFA-IDS-2017 dataset showed the greatest change. In regards to the classic ML methods, the Decision Tree algorithm performed the best across all three datasets, both with and without optimization. K-Means clustering had the worst performance even with optimizations applied to it. Within the NNs, RNN performed the best across all three datasets.

CRediT authorship contribution statement

Mohammed Aledhari: Conceptualization, Methodology, Validation, Data curation, Writing - original draft, Supervision. Rehma Razzak: Investigation, Methodology, Validation, Data curation, Writing - original draft. Reza M. Parizi: Methodology, Validation, Formal analysis, Writing - review & editing, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Mohammed Aledhari is an Assistant Professor of Computer Science and the director of Smart and Autonomous Systems Center (SASC) at Kennesaw State University, GA, USA. His research interests include Data Science, Machine Learning, Computer Vision for Medical Imaging and Autonomous Systems, and Big Data Problems in Computational Biology and Bioinformatics.

References (31)

  • ZhaoJ. et al.

    Transfer learning for detecting unknown network attacks

    EURASIP J Inform Secur

    (2019)
  • SharmaD. et al.

    A network science-based k-means++ clustering method for power systems network equivalence

    Comput Soc Netw

    (2019)
  • D’AlconzoA. et al.

    A survey on big data for network traffic monitoring and analysis

    IEEE Trans Netw Serv Manag

    (2019)
  • NatalinoC. et al.

    Experimental study of machine-learning-based detection and identification of physical-layer attacks in optical networks

    J Lightwave Technol

    (2019)
  • BhutaniG.

    Application of machine-learning based prediction techniques in wireless networks

    Int J Commun Netw Syst Sci

    (2014)
  • WangM. et al.

    Machine learning for networking: Workflow, advances and opportunities

    Ieee Netw

    (2017)
  • NegandhiP. et al.

    Intrusion detection system using random forest on the NSL-kdd dataset

  • SultanaN. et al.

    Survey on SDN based network intrusion detection system using machine learning approaches

    Peer-to-Peer Netw Appl

    (2019)
  • IqbalM.F. et al.

    Efficient prediction of network traffic for real-time applications

    J Comput Netw Commun

    (2019)
  • BoutabaR. et al.

    A comprehensive survey on machine learning for networking: evolution, applications and research opportunities

    J Internet Serv Appl

    (2018)
  • TavallaeeM. et al.

    A detailed analysis of the KDD CUP 99 data set

  • DhanabalL. et al.

    A study on NSL-KDD dataset for intrusion detection system based on classification algorithms

    Int J Adv Res Comput Commun Eng

    (2015)
  • ChenH. et al.

    Azsecure-data.org: Intelligence and security informatics data sets

    (2017)
  • UsamaM. et al.

    Unsupervised machine learning for networking: Techniques, applications and research challenges

    IEEE Access

    (2019)
  • FangW. et al.

    Application of intrusion detection technology in network safety based on machine learning

    Saf Sci

    (2020)
  • Cited by (17)

    • Authenticating tier-two body area network devices through user-specific signal propagation characteristics

      2022, Computers and Security
      Citation Excerpt :

      Moreover, the observations prove that there are sufficiently distinguishable radio propagation signatures between different volunteers that can be utilized to identify the users and authenticate inter-WBAN devices positioned on them. In recent years, artificial neural network approaches have been widely used for the security of IoT devices (Aledhari et al., 2021). Among the many artificial neural networks, feedforward neural networks have been proven to have better performance for pattern detection due to their excellent function approximation capability (Wang et al., 2017).

    • MGA-IDS: Optimal feature subset selection for anomaly detection framework on in-vehicle networks-CAN bus based on genetic algorithm and intrusion detection approach

      2022, Computers and Security
      Citation Excerpt :

      Amato et al. (2021) utilized neural networks and multi layer perceptrons to discover attacks towards the CAN bus. In order to show the effectiveness of the ML algorithms on network application security, Aledhari et al. (2021) presented a large empirical study comprising evaluations of six classical machine learning algorithms and three neural network models over three datasets: KDDCup 99, NSL-KDD, and ADFA IDS 2017. Their study shows that the decision tree algorithm is the fastest and the most accurate, while the recurrent neural networks exhibit the best performance among neural network models.

    • XSS Filter detection using Trust Region Policy Optimization

      2023, 1st International Conference in Advanced Innovation on Smart City, ICAISC 2023 - Proceedings
    • Research on Intrusion Prevention Optimization Algorithm of Power UAV Network Communication Based on Artificial Intelligence

      2023, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    View all citing articles on Scopus

    Mohammed Aledhari is an Assistant Professor of Computer Science and the director of Smart and Autonomous Systems Center (SASC) at Kennesaw State University, GA, USA. His research interests include Data Science, Machine Learning, Computer Vision for Medical Imaging and Autonomous Systems, and Big Data Problems in Computational Biology and Bioinformatics.

    Rehma Razzak is a Graduate Research Assistant in Computer Science at Kennesaw State University, GA, USA. She is currently working toward the M.S. degree in computer science from Kennesaw State University. Her current research interests include Machine Learning, Game Development, and Autism Spectrum Disorder (ASD).

    Reza M. Parizi is the director of Decentralized Science Lab (dSL) at Kennesaw State University, GA, USA. He is a senior member of IEEE, IEEE Blockchain Community, and ACM. His research interests are R&D in decentralized AI, blockchain systems, smart contracts, and emerging issues in the practice of secure software-run world applications.

    This paper is for CAEE special section VSI-aicps. Reviews processed and recommended for publication to the Editor-in-Chief by Guest Editor Dr. Ali Dehghantanha.

    View full text