TC 11 Briefing Papers
A new WAF architecture with machine learning for resource-efficient use

https://doi.org/10.1016/j.cose.2021.102290Get rights and content

Abstract

Web Application Firewalls penalizes everyone, including latency in all requests, whether they are malicious or not. Several studies have reported the benefits of using Machine Learning to extract new rules to detect malware and malicious web requests. However, comparing the metrics of the models with their use of computational resources remains to be accomplished. This work aims to show a distributed WAF architecture, using ML classifiers as one of its components. Instead of having an enforcement point that analyzes the complete HTTP protocol for violations in this architecture, we have a trained classifier to detect them. The first part of this work verifies the viability of using classifiers based on their metrics, such as accuracy and recall. We analyze two datasets and make comparisons about their use. The second part of this paper compares ML models’ prediction processing time and a rules-based engine’s processing time. The classifiers used in this paper had a processing time of about 18x less than a rule-based engine. We also show that a classifier can find errors in the classification of a dataset generated by a WAF based on rules. We present samples and experimental codes to show the difference in approaches.

Section snippets

Introduction and previous work

Web Application Firewalls (WAFs) are used in the industry to protect web applications from vulnerability exploits and other security attacks. In the examples shown in the literature, WAFs can work in a heuristic way and through rules, often using a blocklist approach.

The aspect that we will evaluate in this work is related to the architecture used to implement WAFs. We can predict that the implementation of a WAF requires an increase in the computational capacity available for the applications

Classifiers

In this work, we will use classifiers to speed up the verification of malicious activities in a rules engine for WAFs. Part of the work is to evaluate some classifiers and check the balance provided in detection and speed.

Thus, we seek fewer complexity classifiers, and that could make the most efficient use of resources. We also limited the choice to 3 classifiers to compare their detection and performance rates. Thus, we evaluate the following: Linear Support Vector Classification (LSVM),

Environment

An evaluation of the performance of the classifiers runs at a Google Cloud virtual machine (n1-highmem-2) with 2 vCPUs Intel (R) Xeon (R) CPU @ 2.20GHz (Family 6 Model 79), each vCPUs’s cache with 56,320 KB. The system used had 13,333,540 KB of RAM with Ubuntu 18.04.3 LTS operating system.

The compiler used was GCC in version 7.5.0 (Ubuntu 7.5.0-3ubuntu1   18.04), with optimization options 2 and optimization for the Broadwell architecture. We measure the execution time using the standard C++

Conclusions

This work’s scope analyzed the use of classifiers to obtain a performance gain when using rules engines such as ModSecurity. At request time, we only evaluate the headers provided, with a particular interest in the URI.

This work shows that machine learning can bring practical results superior to the increase in accuracy (Section 2.3.3) and performance in detecting the exploitation of vulnerabilities. The architecture proposed in 1.4 is one of the ways to provide this efficiency increase in the

CRediT authorship contribution statement

Manoel Domingues Junior: Conceptualization, Methodology, Software, Validation, Investigation, Resources, Data curation, Writing - original draft, Visualization. Nelson F.F. Ebecken: Methodology, Validation, Formal analysis, Writing - review & editing, Supervision, Project administration.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The authors would like to thank Globo Comunicações e Participações for the resources used (Private Dataset), in addition to the availability of time to complete this work. Additionally, the authors recognize Globo’s Information Security team for reviewing the points covered and additional considerations.

Nelson Francisco Favilla Ebecken earned a Ph.D. in Civil Engineering from the Federal University of Rio de Janeiro in 1977. He is currently a Full Professor at the Federal University of Rio de Janeiro. He published 135 articles in professional journals and 361 papers in conference proceedings. Has 19 books published. He guided 135 dissertations and 138 doctoral theses in the areas of Computer Science, Civil Engineering, and Information Science. He received 8 awards or honors. He operates in

References (22)

  • Alemalakra, 2018. xWAF....
  • Belov, A., Zimmerle, F., Hutchings, A., 2018. ngx_http_modsecurity_module.c....
  • Buchwald, H., 2016. Shadow daemon....
  • D’Hoinne, J., Hils, A., Kaur, R., Neiva, C., 2019. Magic quadrant for web application firewalls....
  • F. Ahmad
  • R. Funk et al.

    Anomaly-based web application firewall using HTTP-specific features and one-Class SVM

    Revista Eletrônica Argentina-Brasil de Tecnologias da Informação e da Comunicação

    (2018)
  • Junior, M.D., 2020. Complete comparison of classifieds with variation in vocabulary size....
  • Kanapickas, P., 2011. std::clock - cppreference.com. https://en.cppreference.com/w/cpp/chrono/c/clock. Accessed:...
  • Koechlin, T., Brabez, S., Zin, N., Sabban, M., Lawson, C., 2019. naxsi_runtime.c....
  • J. Liang et al.

    Anomaly-Based web attack detection: A deep learning approach

    Proceedings of the 2017 VI International Conference on Network, Communication and Computing

    (2017)
  • H. Mac et al.

    Detecting attacks on web applications using autoencoder

    Proceedings of the Ninth International Symposium on Information and Communication Technology

    (2018)
  • Cited by (3)

    Nelson Francisco Favilla Ebecken earned a Ph.D. in Civil Engineering from the Federal University of Rio de Janeiro in 1977. He is currently a Full Professor at the Federal University of Rio de Janeiro. He published 135 articles in professional journals and 361 papers in conference proceedings. Has 19 books published. He guided 135 dissertations and 138 doctoral theses in the areas of Computer Science, Civil Engineering, and Information Science. He received 8 awards or honors. He operates in interdisciplinary areas of Engineering and Petroleum Engineering, with an emphasis on Computer Systems. In their professional activities interacted with 161 employees in the co-authorship of scientific papers. In his Lattes, the most frequent terms in the context of scientific, technological, and artistic-cultural are Data Mining, Structures, Offshore Structures, Risk Analysis, Neural Networks, Neural Networks, Nonlinear Analysis, Large Scale Computation, Computational Methods, Finite Element Method and Structural Analysis.

    Manoel Domingues Junior earned a B.Sc. in Electronic and Computer Engineering from the Federal University of Rio de Janeiro (2019). He is currently the Principal Security Engineer at the Globo Comunicaço e Participações SA, substitute professor at the Federal University of Rio de Janeiro and founder of MFS Security Engineering. He is a master’s student in the Civil Engineering Program at Federal University of Rio de Janeiro, working on the use of machine learning techniques to solve transversal engineering and computing problems for IoT and smart cities. He has experience in Computer Science, with an emphasis on Information Security, working mainly on the following topics: information security, access control and systems architecture.

    View full text