Skip to main content
Log in

Model of Pseudo-Random Sequences Generated by Encryption and Compression Algorithms

  • Published:
Programming and Computer Software Aims and scope Submit manuscript

Abstract

Classification of high-entropy data sources is one of the key problems in the field of information security. Currently, there are many methods for classification of encrypted and compressed sequences; however, they mostly use digital signatures or service information found in the headers of the containers used to store or transfer data. This paper analyzes the state of research in the field of classification of encrypted and compressed data and develops a model of encrypted and compressed sequences. Our experiments demonstrate a high accuracy of the proposed approach, which allows us to conclude that the methods for classifying encrypted and compressed data used in our study have been improved. The approach can be implemented in data leak prevention systems or corporate email systems to analyze the attachments sent outside the controlled perimeter of a government agency or enterprise.

Purpose of the research – develop a model of pseudo-random sequences generated by data encryption and compression algorithms that most accurately reflects statistical properties of these sequences.

Methods of the research – statistical data analysis, mathematical statistics, and machine learning.

Result of the research – An analysis of the studies aimed at solving the problem of classification for encrypted and compressed sequences in the field of information security is carried out. A model of pseudo-random sequences generated by encryption and compression algorithms is developed taking into account their statistical features: distribution of bytes and distribution of subsequences of limited length, which constitute a new probabilistic space. The choice of the statistical features used in the pseudo-random sequence model is justified. Experiments for determining the hyperparameters of the classifier on a dataset generated from encrypted and compressed files without taking their headers into account are carried out. The constraints used in the pseudo-random sequence model, namely, the length of pseudo-random sequences (approximately 600 Kb), are defined. Experiments for determining the effect of the statistical features used in the model on classification accuracy are conducted. The proposed approach allows encrypted and compressed data to be classified with an accuracy of 0.97.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.
Fig. 8.

Similar content being viewed by others

Notes

  1. https://www.openssl.org. Accessed February 8, 2021.

  2. https://www.win-rar.com. Accessed February 8, 2021.

  3. https://www.7-zip.org. Accessed February 8, 2021.

  4. A statistical test suite for random and pseudorandom number generators for cryptographic applications, National Institute of Standards and Technology (NIST), 2010. https://csrc.nist.gov/ publications/detail/sp/800-22/rev-1a/final. Accessed February 8, 2021.

REFERENCES

  1. Le, D.C., Zincir-Heywood, N., and Heywood, M.I., Analyzing data granularity levels for insider threat detection using machine learning, IEEE Trans. Network Serv. Manage., 2020, vol. 17, no. 1, pp. 30–44.

    Article  Google Scholar 

  2. Bhatiaa, A., Bahugunaa, A.A., Tiwaria, K., Haribabua, K., and Vishwakarmab, D., A survey on analyzing encrypted network traffic of mobile devices, arXiv preprint, 2020.

  3. Mamun, M.S.I., Ghorbani, A.A., and Stakhanova, N., An entropy based encrypted traffic classifier, Lecture Notes Comput. Sci. 2015, vol. 9543. https://doi.org/10.1007/978-3-319-29814-6_23

  4. Shen, M., Wei, M., Zhu, L., and Wang, M., Classification of encrypted traffic with second-order Markov chains and application attribute bigrams, IEEE Trans. Inf. Forensics Secur., 2017, vol. 12, no. 8, pp. 1830–1843. https://doi.org/10.1109/TIFS.2017.2692682

    Article  Google Scholar 

  5. Zhang, Z., Kang, C., Fu, P., Cao, Z., Li, Z., and Xiong, G., Metric learning with statistical features for network traffic classification, Proc. IEEE 36th Int. Performance Computing and Communications Conf. (IPCCC), San Diego, 2017, pp. 1–7. https://doi.org/10.1109/PCCC.2017.8280467

  6. Yang, Y., Kang, C., Gou, G., Li, Z., and Xiong, G., TLS/SSL encrypted traffic classification with autoencoder and convolutional neural network, Proc. IEEE 20th Int. Conf. High Performance Computing and Communications; Proc. IEEE 16th Int. Conf. Smart City; Proc. IEEE 4th Int. Conf. Data Science and Systems (HPCC/SmartCity/DSS), Exeter, United Kingdom, 2018, pp. 362–369. https://doi.org/10.1109/HPCC/SmartCity/DSS.2018.00079

  7. Chen, Y., Zang, T., Zhang, Y., Zhouz, Y., and Wang, Y., Rethinking encrypted traffic classification: A multi-attribute associated fingerprint approach, Proc. IEEE 27th Int. Conf. Network Protocols (ICNP), Chicago, 2019, pp. 1–11. https://doi.org/10.1109/ICNP.2019.8888043

  8. Wang, P., Chen, X., Ye, F., and Sun, Z., A survey of techniques for mobile service encrypted traffic classification using deep learning, IEEE Access, 2019, vol. 7, pp. 54024–54033. https://doi.org/10.1109/ACCESS.2019.2912896

    Article  Google Scholar 

  9. Tang, Z., Zeng, X., and Sheng, Y., Entropy-based feature extraction algorithm for encrypted and non-encrypted compressed traffic classification, Int. J. ICIC, 2019, vol. 15, no. 3, pp. 845–860. https://doi.org/10.24507/ijicic.15.03.845

    Article  Google Scholar 

  10. Obasi, T.C., Encrypted network traffic classification using ensemble learning techniques, PhD Dissertation, Carleton Univ., 2020. https://doi.org/10.22215/etd/2020-14171

  11. Choudhury, P., Kumar, K.P., Nandi, S., and Athithan, G., An empirical approach towards characterization of encrypted and unencrypted VoIP traffic, Multimedia Tools Appl., 2020, vol. 79, nos. 1–2, pp. 603–631. https://doi.org/10.1007/s11042-019-08088-w

    Article  Google Scholar 

  12. Yao, Z., Ge, J., Wu, Y., Lin, X., He, R., and Ma, Y., Encrypted traffic classification based on Gaussian mixture models and hidden Markov models, J. Network Comput. Appl., 2020, vol. 166, p. 102711. https://doi.org/10.1016/j.jnca.2020.102711

    Article  Google Scholar 

  13. Baldini, G., Hernandez-Ramos, J.L., Nowak, S., Neisse, R., and Nowak, M., Mitigation of privacy threats due to encrypted traffic analysis through a policy-based framework and mud profiles, Symmetry, 2020, vol. 12, no. 9, p. 1576. https://doi.org/10.3390/sym12091576

    Article  Google Scholar 

  14. Shen, M., Liu, Y., Zhu, L., Xu, K., Du, X., and Guizani, N., Optimizing feature selection for efficient encrypted traffic classification: A systematic approach, IEEE Network, 2020, vol. 34, no. 4, pp. 20–27. https://doi.org/10.1109/MNET.011.1900366

    Article  Google Scholar 

  15. Panchenko, A., Lanze, F., Pennekamp, J., Engel, T., Zinnen, A., Henze, M., and Wehrle, K., Website fingerprinting at Internet scale, Proc. Network and Distributed System Security Symp., 2016, pp. 21–24. https://doi.org/10.14722/ndss.2016.23477

  16. Wei, S., Ding, Y., and Han, X., TDSC: Two-stage DDoS detection and defense system based on clustering, Proc. 47th Annu. IEEE/IFIP Int. Conf. Dependable Systems and Networks Workshops (DSN-W), 2017, pp. 101–102. https://doi.org/10.1109/DSN-W.2017.11

  17. Sahoo, K.S., Tripathy, B.K., Naik, K., Ramasubbareddy, S., Balusamy, B., Khari, M., and Burgos, D., An evolutionary SVM model for DDoS attack detection in software defined networks, IEEE Access, 2020, vol. 8, pp. 132502–132513. https://doi.org/10.1109/ACCESS.2020.3009733

    Article  Google Scholar 

  18. Grechishnikov, E.V., Dobryshin, M.M., Kochedykov, S.S., and Novoselcev, V.I., Algorithmic model of functioning of the system to detect and counter cyber attacks on virtual private network, J. Phys.: Conf. Ser., 2019, vol. 1203, no. 1, p. 012064. https://doi.org/10.1088/1742-6596/1203/1/012064

    Article  Google Scholar 

  19. Dobryshin, M.M., Proposal for improving systems for countering DDoS attacks, Telekommunikatsii, 2018, no. 10, pp. 32–38.

  20. Dobryshin, M.M., Spirin, A.A., and Laktionov, A.D., Proposals for early detection of Botnet destructive effects on computer communication networks, Telekommunikatsii, 2020, no. 12, pp. 25–29.

  21. Zhu, L., Tang, X., Shen, M., Du, X., and Guizani, M., Privacy-preserving DDoS attack detection using cross-domain traffic in software defined networks, IEEE J. Sel. Areas Commun., 2018, vol. 36, no. 3, pp. 628–643. https://doi.org/10.1109/JSAC.2018.2815442

    Article  Google Scholar 

  22. Wang, F., Quach, T.T., Wheeler, J., Aimone, J.B., and James, C.D., Sparse coding for n-gram feature extraction and training for file fragment classification, IEEE Trans. Inf. Forensics Secur., 2018, vol. 13, no. 10, pp. 2553–2562. https://doi.org/10.1109/TIFS.2018.2823697

    Article  Google Scholar 

  23. Karampidis, K. and Papadourakis, G., File type identification-computational intelligence for digital forensics, J. Digital Forensics, Secur. Law, 2017, vol. 12, no. 2, p. 6. https://doi.org/10.15394/jdfsl.2017.1472

    Article  Google Scholar 

  24. Karampidis, K., Kavallieratou, E., and Papadourakis, G., Comparison of classification algorithms for file type detection: A digital forensics perspective, Polybits, 2017, vol. 56, pp. 15–20. https://doi.org/10.17562/PB-56-2

    Article  Google Scholar 

  25. Kozachok, A.V., Development of a heuristic mechanism for detection of malware programs based on hidden Markov models, Autom. Control Comput. Sci., 2018, vol. 52, no. 8, pp. 1117–1123. https://doi.org/10.3103/S0146411618080345

    Article  Google Scholar 

  26. Srinivas, M., Nayak, A., and Bhatt, A., Forged file detection and steganographic content identification (FFDASCI) using deep learning techniques, 2019. http://ceur-ws.org/Vol-2380/paper_142.pdf.

  27. Konaray, S.K., Toprak, A., Pek, G.M., Akçekoce, H., and Kılınç, D., Detecting file types using machine learning algorithms, Proc. Innovations in Intelligent Systems and Applications Conf., 2019, pp. 1–4. https://doi.org/10.1109/ASYU48272.2019.8946393

  28. Casino, F., Choo, K.K.R., and Patsakis, C., Hedge: Efficient traffic classification of encrypted and compressed packets, IEEE Trans. Inf. Forensics Secur., 2019, vol. 14, no. 11, pp. 2916–2926. https://doi.org/10.1109/TIFS.2019.2911156

    Article  Google Scholar 

  29. De Gaspari, F., Hitaj, D., Pagnotta, G., De Carli, L., and Mancini, L.V., EnCoD: Distinguishing compressed and encrypted file fragments, Proc. Int. Conf. Network and System Security, Springer, 2020, pp. 42–62. https://doi.org/10.1007/978-3-030-65745-1_3

  30. Mousavi, S.S., Detecting disk sectors data types using hidden Markov model, Proc. 17th Int. ISC Conf. Information Security and Cryptology (ISCISC), 2020, pp. 60–64. https://doi.org/10.1109/ISCISC51277.2020.9261906

  31. Cheng, L., Liu, F., and Yao, D., Enterprise data breach: Causes, challenges, prevention, and future directions, Wiley Interdiscip. Rev.: Data Mining Knowl. Discovery, 2017, vol. 7, no. 5.

  32. Doroud, H., et al., Speeding-up DPI traffic classification with chaining, Proc. IEEE Global Communications Conf. (GLOBECOM), 2018.

  33. Hahn, D., Apthorpe, N., and Feamster, N., Detecting compressed cleartext traffic from consumer Internet of Things devices, arXiv preprint, 2018.

  34. Wood, D., Apthorpe, N., and Feamster, N., Cleartext data transmissions in consumer IoT medical devices, Proc. Workshop Internet of Things Security and Privacy, 2017.

  35. Scaife, N., Carter, H., Traynor, P., and Butler, K.R., Cryptolock (and drop it): Stopping ransomware attacks on user data, Proc. IEEE 36th Int. Conf. Distributed Computing Systems (ICDCS), 2016, pp. 303–312. https://doi.org/10.1109/ICDCS.2016.46

  36. Raff, E., Zak, R., Cox, R., Sylvester, J., Yacci, P., Ward, R., and Nicholas, C., An investigation of byte n-gram features for malware classification, J. Comput. Virology Hacking Tech., 2018, vol. 14, no. 1, pp. 1–20. https://doi.org/10.1007/s11416-016-0283-1

    Article  Google Scholar 

  37. Kozachok, A.V. and Spirin, A.A., Algorithm for classification of pseudo-random sequences, Vestn. Voronezh. Gos. Univ., Ser.: Sist. Anal. Inf. Tekhnol., 2020, no. 1, pp. 87–98. https://doi.org/10.17308/sait.2020.1/2595

  38. Kozachok, A.V., Spirin, A.A., and Golembiovskaya, O.M., Algorithm for classification of pseudo-random sequences based on random forest, Dokl. Tomsk. Gos. Univ. Sist. Upr. Radioelektron., 2020, vol. 23, no. 3, pp. 55–60.

    Google Scholar 

  39. Kozachok, A.V. and Kozachok, V.I., Construction and evaluation of the new heuristic malware detection mechanism based on executable files static analysis, J. Comput. Virology Hacking Tech., 2018, vol. 14, no. 3, pp. 225–231. https://doi.org/10.1007/s11416-017-0309-3

    Article  MATH  Google Scholar 

Download references

Funding

This work was supported by the Ministry of Education and Science of the Russian Federation, project no. 18/2020.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to A. V. Kozachok or A. A. Spirin.

Additional information

Translated by Yu. Kornienko

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kozachok, A.V., Spirin, A.A. Model of Pseudo-Random Sequences Generated by Encryption and Compression Algorithms. Program Comput Soft 47, 249–260 (2021). https://doi.org/10.1134/S0361768821040058

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S0361768821040058

Navigation