Model of Pseudo-Random Sequences Generated by Encryption and Compression Algorithms

Kozachok, A. V.; Spirin, A. A.

doi:10.1134/S0361768821040058

Model of Pseudo-Random Sequences Generated by Encryption and Compression Algorithms

Published: 30 July 2021

Volume 47, pages 249–260, (2021)
Cite this article

Programming and Computer Software Aims and scope Submit manuscript

A. V. Kozachok¹ &
A. A. Spirin¹

118 Accesses
2 Citations
Explore all metrics

Abstract

Classification of high-entropy data sources is one of the key problems in the field of information security. Currently, there are many methods for classification of encrypted and compressed sequences; however, they mostly use digital signatures or service information found in the headers of the containers used to store or transfer data. This paper analyzes the state of research in the field of classification of encrypted and compressed data and develops a model of encrypted and compressed sequences. Our experiments demonstrate a high accuracy of the proposed approach, which allows us to conclude that the methods for classifying encrypted and compressed data used in our study have been improved. The approach can be implemented in data leak prevention systems or corporate email systems to analyze the attachments sent outside the controlled perimeter of a government agency or enterprise.

Purpose of the research – develop a model of pseudo-random sequences generated by data encryption and compression algorithms that most accurately reflects statistical properties of these sequences.

Methods of the research – statistical data analysis, mathematical statistics, and machine learning.

Result of the research – An analysis of the studies aimed at solving the problem of classification for encrypted and compressed sequences in the field of information security is carried out. A model of pseudo-random sequences generated by encryption and compression algorithms is developed taking into account their statistical features: distribution of bytes and distribution of subsequences of limited length, which constitute a new probabilistic space. The choice of the statistical features used in the pseudo-random sequence model is justified. Experiments for determining the hyperparameters of the classifier on a dataset generated from encrypted and compressed files without taking their headers into account are carried out. The constraints used in the pseudo-random sequence model, namely, the length of pseudo-random sequences (approximately 600 Kb), are defined. Experiments for determining the effect of the statistical features used in the model on classification accuracy are conducted. The proposed approach allows encrypted and compressed data to be classified with an accuracy of 0.97.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

An Encrypted File Detection Algorithm

Article 01 December 2021

A. V. Kozachok, V. I. Kozachok & A. A. Spirin

EnCoD: Distinguishing Compressed and Encrypted File Fragments

Reliable detection of compressed and encrypted data

Article Open access 24 July 2022

Fabio De Gaspari, Dorjan Hitaj, … Luigi V. Mancini

Notes

https://www.openssl.org. Accessed February 8, 2021.
https://www.win-rar.com. Accessed February 8, 2021.
https://www.7-zip.org. Accessed February 8, 2021.
A statistical test suite for random and pseudorandom number generators for cryptographic applications, National Institute of Standards and Technology (NIST), 2010. https://csrc.nist.gov/ publications/detail/sp/800-22/rev-1a/final. Accessed February 8, 2021.

REFERENCES

Le, D.C., Zincir-Heywood, N., and Heywood, M.I., Analyzing data granularity levels for insider threat detection using machine learning, IEEE Trans. Network Serv. Manage., 2020, vol. 17, no. 1, pp. 30–44.
Article Google Scholar
Bhatiaa, A., Bahugunaa, A.A., Tiwaria, K., Haribabua, K., and Vishwakarmab, D., A survey on analyzing encrypted network traffic of mobile devices, arXiv preprint, 2020.
Mamun, M.S.I., Ghorbani, A.A., and Stakhanova, N., An entropy based encrypted traffic classifier, Lecture Notes Comput. Sci. 2015, vol. 9543. https://doi.org/10.1007/978-3-319-29814-6_23
Shen, M., Wei, M., Zhu, L., and Wang, M., Classification of encrypted traffic with second-order Markov chains and application attribute bigrams, IEEE Trans. Inf. Forensics Secur., 2017, vol. 12, no. 8, pp. 1830–1843. https://doi.org/10.1109/TIFS.2017.2692682
Article Google Scholar
Zhang, Z., Kang, C., Fu, P., Cao, Z., Li, Z., and Xiong, G., Metric learning with statistical features for network traffic classification, Proc. IEEE 36th Int. Performance Computing and Communications Conf. (IPCCC), San Diego, 2017, pp. 1–7. https://doi.org/10.1109/PCCC.2017.8280467
Yang, Y., Kang, C., Gou, G., Li, Z., and Xiong, G., TLS/SSL encrypted traffic classification with autoencoder and convolutional neural network, Proc. IEEE 20th Int. Conf. High Performance Computing and Communications; Proc. IEEE 16th Int. Conf. Smart City; Proc. IEEE 4th Int. Conf. Data Science and Systems (HPCC/SmartCity/DSS), Exeter, United Kingdom, 2018, pp. 362–369. https://doi.org/10.1109/HPCC/SmartCity/DSS.2018.00079
Chen, Y., Zang, T., Zhang, Y., Zhouz, Y., and Wang, Y., Rethinking encrypted traffic classification: A multi-attribute associated fingerprint approach, Proc. IEEE 27th Int. Conf. Network Protocols (ICNP), Chicago, 2019, pp. 1–11. https://doi.org/10.1109/ICNP.2019.8888043
Wang, P., Chen, X., Ye, F., and Sun, Z., A survey of techniques for mobile service encrypted traffic classification using deep learning, IEEE Access, 2019, vol. 7, pp. 54024–54033. https://doi.org/10.1109/ACCESS.2019.2912896
Article Google Scholar
Tang, Z., Zeng, X., and Sheng, Y., Entropy-based feature extraction algorithm for encrypted and non-encrypted compressed traffic classification, Int. J. ICIC, 2019, vol. 15, no. 3, pp. 845–860. https://doi.org/10.24507/ijicic.15.03.845
Article Google Scholar
Obasi, T.C., Encrypted network traffic classification using ensemble learning techniques, PhD Dissertation, Carleton Univ., 2020. https://doi.org/10.22215/etd/2020-14171
Choudhury, P., Kumar, K.P., Nandi, S., and Athithan, G., An empirical approach towards characterization of encrypted and unencrypted VoIP traffic, Multimedia Tools Appl., 2020, vol. 79, nos. 1–2, pp. 603–631. https://doi.org/10.1007/s11042-019-08088-w
Article Google Scholar
Yao, Z., Ge, J., Wu, Y., Lin, X., He, R., and Ma, Y., Encrypted traffic classification based on Gaussian mixture models and hidden Markov models, J. Network Comput. Appl., 2020, vol. 166, p. 102711. https://doi.org/10.1016/j.jnca.2020.102711
Article Google Scholar
Baldini, G., Hernandez-Ramos, J.L., Nowak, S., Neisse, R., and Nowak, M., Mitigation of privacy threats due to encrypted traffic analysis through a policy-based framework and mud profiles, Symmetry, 2020, vol. 12, no. 9, p. 1576. https://doi.org/10.3390/sym12091576
Article Google Scholar
Shen, M., Liu, Y., Zhu, L., Xu, K., Du, X., and Guizani, N., Optimizing feature selection for efficient encrypted traffic classification: A systematic approach, IEEE Network, 2020, vol. 34, no. 4, pp. 20–27. https://doi.org/10.1109/MNET.011.1900366
Article Google Scholar
Panchenko, A., Lanze, F., Pennekamp, J., Engel, T., Zinnen, A., Henze, M., and Wehrle, K., Website fingerprinting at Internet scale, Proc. Network and Distributed System Security Symp., 2016, pp. 21–24. https://doi.org/10.14722/ndss.2016.23477
Wei, S., Ding, Y., and Han, X., TDSC: Two-stage DDoS detection and defense system based on clustering, Proc. 47th Annu. IEEE/IFIP Int. Conf. Dependable Systems and Networks Workshops (DSN-W), 2017, pp. 101–102. https://doi.org/10.1109/DSN-W.2017.11
Sahoo, K.S., Tripathy, B.K., Naik, K., Ramasubbareddy, S., Balusamy, B., Khari, M., and Burgos, D., An evolutionary SVM model for DDoS attack detection in software defined networks, IEEE Access, 2020, vol. 8, pp. 132502–132513. https://doi.org/10.1109/ACCESS.2020.3009733
Article Google Scholar
Grechishnikov, E.V., Dobryshin, M.M., Kochedykov, S.S., and Novoselcev, V.I., Algorithmic model of functioning of the system to detect and counter cyber attacks on virtual private network, J. Phys.: Conf. Ser., 2019, vol. 1203, no. 1, p. 012064. https://doi.org/10.1088/1742-6596/1203/1/012064
Article Google Scholar
Dobryshin, M.M., Proposal for improving systems for countering DDoS attacks, Telekommunikatsii, 2018, no. 10, pp. 32–38.
Dobryshin, M.M., Spirin, A.A., and Laktionov, A.D., Proposals for early detection of Botnet destructive effects on computer communication networks, Telekommunikatsii, 2020, no. 12, pp. 25–29.
Zhu, L., Tang, X., Shen, M., Du, X., and Guizani, M., Privacy-preserving DDoS attack detection using cross-domain traffic in software defined networks, IEEE J. Sel. Areas Commun., 2018, vol. 36, no. 3, pp. 628–643. https://doi.org/10.1109/JSAC.2018.2815442
Article Google Scholar
Wang, F., Quach, T.T., Wheeler, J., Aimone, J.B., and James, C.D., Sparse coding for n-gram feature extraction and training for file fragment classification, IEEE Trans. Inf. Forensics Secur., 2018, vol. 13, no. 10, pp. 2553–2562. https://doi.org/10.1109/TIFS.2018.2823697
Article Google Scholar
Karampidis, K. and Papadourakis, G., File type identification-computational intelligence for digital forensics, J. Digital Forensics, Secur. Law, 2017, vol. 12, no. 2, p. 6. https://doi.org/10.15394/jdfsl.2017.1472
Article Google Scholar
Karampidis, K., Kavallieratou, E., and Papadourakis, G., Comparison of classification algorithms for file type detection: A digital forensics perspective, Polybits, 2017, vol. 56, pp. 15–20. https://doi.org/10.17562/PB-56-2
Article Google Scholar
Kozachok, A.V., Development of a heuristic mechanism for detection of malware programs based on hidden Markov models, Autom. Control Comput. Sci., 2018, vol. 52, no. 8, pp. 1117–1123. https://doi.org/10.3103/S0146411618080345
Article Google Scholar
Srinivas, M., Nayak, A., and Bhatt, A., Forged file detection and steganographic content identification (FFDASCI) using deep learning techniques, 2019. http://ceur-ws.org/Vol-2380/paper_142.pdf.
Konaray, S.K., Toprak, A., Pek, G.M., Akçekoce, H., and Kılınç, D., Detecting file types using machine learning algorithms, Proc. Innovations in Intelligent Systems and Applications Conf., 2019, pp. 1–4. https://doi.org/10.1109/ASYU48272.2019.8946393
Casino, F., Choo, K.K.R., and Patsakis, C., Hedge: Efficient traffic classification of encrypted and compressed packets, IEEE Trans. Inf. Forensics Secur., 2019, vol. 14, no. 11, pp. 2916–2926. https://doi.org/10.1109/TIFS.2019.2911156
Article Google Scholar
De Gaspari, F., Hitaj, D., Pagnotta, G., De Carli, L., and Mancini, L.V., EnCoD: Distinguishing compressed and encrypted file fragments, Proc. Int. Conf. Network and System Security, Springer, 2020, pp. 42–62. https://doi.org/10.1007/978-3-030-65745-1_3
Mousavi, S.S., Detecting disk sectors data types using hidden Markov model, Proc. 17th Int. ISC Conf. Information Security and Cryptology (ISCISC), 2020, pp. 60–64. https://doi.org/10.1109/ISCISC51277.2020.9261906
Cheng, L., Liu, F., and Yao, D., Enterprise data breach: Causes, challenges, prevention, and future directions, Wiley Interdiscip. Rev.: Data Mining Knowl. Discovery, 2017, vol. 7, no. 5.
Doroud, H., et al., Speeding-up DPI traffic classification with chaining, Proc. IEEE Global Communications Conf. (GLOBECOM), 2018.
Hahn, D., Apthorpe, N., and Feamster, N., Detecting compressed cleartext traffic from consumer Internet of Things devices, arXiv preprint, 2018.
Wood, D., Apthorpe, N., and Feamster, N., Cleartext data transmissions in consumer IoT medical devices, Proc. Workshop Internet of Things Security and Privacy, 2017.
Scaife, N., Carter, H., Traynor, P., and Butler, K.R., Cryptolock (and drop it): Stopping ransomware attacks on user data, Proc. IEEE 36th Int. Conf. Distributed Computing Systems (ICDCS), 2016, pp. 303–312. https://doi.org/10.1109/ICDCS.2016.46
Raff, E., Zak, R., Cox, R., Sylvester, J., Yacci, P., Ward, R., and Nicholas, C., An investigation of byte n-gram features for malware classification, J. Comput. Virology Hacking Tech., 2018, vol. 14, no. 1, pp. 1–20. https://doi.org/10.1007/s11416-016-0283-1
Article Google Scholar
Kozachok, A.V. and Spirin, A.A., Algorithm for classification of pseudo-random sequences, Vestn. Voronezh. Gos. Univ., Ser.: Sist. Anal. Inf. Tekhnol., 2020, no. 1, pp. 87–98. https://doi.org/10.17308/sait.2020.1/2595
Kozachok, A.V., Spirin, A.A., and Golembiovskaya, O.M., Algorithm for classification of pseudo-random sequences based on random forest, Dokl. Tomsk. Gos. Univ. Sist. Upr. Radioelektron., 2020, vol. 23, no. 3, pp. 55–60.
Google Scholar
Kozachok, A.V. and Kozachok, V.I., Construction and evaluation of the new heuristic malware detection mechanism based on executable files static analysis, J. Comput. Virology Hacking Tech., 2018, vol. 14, no. 3, pp. 225–231. https://doi.org/10.1007/s11416-017-0309-3
Article MATH Google Scholar

Download references

Funding

This work was supported by the Ministry of Education and Science of the Russian Federation, project no. 18/2020.

Author information

Authors and Affiliations

Academy of Federal Guard Service of the Russian Federation, ul. Priborostroitel’naya 35, Orel, Russia
A. V. Kozachok & A. A. Spirin

Authors

A. V. Kozachok
View author publications
You can also search for this author in PubMed Google Scholar
A. A. Spirin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to A. V. Kozachok or A. A. Spirin.

Additional information

Translated by Yu. Kornienko

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kozachok, A.V., Spirin, A.A. Model of Pseudo-Random Sequences Generated by Encryption and Compression Algorithms. Program Comput Soft 47, 249–260 (2021). https://doi.org/10.1134/S0361768821040058

Download citation

Received: 03 March 2021
Revised: 16 March 2021
Accepted: 17 March 2021
Published: 30 July 2021
Issue Date: July 2021
DOI: https://doi.org/10.1134/S0361768821040058

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Model of Pseudo-Random Sequences Generated by Encryption and Compression Algorithms

Abstract

Access this article

Similar content being viewed by others

An Encrypted File Detection Algorithm

EnCoD: Distinguishing Compressed and Encrypted File Fragments

Reliable detection of compressed and encrypted data

Notes

REFERENCES

Funding

Author information

Authors and Affiliations

Corresponding authors

Additional information

Rights and permissions

About this article

Cite this article

Navigation

Model of Pseudo-Random Sequences Generated by Encryption and Compression Algorithms

Abstract

Access this article

Similar content being viewed by others

An Encrypted File Detection Algorithm

EnCoD: Distinguishing Compressed and Encrypted File Fragments

Reliable detection of compressed and encrypted data

Notes

REFERENCES

Funding

Author information

Authors and Affiliations

Corresponding authors

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation