Abstract
Outlier detection methods are now used extensively, particularly in systems for detecting internal intrusions, in medicine, and in systems for detecting extremism in public political discussions on forums and social media. The aim of this work is to consider a fuzzy method of detecting outliers, based on elliptic clustering in the higher-dimensional space of attributes and using the Mahalanobis metrics for calculating the distances between objects and the center of a cluster. A procedure developed by the authors is used to find the optimum values of metaparameters of this algorithm. The classification of both individual events and complete sessions of user activity is considered, using an algorithm based on Welch’s t-statistics. The proposed procedures display a high quality of operation in solving two important problems of the stream analysis of complex data structures: the authentication of users by keystroke dynamics, and detecting extremist information in web text messages.
Similar content being viewed by others
References
M. Kazachuk, A. Kovalchuk, I. Mashechkin, I. Orpanen, M. Petrovskiy, I. Popov, and R. Zakliakov, “One-class models for continuous authentication based on keystroke dynamics,” in Proceedings of the International Conference on Intelligent Data Engineering and Automated Learning (Springer, Cham, 2016), pp. 416–425.
E. W. T. Ngai et al., “The application of data mining techniques in financial fraud detection: a classification framework and an academic review of literature,” Decision Support Syst. 50, 559–569 (2011).
M. Petrovskiy, D. Tsarev, and I. Pospelova, “Pattern based information retrieval approach to discover extremist information on the Internet,” in Proceedings of the International Conference on Mining Intelligence and Knowledge Exploration (Springer, Cham, 2017), pp. 240–249.
A. Ben-Hur et al., “Support vector clustering,” J. Mach. Learning Res. 2, 125–137 (2001).
H. Hoffmann, “Kernel PCA for novelty detection,” Pattern Recognit. 40, 863–874 (2007).
M. Petrovskiy, “A fuzzy kernel-based method for real-time network intrusion detection,” in Proceedings of the International Workshop on Innovative Internet Community Systems (Springer, Berlin, Heidelberg, 2003), pp. 189–200.
J. C. Bezdek et al., “Local convergence analysis of a grouped variable version of coordinate descent,” J. Optim. Theory Appl. 54, 471–477 (1987).
R. A. J. Everitt and P. W. McOwan, “Java-based internet biometric authentication system,” IEEE Trans. Pattern Anal. Mach. Intell. 25, 1166–1172 (2003).
B. L. Welch, “The generalization of student’s’ problem when several different population variances are involved,” Biometrika 34, 28–35 (1947).
J. V. Monaco, N. Bakelman, S. H. Cha, and C. C. Tappert, “Developing a keystroke biometric system for continual authentication of computer users,” in Proceedings of the 2012 European Intelligence and Security Informatics Conference EISIC (IEEE, 2012), pp. 210–216.
C. C. Tappert, M. Villani, and S. H. Cha, “Keystroke biometric identification and authentication on long-text input,” in Behavioral Biometrics For Human Identification: Intelligent Applications (IGI Global, 2010), pp. 342–367.
O. Gorokhov, M. Petrovskiy, and I. Mashechkin, “Convolutional neural networks for unsupervised anomaly detection in text data,” in Proceedings of the International Conference on Intelligent Data Engineering and Automated Learning (Springer, Cham, 2017), pp. 500–507.
Funding
This work was supported by the Russian Foundation for Basic Research, project no. 16-29-09555.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Russian Text © The Author(s), 2019, published in Vestnik Moskovskogo Universiteta, Seriya 15: Vychislitel’naya Matematika i Kibernetika, 2019, No. 3, pp. 17–28.
About this article
Cite this article
Kazachuk, M.A., Petrovskiy, M.I., Mashechkin, I.V. et al. Outlier Detection in Complex Structured Event Streams. MoscowUniv.Comput.Math.Cybern. 43, 101–111 (2019). https://doi.org/10.3103/S0278641919030038
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.3103/S0278641919030038