当前位置: X-MOL 学术Int. J. Softw. Eng. Knowl. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Log-Based Anomaly Detection with the Improved K-Nearest Neighbor
International Journal of Software Engineering and Knowledge Engineering ( IF 0.9 ) Pub Date : 2020-03-23 , DOI: 10.1142/s0218194020500114
Bingming Wang 1 , Shi Ying 1 , Guoli Cheng 1 , Rui Wang 1 , Zhe Yang 1 , Bo Dong 1
Affiliation  

Logs play an important role in the maintenance of large-scale systems. The number of logs which indicate normal (normal logs) differs greatly from the number of logs that indicate anomalies (abnormal logs), and the two types of logs have certain differences. To automatically obtain faults by K-Nearest Neighbor (KNN) algorithm, an outlier detection method with high accuracy, is an effective way to detect anomalies from logs. However, logs have the characteristics of large scale and very uneven samples, which will affect the results of KNN algorithm on log-based anomaly detection. Thus, we propose an improved KNN algorithm-based method which uses the existing mean-shift clustering algorithm to efficiently select the training set from massive logs. Then we assign different weights to samples with different distances, which reduces the negative effect of unbalanced distribution of the log samples on the accuracy of KNN algorithm. By comparing experiments on log sets from five supercomputers, the results show that the method we proposed can be effectively applied to log-based anomaly detection, and the accuracy, recall rate and F measure with our method are higher than those of traditional keyword search method.

中文翻译:

使用改进的 K 最近邻进行基于日志的异常检测

日志在大型系统的维护中发挥着重要作用。表示正常的日志数(正常日志)与表示异常的日志数(异常日志)差别很大,两种日志有一定的区别。通过K-Nearest Neighbor(KNN)算法自动获取故障是一种高精度的异常检测方法,是从日志中检测异常的有效方法。但是,日志具有规模大、样本极不均匀的特点,这会影响KNN算法对基于日志的异常检测的结果。因此,我们提出了一种改进的基于 KNN 算法的方法,该方法使用现有的均值偏移聚类算法从海量日志中有效地选择训练集。然后我们为不同距离的样本分配不同的权重,减少了对数样本分布不平衡对KNN算法精度的负面影响。通过对五台超级计算机的日志集进行对比实验,结果表明,我们提出的方法可以有效地应用于基于日志的异常检测,并且该方法的准确率、召回率和 F 度量均高于传统的关键词搜索方法。 .
更新日期:2020-03-23
down
wechat
bug