当前位置: X-MOL 学术Wirel. Commun. Mob. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Valid Probabilistic Anomaly Detection Models for System Logs
Wireless Communications and Mobile Computing ( IF 2.146 ) Pub Date : 2020-11-16 , DOI: 10.1155/2020/8827185
Chunbo Liu 1 , Lanlan Pan 2 , Zhaojun Gu 1 , Jialiang Wang 2 , Yitong Ren 2 , Zhi Wang 3
Affiliation  

System logs can record the system status and important events during system operation in detail. Detecting anomalies in the system logs is a common method for modern large-scale distributed systems. Yet threshold-based classification models used for anomaly detection output only two values: normal or abnormal, which lacks probability of estimating whether the prediction results are correct. In this paper, a statistical learning algorithm Venn-Abers predictor is adopted to evaluate the confidence of prediction results in the field of system log anomaly detection. It is able to calculate the probability distribution of labels for a set of samples and provide a quality assessment of predictive labels to some extent. Two Venn-Abers predictors LR-VA and SVM-VA have been implemented based on Logistic Regression and Support Vector Machine, respectively. Then, the differences among different algorithms are considered so as to build a multimodel fusion algorithm by Stacking. And then a Venn-Abers predictor based on the Stacking algorithm called Stacking-VA is implemented. The performances of four types of algorithms (unimodel, Venn-Abers predictor based on unimodel, multimodel, and Venn-Abers predictor based on multimodel) are compared in terms of validity and accuracy. Experiments are carried out on a log dataset of the Hadoop Distributed File System (HDFS). For the comparative experiments on unimodels, the results show that the validities of LR-VA and SVM-VA are better than those of the two corresponding underlying models. Compared with the underlying model, the accuracy of the SVM-VA predictor is better than that of LR-VA predictor, and more significantly, the recall rate increases from 81% to 94%. In the case of experiments on multiple models, the algorithm based on Stacking multimodel fusion is significantly superior to the underlying classifier. The average accuracy of Stacking-VA is larger than 0.95, which is more stable than the prediction results of LR-VA and SVM-VA. Experimental results show that the Venn-Abers predictor is a flexible tool that can make accurate and valid probability predictions in the field of system log anomaly detection.

中文翻译:

系统日志的有效概率异常检测模型

系统日志可以详细记录系统运行期间的系统状态和重要事件。检测系统日志中的异常是现代大规模分布式系统的一种常用方法。然而,用于异常检测的基于阈值的分类模型仅输出两个值:正常或异常,这缺乏估计预测结果是否正确的可能性。本文采用统计学习算法Venn-Abers预测器对系统日志异常检测领域的预测结果进行置信度评估。它能够为一组样本计算标签的概率分布,并在一定程度上提供预测标签的质量评估。基于Logistic回归和支持向量机分别实现了两个Venn-Abers预测变量LR-VA和SVM-VA。然后,考虑不同算法之间的差异,从而通过Stacking构建多模型融合算法。然后实现了基于Stacking-VA的Stacking算法的Venn-Abers预测器。从有效性和准确性方面比较了四种算法(单模型,基于单模型的Venn-Abers预测器,多模型和基于多模型的Venn-Abers预测器)的性能。实验是在Hadoop分布式文件系统(HDFS)的日志数据集上进行的。对于单模型的比较实验,结果表明LR-VA和SVM-VA的有效性优于两个相应基础模型的有效性。与基础模型相比,SVM-VA预测器的准确性优于LR-VA预测器,并且更重要的是,召回率从81%增加到94%。在对多个模型进行实验的情况下,基于Stacking多模型融合的算法明显优于基础分类器。Stacking-VA的平均精度大于0.95,比LR-VA和SVM-VA的预测结果更稳定。实验结果表明,Venn-Abers预测器是一种灵活的工具,可以在系统日志异常检测领域进行准确有效的概率预测。
更新日期:2020-11-16
down
wechat
bug