当前位置: X-MOL 学术J. Netw. Comput. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Confidence guided anomaly detection model for anti-concept drift in dynamic logs
Journal of Network and Computer Applications ( IF 8.7 ) Pub Date : 2020-04-14 , DOI: 10.1016/j.jnca.2020.102659
Xueshuo Xie , Zongming Jin , Jiming Wang , Lei Yang , Ye Lu , Tao Li

Log data records system state and runtime behaviors, and is usually used to diagnose system failures and detect anomalies. However, the accuracy of log-based anomaly detection algorithms will reduce dramatically in dynamic logs since the system more complex than ever before, a phenomenon known as concept drift. In this paper, we design a confidence-guide anomaly detection model that combines multiple algorithms, called Multi-CAD. We first propose a statistical value p_value to measure the non-conformity between logs and establish a link in the new log and previous logs, and can also choose multiple suitable algorithms as the non-conformity measure to calculate scores for combined detection instead of to make a decision. And then, we design a confidence-guided parameter adjustment method to anti-concept drift in dynamic logs and update the score set with the corresponding label from a trusted result that contains a label, non-conformity score, and confidence by a feedback mechanism as the previous experience for the following-up detection. Finally, we demonstrate that Multi-CAD will make a balance performance in precision rate, recall rate, and F_measure, and detect actual anomalies on multiple datasets. An extensive set of experiment results highlight that Multi-CAD will increase almost 20% on average in recall rate and F_measure compared with four typical algorithms on the HDFS benchmark dataset, where it achieves 98.2% in precision rate, 95.2% in recall rate, and 96.7% in F_measure.



中文翻译:

动态测井中反概念漂移的置信引导异常检测模型

日志数据记录系统状态和运行时行为,通常用于诊断系统故障和检测异常。但是,基于日志的异常检测算法的准确性将在动态日志中急剧降低,因为该系统比以往任何时候都更加复杂,这种现象被称为概念漂移。在本文中,我们设计了一个置信指南异常检测模型,该模型结合了多种算法,称为Multi-CAD。我们首先提出一个统计值p_value来测量日志之间的不符合,并在新日志和以前的日志中建立链接,还可以选择多种合适的算法作为不符合度量,以计算分数以进行组合检测而不是进行决定。然后,我们设计了一种置信度引导的参数调整方法来防止动态日志中的概念偏移,并通过包含反馈,以前的经验的反馈机制从包含标签,不合格分数和置信度的可信结果中更新带有相应标签的分数集用于后续检测。最后,我们证明Multi-CAD将在准确率,查全率和F_measure方面取得平衡,并检测多个数据集上的实际异常。大量的实验结果表明,与HDFS基准数据集上的四种典型算法相比,Multi-CAD的查全率和F_measure平均提高了近20%,其准确率达到98.2%,查全率达到95.2%,并且F_measure的96.7%。不合格分数和反馈机制的置信度,作为后续检测的先前经验。最后,我们证明Multi-CAD将在准确率,查全率和F_measure方面取得平衡,并检测多个数据集上的实际异常。大量的实验结果表明,与HDFS基准数据集上的四种典型算法相比,Multi-CAD的查全率和F_measure平均提高了近20%,其准确率达到98.2%,查全率达到95.2%,并且F_measure的96.7%。不合格分数和反馈机制的置信度,作为后续检测的先前经验。最后,我们证明Multi-CAD将在准确率,查全率和F_measure方面取得平衡,并检测多个数据集上的实际异常。大量的实验结果表明,与HDFS基准数据集上的四种典型算法相比,Multi-CAD的查全率和F_measure平均提高了近20%,其准确率达到98.2%,查全率达到95.2%,并且F_measure的96.7%。

更新日期:2020-04-14
down
wechat
bug