Self-Supervised Log Parsing,arXiv - CS - Software Engineering

当前位置： X-MOL 学术 › arXiv.cs.SE › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Self-Supervised Log Parsing
arXiv - CS - Software Engineering Pub Date : 2020-03-17 , DOI: arxiv-2003.07905
Sasho Nedelkoski, Jasmin Bogatinovski, Alexander Acker, Jorge Cardoso and Odej Kao

Logs are extensively used during the development and maintenance of software systems. They collect runtime events and allow tracking of code execution, which enables a variety of critical tasks such as troubleshooting and fault detection. However, large-scale software systems generate massive volumes of semi-structured log records, posing a major challenge for automated analysis. Parsing semi-structured records with free-form text log messages into structured templates is the first and crucial step that enables further analysis. Existing approaches rely on log-specific heuristics or manual rule extraction. These are often specialized in parsing certain log types, and thus, limit performance scores and generalization. We propose a novel parsing technique called NuLog that utilizes a self-supervised learning model and formulates the parsing task as masked language modeling (MLM). In the process of parsing, the model extracts summarizations from the logs in the form of a vector embedding. This allows the coupling of the MLM as pre-training with a downstream anomaly detection task. We evaluate the parsing performance of NuLog on 10 real-world log datasets and compare the results with 12 parsing techniques. The results show that NuLog outperforms existing methods in parsing accuracy with an average of 99% and achieves the lowest edit distance to the ground truth templates. Additionally, two case studies are conducted to demonstrate the ability of the approach for log-based anomaly detection in both supervised and unsupervised scenario. The results show that NuLog can be successfully used to support troubleshooting tasks. The implementation is available at https://github.com/nulog/nulog.

中文翻译：

自监督的日志解析

日志在软件系统的开发和维护过程中被广泛使用。它们收集运行时事件并允许跟踪代码执行，从而支持各种关键任务，例如故障排除和故障检测。然而，大型软件系统会生成大量的半结构化日志记录，这对自动化分析构成了重大挑战。将带有自由格式文本日志消息的半结构化记录解析为结构化模板是实现进一步分析的第一步，也是至关重要的一步。现有方法依赖于特定于日志的启发式或手动规则提取。这些通常专门用于解析某些日志类型，因此限制了性能分数和泛化。我们提出了一种称为 NuLog 的新型解析技术，该技术利用自监督学习模型并将解析任务制定为掩码语言建模 (MLM)。在解析过程中，模型以向量嵌入的形式从日志中提取摘要。这允许将 MLM 作为预训练与下游异常检测任务耦合。我们评估 NuLog 在 10 个真实日志数据集上的解析性能，并将结果与 12 种解析技术进行比较。结果表明，NuLog 在解析准确度方面优于现有方法，平均为 99%，并实现了与地面实况模板的最低编辑距离。此外，还进行了两个案例研究，以证明该方法在有监督和无监督场景中进行基于日志的异常检测的能力。结果表明，NuLog 可以成功地用于支持故障排除任务。该实现可在 https://github.com/nulog/nulog 上获得。

更新日期：2020-03-20

点击分享查看原文

点击收藏

阅读更多本刊最新论文