An Empirical Study on Log Level Prediction for Multi-Component Systems,IEEE Transactions on Software Engineering

当前位置： X-MOL 学术 › IEEE Trans. Softw. Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

An Empirical Study on Log Level Prediction for Multi-Component Systems
IEEE Transactions on Software Engineering ( IF 6.5 ) Pub Date : 2022-02-25 , DOI: 10.1109/tse.2022.3154672
Youssef Esseddiq Ouatiti ₁ , Mohammed Sayagh ₂ , Noureddine Kerzazi ₃ , Ahmed E. Hassan ₁

Affiliation

Logging statements are used to trace the execution of a software system. Practitioners leverage different logging information (e.g., the content of a log message) to decide for each logging statement an appropriate log level, which is leveraged to adjust the verbosity of logs so that only important log messages are traced. Deciding for the log level can be done differently from one to another component of a multi-component system, such as OpenStack and its 28 components. For example, a component might aim for increasing the verbosity of its log messages, while another component for the same multi-component system might aim at decreasing such a verbosity. Such different logging strategies can exist since each component can be developed and maintained by a different team. While a prior work leveraged an ordinal regression model to recommend the appropriate log level for a new logging statement, their evaluation did not consider the particularities that each component can have within a multi-component system. For instance, their model might not perform well at each component level of a multi-component system. The same model’s interpretability can mislead the developers of each component that has its unique logging strategy. In this paper, we quantify the impact of the particularities of each component of a multi-component system on the performance and interpretability of the log level prediction model of prior work. We observe that the performance of the log level prediction models that are trained at the whole project level (aka., global models) have lower performances (AUC) on 72% to 100% of the components of our five evaluated multi-component systems, compared to the same models when evaluated on the whole multi-component system. We observe that the models that are trained at the component level (aka., local models) statistically outperform the global model on 33% to 77% of the components of our evaluated multi-component systems. Furthermore, we observe that the rankings of the most important features that are obtained from the global models are statistically different from the feature importance rankings of 50% to 87% of the local models of our evaluated multi-component systems. Finally, we observe that 60% and 35% of the Spring and OpenStack components do not have enough data points to train their own local models (aka., data lacking components). Leveraging a peer-local model for such type of components is more promising than using the global model.

中文翻译：

多组件系统对数水平预测的实证研究

日志语句用于跟踪软件系统的执行情况。从业者利用不同的日志信息（例如，日志消息的内容）来为每个日志语句决定适当的日志级别，利用该级别来调整日志的详细程度，以便仅跟踪重要的日志消息。多组件系统（例如 OpenStack 及其 28 个组件）的不同组件可以以不同的方式决定日志级别。例如，一个组件可能旨在增加其日志消息的详细程度，而同一多组件系统的另一个组件可能旨在减少这种详细程度。由于每个组件可以由不同的团队开发和维护，因此可以存在这种不同的日志记录策略。虽然之前的工作利用序数回归模型为新的日志记录语句推荐适当的日志级别，但他们的评估没有考虑多组件系统中每个组件可能具有的特殊性。例如，他们的模型可能在多组件系统的每个组件级别上都表现不佳。同一模型的可解释性可能会误导具有独特日志记录策略的每个组件的开发人员。在本文中，我们量化了多组件系统中每个组件的特殊性对先前工作的日志级别预测模型的性能和可解释性的影响。我们观察到，在整个项目级别训练的日志级别预测模型（也称为全局模型）的性能在我们评估的五个多组件系统的 72% 到 100% 的组件上具有较低的性能 (AUC)，在整个多组件系统上进行评估时与相同模型进行比较。我们观察到，在我们评估的多组件系统的 33% 到 77% 的组件上，在组件级别训练的模型（也称为局部模型）在统计上优于全局模型。此外，我们观察到，从全局模型获得的最重要特征的排名与我们评估的多组件系统的 50% 至 87% 的局部模型的特征重要性排名在统计上有所不同。最后，我们观察到 60% 和 35% 的 Spring 和 OpenStack 组件没有足够的数据点来训练自己的本地模型（也称为缺少组件的数据）。对于此类组件来说，利用对等本地模型比使用全局模型更有前景。

更新日期：2022-02-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11