Enhancing the analysis of software failures in cloud computing systems with deep learning,Journal of Systems and Software

当前位置： X-MOL 学术 › J. Syst. Softw. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Enhancing the analysis of software failures in cloud computing systems with deep learning
Journal of Systems and Software ( IF 3.5 ) Pub Date : 2021-07-12 , DOI: 10.1016/j.jss.2021.111043
Domenico Cotroneo ₁ , Luigi De Simone ₁ , Pietro Liguori ₁ , Roberto Natella ₁

Affiliation

Identifying the failure modes of cloud computing systems is a difficult and time-consuming task, due to the growing complexity of such systems, and the large volume and noisiness of failure data. This paper presents a novel approach for analyzing failure data from cloud systems, in order to relieve human analysts from manually fine-tuning the data for feature engineering. The approach leverages Deep Embedded Clustering (DEC), a family of unsupervised clustering algorithms based on deep learning, which uses an autoencoder to optimize data dimensionality and inter-cluster variance. We applied the approach in the context of the OpenStack cloud computing platform, both on the raw failure data and in combination with an anomaly detection pre-processing algorithm. The results show that the performance of the proposed approach, in terms of purity of clusters, is comparable to, or in some cases even better than manually fine-tuned clustering, thus avoiding the need for deep domain knowledge and reducing the effort to perform the analysis. In all cases, the proposed approach provides better performance than unsupervised clustering when no feature engineering is applied to the data. Moreover, the distribution of failure modes from the proposed approach is closer to the actual frequency of the failure modes.

中文翻译：

通过深度学习加强对云计算系统中软件故障的分析

由于云计算系统的复杂性不断增加，故障数据量大且嘈杂，因此识别云计算系统的故障模式是一项艰巨且耗时的任务。本文提出了一种分析来自云系统的故障数据的新方法，以减轻人类分析师手动微调特征工程数据的负担。该方法利用深度嵌入式聚类 (DEC)，这是一个基于深度学习的无监督聚类算法系列，它使用自动编码器来优化数据维度和聚类间方差。我们在 OpenStack 云计算平台的上下文中应用了该方法，包括原始故障数据和异常检测预处理算法的结合。结果表明，所提出的方法的性能，在聚类纯度方面，可与手动微调聚类相媲美，甚至在某些情况下甚至优于手动微调聚类，从而避免了对深入领域知识的需求并减少了执行分析的工作量。在所有情况下，当没有对数据应用特征工程时，所提出的方法比无监督聚类提供更好的性能。此外，所提出方法的故障模式分布更接近故障模式的实际频率。

更新日期：2021-07-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>