TELESTO: A Graph Neural Network Model for Anomaly Classification in Cloud Services,arXiv - CS - Software Engineering

当前位置： X-MOL 学术 › arXiv.cs.SE › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

TELESTO: A Graph Neural Network Model for Anomaly Classification in Cloud Services
arXiv - CS - Software Engineering Pub Date : 2021-02-25 , DOI: arxiv-2102.12877
Dominik Scheinert, Alexander Acker

Deployment, operation and maintenance of large IT systems becomes increasingly complex and puts human experts under extreme stress when problems occur. Therefore, utilization of machine learning (ML) and artificial intelligence (AI) is applied on IT system operation and maintenance - summarized in the term AIOps. One specific direction aims at the recognition of re-occurring anomaly types to enable remediation automation. However, due to IT system specific properties, especially their frequent changes (e.g. software updates, reconfiguration or hardware modernization), recognition of reoccurring anomaly types is challenging. Current methods mainly assume a static dimensionality of provided data. We propose a method that is invariant to dimensionality changes of given data. Resource metric data such as CPU utilization, allocated memory and others are modelled as multivariate time series. The extraction of temporal and spatial features together with the subsequent anomaly classification is realized by utilizing TELESTO, our novel graph convolutional neural network (GCNN) architecture. The experimental evaluation is conducted in a real-world cloud testbed deployment that is hosting two applications. Classification results of injected anomalies on a cassandra database node show that TELESTO outperforms the alternative GCNNs and achieves an overall classification accuracy of 85.1%. Classification results for the other nodes show accuracy values between 85% and 60%.

中文翻译：

TELESTO：用于云服务中异常分类的图神经网络模型

大型IT系统的部署，运营和维护变得越来越复杂，一旦出现问题，人类专家将承受极大的压力。因此，将机器学习（ML）和人工智能（AI）的利用应用于IT系统的操作和维护-在术语AIOps中进行了概述。一个特定的方向旨在识别重复出现的异常类型，以实现修复自动化。但是，由于IT系统的特定属性，尤其是它们的频繁更改（例如，软件更新，重新配置或硬件现代化），因此识别重复出现的异常类型非常具有挑战性。当前的方法主要假设所提供数据的静态维数。我们提出了一种不变于给定数据的维数变化的方法。资源指标数据，例如CPU利用率，分配的内存和其他内存建模为多元时间序列。利用我们的新型图卷积神经网络（GCNN）体系结构TELESTO，可以实现时空特征的提取以及后续的异常分类。实验评估是在托管两个应用程序的真实云测试平台部署中进行的。在cassandra数据库节点上注入的异常的分类结果表明，TELESTO优于替代的GCNN，并且实现了85.1％的总体分类精度。其他节点的分类结果显示准确度值介于85％和60％之间。我们新颖的图卷积神经网络（GCNN）架构。实验评估是在托管两个应用程序的真实云测试平台部署中进行的。在cassandra数据库节点上注入的异常的分类结果表明，TELESTO优于替代的GCNN，并且实现了85.1％的总体分类精度。其他节点的分类结果显示准确度值介于85％和60％之间。我们新颖的图卷积神经网络（GCNN）架构。实验评估是在托管两个应用程序的真实云测试平台部署中进行的。在cassandra数据库节点上注入的异常的分类结果表明，TELESTO优于替代的GCNN，并且实现了85.1％的总体分类精度。其他节点的分类结果显示准确度值介于85％和60％之间。

更新日期：2021-02-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文