当前位置: X-MOL 学术IEEE Trans. Netw. Serv. Manag. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Workflow-aware Automatic Fault Diagnosis for Microservice-based Applications with Statistics
IEEE Transactions on Network and Service Management ( IF 5.3 ) Pub Date : 2020-12-01 , DOI: 10.1109/tnsm.2020.3022028
Tao Wang , Wenbo Zhang , Jiwei Xu , Zeyu Gu

Microservice architectures bring many benefits, e.g., faster delivery, improved scalability, and greater autonomy, so they are widely adopted to develop and operate Internet-based applications. How to effectively diagnose the faults of applications with lots of dynamic microservices has become a key to guarantee applications’ performance and reliability. As a microservice performs various behaviors in different workflows of processing requests, existing approaches often cannot accurately locate the root cause of an application with interactive microservices in a dynamic deployment environment. We propose a workflow-aware automatic fault diagnosis approach for microservice-based applications with statistics. We characterize traces across microservices with calling trees, and then learn trace patterns as baselines. For the faults affecting the workflows of processing requests, we estimate the workflows’ anomaly degrees, and then locate the microservices causing anomalies by comparing the difference between current traces and learned baselines with tree edit distance. For performance anomalies causing significantly increased response time, we employ principal component analysis to extract suspicious microservices with large fluctuation in response time. Finally, we evaluate our approach on three typical microservice-based applications with a series of experiments. The results show that our approach can accurately locate the microservices causing anomalies.

中文翻译:

基于统计的微服务应用的工作流感知自动故障诊断

微服务架构带来了许多好处,例如更快的交付、更高的可扩展性和更大的自主权,因此它们被广泛用于开发和运行基于 Internet 的应用程序。如何有效诊断具有大量动态微服务的应用程序的故障已成为保障应用程序性能和可靠性的关键。由于微服务在处理请求的不同工作流中执行各种行为,现有方法往往无法准确定位动态部署环境中具有交互式微服务的应用程序的根本原因。我们为基于微服务的具有统计功能的应用程序提出了一种工作流感知自动故障诊断方法。我们使用调用树来表征跨微服务的跟踪,然后学习跟踪模式作为基线。对于影响处理请求工作流的故障,我们估计工作流的异常程度,然后通过比较当前trace和学习到的基线之间的差异和树编辑距离来定位导致异常的微服务。对于导致响应时间显着增加的性能异常,我们通过主成分分析提取响应时间波动较大的可疑微服务。最后,我们通过一系列实验在三个典型的基于微服务的应用程序上评估我们的方法。结果表明,我们的方法可以准确定位导致异常的微服务。然后通过比较当前跟踪和学习基线之间的差异与树编辑距离来定位导致异常的微服务。对于导致响应时间显着增加的性能异常,我们通过主成分分析提取响应时间波动较大的可疑微服务。最后,我们通过一系列实验在三个典型的基于微服务的应用程序上评估我们的方法。结果表明,我们的方法可以准确定位导致异常的微服务。然后通过比较当前跟踪和学习基线之间的差异与树编辑距离来定位导致异常的微服务。对于导致响应时间显着增加的性能异常,我们通过主成分分析提取响应时间波动较大的可疑微服务。最后,我们通过一系列实验在三个典型的基于微服务的应用程序上评估我们的方法。结果表明,我们的方法可以准确定位导致异常的微服务。我们通过一系列实验在三个典型的基于微服务的应用程序上评估我们的方法。结果表明,我们的方法可以准确定位导致异常的微服务。我们通过一系列实验在三个典型的基于微服务的应用程序上评估我们的方法。结果表明,我们的方法可以准确定位导致异常的微服务。
更新日期:2020-12-01
down
wechat
bug