当前位置: X-MOL 学术J. Grid Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Automated Analysis of Distributed Tracing: Challenges and Research Directions
Journal of Grid Computing ( IF 5.5 ) Pub Date : 2021-02-25 , DOI: 10.1007/s10723-021-09551-5
Andre Bento , Jaime Correia , Ricardo Filipe , Filipe Araujo , Jorge Cardoso

Microservice-based architectures are gaining popularity for their benefits in software development. Distributed tracing can be used to help operators maintain observability in this highly distributed context, and find problems such as latency, and analyse their context and root cause. However, exploring and working with distributed tracing data is sometimes difficult due to its complexity and application specificity, volume of information and lack of tools. The most common and general tools available for this kind of data, focus on trace-level human-readable data visualisation. Unfortunately, these tools do not provide good ways to abstract, navigate, filter and analyse tracing data. Additionally, they do not automate or aid with trace analysis, relying on administrators to do it themselves. In this paper we propose using tracing data to extract service metrics, dependency graphs and work-flows with the objective of detecting anomalous services and operation patterns. We implemented and published open source prototype tools to process tracing data, conforming to the OpenTracing standard, and developed anomaly detection methods. We validated our tools and methods against real data provided by a major cloud provider. Results show that there is an underused wealth of actionable information that can be extracted from both metric and morphological aspects derived from tracing. In particular, our tools were able to detect anomalous behaviour and situate it both in terms of involved services, work-flows and time-frame. Furthermore, we identified some limitations of the OpenTracing format—as well as the industry accepted tracing abstractions—, and provide suggestions to test trace quality and enhance the standard.



中文翻译:

分布式跟踪的自动分析:挑战和研究方向

基于微服务的体系结构因其在软件开发中的优势而越来越受欢迎。分布式跟踪可用于帮助操作员在这种高度分布式的上下文中保持可观察性,并发现诸如延迟之类的问题,并分析其上下文和根本原因。但是,由于其复杂性和应用程序的特殊性,信息量以及缺乏工具,有时很难探索和使用分布式跟踪数据。可用于此类数据的最常见和通用工具集中在痕迹级人类可读数据的可视化上。不幸的是,这些工具没有提供抽象,导航,过滤和分析跟踪数据的好方法。此外,他们不依靠跟踪管理员来自动化或协助跟踪分析。在本文中,我们建议使用跟踪数据来提取服务指标,依赖关系图和工作流,以检测异常服务和操作模式。我们实施并发布了开放源代码原型工具来处理跟踪数据(符合OpenTracing标准),并开发了异常检测方法。我们根据主要云提供商提供的真实数据验证了我们的工具和方法。结果表明,存在大量未充分利用的可操作信息,这些信息可从跟踪得出的度量和形态方面中提取。特别是,我们的工具能够检测异常行为并将其置于所涉及的服务,工作流程和时间范围内。此外,

更新日期:2021-02-25
down
wechat
bug