当前位置: X-MOL 学术ACM Trans. Comput. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Pivot Tracing
ACM Transactions on Computer Systems ( IF 1.5 ) Pub Date : 2018-12-05 , DOI: 10.1145/3208104
Jonathan Mace 1 , Ryan Roelke 2 , Rodrigo Fonseca 2
Affiliation  

Monitoring and troubleshooting distributed systems is notoriously difficult; potential problems are complex, varied, and unpredictable. The monitoring and diagnosis tools commonly used today—logs, counters, and metrics—have two important limitations: what gets recorded is defined a priori , and the information is recorded in a component- or machine-centric way, making it extremely hard to correlate events that cross these boundaries. This article presents Pivot Tracing, a monitoring framework for distributed systems that addresses both limitations by combining dynamic instrumentation with a novel relational operator: the happened-before join. Pivot Tracing gives users, at runtime, the ability to define arbitrary metrics at one point of the system, while being able to select, filter, and group by events meaningful at other parts of the system, even when crossing component or machine boundaries. We have implemented a prototype of Pivot Tracing for Java-based systems and evaluate it on a heterogeneous Hadoop cluster comprising HDFS, HBase, MapReduce, and YARN. We show that Pivot Tracing can effectively identify a diverse range of root causes such as software bugs, misconfiguration, and limping hardware. We show that Pivot Tracing is dynamic, extensible, and enables cross-tier analysis between inter-operating applications, with low execution overhead.

中文翻译:

枢轴追踪

众所周知,分布式系统的监控和故障排除非常困难。潜在的问题是复杂的、多样的和不可预测的。今天常用的监控和诊断工具——日志、计数器和指标——有两个重要的限制:记录的内容是定义的先验,并且信息是以组件或机器为中心的方式记录的,因此很难关联跨越这些边界的事件。本文介绍了 Pivot Tracing,这是一个分布式系统的监控框架,它通过将动态检测与新颖的关系运算符(即发生前连接)相结合来解决这两个限制。Pivot Tracing 使用户能够在运行时在系统的某一点定义任意指标,同时能够选择、过滤和按对系统其他部分有意义的事件进行分组,即使跨越组件或机器边界也是如此。我们已经为基于 Java 的系统实现了 Pivot Tracing 原型,并在包含 HDFS、HBase、MapReduce 和 YARN 的异构 Hadoop 集群上对其进行了评估。我们展示了 Pivot Tracing 可以有效地识别各种根本原因,例如软件错误、错误配置和跛行硬件。我们展示了 Pivot Tracing 是动态的、可扩展的,并且可以在互操作应用程序之间进行跨层分析,并且执行开销很低。
更新日期:2018-12-05
down
wechat
bug