当前位置: X-MOL 学术arXiv.cs.PF › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Chimbuko: A Workflow-Level Scalable Performance Trace Analysis Tool
arXiv - CS - Performance Pub Date : 2020-08-31 , DOI: arxiv-2008.13742
Sungsoo Ha, Wonyong Jeong, Gyorgy Matyasfalvi, Cong Xie, Kevin Huck, Jong Youl Choi, Abid Malik, Li Tang, Hubertus Van Dam, Line Pouchard, Wei Xu, Shinjae Yoo, Nicholas D'Imperio, Kerstin Kleese Van Dam

Because of the limits input/output systems currently impose on high-performance computing systems, a new generation of workflows that include online data reduction and analysis is emerging. Diagnosing their performance requires sophisticated performance analysis capabilities due to the complexity of execution patterns and underlying hardware, and no tool could handle the voluminous performance trace data needed to detect potential problems. This work introduces Chimbuko, a performance analysis framework that provides real-time, distributed, in situ anomaly detection. Data volumes are reduced for human-level processing without losing necessary details. Chimbuko supports online performance monitoring via a visualization module that presents the overall workflow anomaly distribution, call stacks, and timelines. Chimbuko also supports the capture and reduction of performance provenance. To the best of our knowledge, Chimbuko is the first online, distributed, and scalable workflow-level performance trace analysis framework, and we demonstrate the tool's usefulness on Oak Ridge National Laboratory's Summit system.

中文翻译:

Chimbuko:工作流级可扩展性能跟踪分析工具

由于当前对高性能计算系统施加的输入/输出系统的限制,包括在线数据缩减和分析的新一代工作流正在出现。由于执行模式和底层硬件的复杂性,诊断它们的性能需要复杂的性能分析能力,而且没有任何工具可以处理检测潜在问题所需的大量性能跟踪数据。这项工作介绍了 Chimbuko,这是一种性能分析框架,可提供实时、分布式、原位异常检测。减少了人为处理的数据量,而不会丢失必要的细节。Chimbuko 通过一个可视化模块支持在线性能监控,该模块显示了整个工作流异常分布、调用堆栈和时间线。Chimbuko 还支持捕获和减少性能来源。据我们所知,Chimbuko 是第一个在线、分布式和可扩展的工作流级性能跟踪分析框架,我们在橡树岭国家实验室的 Summit 系统上展示了该工具的实用性。
更新日期:2020-09-01
down
wechat
bug