当前位置: X-MOL 学术Int. J. Parallel. Program › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Efficient Methods for Trace Analysis Parallelization
International Journal of Parallel Programming ( IF 0.9 ) Pub Date : 2019-02-09 , DOI: 10.1007/s10766-019-00631-4
Fabien Reumont-Locke , Naser Ezzati-Jivan , Michel R. Dagenais

Tracing provides a low-impact, high-resolution way to observe the execution of a system. As the amount of parallelism in traced systems increases, so does the data generated by the trace. Most trace analysis tools work in a single thread, which hinders their performance as the scale of data increases. In this paper, we explore parallelization as an approach to speedup system trace analysis. We propose a solution which uses the inherent aspects of the CTF trace format to create balanced and parallelizable workloads. Our solution takes into account key factors of parallelization, such as good load balancing, low synchronization overhead and an efficient resolution of data dependencies. We also propose an algorithm to detect and resolve data dependencies during trace analysis, with minimal locking and synchronization. Using this approach, we implement three different trace analysis programs: event counting, CPU usage analysis and I/O usage analysis, to assess the scalability in terms of parallel efficiency. The parallel implementations achieve parallel efficiency above 56% with 32 cores, which translates to a speedup of 18 times the serial speed, when running the parallel trace analyses and using trace data stored on consumer-grade solid state storage devices. We also show the scalability and potential of our approach by measuring the effect of future improvements to trace decoding on parallel efficiency.

中文翻译:

跟踪分析并行化的有效方法

跟踪提供了一种低影响、高分辨率的方式来观察系统的执行情况。随着被跟踪系统中并行量的增加,跟踪生成的数据也会增加。大多数跟踪分析工具在单个线程中工作,随着数据规模的增加,这会阻碍它们的性能。在本文中,我们探索并行化作为加速系统跟踪分析的方法。我们提出了一种解决方案,该解决方案使用 CTF 跟踪格式的固有方面来创建平衡和可并行化的工作负载。我们的解决方案考虑了并行化的关键因素,例如良好的负载平衡、低同步开销和有效解决数据依赖关系。我们还提出了一种算法,用于在跟踪分析期间检测和解决数据依赖性,并具有最少的锁定和同步。使用这种方法,我们实现了三种不同的跟踪分析程序:事件计数、CPU 使用情况分析和 I/O 使用情况分析,以评估并行效率方面的可扩展性。当运行并行跟踪分析并使用存储在消费级固态存储设备上的跟踪数据时,并行实现在 32 个内核的情况下实现了 56% 以上的并行效率,这意味着串行速度提高了 18 倍。我们还通过测量未来跟踪解码改进对并行效率的影响来展示我们方法的可扩展性和潜力。当运行并行跟踪分析并使用存储在消费级固态存储设备上的跟踪数据时,这意味着串行速度提高了 18 倍。我们还通过测量未来跟踪解码改进对并行效率的影响来展示我们方法的可扩展性和潜力。当运行并行跟踪分析并使用存储在消费级固态存储设备上的跟踪数据时,这意味着串行速度提高了 18 倍。我们还通过测量未来跟踪解码改进对并行效率的影响来展示我们方法的可扩展性和潜力。
更新日期:2019-02-09
down
wechat
bug