当前位置: X-MOL 学术Form. Methods Syst. Des. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Annotation guided collection of context-sensitive parallel execution profiles
Formal Methods in System Design ( IF 0.7 ) Pub Date : 2019-10-09 , DOI: 10.1007/s10703-019-00341-0
Zachary Benavides , Keval Vora , Rajiv Gupta , Xiangyu Zhang

Studying the relative behavior of an application’s threads is critical to identifying performance bottlenecks and understanding their root causes. We present context-sensitive parallel (CSP) execution profiles, that capture the relative behavior of threads in terms of the user selected code regions they execute. CSPs can be analyzed to compute execution times spent by the application in interesting behavior states. To capture execution context, code regions of interest can be given static and dynamic names using a versatile set of annotations. The CSP divides the execution time of a multithreaded application into a sequence of time intervals called frames, during which no thread transitions between code regions. By appropriate selection and naming of code regions, the user can obtain a CSP that captures all occurrences of desired behavior states. We provide the user with a powerful query language to facilitate the analysis of CSPs. Our implementation for collection of CSPs of C++ programs has low overhead and high accuracy. Collection of CSPs of full executions of 12 Parsec programs incurred overhead of at most 7% in execution time. The accuracy of CSPs was validated in the context of common performance problems such as load imbalance in pipeline stages and the presence of straggler threads.

中文翻译:

上下文相关并行执行配置文件的注释引导集合

研究应用程序线程的相关行为对于识别性能瓶颈和了解其根本原因至关重要。我们提出了上下文相关并行 (CSP) 执行配置文件,根据用户选择的它们执行的代码区域捕获线程的相对行为。可以分析 CSP 以计算应用程序在有趣的行为状态中花费的执行时间。为了捕获执行上下文,可以使用一组通用的注释为感兴趣的代码区域指定静态和动态名称。CSP 将多线程应用程序的执行时间划分为一系列称为帧的时间间隔,在此期间代码区域之间没有线程转换。通过适当的选择和命名代码区域,用户可以获得捕获所需行为状态的所有发生的CSP。我们为用户提供了强大的查询语言,以方便对 CSP 进行分析。我们用于收集 C++ 程序 CSP 的实现具有低开销和高精度。12 个 Parsec 程序的完整执行的 CSP 的集合在执行时间中产生最多 7% 的开销。CSP 的准确性在常见性能问题的背景下得到验证,例如管道阶段的负载不平衡和存在滞后线程。
更新日期:2019-10-09
down
wechat
bug