GPA: A GPU Performance Advisor Based on Instruction Sampling,arXiv - CS - Performance

当前位置： X-MOL 学术 › arXiv.cs.PF › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

GPA: A GPU Performance Advisor Based on Instruction Sampling
arXiv - CS - Performance Pub Date : 2020-09-09 , DOI: arxiv-2009.04061
Keren Zhou, Xiaozhu Meng, Ryuichi Sai, John Mellor-Crummey

Developing efficient GPU kernels can be difficult because of the complexity of GPU architectures and programming models. Existing performance tools only provide coarse-grained suggestions at the kernel level, if any. In this paper, we describe GPA, a performance advisor for NVIDIA GPUs that suggests potential code optimization opportunities at a hierarchy of levels, including individual lines, loops, and functions. To relieve users of the burden of interpreting performance counters and analyzing bottlenecks, GPA uses data flow analysis to approximately attribute measured instruction stalls to their root causes and uses information about a program's structure and the GPU to match inefficiency patterns with suggestions for optimization. To quantify each suggestion's potential benefits, we developed PC sampling-based performance models to estimate its speedup. Our experiments with benchmarks and applications show that GPA provides an insightful report to guide performance optimization. Using GPA, we obtained speedups on a Volta V100 GPU ranging from 1.03$\times$ to 3.86$\times$, with a geometric mean of 1.22$\times$.

中文翻译：

GPA：基于指令采样的 GPU 性能顾问

由于 GPU 架构和编程模型的复杂性，开发高效的 GPU 内核可能很困难。现有的性能工具仅提供内核级别的粗粒度建议（如果有）。在本文中，我们描述了 GPA，它是 NVIDIA GPU 的性能顾问，它在层次结构（包括单个行、循环和函数）上建议潜在的代码优化机会。为了减轻用户解释性能计数器和分析瓶颈的负担，GPA 使用数据流分析将测量到的指令停顿大致归因于其根本原因，并使用有关程序结构和 GPU 的信息将低效模式与优化建议相匹配。为了量化每个建议的潜在好处，我们开发了基于 PC 采样的性能模型来估计其加速比。我们对基准测试和应用程序的实验表明，GPA 提供了有见地的报告来指导性能优化。使用 GPA，我们在 Volta V100 GPU 上获得了从 1.03$\times$ 到 3.86$\times$ 的加速，几何平均值为 1.22$\times$。

更新日期：2020-09-10

点击分享查看原文

点击收藏

阅读更多本刊最新论文