GPGPU Performance Estimation With Core and Memory Frequency Scaling,IEEE Transactions on Parallel and Distributed Systems

当前位置： X-MOL 学术 › IEEE Trans. Parallel Distrib. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

GPGPU Performance Estimation With Core and Memory Frequency Scaling
IEEE Transactions on Parallel and Distributed Systems ( IF 5.6 ) Pub Date : 2020-06-24 , DOI: 10.1109/tpds.2020.3004623
Qiang Wang , Xiaowen Chu

Contemporary graphics processing units (GPUs) support dynamic voltage and frequency scaling to balance computational performance and energy consumption. However, accurate and straightforward performance estimation for a given GPU kernel under different frequency settings is still lacking for real hardware, which is essential to determine the best frequency configuration for energy saving. In this article, we reveal a fine-grained analytical model to estimate the execution time of GPU kernels with both core and memory frequency scaling. Compared to the cycle-level simulators, which are too slow to apply on real hardware, our model only needs simple and one-off micro-benchmarks to extract a set of hardware parameters and kernel performance counters without any source code analysis. Our experimental results show that the proposed performance model can capture the kernel performance scaling behaviors under different frequency settings and achieve decent accuracy (average errors of 3.85, 8.6, 8.82, and 8.83 percent on a set of 20 GPU kernels with four modern Nvidia GPUs).

中文翻译：

通过核心和内存频率缩放进行 GPGPU 性能评估

现代图形处理单元 (GPU) 支持动态电压和频率缩放，以平衡计算性能和能耗。然而，真实硬件仍然缺乏对给定 GPU 内核在不同频率设置下的准确、直接的性能估计，这对于确定节能的最佳频率配置至关重要。在本文中，我们揭示了一个细粒度的分析模型，用于估计具有核心和内存频率缩放的 GPU 内核的执行时间。与在实际硬件上应用速度太慢的周期级模拟器相比，我们的模型只需要简单且一次性的微基准来提取一组硬件参数和内核性能计数器，而无需任何源代码分析。我们的实验结果表明，所提出的性能模型可以捕获不同频率设置下的内核性能扩展行为，并实现不错的精度（在具有四个现代 Nvidia GPU 的一组 20 个 GPU 内核上，平均误差为 3.85%、8.6%、8.82% 和 8.83%）。

更新日期：2020-06-24

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11