Irregular Accesses Reorder Unit: Improving GPGPU Memory Coalescing for Graph-Based Workloads,arXiv - CS - Hardware Architecture

当前位置： X-MOL 学术 › arXiv.cs.AR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Irregular Accesses Reorder Unit: Improving GPGPU Memory Coalescing for Graph-Based Workloads
arXiv - CS - Hardware Architecture Pub Date : 2020-07-14 , DOI: arxiv-2007.07131
Albert Segura, Jose-Maria Arnau, Antonio Gonzalez

GPGPU architectures have become established as the dominant parallelization and performance platform achieving exceptional popularization and empowering domains such as regular algebra, machine learning, image detection and self-driving cars. However, irregular applications struggle to fully realize GPGPU performance as a result of control flow divergence and memory divergence due to irregular memory access patterns. To ameliorate these issues, programmers are obligated to carefully consider architecture features and devote significant efforts to modify the algorithms with complex optimization techniques, which shift programmers priorities yet struggle to quell the shortcomings. We show that in graph-based GPGPU irregular applications these inefficiencies prevail, yet we find that it is possible to relax the strict relationship between thread and data processed to empower new optimizations. Based on this key idea, we propose the Irregular accesses Reorder Unit (IRU), a novel hardware extension tightly integrated in the GPGPU pipeline. The IRU reorders data processed by the threads on irregular accesses which significantly improves memory coalescing, and allows increased performance and energy efficiency. Additionally, the IRU is capable of filtering and merging duplicated irregular access which further improves graph-based irregular applications. Programmers can easily utilize the IRU with a simple API, or compiler optimized generated code with the extended ISA instructions provided. We evaluate our proposal for state-of-the-art graph-based algorithms and a wide selection of applications. Results show that the IRU achieves a memory coalescing improvement of 1.32x and a 46% reduction in the overall traffic in the memory hierarchy, which results in 1.33x and 13% improvement in performance and energy savings respectively, while incurring in a small 5.6% area overhead.

中文翻译：

不规则访问重新排序单元：改进基于图形的工作负载的 GPGPU 内存合并

GPGPU 架构已成为主要的并行化和性能平台，实现了非凡的普及和赋能领域，例如常规代数、机器学习、图像检测和自动驾驶汽车。然而，由于不规则的内存访问模式导致控制流发散和内存发散，不规则应用程序难以完全实现 GPGPU 性能。为了改善这些问题，程序员有义务仔细考虑架构特性，并投入大量精力用复杂的优化技术修改算法，这改变了程序员的优先级，但又难以弥补缺点。我们表明，在基于图的 GPGPU 不规则应用程序中，这些低效率普遍存在，但是我们发现可以放松线程和处理数据之间的严格关系，以实现新的优化。基于这一关键思想，我们提出了不规则访问重排序单元 (IRU)，这是一种紧密集成在 GPGPU 流水线中的新型硬件扩展。IRU 在不规则访问时对线程处理的数据进行重新排序，这显着改善了内存合并，并提高了性能和能源效率。此外，IRU 能够过滤和合并重复的不规则访问，这进一步改进了基于图的不规则应用。程序员可以通过简单的 API 轻松使用 IRU，或者通过提供的扩展 ISA 指令编译器优化生成的代码。我们评估了我们对最先进的基于图的算法和广泛的应用程序选择的建议。

更新日期：2020-07-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文