IBOM: An Integrated and Balanced On-Chip Memory for High Performance GPGPUs,IEEE Transactions on Parallel and Distributed Systems

当前位置： X-MOL 学术 › IEEE Trans. Parallel Distrib. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

IBOM: An Integrated and Balanced On-Chip Memory for High Performance GPGPUs
IEEE Transactions on Parallel and Distributed Systems ( IF 5.6 ) Pub Date : 2018-03-01 , DOI: 10.1109/tpds.2017.2773516
Jianfei Wang , Qin Wang , Li Jiang , Chao Li , Xiaoyao Liang , Naifeng Jing

GPGPU accelerated computing has revolutionized a broad range of applications. To serve between the ever-growing computing capability and external memory, the on-chip memory is becoming increasingly important to GPGPU performance for general-purpose computing. Inherited from the traditional CPUs, however, the contemporary GPGPU on-chip memory design is suboptimal to the SIMT (single instruction, multiple threads) execution. In particular, the on-chip first-level data (L1D) cache thrashing, resulting from insufficient capacity and imbalanced usage, leads to a low hit rate and limits the overall performance. In this study, we reform the contemporary on-chip memory design and propose an integrated and balanced on-chip memory (IBOM) architecture for high-performance GPGPUs. It first virtually enlarges the L1D cache size by an integrated architecture that exploits the under-utilized register file (RF) with lightweight ISA, compiler and microarchitecture supports. Then with sufficient capacity, it is able to improve the cache usage by a set balancing technique that exploits the under-utilized set resources. In our proposed IBOM design, the register and cache accesses are amenable to normal pipeline operations with simple changes. It adequately exploits the size inversion in GPGPU on-chip memory, and enables optimized utilization of the precious resources for higher performance and energy efficiency with even smaller on-chip memory size. The experiment results demonstrate that the proposed IBOM design can offer an average of 29.6 percent increase in L1D hit rate and in turn 3X performance improvement for the cache-sensitive applications.

中文翻译：

IBOM：用于高性能 GPGPU 的集成且平衡的片上存储器

GPGPU 加速计算已经彻底改变了广泛的应用程序。为了在不断增长的计算能力和外部存储器之间提供服务，片上存储器对于通用计算的 GPGPU 性能变得越来越重要。然而，从传统 CPU 继承而来的当代 GPGPU 片上内存设计对于 SIMT（单指令多线程）执行来说并不是最佳的。尤其是由于容量不足和使用不平衡导致的片上一级数据（L1D）缓存抖动，导致命中率低并限制了整体性能。在这项研究中，我们改革了当代的片上存储器设计，并为高性能 GPGPU 提出了一种集成且平衡的片上存储器 (IBOM) 架构。它首先通过集成架构虚拟扩大 L1D 缓存大小，该架构利用轻量级 ISA、编译器和微架构支持利用未充分利用的寄存器文件 (RF)。然后在有足够容量的情况下，它能够通过利用未充分利用的集合资源的集合平衡技术来提高缓存使用率。在我们提议的 IBOM 设计中，寄存器和缓存访问可以通过简单的更改进行正常的流水线操作。它充分利用了 GPGPU 片上内存中的大小反转，并能够优化利用宝贵的资源，以更小的片上内存大小实现更高的性能和能效。实验结果表明，所提出的 IBOM 设计可以提供平均 29。

更新日期：2018-03-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11