当前位置: X-MOL 学术Microprocess. Microsyst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Virtual-Cache: A cache-line borrowing technique for efficient GPU cache architectures
Microprocessors and Microsystems ( IF 1.9 ) Pub Date : 2021-06-26 , DOI: 10.1016/j.micpro.2021.104301
Bingchao Li , Jizeng Wei , Nam Sung Kim

GPUs provide megabytes of registers and shared memories to maintain the contexts for thousands of threads and enable fast data sharing amongst threads of a thread block, respectively. Besides, GPUs employ L1 cache to provide the high bandwidth service for memory requests. However, the average L1 cache capacity per thread is very limited, resulting in cache thrashing which in turn impairs the performance. Meanwhile, many registers and shared memories are unassigned to any warps or thread blocks. Moreover, registers and shared memories that are assigned can be idle when warps or thread blocks are finished. Exploiting the above insights, we propose Virtual-Cache to cost-effectively increase the effective size of L1 cache by utilizing the unassigned and released registers and shared memories as cache-lines in this paper. Specifically, we leverage the unassigned registers and shared memories to serve cache requests directly. Regarding the registers assigned to a warp, they can work as cache-lines after the warp completes the execution and before they are accessed again by a new launched warp. Regarding the shared memories of a thread block, they are enabled to serve cache requests when the thread block is finished till they are referenced by shared memory instructions of the relaunched thread block. The register file, shared memory and L1 cache are physically independent but logically unified as a large virtual cache with redesigned cache-line management. We develop the control and data path for the register file, making the register file accessible for cache requests by borrowing an operand collector to serve the cache requests. We also expand the control and data path for the shared memory to serve the cache requests. Our evaluation results show that Virtual-Cache makes the performance improved by 28% over the previously proposed cache management technique for cache-sensitive applications.



中文翻译:

虚拟缓存:一种用于高效 GPU 缓存架构的缓存行借用技术

GPU 提供兆字节的寄存器和共享内存来维护数千个线程的上下文,并分别在线程块的线程之间实现快速数据共享。此外,GPU 使用 L1 缓存为内存请求提供高带宽服务。但是,每个线程的平均 L1 缓存容量非常有限,导致缓存抖动,进而影响性能。同时,许多寄存器和共享内存未分配给任何扭曲或线程块。此外,当 warp 或线程块完成时,分配的寄存器和共享内存可能处于空闲状态。利用上述见解,我们提出虚拟缓存,通过利用未分配和释放的寄存器以及共享内存作为本文中的缓存线,以经济有效的方式增加 L1 缓存的有效大小。具体来说,我们利用未分配的寄存器和共享内存直接为缓存请求提供服务。关于分配给 warp 的寄存器,它们可以在 warp 完成执行之后和被新启动的 warp 再次访问之前用作缓存线。对于线程块的共享内存,当线程块完成时,它们被启用服务缓存请求,直到它们被重新启动的线程块的共享内存指令引用。寄存器文件、共享内存和 L1 缓存在物理上是独立的,但在逻辑上统一为一个具有重新设计的缓存行管理的大型虚拟缓存。我们为寄存器文件开发了控制和数据路径,通过借用一个操作数收集器来为缓存请求提供服务,使缓存请求可以访问寄存器文件。我们还扩展了共享内存的控制和数据路径,以服务于缓存请求。我们的评估结果表明,Virtual-Cache 使性能比之前提出的缓存敏感应用程序的缓存管理技术提高了 28%。

更新日期:2021-07-08
down
wechat
bug