当前位置:
X-MOL 学术
›
arXiv.cs.AR
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue Cores
arXiv - CS - Hardware Architecture Pub Date : 2019-11-19 , DOI: arxiv-1911.08356 Fabian Schuiki, Florian Zaruba, Torsten Hoefler, Luca Benini
arXiv - CS - Hardware Architecture Pub Date : 2019-11-19 , DOI: arxiv-1911.08356 Fabian Schuiki, Florian Zaruba, Torsten Hoefler, Luca Benini
Single-issue processor cores are very energy efficient but suffer from the
von Neumann bottleneck, in that they must explicitly fetch and issue the
loads/storse necessary to feed their ALU/FPU. Each instruction spent on moving
data is a cycle not spent on computation, limiting ALU/FPU utilization to 33%
on reductions. We propose "Stream Semantic Registers" to boost utilization and
increase energy efficiency. SSR is a lightweight, non-invasive RISC-V ISA
extension which implicitly encodes memory accesses as register reads/writes,
eliminating a large number of loads/stores. We implement the proposed extension
in the RTL of an existing multi-core cluster and synthesize the design for a
modern 22nm technology. Our extension provides a significant, 2x to 5x,
architectural speedup across different kernels at a small 11% increase in core
area. Sequential code runs 3x faster on a single core, and 3x fewer cores are
needed in a cluster to achieve the same performance. The utilization increase
to almost 100% in leads to a 2x energy efficiency improvement in a multi-core
cluster. The extension reduces instruction fetches by up to 3.5x and
instruction cache power consumption by up to 5.6x. Compilers can automatically
map loop nests to SSRs, making the changes transparent to the programmer.
中文翻译:
流语义寄存器:一个轻量级的 RISC-V ISA 扩展,在单问题内核中实现完全计算利用
单问题处理器内核非常节能,但受到冯诺依曼瓶颈的影响,因为它们必须明确地获取和发出为 ALU/FPU 供电所需的负载/存储。用于移动数据的每条指令都是一个未用于计算的周期,将 ALU/FPU 利用率限制为 33%。我们建议使用“流语义寄存器”来提高利用率并提高能源效率。SSR 是一种轻量级、非侵入性的 RISC-V ISA 扩展,它将内存访问隐式编码为寄存器读/写,从而消除了大量的加载/存储。我们在现有多核集群的 RTL 中实现了提议的扩展,并综合了现代 22 纳米技术的设计。我们的扩展提供了跨不同内核的 2 到 5 倍的显着架构加速,而核心面积仅增加了 11%。顺序代码在单个内核上的运行速度提高了 3 倍,并且集群中实现相同性能所需的内核数量减少了 3 倍。利用率提高到几乎 100%,从而使多核集群的能效提高了 2 倍。该扩展将指令提取减少了 3.5 倍,指令缓存功耗减少了 5.6 倍。编译器可以自动将循环嵌套映射到 SSR,使更改对程序员透明。
更新日期:2020-04-02
中文翻译:
流语义寄存器:一个轻量级的 RISC-V ISA 扩展,在单问题内核中实现完全计算利用
单问题处理器内核非常节能,但受到冯诺依曼瓶颈的影响,因为它们必须明确地获取和发出为 ALU/FPU 供电所需的负载/存储。用于移动数据的每条指令都是一个未用于计算的周期,将 ALU/FPU 利用率限制为 33%。我们建议使用“流语义寄存器”来提高利用率并提高能源效率。SSR 是一种轻量级、非侵入性的 RISC-V ISA 扩展,它将内存访问隐式编码为寄存器读/写,从而消除了大量的加载/存储。我们在现有多核集群的 RTL 中实现了提议的扩展,并综合了现代 22 纳米技术的设计。我们的扩展提供了跨不同内核的 2 到 5 倍的显着架构加速,而核心面积仅增加了 11%。顺序代码在单个内核上的运行速度提高了 3 倍,并且集群中实现相同性能所需的内核数量减少了 3 倍。利用率提高到几乎 100%,从而使多核集群的能效提高了 2 倍。该扩展将指令提取减少了 3.5 倍,指令缓存功耗减少了 5.6 倍。编译器可以自动将循环嵌套映射到 SSR,使更改对程序员透明。