Snitch: A tiny Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads,arXiv - CS - Hardware Architecture

当前位置： X-MOL 学术 › arXiv.cs.AR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Snitch: A tiny Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads
arXiv - CS - Hardware Architecture Pub Date : 2020-02-24 , DOI: arxiv-2002.10143
Florian Zaruba, Fabian Schuiki, Torsten Hoefler, and Luca Benini

Data-parallel applications, such as data analytics, machine learning, and scientific computing, are placing an ever-growing demand on floating-point operations per second on emerging systems. With increasing integration density, the quest for energy efficiency becomes the number one design concern. While dedicated accelerators provide high energy efficiency, they are over-specialized and hard to adjust to algorithmic changes. We propose an architectural concept that tackles the issues of achieving extreme energy efficiency while still maintaining high flexibility as a general-purpose compute engine. The key idea is to pair a tiny 10kGE control core, called Snitch, with a double-precision FPU to adjust the compute to control ratio. While traditionally minimizing non-FPU area and achieving high floating-point utilization has been a trade-off, with Snitch, we achieve them both, by enhancing the ISA with two minimally intrusive extensions: stream semantic registers (SSR) and a floating-point repetition instruction (FREP). SSRs allow the core to implicitly encode load/store instructions as register reads/writes, eliding many explicit memory instructions. The FREP extension decouples the floating-point and integer pipeline by sequencing instructions from a micro-loop buffer. These ISA extensions significantly reduce the pressure on the core and free it up for other tasks, making Snitch and FPU effectively dual-issue at a minimal incremental cost of 3.2%. The two low overhead ISA extensions make Snitch more flexible than a contemporary vector processor lane, achieving a $2\times$ energy-efficiency improvement. We have evaluated the proposed core and ISA extensions on an octa-core cluster in 22nm technology. We achieve more than $5\times$ multi-core speed-up and a $3.5\times$ gain in energy efficiency on several parallel microkernels.

中文翻译：

Snitch：用于面积和能源高效执行浮点密集型工作负载的微型伪双发处理器

数据并行应用程序，例如数据分析、机器学习和科学计算，对新兴系统每秒浮点运算的需求不断增长。随着集成密度的增加，对能源效率的追求成为首要的设计问题。虽然专用加速器提供了高能效，但它们过于专业化，难以适应算法变化。我们提出了一种架构概念，该概念解决了实现极高能效的问题，同时仍保持作为通用计算引擎的高度灵活性。关键思想是将一个称为 Snitch 的微型 10kGE 控制核心与双精度 FPU 配对，以调整计算与控制比率。虽然传统上最小化非 FPU 面积和实现高浮点利用率一直是一种权衡，使用 Snitch，我们通过使用两个最小侵入性扩展增强 ISA：流语义寄存器 (SSR) 和浮点重复指令 (FREP) 来实现它们。SSR 允许内核将加载/存储指令隐式编码为寄存器读/写，从而省略许多显式内存指令。FREP 扩展通过对来自微循环缓冲区的指令进行排序来分离浮点和整数流水线。这些 ISA 扩展显着降低了内核的压力，并将其释放给其他任务，使 Snitch 和 FPU 有效地双发，增量成本最低为 3.2%。两个低开销 ISA 扩展使 Snitch 比现代矢量处理器通道更灵活，实现了 2 美元/倍的能效改进。我们已经在 22 纳米技术的八核集群上评估了提议的核心和 ISA 扩展。我们在多个并行微内核上实现了超过 $5\times$ 的多核加速和 $3.5\times$ 的能效增益。

更新日期：2020-10-09

点击分享查看原文

点击收藏

阅读更多本刊最新论文