当前位置: X-MOL 学术Electronics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Similarity-Aware Architecture/Compiler Co-Designed Context-Reduction Framework for Modulo-Scheduled CGRA
Electronics ( IF 2.9 ) Pub Date : 2021-09-09 , DOI: 10.3390/electronics10182210
Zhongyuan Zhao , Weiguang Sheng , Jinchao Li , Pengfei Ye , Qin Wang , Zhigang Mao

Modulo-scheduled coarse-grained reconfigurable array (CGRA) processors have shown their potential for exploiting loop-level parallelism at high energy efficiency. However, these CGRAs need frequent reconfiguration during their execution, which makes them suffer from large area and power overhead for context memory and context-fetching. To tackle this challenge, this paper uses an architecture/compiler co-designed method for context reduction. From an architecture perspective, we carefully partition the context into several subsections and only fetch the subsections that are different to the former context word whenever fetching the new context. We package each different subsection with an opcode and index value to formulate a context-fetching primitive (CFP) and explore the hardware design space by providing the centralized and distributed CFP-fetching CGRA to support this CFP-based context-fetching scheme. From the software side, we develop a similarity-aware tuning algorithm and integrate it into state-of-the-art modulo scheduling and memory access conflict optimization algorithms. The whole compilation flow can efficiently improve the similarities between contexts in each PE for the purpose of reducing both context-fetching latency and context footprint. Experimental results show that our HW/SW co-designed framework can improve the area efficiency and energy efficiency to at most 34% and 21% higher with only 2% performance overhead.

中文翻译:

相似性感知架构/编译器共同设计的模数调度 CGRA 上下文减少框架

模调度粗粒度可重构阵列 (CGRA) 处理器已显示出它们在高能效下利用循环级并行性的潜力。然而,这些 CGRA 在执行过程中需要频繁的重新配置,这使得它们受到上下文内存和上下文获取的大面积和功率开销的影响。为了应对这一挑战,本文使用了一种架构/编译器协同设计的方法来减少上下文。从架构的角度来看,我们仔细地将上下文划分为几个子部分,并且在获取新上下文时仅获取与前一个上下文词不同的子部分。我们将每个不同的子部分与操作码和索引值打包在一起,以制定上下文获取原语 (CFP),并通过提供集中式和分布式 CFP 获取 CGRA 来支持这种基于 CFP 的上下文获取方案来探索硬件设计空间。在软件方面,我们开发了一种相似性感知调整算法,并将其集成到最先进的模调度和内存访问冲突优化算法中。整个编译流程可以有效地提高每个 PE 中上下文之间的相似性,以减少上下文获取延迟和上下文占用空间。实验结果表明,我们的 HW/SW 协同设计框架可以将面积效率和能源效率提高至最多 34% 和 21%,而性能开销仅为 2%。
更新日期:2021-09-09
down
wechat
bug