当前位置: X-MOL 学术IEEE Trans. Very Larg. Scale Integr. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Dynamic General Accelerator for Integer and Fixed-Point Processing
IEEE Transactions on Very Large Scale Integration (VLSI) Systems ( IF 2.8 ) Pub Date : 2020-12-01 , DOI: 10.1109/tvlsi.2020.3023106
Ali A. D. Farahani , Hakem Beitollahi , Mahmood Fathi

Coarse-grained reconfigurable arrays (CGRAs) are used as low-power and high-performance accelerators in the processors of the Internet of Things (IoT) and embedded systems to accelerate the computation of intensive tasks. These accelerators speedup loops, including integer and fixed-point instructions of computation-intensive applications, in multimedia, voice coding, and encryption algorithms. The design of an efficient compiler that could map the compiled assembly codes to CGRA is a serious challenge. Even with existing such compiler, the compiled programs are dependent on the hardware of CGRA and processor, i.e., the accelerator is not transparent to compilers and applications. This article proposes a novel accelerator that its CGRA is concatenated with the main ALU of the processor to speedup the execution of integer and logical applications. The proposed accelerator and its hardware-based implemented mapping technique are completely transparent to the compiler, OS, user, and applications. This architecture overcomes the dependence challenge mentioned earlier. The simulation results indicate that the proposed architecture improves the performance and energy consumption on average by 15.8% and 8% in comparison with the baseline architecture, respectively, whereas the area and power overhead of the proposed architecture when the CGRA is used at its optimum case are 4.47% and 4.43%, respectively.

中文翻译:

用于整数和定点处理的动态通用加速器

粗粒度可重构阵列 (CGRA) 在物联网 (IoT) 和嵌入式系统的处理器中用作低功耗和高性能加速器,以加速密集任务的计算。这些加速器加速循环,包括多媒体、语音编码和加密算法中计算密集型应用程序的整数和定点指令。可以将编译后的汇编代码映射到 CGRA 的高效编译器的设计是一个严峻的挑战。即使有这样的编译器,编译出来的程序也依赖于CGRA和处理器的硬件,即加速器对编译器和应用程序不透明。本文提出了一种新型加速器,其 CGRA 与处理器的主 ALU 级联,以加速整数和逻辑应用程序的执行。建议的加速器及其基于硬件的映射技术对编译器、操作系统、用户和应用程序是完全透明的。这种架构克服了前面提到的依赖性挑战。仿真结果表明,与基线架构相比,所提出的架构平均分别提高了 15.8% 和 8% 的性能和能耗,而在最佳情况下使用 CGRA 时所提出的架构的面积和功耗开销分别为 4.47% 和 4.43%。
更新日期:2020-12-01
down
wechat
bug