当前位置: X-MOL 学术Sustain. Comput. Inform. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An FPU design template to optimize the accuracy-efficiency-area trade-off
Sustainable Computing: Informatics and Systems ( IF 3.8 ) Pub Date : 2020-10-27 , DOI: 10.1016/j.suscom.2020.100450
Davide Zoni , Andrea Galimberti , William Fornaciari

Modern embedded systems are in charge of an increasing number of tasks that extensively employ floating-point (FP) computations. The ever-increasing efficiency requirement, coupled with the additional computational effort to perform FP computations, motivates several microarchitectural optimizations of the FPU. This manuscript presents a novel modular FPU microarchitecture, which targets modern embedded systems and considers heterogeneous workloads including both best-effort and accuracy-sensitive applications. The design optimizes the EDP-accuracy-area figure of merit by allowing, at design-time, to independently configure the precision of each FP operation, while the FP dynamic range is kept common to the entire FPU to deliver a simpler microarchitecture. To ensure the correct execution of accuracy-sensitive applications, a novel compiler pass allows to substitute each FP operation for which a low-precision hardware support is offered with the corresponding soft-float function call. The assessment considers seven FPU variants encompassing three different state-of-the-art designs. The results on several representative use cases show that the binary32 FPU implementation offers an EDP gain of 15%, while, in case the FPU implements a mix of binary32 and bfloat16 operations, the EDP gain is 19%, the reduction in the resource utilization is 21% and the average accuracy loss is less than 2.5%. Moreover, the resource utilization of our FPU variants is aligned with the one of the FPU employing state-of-the-art, highly specialized FP hardware accelerators. Starting from the assessment, a set of guidelines is drawn to steer the design of the FP hardware support in modern embedded systems.



中文翻译:

FPU设计模板,以优化精度-效率-区域权衡

现代嵌入式系统负责执行越来越多的任务,这些任务广泛采用浮点(FP)计算。效率要求不断提高,再加上执行FP计算所需的额外计算量,促使FPU进行了一些微体系结构优化。该手稿介绍了一种新颖的模块化FPU微体系结构,该体系结构针对现代嵌入式系统,并考虑了包括尽力而为和精度敏感应用程序在内的异构工作负载。该设计通过允许在设计时独立配置每个FP操作的精度来优化EDP精度区域的品质因数,同时FP动态范围对于整个FPU保持通用,以提供更简单的微体系结构。为了确保正确执行对精度敏感的应用程序,新颖的编译器遍历允许将每个提供低精度硬件支持的FP操作替换为相应的软浮点函数调用。评估考虑了七个FPU变体,包括三个不同的最新设计。几个代表性用例的结果表明,binary32 FPU实现提供了15%的EDP增益,而如果FPU实现binary32bfloat16操作的混合,则EDP增益为19%,资源利用率降低了21%,平均准确度损失小于2.5%。此外,我们FPU变体的资源利用率与采用最新,高度专业化的FP硬件加速器的FPU之一保持一致。从评估开始,制定了一套指导方针,以指导现代嵌入式系统中FP硬件支持的设计。

更新日期:2020-10-30
down
wechat
bug