An FPU design template to optimize the accuracy-efficiency-area trade-off,Sustainable Computing: Informatics and Systems

当前位置： X-MOL 学术 › Sustain. Comput. Inform. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

An FPU design template to optimize the accuracy-efficiency-area trade-off
Sustainable Computing: Informatics and Systems ( IF 3.8 ) Pub Date : 2020-10-27 , DOI: 10.1016/j.suscom.2020.100450
Davide Zoni , Andrea Galimberti , William Fornaciari

Modern embedded systems are in charge of an increasing number of tasks that extensively employ floating-point (FP) computations. The ever-increasing efficiency requirement, coupled with the additional computational effort to perform FP computations, motivates several microarchitectural optimizations of the FPU. This manuscript presents a novel modular FPU microarchitecture, which targets modern embedded systems and considers heterogeneous workloads including both best-effort and accuracy-sensitive applications. The design optimizes the EDP-accuracy-area figure of merit by allowing, at design-time, to independently configure the precision of each FP operation, while the FP dynamic range is kept common to the entire FPU to deliver a simpler microarchitecture. To ensure the correct execution of accuracy-sensitive applications, a novel compiler pass allows to substitute each FP operation for which a low-precision hardware support is offered with the corresponding soft-float function call. The assessment considers seven FPU variants encompassing three different state-of-the-art designs. The results on several representative use cases show that the binary32 FPU implementation offers an EDP gain of 15%, while, in case the FPU implements a mix of binary32 and bfloat16 operations, the EDP gain is 19%, the reduction in the resource utilization is 21% and the average accuracy loss is less than 2.5%. Moreover, the resource utilization of our FPU variants is aligned with the one of the FPU employing state-of-the-art, highly specialized FP hardware accelerators. Starting from the assessment, a set of guidelines is drawn to steer the design of the FP hardware support in modern embedded systems.

中文翻译：

FPU设计模板，以优化精度-效率-区域权衡

现代嵌入式系统负责执行越来越多的任务，这些任务广泛采用浮点（FP）计算。效率要求不断提高，再加上执行FP计算所需的额外计算量，促使FPU进行了一些微体系结构优化。该手稿介绍了一种新颖的模块化FPU微体系结构，该体系结构针对现代嵌入式系统，并考虑了包括尽力而为和精度敏感应用程序在内的异构工作负载。该设计通过允许在设计时独立配置每个FP操作的精度来优化EDP精度区域的品质因数，同时FP动态范围对于整个FPU保持通用，以提供更简单的微体系结构。为了确保正确执行对精度敏感的应用程序，新颖的编译器遍历允许将每个提供低精度硬件支持的FP操作替换为相应的软浮点函数调用。评估考虑了七个FPU变体，包括三个不同的最新设计。几个代表性用例的结果表明，binary32 FPU实现提供了15％的EDP增益，而如果FPU实现binary32和bfloat16操作的混合，则EDP增益为19％，资源利用率降低了21％，平均准确度损失小于2.5％。此外，我们FPU变体的资源利用率与采用最新，高度专业化的FP硬件加速器的FPU之一保持一致。从评估开始，制定了一套指导方针，以指导现代嵌入式系统中FP硬件支持的设计。

更新日期：2020-10-30

点击分享查看原文

点击收藏

阅读更多本刊最新论文