当前位置: X-MOL 学术Microelectron. Reliab. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Arbitrary Reduced Precision for Fine-grained Accuracy and Energy Trade-offs
Microelectronics Reliability ( IF 1.6 ) Pub Date : 2021-04-14 , DOI: 10.1016/j.microrel.2021.114099
Noureddine Ait Said , Mounir Benabdenbi , Katell Morin-Allory

Full-precision Floating-Point Units (FPUs) can be a source of extensive hardware overhead (power consumption, area, memory footprint, etc.). As several modern applications feature an inherent tolerance to precision loss, a new computing paradigm has emerged: Transprecision Computing (TC). TC proposes several tools and techniques that trade precision for energy efficiency. However, most of these tools require developers to rewrite part or all of their existing software stacks, which is often infeasible, complex, or requires extensive development efforts. In addition to their intrusiveness, TC tools can only simulate the impact of precision loss, and they do not provide corresponding hardware designs that take advantage of the simulations.

This work proposes a non-intrusive hardware-oriented approach, requiring no modification of source code that applies approximations at the low-level in assembly. The approach can be used to approximate virtually all types of executable binaries (bare-metal applications, single−/multi-threaded user applications, OS/RTOS, etc.). We introduce AxQEMU: a software based on the well known QEMU dynamic binary translator. We demonstrate how our approach can determine the effects of FP approximations on application-level Quality of Result (QoR), and how it interfaces with other tools from the literature. A hardware-level case study on a 28-nm FD-SOI implementation is presented, demonstrating how fine-grained energy/accuracy trade-offs can be made thanks to floating-point arbitrary reduced precision (ARP). For instance, considering the well-known arclength FP application, FPU computation energy savings of up to 19.4% were achieved with an accuracy threshold of 10 significant digits, and up to 60.7% with a 4-digit accuracy when using ARP. These savings compared favorably to the limited 7.7% saving afforded by using standard variable type optimization tools.



中文翻译:

任意降低的精度,以实现细粒度的精度和能量折衷

全精度浮点单元(FPU)可能是大量硬件开销(功耗,面积,内存占用量)的来源。由于一些现代应用程序具有对精度损失的固有容忍度,因此出现了一种新的计算范例:超精密计算(TC)。TC提出了几种工具和技术,它们以精度为代价来提高能效。但是,大多数这些工具都要求开发人员重写其部分或全部现有软件堆栈,这通常是不可行的,复杂的,或者需要大量的开发工作。除了具有侵入性之外,TC工具还只能模拟精度损失的影响,并且不提供利用模拟的相应硬件设计。

这项工作提出了一种非侵入式的面向硬件的方法,不需要修改在汇编的低层应用近似的源代码。该方法几乎可以用于近似所有类型的可执行二进制文件(裸机应用程序,单线程/多线程用户应用程序,OS / RTOS等)。)。我们介绍AxQEMU:一种基于著名的QEMU动态二进制转换器的软件。我们展示了我们的方法如何确定FP近似值对应用程序级结果质量(QoR)的影响,以及它如何与文献中的其他工具进行交互。给出了有关28纳米FD-SOI实现的硬件级案例研究,演示了如何借助浮点任意降低的精度(ARP)进行细粒度的能量/精度折衷。例如,考虑到众所周知的arclength FP应用程序,使用10个有效数字的精度阈值时,FPU计算节能高达19.4%,使用ARP时,其精度达到4位数字时高达60.7%。与使用标准变量类型优化工具所提供的有限的7.7%节省相比,这些节省是有利的。

更新日期:2021-04-14
down
wechat
bug