FixM: Code generation of fixed point mathematical functions

doi:10.1016/j.suscom.2020.100478

Sustainable Computing: Informatics and Systems

Volume 29, Part B, March 2021, 100478

https://doi.org/10.1016/j.suscom.2020.100478 Get rights and content

Highlights

•
We provide an approach to automatically generate code that implements mathematical functions using dynamic fixed point, i.e. fixed point with variable bit partitioning.
•
We extended a compiler-based precision tuning framework to implement a proof-of-concept of our approach.
•
We show that our solution improves energy efficiency and performance with respect to the state-of-the-art.

Abstract

Approximate computing has seen significant interest as a design philosophy oriented to performance and energy efficiency [1]. Precision tuning is an approximate computing technique that trades off the accuracy of operations for performance and energy by employing less precise data types, such as fixed point instead of floating point. However, the current state-of-the-art does not consider the possibility of optimizing mathematical functions whose computation is usually off-loaded to a library.

In this work we extend a precision-tuning framework to perform tuning of trigonometric functions as well. We developed a new mathematical function library, which is parameterizable at compile-time depending on the data type and works natively in the fixed point numeric representation. Through modification of a compiler pass, the parameterized implementations of these trigonometric functions are inserted into the program seamlessly during the precision tuning process.

Our approach, which we test on two microcontrollers with different architectures, achieves a speedup of up to $180 %$ , and energy savings up to $60 %$ , with a negligible cost in terms of error in the results.

Graphical abstract

Introduction

Error-tolerant applications are increasingly common. In the embedded systems field, error-tolerance often arises from applications dealing with interactions with the human senses (which are naturally error-tolerant), such as video or audio encoding, or from algorithmic properties, as in the case of some iterative refinement numerical algorithms [2], [3]. In High Performance Computing, error-tolerant applications often emerge from the need to carry out large scale simulations, where local errors do not significantly affect the overall picture, or can be handled a posteriori. Therefore, approximate computing techniques are coming under significant scrutiny by the HPC community [4], [5], [6]. In both domains, emerging AI applications often exhibit strong error-tolerance, as shown in both DNN [7] and CNN [8]. Proposals have been made at the hardware level to take advantage of inherent perceptual limitations, redundant data, or reduced precision input, as well as to reduce system costs and improve power efficiency [3]. At the same time, works on floating-point to fixed-point conversion tools allow us to trade-off the algorithm exactness for a more efficient implementation [9]. Several tools and methods for approximate computing have been proposed, ranging from the tuning of computation precision to more aggressive methods such as loop perforation.

Simple arithmetic operations such as multiplication and division can impact performance and energy significantly when moving from floating point to fixed point implementations [10]. Trigonometric functions can have an even larger impact, due to their inherently higher implementation cost, and the lack of hardware acceleration. This is especially true in embedded systems.

While on small code bases optimizing such trigonometric functions to employ fixed point representations can easily be achieved manually, a scalable automatic approach is still missing in the state-of-the-art. Indeed, a traditional static, generic library would require a large number of different versions of each trigonometric function: one per each allowed fixed-point bit partitioning, for each allowed bit-width.

The goal of this work is to extend the dynamic fixed point [11], [12] approach to mathematical library functions. In contrast with existing implementations [13], not only the application code is converted to use the fixed point representation, but the code of the mathematical library is converted as well.

In order to achieve this conversion, we introduce a new code generation approach (FixM) that minimizes code duplication while fully supporting the dynamic fixed point algorithm design without compromises. The code size reduction can be extremely important in the case of, sensor nodes and other low-end embedded systems, where program memory size is limited.

Its proof-of-concept implementation (FixMAGE) is an extension of the taffo [14] mixed-precision solution. It enables the automatic code generation of mathematical functions for the desired fixed point representations. The implementations generated leverage the classic CORDIC algorithm [15], which is widely used in the industry [16].

We test our compiler-based library generator on two embedded computing platforms using relevant benchmarks – applications from the AxBench suite [17], the most popular approximate computing benchmark, and the FBench benchmark [18]. Experimental results prove that, in applications relying on trigonometric functions, these are the dominant component of the performance difference between (emulated) floating point and fixed point implementations. Thus, floating to fixed point conversion is pointless unless trigonometric functions are optimized, achieving speedups of 180%, while containing average error (absolute and relative) to $10^{- 3}$ or less. This enables energy savings up to 60%.

The FixM approach is demonstrated on the sin and cos functions, but can be easily generalized to cover all trigonometric and hyperbolic functions, as well as other functions that can be computed using the CORDIC method.

The rest of this paper is organized as follows. In Section 2 we present our approach, providing all the technical details in Section 3. In Section 4, we provide an experimental evaluation of the proposed library and compiler support on a set of selected approximate computing benchmarks that employ trigonometric functions. Finally, in Section 5 we compare our approach and results with the state-of-the-art, and in Section 6 we draw some conclusions and propose future research directions.

Section snippets

Approach

FixM is our approach to on-demand code generation of mathematical functions using fixed point arithmetic. The traditional approach to fixed point arithmetic assumes the program uses a given fixed point data type, and the whole code generation is pivoted around this rigid assumption. Instead, in our proposal we use dynamic fixed point [12], a technique that proved to be effective to reduce the rounding error on the final output. Dynamic fixed point consists in changing the numerical

Implementation

The FixM approach to generating mathematical functions is based on two components. The first component consists of an automated precision tuning framework. The other component is a template-based code generator, called FixMAGE (FIXed point MAthematical function GEnerator).

In this section, we will illustrate the concepts and algorithms behind these two components, and how we integrated them in our solution.

Evaluation

For the sake of the efficacy validation of FixM in a precision tuning context, we partition the embedded systems hardware architectures into two different classes.

The first class consists of embedded systems microcontrollers without a floating point unit (FPU). On this kind of hardware, floating point computation can be handled through an emulation library provided by the compiler. Our approach is based on the fixed point numeric representation, which is always supported since it relies on the

Comparative analysis with the state-of-the-art

The implementation of CORDIC algorithms is an engineering problem that has been addressed by several solutions in the state-of-the-art. However, most implementations share the same approach. We distinguish two main categories: hardware and software implementations.

Nowadays, the most common use case of custom CORDIC implementations in hardware comes from the generation of circuit descriptions for programmable devices, for example FPGAs. High Level Synthesis (HLS) is the task that aims at

Conclusions

We presented an approach and compiler pass to deploy mathematical function implementations specialized for dynamic fixed point, minimizing the amount of additional code generated with respect to current approaches.

We were able to achieve speedups up to approximately $180 %$ on an microcontroller-based embedded system, with a negligible cost in terms of error, in benchmarks where trigonometric functions represent the majority of the computational effort, where state-of-the-art floating to

Declaration of interest

None declared.

Authors contribution

Daniele Cattaneo: Conceptualization, Methodology, Software, Writing – Original Draft, Writing – Review and Editing, Visualization, Supervision, Project administration. Michele Chiari: Software, Formal analysis, Writing – Review and Editing, Visualization, Supervision. Gabriele Magnani: Methodology, Software, Validation, Formal analysis, Investigation, Writing – Original Draft. Nicola Fossati: Validation, Investigation, Writing – Review and Editing. Stefano Cherubin: Conceptualization,

Declaration of Competing Interest

The authors report no declarations of interest.

Acknowledgments

Work supported by the FET-HPC project RECIPE, G.A. n. 801137.

References (29)

A. Agrawal
Approximate computing: challenges and opportunities
2016 IEEE International Conference on Rebooting Computing (ICRC)
(2016)
D.D. Thaker
Characterization of error-tolerant applications when protecting control data
2006 IEEE International Symposium on Workload Characterization
(2006)
S. Mittal
A survey of techniques for approximate computing
ACM Comput. Surveys
(2016)
R. Iakymchuk
White Paper from Workshop on Large-Scale Parallel Numerical Computing Technology (LSPANC 2020): HPC and Computer Arithmetic Toward Minimal-Precision Computing
(2020)
C. Silvano
Autotuning and adaptivity in energy efficient HPC systems: the ANTAREX toolbox
Proceedings of the 15th ACM International Conference on Computing Frontiers, CF ’18
(2018)
G. Tagliavini
A transprecision floating-point platform for ultra-low power computing
2018 Design, Automation Test in Europe Conference Exhibition (DATE)
(2018)
C. Chen
Exploiting approximate computing for deep learning acceleration
2018 Design, Automation Test in Europe Conference Exhibition (DATE)
(2018)
M.A. Hanif et al.
Error resilience analysis for systematically employing approximate computing in convolutional neural networks
2018 Design, Automation & Test in Europe Conference & Exhibition (DATE)
(2018)
S. Cherubin et al.
Tools for reduced precision computation: a survey
ACM Comput. Surveys
(2020 Apr)
S. Cherubin
Implications of Reduced-Precision Computations in HPC: Performance, Energy and Error
Parallel Computing is Everywhere, Vol. 32: Advances in Parallel Computing. International Conference on Parallel Computing (ParCo), September 2017
(2018)

J.-F. Wang

Dynamic fixed-point arithmetic design of embedded svm-based speaker identification system

Advances in Neural Networks – ISNN 2010

(2010)

D. Williamson

Dynamically scaled fixed point arithmetic

IEEE Pacific Rim Conference on Communications, Computers and Signal Processing Conference Proceedings, vol. 1

(1991)

N. Iwanaga et al.

Development of fixed-point trigonometric function library for high-level synthesis

The 1st IEEE/IIAE International Conference on Intelligent Systems and Image Processing 2013 (ICISIP2013)

(2013)

S. Cherubin

TAFFO: Tuning assistant for floating to fixed point optimization

IEEE Embedded Syst. Lett.

(2019)

Cited by (4)

TAFFO: The compiler-based precision tuner
2022, SoftwareX
We present taffo, a framework that automatically performs precision tuning to exploit the performance/accuracy trade-off. In order to avoid expensive dynamic analyses, taffo leverages programmer annotations which encapsulate domain knowledge about the conditions under which the software being optimized will run. As a result, taffo is easy to use and provides state-of-the-art optimization efficacy in a variety of hardware configurations and application domains. We provide guidelines for the effective exploitation of taffo by showing a typical example of usage on a simple application, achieving a speedup up to 60% at the price of an absolute error of $3.53 \times 1 0^{- 5}$ . taffo is modular and based on the solid llvm technology, which allows extensibility to improved analysis techniques, and comprehensive support for the most common precision-reduced data types and programming languages. As a result, the taffo technology has been selected as the precision tuning tool of the European Training Network on Approximate Computing.
Mixed Precision in Heterogeneous Parallel Computing Platforms via Delayed Code Analysis
2023, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
On the Functional Properties of Automatically Generated Fixed-Point Controllers
2023, 9th 2023 International Conference on Control, Decision and Information Technologies, CoDIT 2023
The impact of precision tuning on embedded systems performance: A case study on field-oriented control
2021, OpenAccess Series in Informatics

View full text

FixM: Code generation of fixed point mathematical functions

Highlights

Abstract

Graphical abstract

Introduction

Section snippets

Approach

Implementation

Evaluation

Comparative analysis with the state-of-the-art

Conclusions

Declaration of interest

Authors contribution

Declaration of Competing Interest

Acknowledgments

Approximate computing: challenges and opportunities

2016 IEEE International Conference on Rebooting Computing (ICRC)

Characterization of error-tolerant applications when protecting control data

2006 IEEE International Symposium on Workload Characterization

A survey of techniques for approximate computing

ACM Comput. Surveys

White Paper from Workshop on Large-Scale Parallel Numerical Computing Technology (LSPANC 2020): HPC and Computer Arithmetic Toward Minimal-Precision Computing

Autotuning and adaptivity in energy efficient HPC systems: the ANTAREX toolbox

Proceedings of the 15th ACM International Conference on Computing Frontiers, CF ’18

A transprecision floating-point platform for ultra-low power computing

2018 Design, Automation Test in Europe Conference Exhibition (DATE)

Exploiting approximate computing for deep learning acceleration

2018 Design, Automation Test in Europe Conference Exhibition (DATE)

Error resilience analysis for systematically employing approximate computing in convolutional neural networks

2018 Design, Automation & Test in Europe Conference & Exhibition (DATE)

Tools for reduced precision computation: a survey