FixM: Code generation of fixed point mathematical functions
Graphical abstract
Introduction
Error-tolerant applications are increasingly common. In the embedded systems field, error-tolerance often arises from applications dealing with interactions with the human senses (which are naturally error-tolerant), such as video or audio encoding, or from algorithmic properties, as in the case of some iterative refinement numerical algorithms [2], [3]. In High Performance Computing, error-tolerant applications often emerge from the need to carry out large scale simulations, where local errors do not significantly affect the overall picture, or can be handled a posteriori. Therefore, approximate computing techniques are coming under significant scrutiny by the HPC community [4], [5], [6]. In both domains, emerging AI applications often exhibit strong error-tolerance, as shown in both DNN [7] and CNN [8]. Proposals have been made at the hardware level to take advantage of inherent perceptual limitations, redundant data, or reduced precision input, as well as to reduce system costs and improve power efficiency [3]. At the same time, works on floating-point to fixed-point conversion tools allow us to trade-off the algorithm exactness for a more efficient implementation [9]. Several tools and methods for approximate computing have been proposed, ranging from the tuning of computation precision to more aggressive methods such as loop perforation.
Simple arithmetic operations such as multiplication and division can impact performance and energy significantly when moving from floating point to fixed point implementations [10]. Trigonometric functions can have an even larger impact, due to their inherently higher implementation cost, and the lack of hardware acceleration. This is especially true in embedded systems.
While on small code bases optimizing such trigonometric functions to employ fixed point representations can easily be achieved manually, a scalable automatic approach is still missing in the state-of-the-art. Indeed, a traditional static, generic library would require a large number of different versions of each trigonometric function: one per each allowed fixed-point bit partitioning, for each allowed bit-width.
The goal of this work is to extend the dynamic fixed point [11], [12] approach to mathematical library functions. In contrast with existing implementations [13], not only the application code is converted to use the fixed point representation, but the code of the mathematical library is converted as well.
In order to achieve this conversion, we introduce a new code generation approach (FixM) that minimizes code duplication while fully supporting the dynamic fixed point algorithm design without compromises. The code size reduction can be extremely important in the case of, sensor nodes and other low-end embedded systems, where program memory size is limited.
Its proof-of-concept implementation (FixMAGE) is an extension of the taffo [14] mixed-precision solution. It enables the automatic code generation of mathematical functions for the desired fixed point representations. The implementations generated leverage the classic CORDIC algorithm [15], which is widely used in the industry [16].
We test our compiler-based library generator on two embedded computing platforms using relevant benchmarks – applications from the AxBench suite [17], the most popular approximate computing benchmark, and the FBench benchmark [18]. Experimental results prove that, in applications relying on trigonometric functions, these are the dominant component of the performance difference between (emulated) floating point and fixed point implementations. Thus, floating to fixed point conversion is pointless unless trigonometric functions are optimized, achieving speedups of 180%, while containing average error (absolute and relative) to or less. This enables energy savings up to 60%.
The FixM approach is demonstrated on the sin and cos functions, but can be easily generalized to cover all trigonometric and hyperbolic functions, as well as other functions that can be computed using the CORDIC method.
The rest of this paper is organized as follows. In Section 2 we present our approach, providing all the technical details in Section 3. In Section 4, we provide an experimental evaluation of the proposed library and compiler support on a set of selected approximate computing benchmarks that employ trigonometric functions. Finally, in Section 5 we compare our approach and results with the state-of-the-art, and in Section 6 we draw some conclusions and propose future research directions.
Section snippets
Approach
FixM is our approach to on-demand code generation of mathematical functions using fixed point arithmetic. The traditional approach to fixed point arithmetic assumes the program uses a given fixed point data type, and the whole code generation is pivoted around this rigid assumption. Instead, in our proposal we use dynamic fixed point [12], a technique that proved to be effective to reduce the rounding error on the final output. Dynamic fixed point consists in changing the numerical
Implementation
The FixM approach to generating mathematical functions is based on two components. The first component consists of an automated precision tuning framework. The other component is a template-based code generator, called FixMAGE (FIXed point MAthematical function GEnerator).
In this section, we will illustrate the concepts and algorithms behind these two components, and how we integrated them in our solution.
Evaluation
For the sake of the efficacy validation of FixM in a precision tuning context, we partition the embedded systems hardware architectures into two different classes.
The first class consists of embedded systems microcontrollers without a floating point unit (FPU). On this kind of hardware, floating point computation can be handled through an emulation library provided by the compiler. Our approach is based on the fixed point numeric representation, which is always supported since it relies on the
Comparative analysis with the state-of-the-art
The implementation of CORDIC algorithms is an engineering problem that has been addressed by several solutions in the state-of-the-art. However, most implementations share the same approach. We distinguish two main categories: hardware and software implementations.
Nowadays, the most common use case of custom CORDIC implementations in hardware comes from the generation of circuit descriptions for programmable devices, for example FPGAs. High Level Synthesis (HLS) is the task that aims at
Conclusions
We presented an approach and compiler pass to deploy mathematical function implementations specialized for dynamic fixed point, minimizing the amount of additional code generated with respect to current approaches.
We were able to achieve speedups up to approximately on an microcontroller-based embedded system, with a negligible cost in terms of error, in benchmarks where trigonometric functions represent the majority of the computational effort, where state-of-the-art floating to
Declaration of interest
None declared.
Authors contribution
Daniele Cattaneo: Conceptualization, Methodology, Software, Writing – Original Draft, Writing – Review and Editing, Visualization, Supervision, Project administration. Michele Chiari: Software, Formal analysis, Writing – Review and Editing, Visualization, Supervision. Gabriele Magnani: Methodology, Software, Validation, Formal analysis, Investigation, Writing – Original Draft. Nicola Fossati: Validation, Investigation, Writing – Review and Editing. Stefano Cherubin: Conceptualization,
Declaration of Competing Interest
The authors report no declarations of interest.
Acknowledgments
Work supported by the FET-HPC project RECIPE, G.A. n. 801137.
References (29)
Approximate computing: challenges and opportunities
2016 IEEE International Conference on Rebooting Computing (ICRC)
(2016)Characterization of error-tolerant applications when protecting control data
2006 IEEE International Symposium on Workload Characterization
(2006)A survey of techniques for approximate computing
ACM Comput. Surveys
(2016)White Paper from Workshop on Large-Scale Parallel Numerical Computing Technology (LSPANC 2020): HPC and Computer Arithmetic Toward Minimal-Precision Computing
(2020)Autotuning and adaptivity in energy efficient HPC systems: the ANTAREX toolbox
Proceedings of the 15th ACM International Conference on Computing Frontiers, CF ’18
(2018)A transprecision floating-point platform for ultra-low power computing
2018 Design, Automation Test in Europe Conference Exhibition (DATE)
(2018)Exploiting approximate computing for deep learning acceleration
2018 Design, Automation Test in Europe Conference Exhibition (DATE)
(2018)- et al.
Error resilience analysis for systematically employing approximate computing in convolutional neural networks
2018 Design, Automation & Test in Europe Conference & Exhibition (DATE)
(2018) - et al.
Tools for reduced precision computation: a survey
ACM Comput. Surveys
(2020 Apr) Implications of Reduced-Precision Computations in HPC: Performance, Energy and Error
Parallel Computing is Everywhere, Vol. 32: Advances in Parallel Computing. International Conference on Parallel Computing (ParCo), September 2017
(2018)
Dynamic fixed-point arithmetic design of embedded svm-based speaker identification system
Advances in Neural Networks – ISNN 2010
Dynamically scaled fixed point arithmetic
IEEE Pacific Rim Conference on Communications, Computers and Signal Processing Conference Proceedings, vol. 1
Development of fixed-point trigonometric function library for high-level synthesis
The 1st IEEE/IIAE International Conference on Intelligent Systems and Image Processing 2013 (ICISIP2013)
TAFFO: Tuning assistant for floating to fixed point optimization
IEEE Embedded Syst. Lett.
Cited by (4)
TAFFO: The compiler-based precision tuner
2022, SoftwareXMixed Precision in Heterogeneous Parallel Computing Platforms via Delayed Code Analysis
2023, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)On the Functional Properties of Automatically Generated Fixed-Point Controllers
2023, 9th 2023 International Conference on Control, Decision and Information Technologies, CoDIT 2023The impact of precision tuning on embedded systems performance: A case study on field-oriented control
2021, OpenAccess Series in Informatics