Elsevier

Integration

Volume 76, January 2021, Pages 1-12
Integration

READ: A fixed restoring array based accuracy-configurable approximate divider for energy efficiency

https://doi.org/10.1016/j.vlsi.2020.08.002Get rights and content

Highlights

  • A fixed restoring array based approximate divider is proposed and presented with detailed error and circuit analysis for CMOS-45 nm technology node.

  • The approximate divider presented is reconfigurable in runtime which enables accuracy configurability by varying the design parameter.

  • Reconfigurable half and full subtractor cells are designed and proposed which achieves dual mode of operation — accurate and exact with energy savings.

  • Overflow detector circuit for divider design is proposed and presented which is scalable and hardware efficient.

  • The approximate divider designs are evaluated for image processing applications and found to be satisfactory in terms of Quality of Services (QoS).

Abstract

Energy efficiency has emerged as one of the most essential design parameters in contemporary computing system design. Approximate computing is a new computing paradigm to achieve energy efficiency by trading-off energy/area/latency improvements with accuracy for error-resilient applications. This paper proposes Reconfigurable Energy-efficient Approximate Divider (READ) that achieves several energy–quality configurable modes using fixed restoring array divider architecture. Conventional approximate binary dividers require various divider hardware configurations to achieve distinct energy–quality trade-off points, which decreases the hardware flexibility, especially for modern embedded systems. READ accomplishes energy efficiency while meeting the dynamically varying accuracy requirements of the targeted application. READ uses reconfigurable subtractor cells that can work in either accurate or approximate mode using a subtractor cell controller logic. The paper also introduces the design of overflow detector using minimal hardware resources. A comprehensive accuracy and hardware evaluation on CMOS 45-nm technology node are performed for the proposed dividers as well as other state-of-the- art divider designs. Compared to the accurate 168 divider design, the proposed divider shows an improvement of 49% in terms of energy efficiency and is 1.26x faster, while introducing minimal errors. The proposed divider design is demonstrated for its efficacy in image processing tasks and shows nominal effect on the output quality.

Introduction

There is a persistent demand for energy-efficient yet compute-intensive designs for modern applications such as multimedia, digital signal processing (DSP), and machine learning-based applications. To achieve transformational energy reductions the improvements in technology nodes, voltage scaling, or designing parallel architectures are insufficient. For such applications, a decent accuracy loss may be acceptable, as it will not hamper the quality in a noticeable way due to limited human perception capability. There is an urgent need to promote alternative computing techniques to cope with high energy requirements for emerging applications. One class of techniques which can be applied in such scenario is based on the premise of approximate computing where full precision computing blocks are replaced with their approximate counterpart [1]. However, using approximate computing techniques results in a trade-off between circuit parameters (area, delay, and power) with the output accuracy. The computational quality as well as the available energy requirements of an application varies at runtime. The designs which can support different accuracy levels are in high demand and are useful for both error-tolerant and error-sensitive applications. The use of reconfigurable approximate circuits saves unnecessary computational effort when required using a re-usable hardware design. Reconfigurability can be introduced such that the degradation in quality is graceful with minimal energy consumption.

Low power and fast arithmetic units have been proposed in the literature for compute-intensive applications. The arithmetic units designed are mostly over-engineered and establishing the fact that most applications dealing with real-time data like audio, video, images are error-resilient (due to analog nature of data) and thus can be explored for approximations. A vast literature on approximate adders and multipliers is available applying approximate computing techniques from device to architecture level. Approximate multi-bit adders are proposed in [2] that uses carry-speculation as an approximation technique to obtain desired circuit metrics. Five approximate mirror adders (MAs) are proposed in [3] using logic simplifications at transistor level, thereby reducing hardware complexity. A reconfigurable oriented approximate carry-lookahead adder is introduced in [4], the adder uses power-gating technique in approximate mode to save power. Similarly, approximate multipliers have also been widely discussed. The approximate multiplier design [5] discusses a truncation and rounding-based approach for input operands resulting in an inexact multiplication. Approximate multiplier design components like adders and compressor have been simplified to be used in approximate multiplier in [6]. While approximate booth-multipliers are discussed in [7]. In comparison, approximate divider designs are recently explored for approximations and is an active research area.

Motivation: Out of four basic arithmetic operations, the division operation is performed infrequently while its power consumption, latency, and complexity are greater as compared to other arithmetic operations. A hardware divider proves to be costly and power hungry due to complex division algorithms. As an example, the integer divide instruction (IDIV) of the AMD 15-h family has an average latency of 12 cycles for performing 16-bit division which is considerably larger than 3 cycles for integer multiplication (IMUL) instruction for the same width [8]. The demand for higher resolution in multimedia content has increased due to the advancement of camera and sensing technologies which demand real-time encoding and decoding of high volume data. This becomes even more crucial for devices with limited battery capacities such as smartphones and portable electronics. Division operation is used in many of the image/video processing tasks such as compression, pixel division, foreground extraction, etc. However, as the use of digital image/video and signal processing techniques becomes more prevalent, the importance of arithmetic operation like division is gaining momentum. This creates the need for an energy-efficient divider design with simplified and reusable hardware. Approximate Computing can be leveraged to improve circuit performance by accepting inaccuracies in the calculations [9]. Most of the prior approximate divider designs work efficiently only in approximate mode. The accurate working of these designs is either not feasible or proves to be energy-inefficient compared to the exact counterpart. Thus, this paper proposes reconfigurable energy-efficient approximate divider hardware for error-tolerant as well as accurate applications leveraging approximate computing methods.

Different approximate hardware dividers have been proposed [10], [11], [12], [13], [14], [15] and [16] based on techniques such as cell replacement, cell truncation, multiplicative division, high-radix division and lookup tables (LUT’s). Most of the proposed dividers are fully approximate and can only be utilized in error-resilient applications. The accuracy of arithmetic design needs to be tunable as the application-level quality is input dependent or can change over time [17]. Thus, there is a need for compact underlying hardware that can be configured depending on energy or quality requirements. The present approximate dividers need to be hardwired again for different approximate modes. As a result, the energy savings potential of the approximated design diminishes. For hardware design in real-time embedded systems, it is impractical to implement several hardware compositions for every changing approximate mode. Overflow detection constitutes a crucial component in division computation which is not discussed in recent literature regarding approximated divider designs. Neglecting overflow conditions may lead to erroneous outcomes. A proper design of overflow detection hardware thus becomes crucial for divider hardware implementation. Reconfigurability offers a useful design solution such that the approximate design can also work accurately when required with significant energy savings.

Therefore, the above mentioned issues form a basis to develop and propose a reconfigurable energy-efficient approximate divider that can modulate its degree of approximation. To summarize, the contributions made by this paper are as follows:

  • READ is introduced as an energy-efficient restoring array based approximate integer divider that offers several energy–quality trade-off modes using a fixed restoring array-based hardware structure.

  • Reconfigurable approximate half and full subtractor cells (RHSCs and RSCs respectively) are introduced which can work efficiently both in accurate and approximate mode depending on the control signals generated by subtractor cell control logic which is easily tunable during runtime.

  • An efficient and scalable overflow detection logic hardware design is proposed which when integrated with reconfigurable subtractor cells form a complete approximate divider architecture.

  • READ is extensively analyzed for calculating error metrics and circuit parameters at CMOS 45-nm technology node. Realistic power estimation is performed considering all the relevant circuit parameters. The trade-off between energy and accuracy is carefully investigated with suitable graphs and tables. The accuracy results for READ for various approximate modes is presented and validated by comparing with the state-of-the-art approximate dividers.

  • Finally, application-level evaluation of the proposed and approximate dividers for image processing applications demonstrates minuscule degradation of the output accuracy.

This paper is organized as follows. Section 2 provides a brief review of the existing works on approximate dividers with a description of exact restoring array-based dividers. Section 3 introduces the proposed design (READ) along with reconfigurable subtractor cells, overflow detector, and controller logic. In Section 4, the relevant accuracy and circuit metrics are simulated to assess the performance of exact and approximate dividers. Energy–quality trade-off is also discussed for approximate designs. Simulation results for various image processing tasks are presented in Section 5. Section 6 concludes the paper.

Section snippets

Background

In this section, related works to approximate dividers are briefly reviewed followed by a concise description of accurate restoring dividers.

Proposed reconfigurable energy-efficient approximate divider (READ)

This section introduces and discusses hardware implementation of READ which is a reconfigurable approximate divider. READ architecture comprises of reconfigurable subtractor cells (RSCs), overflow detector logic (OFD), and subtractor cell controller logic (SCL). READ computes n-bit quotient for 2nn division as for most of the applications, quotient computations are prioritized over remainder calculation [16]. READ-8 and READ-16 refer to 84 and 168 divider designs respectively. The term

Results & analysis

All the 168 approximate, exact, and READ-16 are simulated for realistic power estimation as shown in Fig. 11. The register transfer level (RTL) description of proposed and related approximated designs is done in Verilog HDL. AXDr3 proposed in [16] is implemented at the gate level rather than the transistor level for consistency with other designs. The circuits are synthesized using NanGate CMOS 45-nm technology [28] library using Cadence RTL compiler with relevant input files as shown in

Application analysis

This section evaluates the approximate and exact dividers in the context of image processing applications which are inherently error-tolerant. These applications include JPEG compression, change detection, and background removal. All these applications use division operation as a fundamental operation for image processing. The results are evaluated for 168 division in terms of Quality of Service (QoS) parameters like Peak Signal to Noise ratio (PSNR) and Structural Similarity Index (SSIM). A

Conclusion

A reconfigurable energy-efficient approximate divider (READ) using a fixed restoring array with various quality-configurability modes were proposed and analyzed. Reconfigurable half and full subtractor cells were designed that can work in exact or approximate mode enabling accuracy configurability. An overflow detection hardware based on carry-lookahead logic was also proposed and forms a crucial part in division computation. READ can be configured for any divider design (2nn), simply by

CRediT authorship contribution statement

Neelam Arya: Conceptualization, Methodology, Writing - original draft, Formal analysis. Teena Soni: Software, Data curation, Validation. Manisha Pattanaik: Investigation, Supervision. G.K. Sharma: Supervision, Resources.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (33)

  • LiuW. et al.

    A retrospective and prospective view of approximate computing point of view

    Proceedings of the IEEE

    (2020)
  • Ebrahimi-AzandaryaniF. et al.

    Block-based carry speculative approximate adder for energy-efficient applications

    IEEE Trans. Circuits Syst. II

    (2019)
  • GuptaV. et al.

    Low-power digital signal processing using approximate adders

    IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.

    (2012)
  • AkbariO. et al.

    RAP-CLA: A reconfigurable approximate carry look-ahead adder

    IEEE Trans. Circuits Syst. II

    (2016)
  • VahdatS. et al.

    TOSAM: An energy-efficient truncation-and rounding-based scalable approximate multiplier

    IEEE Trans. Very Large Scale Integr. (VLSI) Syst.

    (2019)
  • VenkatachalamS. et al.

    Design of power and area efficient approximate multipliers

    IEEE Trans. Very Large Scale Integr. (VLSI) Syst.

    (2017)
  • QianL. et al.

    Design and evaluation of an approximate wallace-booth multiplier

  • DevicesA.M.

    Software Optimization Guide for AMD Family 15h Processors

    (2014)
  • AryaN. et al.

    Quality driven energy aware approximated core transform architecture for HEVC standard

  • HashemiS. et al.

    A low-power dynamic divider for approximate applications

  • ImaniM. et al.

    Cade: Configurable approximate divider for energy efficiency

  • JiangH. et al.

    Adaptive approximation in arithmetic circuits: A low-power unsigned divider design

  • ChenL. et al.

    Design, evaluation and application of approximate high-radix dividers

    IEEE Trans. Multi-Scale Comput. Syst.

    (2018)
  • JiangH. et al.

    Low-power unsigned divider and square root circuit designs using adaptive approximation

    IEEE Trans. Comput.

    (2019)
  • VahdatS. et al.

    TruncApp: A truncation-based approximate divider for energy efficient DSP applications

  • ChenL. et al.

    On the design of approximate restoring dividers for error-tolerant applications

    IEEE Trans. Comput.

    (2015)
  • Cited by (5)

    • Design of energy efficient approximate subtractors and restoring dividers for error tolerant applications

      2023, Microelectronics Journal
      Citation Excerpt :

      In Reconfigurable Energy Efficient Approximate Divider (READ) a fixed restoring array was proposed using reconfigurable subtractor cells. The circuit can operate in inaccurate or approximate mode incorporating built-in subtractor controller logic and overflow detection hardware [19]. Approximation at the circuit level was performed in the model–I by reducing the logic equations and replacing conventional restoring divider units with approximate restoring divider units.

    • Non-iterative division circuit design with accuracy and performance trade-off based on mixed integer linear programming approach

      2022, Microelectronics Journal
      Citation Excerpt :

      Thus approximated computing and related circuit designs have attracted research attention in recent years [7,16–21]. One example of such applications is the 3D graphics processing [22], where multiplications and divisions are simplified as additions and subtraction in logarithmic domain using the Mitchell-based logarithmic approximation [23–25]. Similar logarithmic arithmetic unit is reported in [26].

    • Error Resilient Approximate Restoring Divider for Image Processing Application

      2023, 2023 IEEE 20th India Council International Conference, INDICON 2023
    View full text