Configurable DSI partitioned approximate multiplier

doi:10.1016/j.future.2020.09.008

Future Generation Computer Systems

Volume 115, February 2021, Pages 100-114

https://doi.org/10.1016/j.future.2020.09.008 Get rights and content

Highlights

•
It is error-configurable and based on lack of partial products generation for LSBs.
•
It provides a trade-off between hardware resources, accuracy, delay and power.
•
The goal is to reduce all hardware metrics besides keeping accuracy satisfaction.
•
Many error metrics, test cases, and architectures have been used to validate our evaluations.

Abstract

Approximate computing has been considered for error-tolerant applications that can tolerate some loss of accuracy. It improves metrics such as dynamic power, delay, and area. Multipliers are key elements in arithmetic logic units and used in many applications such as Digital Signal Processing (DSP). Hence, it is vital to develop a robust strategy to take advantage of approximate computing in multipliers. In this paper, a novel algorithm has been presented for the approximate multiplication of unsigned numbers. The proposed approach is error-configurable and provides a trade-off between hardware resources, accuracy, delay, and power. In addition, it can be adjusted based on target systems or applications. The proposed method, compared to the accurate and other configurable algorithm instances, improves the metrics by using a low power configuration. Meanwhile, according to experimental results, the average error rates is 1.04% for 16-bit multiplication. The percentage of improvements for error, delay, area, and dynamic power of the proposed 16-bit multiplier are 0.02% to 16.71%, 23.8% to 70.6%, -11% to 34.1%, and 42.9% to 81.1%, respectively. Moreover, the proposed multiplier has been employed in Discrete Cosine Transform (DCT) applications and has obtained admirable outputs.

Introduction

In recent years, the CMOS technology is moving towardsmaller geometries, and the size of a transistor is considerably getting shrunk. In this regard, the number of transistors in a chip has reached billions and the complexity of recent CMOS integrated circuits has increased. Also, the newer circuits operate at higher frequencies and lower power supply and subthreshold voltages. Accordingly, power consumption grows in high density chips and it becomes a major concern in highly integrated nanoscale designs. Furthermore, the ever-increasing demand for higher computing power represents a driving force toward ultra-low power design strategies. Seeking the energy efficiency improvement, designers have turned to optimization methods in several ways, from system level down to transistor device level. Many techniques are available for reducing power consumption providing a trade-off relationship between power and performance. A potential solution to lower power dissipation is to employ approximate circuit designs [1], [2], [3].

Approximate Computing [4] is an emerging trend in digital design that exploits the inherent error tolerance in many applications to gain performance enhancement in terms of area, speed and/or power by forsaking computational accuracy. Approximate computing generates inaccurate but acceptable results rather than an accurate result. It also provides low-power and small size for a design [5], [6], [7], [8]. Especially in applications that use human senses, it is suitable to apply approximate computation because people do not recognize small errors. Applications such as image processing, recognition, Digital Signal Processing (DSP), web search algorithm, machine learning and data mining are inherently error-tolerant and do not require perfect accuracy in computation. Computing units are considered as key components of modern electronic embedded devices. For these applications, approximate circuits may play an important role as a promising alternative for reducing area, power and delay in digital systems that can tolerate some loss of accuracy, thereby achieving better performance in energy efficiency [9].

Applying the approximation to the arithmetic units can be performed at different design abstraction levels including circuit, logic, and architecture levels, as well as algorithm and software layers [5]. Using approximation in arithmetic building blocks such as adders and multipliers at different design levels have been suggested in [10], [11], [12], [13], [14]. Among the arithmetic operations, the multiplication block has always been considered as a complex block that causes increasing the complexity of the design. Multipliers are the most widely executed arithmetic blocks of an ALU in a wide range of applications including multimedia, wireless communication, machine learning, data mining, etc. [15]. Multiplication is one of the most area consuming arithmetic operations in high-performance circuits and efforts aimed at improving ALU performance. Therefore, decreasing the complexity of multipliers may reduce the power consumption of the overall system. Hence, approximate multiplier design has become an important research subject in recent years [16]. A multiplier includes a few stages. Fig. 1 shows that a multiplier includes at least three stages:

1.
Partial products generation
2.
Partial products reduction
3.
Carry-propagate addition

Approximations in multipliers can be conducted in any of these stages [16].

The array multiplier is well known due to its regular architecture. The circuit is based on the Add and Shift algorithm. Each partial product is generated by the multiplication of the multiplicand with one multiplier bit. The partial products are shifted according to their bit orders and then added. The addition can be performed with a normal carry propagate adder. For j multiplier bits and k multiplicand, we need $j \times k$ AND gates and $(j - 1)$ k-bit adders to produce a product of $j + k$ bits. Fig. 2 shows a typical organization of a 4-bit array multiplier with exact accuracy.

Most studies have focused on applying approximation on stages 2 and 3 of a multiplier, i.e., using approximate adders for adding the generated partial products. Also, the matter of configurability among researchers has received much less attention. In this paper, a novel approximate multiplier is proposed with the approach of reducing partial products i.e., applying approximation on stage 1 of a multiplier by introducing a unique truncation strategy. The proposed approximate multiplier is configurable and it can be adjusted based on the application’s requirements. Configurability of the proposed architecture provides a trade-off relationship among minimization of average error, area, delay and dynamic power.

The rest of this paper is organized as follows. Section 2 surveys prior works. In Section 3, the proposed method is described. The experimental results and application-based evaluation are presented in Section 4. Finally, Section 5 presents the conclusion and future work.

Section snippets

Related works

In this section, some previous works in the field of approximate adders and multipliers are briefly reviewed [3], [5], [14], [17], [18]. Adders and multipliers are widely used in computing units of any microprocessor, multimedia systems, or Digital Signal Processor (DSP) [19]. In recent years, due to the systems and user requirements, adders and multipliers have been very much considered. The multiplier is an important arithmetic logic unit in most applications, but consumes much power. As

Overview

The proposed architecture for approximate multiplication has focused on decreasing power consumption, delay, area and the average and maximum error distances. Configurability feature forms a compromise between mentioned criteria, allowing one to choose a better configuration than ever before to achieve better performance and accuracy. The proposed approximate multiplier is designed for unsigned numbers and ignores some particular partial products without extra errors.

Most significant bits of

Simulation setup

In this paper, simulations were done in two levels of abstraction. Hardware simulations were created by VHDL language and synthesized using Altera Quartus II, realized in Cyclone IV family of Altera FPGAs. Behavioral simulations were done by our developed tool called Configurable Direct-Search-Ignore (CDSI). CDSI tool was developed by graphical C++ language and provides a user interface to select customized configurations in order to report different error reports.

Fig. 6 presents the design

Conclusion and future work

In this paper, a configurable approximate multiplier is introduced which is capable of adjusting metrics such as dynamic power consumption, area, delay, and accuracy based on requirements. The proposed multiplier is based on the lack of partial product generation for LSBs. Furthermore, operand swapping and Ex. bit assumption have been used to reduce the average error rate and bias error. The goal of the proposed method is to reduce all hardware metrics along with keeping accuracy satisfaction

CRediT authorship contribution statement

Fahimeh Hajizadeh: Conceptualization, Methodology, Software, Writing - original draft. Mohammadreza Binesh Marvasti: Supervision, Resources, Investigation, Writing - review & editing. Seyyed Amir Asghari: Supervision, Resources, Investigation. Mostafa Abbas Mollaei: Software, Data curation. Amir M. Rahmani: Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Fahimeh Hajizadeh received her B.Sc. degree in computer hardware engineering from Shahid Beheshti University, Tehran, Iran, in 2015, and the M.Sc. degree in computer architecture engineering from Kharazmi University, Tehran, Iran, in 2019. Her research interests include approximate computing, hardware security, image processing.

References (35)

SunnyA. et al.
Area efficient high speed approximate multiplier with carry predictor
Proc. Technol.
(2016)
International Technology Roadmap for Semiconductors (ITRS)
(2019)
C. Kozyrakis, Advancing computer systems without technology progress, in: 2013 IEEE International Symposium on...
J. Han, M. Orshansky, Approximate computing: An emerging paradigm for energy-efficient design, in: 2013 18th IEEE...
XuQ. et al.
Approximate computing: A survey
IEEE Des. Test
(2016)
GuptaV. et al.
Low-power digital signal processing using approximate adders
IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.
(2013)
A. Becher, J. Echavarria, D. Ziener, S. Wildermann, J. Teich, A LUT-based approximate adder, in: 2016 IEEE 24th Annual...
AkbariO. et al.
RAP-CLA: A reconfigurable Approximate Carry look-ahead Adder
IEEE Trans. Circuits Syst. II
(2018)
M. Shafique, R. Hafiz, S. Rehman, W. El-Harouni, J. Henkel, Invited: Cross-layer approximate computing: From logic to...
H.A.F. Almurib, T.N. Kumar, F. Lombardi, Inexact designs for approximate low power addition by cell replacement, in:...

P. Kulkarni, P. Gupta, M. Ercegovac, Trading accuracy for power with an under designed multiplier architecture, in:...

Khaing Yin Kyaw, Wang Ling Goh, Kiat Seng Yeo, Low-power high-speed multiplier for error-tolerant application, in: 2010...

A.B. Kahng, S. Kang, Accuracy-configurable adder for approximate arithmetic designs, in: DAC Design Automation...

NarayanamoorthyS. et al.

Energy-efficient approximate multiplication for digital signal processing and classification applications

IEEE Trans. Very Large Scale Integr. (VLSI) Syst.

(2015)

Z. Yang, A. Jain, J. Liang, J. Han, F. Lombardi, Approximate XOR/XNOR-based adders for inexact computing, in: 2013 13th...

PetraN. et al.

Truncated binary multipliers with variable correction and Minimum Mean Square Error

IEEE Trans. Circuits Syst. I. Regul. Pap.

(2010)

H. Jiang, C. Liu, N. Maheshwari, F. Lombardi, J. Han, A comparative evaluation of approximate multipliers, in: 2016...

Cited by (4)

CNTFET-based digital arithmetic circuit designs in ternary logic with improved performance
2024, e-Prime - Advances in Electrical Engineering, Electronics and Energy
Ternary logics are more desirable than the binary logic because it offers high speed operations and large density of information in digital logic systems. The carbon nanotube field-effect transistors (CNTFETs) are considered as better option for designing the ternary circuits because it provides the multiple threshold voltages by changing nanotube diameter. The complex arithmetic schematics such as adders and multipliers are important circuits in many VLSI applications. The ternary half-adder (THA) and multiplier (TMUL) designed using the Stanford 32 nm CNTFET are proposed in this study. The proposed THA and TMUL uses 41 and 33 CNTFETs and that resulting in a 54.44 % and 45 % reduction in transistor count, which plays major role for delay and power optimization. As a result, the proposed THA shows 37.01 %, 14.07 %, and 45.87 % improvement in delay, power, and energy, whereas TMUL shows 30.48 %, 6.64 %, and 36.74 % improvement in delay, power and energy, respectively, compared to existing circuits. The proposed THA and TMUL are developed using HSPICE with 0.9 V as a supply voltage.
A power constrained approximate multiplier with a high level of configurability
2022, Microprocessors and Microsystems
Citation Excerpt :
If greater input is considered the second operand in the proposed method, the average error rate will reduce. This technique is used in [11] as a swapping mode. Using the swapping mode in the proposed approximate multiplier causes a lower average error rate than the normal mode.
Approximate computing helps tackle the challenges of future embedded and high-performance computing by using various methodologies. By increasing the volume of computations and limiting the power consumption, approximate computing can address these challenges in various demands. Approximate computing aims to achieve acceptable accuracy, rather than exact and correct results and can be used in error-tolerant applications to reduce hardware resources, delay, and most importantly power consumption. This paper introduces a new approximate multiplier, with a high degree of configurability, for unsigned numbers. It aims to reduce all hardware metrics besides keeping high accuracy. The proposed method offers well-optimized options in a power-accuracy tradeoff compared to the other configurable algorithm instances. In addition, it provides a wide range of options with power saving from 35% to 85% to satisfy most applications with a desirable power budget. Furthermore, the proposed approximate multiplier has been employed in Discrete Cosine Transform (DCT) applications.
Editorial: Special issue on Advancing on Approximate Computing: Methodologies, Architectures and Algorithms
2021, Future Generation Computer Systems
Citation Excerpt :
The experimental result demonstrates significant energy savings against negligible accuracy losses for image classification and speech recognition problems. Hajizadeh et al. [3] discuss, in the paper titled Configurable DSI Partitioned Approximate Multiplier, an approximate binary multiplier that is error-configurable and provides a trade-off between hardware resources, accuracy, delay, and power consumption. The proposed approach exploits the Configurable Direct-Search-Ignore (CDSI) partition and the authors show a 16 bit unsigned approximate multiplier against other approximate multipliers.
In the modern computing era, characterized by saturated performance and high production costs, Approximate Computing has been representing the most attractive breakthrough for efficient system design. Such an innovative paradigm leverages the intrinsic error resilience of applications to inaccuracy in their inner calculations, in order to trade output result quality, under a certain maximum acceptable error threshold, off for system performance gain, such as calculation time and power demanding. In particular, for audio, image and video processing, data mining and information retrieval, approximate results turn out hard to distinguish from perfect ones, while their computation is less expensive. In recent years, Approximate Computing applicability is broadening in many scientific areas since suitable solutions come from approximate arithmetic operators, implemented both at hardware and software level, but from unreliable memory architectures, integrated circuit test, compilers and many others too. This special issue is dedicated to original research results and achievements by researcher community working on challenges and issues related to Approximate Computing.
Adaptable Approximate Multiplier Design Based on Input Distribution and Polarity
2022, IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Mohammadreza Binesh Marvasti received the M.Sc. degree from Department of ECE University of Tehran, Iran, in 2007 and the Ph.D. degree in ECE from McMaster University, Canada, in 2013. His research interests include Computer Architecture, Low-Power Digital Design, FPGAs, Approximate Computing, and On-chip Interconnection Network. He has served as a faculty member in the Department of Electrical and Computer Engineering at Kharazmi University.

Seyyed Amir Asghari received his B.Sc. degree in 2007 (hardware engineering major), M.Sc. and Ph.D. in 2009 and 2013 respectively (computer architecture major) from Amirkabir University of Technology. His current research interests include fault-tolerant design and real-time embedded system design. He has served as a faculty member in the Department of Electrical and Computer Engineering at Kharazmi University.

Mostafa Abbas Mollaei received his B.Sc. degree in computer hardware engineering from Shahid Beheshti University, Tehran, Iran, in 2015, and the M.Sc. degree in computer architecture engineering from Tehran University, Tehran, Iran, in 2018. His research interests include hardware security, approximate computing, embedded system design.

Amir M. Rahmani is currently Marie Curie Global Fellow at University of California Irvine (USA) and TU Wien (Austria). He is also an adjunct professor (Docent) at the University of Turku (Finland). His work spans self-aware computing, healthcare Internet-of-Things, wearable sensor design, and Fog/Edge Computing. He is the Associate Editor of ACM Transactions on Computing for Healthcare.

View full text

Configurable DSI partitioned approximate multiplier

Highlights

Abstract

Introduction

Section snippets

Related works

Overview

Simulation setup

Conclusion and future work

CRediT authorship contribution statement

Declaration of Competing Interest

Proc. Technol.

International Technology Roadmap for Semiconductors (ITRS)

Approximate computing: A survey

IEEE Des. Test

Low-power digital signal processing using approximate adders

IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.

RAP-CLA: A reconfigurable Approximate Carry look-ahead Adder

IEEE Trans. Circuits Syst. II

Energy-efficient approximate multiplication for digital signal processing and classification applications

IEEE Trans. Very Large Scale Integr. (VLSI) Syst.

Truncated binary multipliers with variable correction and Minimum Mean Square Error

IEEE Trans. Circuits Syst. I. Regul. Pap.