Configurable DSI partitioned approximate multiplier
Introduction
In recent years, the CMOS technology is moving towardsmaller geometries, and the size of a transistor is considerably getting shrunk. In this regard, the number of transistors in a chip has reached billions and the complexity of recent CMOS integrated circuits has increased. Also, the newer circuits operate at higher frequencies and lower power supply and subthreshold voltages. Accordingly, power consumption grows in high density chips and it becomes a major concern in highly integrated nanoscale designs. Furthermore, the ever-increasing demand for higher computing power represents a driving force toward ultra-low power design strategies. Seeking the energy efficiency improvement, designers have turned to optimization methods in several ways, from system level down to transistor device level. Many techniques are available for reducing power consumption providing a trade-off relationship between power and performance. A potential solution to lower power dissipation is to employ approximate circuit designs [1], [2], [3].
Approximate Computing [4] is an emerging trend in digital design that exploits the inherent error tolerance in many applications to gain performance enhancement in terms of area, speed and/or power by forsaking computational accuracy. Approximate computing generates inaccurate but acceptable results rather than an accurate result. It also provides low-power and small size for a design [5], [6], [7], [8]. Especially in applications that use human senses, it is suitable to apply approximate computation because people do not recognize small errors. Applications such as image processing, recognition, Digital Signal Processing (DSP), web search algorithm, machine learning and data mining are inherently error-tolerant and do not require perfect accuracy in computation. Computing units are considered as key components of modern electronic embedded devices. For these applications, approximate circuits may play an important role as a promising alternative for reducing area, power and delay in digital systems that can tolerate some loss of accuracy, thereby achieving better performance in energy efficiency [9].
Applying the approximation to the arithmetic units can be performed at different design abstraction levels including circuit, logic, and architecture levels, as well as algorithm and software layers [5]. Using approximation in arithmetic building blocks such as adders and multipliers at different design levels have been suggested in [10], [11], [12], [13], [14]. Among the arithmetic operations, the multiplication block has always been considered as a complex block that causes increasing the complexity of the design. Multipliers are the most widely executed arithmetic blocks of an ALU in a wide range of applications including multimedia, wireless communication, machine learning, data mining, etc. [15]. Multiplication is one of the most area consuming arithmetic operations in high-performance circuits and efforts aimed at improving ALU performance. Therefore, decreasing the complexity of multipliers may reduce the power consumption of the overall system. Hence, approximate multiplier design has become an important research subject in recent years [16]. A multiplier includes a few stages. Fig. 1 shows that a multiplier includes at least three stages:
- 1.
Partial products generation
- 2.
Partial products reduction
- 3.
Carry-propagate addition
Approximations in multipliers can be conducted in any of these stages [16].
The array multiplier is well known due to its regular architecture. The circuit is based on the Add and Shift algorithm. Each partial product is generated by the multiplication of the multiplicand with one multiplier bit. The partial products are shifted according to their bit orders and then added. The addition can be performed with a normal carry propagate adder. For j multiplier bits and k multiplicand, we need AND gates and k-bit adders to produce a product of bits. Fig. 2 shows a typical organization of a 4-bit array multiplier with exact accuracy.
Most studies have focused on applying approximation on stages 2 and 3 of a multiplier, i.e., using approximate adders for adding the generated partial products. Also, the matter of configurability among researchers has received much less attention. In this paper, a novel approximate multiplier is proposed with the approach of reducing partial products i.e., applying approximation on stage 1 of a multiplier by introducing a unique truncation strategy. The proposed approximate multiplier is configurable and it can be adjusted based on the application’s requirements. Configurability of the proposed architecture provides a trade-off relationship among minimization of average error, area, delay and dynamic power.
The rest of this paper is organized as follows. Section 2 surveys prior works. In Section 3, the proposed method is described. The experimental results and application-based evaluation are presented in Section 4. Finally, Section 5 presents the conclusion and future work.
Section snippets
Related works
In this section, some previous works in the field of approximate adders and multipliers are briefly reviewed [3], [5], [14], [17], [18]. Adders and multipliers are widely used in computing units of any microprocessor, multimedia systems, or Digital Signal Processor (DSP) [19]. In recent years, due to the systems and user requirements, adders and multipliers have been very much considered. The multiplier is an important arithmetic logic unit in most applications, but consumes much power. As
Overview
The proposed architecture for approximate multiplication has focused on decreasing power consumption, delay, area and the average and maximum error distances. Configurability feature forms a compromise between mentioned criteria, allowing one to choose a better configuration than ever before to achieve better performance and accuracy. The proposed approximate multiplier is designed for unsigned numbers and ignores some particular partial products without extra errors.
Most significant bits of
Simulation setup
In this paper, simulations were done in two levels of abstraction. Hardware simulations were created by VHDL language and synthesized using Altera Quartus II, realized in Cyclone IV family of Altera FPGAs. Behavioral simulations were done by our developed tool called Configurable Direct-Search-Ignore (CDSI). CDSI tool was developed by graphical C++ language and provides a user interface to select customized configurations in order to report different error reports.
Fig. 6 presents the design
Conclusion and future work
In this paper, a configurable approximate multiplier is introduced which is capable of adjusting metrics such as dynamic power consumption, area, delay, and accuracy based on requirements. The proposed multiplier is based on the lack of partial product generation for LSBs. Furthermore, operand swapping and Ex. bit assumption have been used to reduce the average error rate and bias error. The goal of the proposed method is to reduce all hardware metrics along with keeping accuracy satisfaction
CRediT authorship contribution statement
Fahimeh Hajizadeh: Conceptualization, Methodology, Software, Writing - original draft. Mohammadreza Binesh Marvasti: Supervision, Resources, Investigation, Writing - review & editing. Seyyed Amir Asghari: Supervision, Resources, Investigation. Mostafa Abbas Mollaei: Software, Data curation. Amir M. Rahmani: Supervision.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Fahimeh Hajizadeh received her B.Sc. degree in computer hardware engineering from Shahid Beheshti University, Tehran, Iran, in 2015, and the M.Sc. degree in computer architecture engineering from Kharazmi University, Tehran, Iran, in 2019. Her research interests include approximate computing, hardware security, image processing.
References (35)
- et al.
Area efficient high speed approximate multiplier with carry predictor
Proc. Technol.
(2016) International Technology Roadmap for Semiconductors (ITRS)
(2019)- C. Kozyrakis, Advancing computer systems without technology progress, in: 2013 IEEE International Symposium on...
- J. Han, M. Orshansky, Approximate computing: An emerging paradigm for energy-efficient design, in: 2013 18th IEEE...
- et al.
Approximate computing: A survey
IEEE Des. Test
(2016) - et al.
Low-power digital signal processing using approximate adders
IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.
(2013) - A. Becher, J. Echavarria, D. Ziener, S. Wildermann, J. Teich, A LUT-based approximate adder, in: 2016 IEEE 24th Annual...
- et al.
RAP-CLA: A reconfigurable Approximate Carry look-ahead Adder
IEEE Trans. Circuits Syst. II
(2018) - M. Shafique, R. Hafiz, S. Rehman, W. El-Harouni, J. Henkel, Invited: Cross-layer approximate computing: From logic to...
- H.A.F. Almurib, T.N. Kumar, F. Lombardi, Inexact designs for approximate low power addition by cell replacement, in:...
Energy-efficient approximate multiplication for digital signal processing and classification applications
IEEE Trans. Very Large Scale Integr. (VLSI) Syst.
Truncated binary multipliers with variable correction and Minimum Mean Square Error
IEEE Trans. Circuits Syst. I. Regul. Pap.
Cited by (4)
CNTFET-based digital arithmetic circuit designs in ternary logic with improved performance
2024, e-Prime - Advances in Electrical Engineering, Electronics and EnergyA power constrained approximate multiplier with a high level of configurability
2022, Microprocessors and MicrosystemsCitation Excerpt :If greater input is considered the second operand in the proposed method, the average error rate will reduce. This technique is used in [11] as a swapping mode. Using the swapping mode in the proposed approximate multiplier causes a lower average error rate than the normal mode.
Editorial: Special issue on Advancing on Approximate Computing: Methodologies, Architectures and Algorithms
2021, Future Generation Computer SystemsCitation Excerpt :The experimental result demonstrates significant energy savings against negligible accuracy losses for image classification and speech recognition problems. Hajizadeh et al. [3] discuss, in the paper titled Configurable DSI Partitioned Approximate Multiplier, an approximate binary multiplier that is error-configurable and provides a trade-off between hardware resources, accuracy, delay, and power consumption. The proposed approach exploits the Configurable Direct-Search-Ignore (CDSI) partition and the authors show a 16 bit unsigned approximate multiplier against other approximate multipliers.
Adaptable Approximate Multiplier Design Based on Input Distribution and Polarity
2022, IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Fahimeh Hajizadeh received her B.Sc. degree in computer hardware engineering from Shahid Beheshti University, Tehran, Iran, in 2015, and the M.Sc. degree in computer architecture engineering from Kharazmi University, Tehran, Iran, in 2019. Her research interests include approximate computing, hardware security, image processing.
Mohammadreza Binesh Marvasti received the M.Sc. degree from Department of ECE University of Tehran, Iran, in 2007 and the Ph.D. degree in ECE from McMaster University, Canada, in 2013. His research interests include Computer Architecture, Low-Power Digital Design, FPGAs, Approximate Computing, and On-chip Interconnection Network. He has served as a faculty member in the Department of Electrical and Computer Engineering at Kharazmi University.
Seyyed Amir Asghari received his B.Sc. degree in 2007 (hardware engineering major), M.Sc. and Ph.D. in 2009 and 2013 respectively (computer architecture major) from Amirkabir University of Technology. His current research interests include fault-tolerant design and real-time embedded system design. He has served as a faculty member in the Department of Electrical and Computer Engineering at Kharazmi University.
Mostafa Abbas Mollaei received his B.Sc. degree in computer hardware engineering from Shahid Beheshti University, Tehran, Iran, in 2015, and the M.Sc. degree in computer architecture engineering from Tehran University, Tehran, Iran, in 2018. His research interests include hardware security, approximate computing, embedded system design.
Amir M. Rahmani is currently Marie Curie Global Fellow at University of California Irvine (USA) and TU Wien (Austria). He is also an adjunct professor (Docent) at the University of Turku (Finland). His work spans self-aware computing, healthcare Internet-of-Things, wearable sensor design, and Fog/Edge Computing. He is the Associate Editor of ACM Transactions on Computing for Healthcare.