skip to main content
research-article

A Reconfigurable Multiplier for Signed Multiplications with Asymmetric Bit-Widths

Published:30 June 2021Publication History
Skip Abstract Section

Abstract

Multiplications have been commonly conducted in quantized CNNs, filters, and reconfigurable cores, and so on, which are widely deployed in mobile and embedded applications. Most multipliers are designed to perform multiplications with symmetric bit-widths, i.e., n- by n-bit multiplication. Such features would cause extra area overhead and performance loss when m- by n-bit multiplications (m > n) are deployed in the same hardware design, resulting in inefficient multiplication operations. It is highly desired and challenging to propose a reconfigurable multiplier design to accommodate operands with both symmetric and asymmetric bit-widths. In this work, we propose a reconfigurable approximate multiplier to support multiplications at various precisions, i.e., bit-widths. Unlike prior works of approximate adders assuming a uniform weight distribution with bit-wise independence, scenarios like a quantized CNN may have a centralized weight distribution and hence follow a Gaussian-like distribution with correlated adjacent bits. Thus, a new block-based approximate adder is also proposed as part of the multiplier to ensure energy-efficient operation with an awareness of the bit-wise correlation. Our experimental results show that the proposed approximate adder significantly reduces the error rate by 76% to 98% over a state-of-the-art approximate adder for Gaussian-like distribution scenarios. Evaluation results show that the proposed multiplier is 19% faster and 22% more power saving than a Xilinx multiplier IP at the same bit precision and achieves a 23.94-dB peak signal-to-noise ratio, which is comparable to the accurate one of 24.10 dB when deployed in a Gaussian filter for image processing tasks.

References

  1. Elisardo Antelo, Paolo Montuschi, and Alberto Nannarelli. 2017. Improved 64-bit Radix-16 booth multiplier based on partial product array height reduction. IEEE Transactions on Circuits and Systems I: Regular Papers 64, 2 (2017), 409–418. https://doi.org/10.1109/TCSI.2016.2561518Google ScholarGoogle ScholarCross RefCross Ref
  2. Manish Bansal, Sangeeta Nakhate, and Ajay Somkuwar. 2011. High performance pipelined signed 64x64-bit multiplier using Radix-32 modified booth algorithm and Wallace structure. In Proceedings of the 2011 International Conference on Computational Intelligence and Communication Networks (CICN’11). 411–415. https://doi.org/10.1109/CICN.2011.86 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Kartikeya Bhardwaj, Pravin S. Mane, and Jorg Henkel. 2014. Power- and area-efficient approximate Wallace tree multiplier for error-resilient systems. In Proceedings of the 15th International Symposium on Quality Electronic Design (ISQED’14). 263–269. https://doi.org/10.1109/ISQED.2014.6783335Google ScholarGoogle ScholarCross RefCross Ref
  4. Indranil Chakraborty, Deboleena Roy, Aayush Ankit, and Kaushik Roy. 2019. Efficient hybrid network architectures for extremely quantized neural networks enabling intelligence at the edge. arxiv:1902.00460.Google ScholarGoogle Scholar
  5. Chuangtao Chen, Sen Yang, Weikang Qian, Mohsen Imani, Xunzhao Yin, and Cheng Zhuo. 2020. Optimally approximated and unbiased floating-point multiplier with runtime configurability. In Proceedings of the 39th International Conference on Computer-Aided Design (ICCAD’20). ACM, New York, NY, Article 121, 9 pages. https://doi.org/10.1145/3400302.3415702 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Jian Cheng, Jiaxiang Wu, Cong Leng, Yuhang Wang, and Qinghao Hu. 2018. Quantized CNN: A unified approach to accelerate and compress convolutional networks. IEEE Transactions on Neural Networks and Learning Systems 29, 10 (2018), 4730–4743. https://doi.org/10.1109/TNNLS.2017.2774288Google ScholarGoogle ScholarCross RefCross Ref
  7. Vinay K. Chippa, Srimat T. Chakradhar, Kaushik Roy, and Anand Raghunathan. 2013. Analysis and characterization of inherent application resilience for approximate computing. In Proceedings of the 50th Annual Design Automation Conference (DAC’13). ACM, New York, NY, Article 113, 9 pages. https://doi.org/10.1145/2463209.2488873 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jianing Deng, Zhiguo Shi, and Cheng Zhuo. 2020. Energy-efficient real-time UAV object detection on embedded platforms. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39, 10 (2020), 3123–3127. https://doi.org/10.1109/TCAD.2019.2957724Google ScholarGoogle ScholarCross RefCross Ref
  9. Vaibhav Gupta, Debabrata Mohapatra, Sang Phill Park, Anand Raghunathan, and Kaushik Roy. 2011. IMPACT: IMPrecise adders for low-power approximate computing. In Proceedings of the IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED’11). 409–414. https://doi.org/10.1109/ISLPED.2011.5993675 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jie Han and Michael Orshansky. 2013. Approximate computing: An emerging paradigm for energy-efficient design. In Proceedings of the 2013 18th IEEE European Test Symposium (ETS’13). 1–6. https://doi.org/10.1109/ETS.2013.6569370Google ScholarGoogle ScholarCross RefCross Ref
  11. Muhammad Abdullah Hanif, Rehan Hafiz, Osman Hasan, and Muhammad Shafique. 2017. QuAd: Design and analysis of quality-area optimal low-latency approximate adders. In Proceedings of the 54th Annual Design Automation Conference 2017 (DAC’17). ACM, New York, NY, Article 42, 6 pages. https://doi.org/10.1145/3061639.3062306 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Soheil Hashemi, Nicholas Anthony, Hokchhay Tann, R. Iris Bahar, and Sherief Reda. 2017. Understanding the impact of precision quantization on the accuracy and energy of neural networks. In Proceedings of the 2017 Design, Automation, and Test in Europe Conference and Exhibition (DATE’17). 1474–1479. https://doi.org/10.23919/DATE.2017.7927224 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Chandan Kumar Jha and Joycee Mekie. 2019. SEDA—Single exact dual approximate adders for approximate processors. In Proceedings of the 56th Annual Design Automation Conference (DAC’19). ACM, New York, NY, Article 237, 2 pages. https://doi.org/10.1145/3316781.3322475 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Andrew B. Kahng and Seokhyeong Kang. 2012. Accuracy-configurable adder for approximate arithmetic designs. In Proceedings of the 49th Annual Design Automation Conference (DAC’12). ACM, New York, NY, 820–825. https://doi.org/10.1145/2228360.2228509 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Sukhmeet Kaur, Manpreet Signh Manna Suman, and Signh Manna. 2013. Implementation of modified Booth algorithm (Radix 4) and its comparison with Booth algorithm (Radix-2). Advances in Electronic and Electric Engineering 3, 6 (2013), 683–690.Google ScholarGoogle Scholar
  16. Khaing Yin Kyaw, Wang Ling Goh, and Kiat Seng Yeo. 2010. Low-power high-speed multiplier for error-tolerant application. In Proceedings of the 2010 IEEE International Conference on Electron Devices and Solid-State Circuits (EDSSC’10). 1–4. https://doi.org/10.1109/EDSSC.2010.5713751Google ScholarGoogle ScholarCross RefCross Ref
  17. Doyun Kim, Han Young Yim, Sanghyuck Ha, Changgwun Lee, and Inyup Kang. 2018. Convolutional neural network quantization using generalized gamma distribution. arxiv:1810.13329.Google ScholarGoogle Scholar
  18. Jong Hwan Ko, Duckhwan Kim, Taesik Na, Jaeha Kung, and Saibal Mukhopadhyay. 2017. Adaptive weight compression for memory-efficient neural networks. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE’17). 199–204. https://doi.org/10.23919/DATE.2017.7926982 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Raghuraman Krishnamoorthi. 2018. Quantizing deep convolutional networks for efficient inference: A whitepaper. arxiv:1806.08342.Google ScholarGoogle Scholar
  20. Parag Kulkarni, Puneet Gupta, and Milos Ercegovac. 2011. Trading accuracy for power with an underdesigned multiplier architecture. In Proceedings of the 2011 24th International Conference on VLSI Design (VLSID’11). 346–351. https://doi.org/10.1109/VLSID.2011.51 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jinmook Lee, Changhyeon Kim, Sanghoon Kang, Dongjoo Shin, Sangyeob Kim, and Hoi-Jun Yoo. 2019. UNPU: An energy-efficient deep neural network accelerator with fully variable weight bit precision. IEEE Journal of Solid-State Circuits 54, 1 (2019), 173–185. https://doi.org/10.1109/JSSC.2018.2865489Google ScholarGoogle ScholarCross RefCross Ref
  22. Li Li and Hai Zhou. 2014. On error modeling and analysis of approximate adders. In Proceedings of the 2014 IEEE/ACM International Conference on Computer-Aided Design (ICCAD’14). 511–518. https://doi.org/10.1109/ICCAD.2014.7001399 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Weiqiang Liu, Liangyu Qian, Chenghua Wang, Honglan Jiang, Jie Han, and Fabrizio Lombardi. 2017. Design of approximate Radix-4 Booth multipliers for error-tolerant computing. IEEE Transactions on Computers 66, 8 (2017), 1435–1441. https://doi.org/10.1109/TC.2017.2672976Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Zhongyang Liu, Shaoheng Luo, Xiaowei Xu, Yiyu Shi, and Cheng Zhuo. 2018. A multi-level-optimization framework for FPGA-based cellular neural network implementation. Journal on Emerging Technologies in Computing Systems 14, 4 (Nov. 2018), Article 47, 17 pages. https://doi.org/10.1145/3273957 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Hang Lu, Xin Wei, Ning Lin, Guihai Yan, and Xiaowei Li. 2018. Tetris: Re-architecting convolutional neural network computation for machine learning accelerators. In Proceedings of the 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD’18). 1–8. https://doi.org/10.1145/3240765.3240855 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Hamid Reza Mahdiani, Ali-Akbar Ahmadi, Sied Mehdi Fakhraie, and Caro Lucas. 2010. Bio-inspired imprecise computational blocks for efficient VLSI implementation of soft-computing applications. IEEE Transactions on Circuits and Systems I: Regular Papers 57, 4 (2010), 850–862. https://doi.org/10.1109/TCSI.2009.2027626 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Sana Mazahir, Osman Hasan, Rehan Hafiz, Muhammad Shafique, and Jorg Henkel. 2017. Probabilistic error modeling for approximate adders. IEEE Transactions on Computers 66, 3 (2017), 515–530. https://doi.org/10.1109/TC.2016.2605382 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Bert Moons and Marian Verhelst. 2015. DVAS: Dynamic voltage accuracy scaling for increased energy-efficiency in approximate computing. In Proceedings of the 2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED’15). 237–242. https://doi.org/10.1109/ISLPED.2015.7273520Google ScholarGoogle ScholarCross RefCross Ref
  29. Muhammad Shafique, Waqas Ahmad, Rehan Hafiz, and Jorg Henkel. 2015. A low latency generic accuracy configurable adder. In Proceedings of the 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC’15). 1–6. https://doi.org/10.1145/2744769.2744778 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Honey Durga Tiwari, Ganzorig Gankhuyag, Chan Mo Kim, and Yong Beom Cho. 2008. Multiplier design based on ancient Indian Vedic mathematics. In Proceedings of the 2008 International SoC Design Conference, Vol. 2. II-65–II-68. https://doi.org/10.1109/SOCDC.2008.4815685Google ScholarGoogle ScholarCross RefCross Ref
  31. Ajay K. Verma, Philip Brisk, and Paolo Ienne. 2008. Variable latency speculative addition: A new paradigm for arithmetic circuit design. In Proceedings of the Conference on Design, Automation, and Test in Europe (DATE’08). ACM, New York, NY, 1250–1255. https://doi.org/10.1145/1403375.1403679 Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. C. S. Wallace. 1964. A suggestion for a fast multiplier. IEEE Transactions on Electronic Computers EC-13, 1 (1964), 14–17. https://doi.org/10.1109/PGEC.1964.263830Google ScholarGoogle ScholarCross RefCross Ref
  33. Rong Ye, Ting Wang, Feng Yuan, Rakesh Kumar, and Qiang Xu. 2013. On reconfiguration-oriented approximate adder design and its application. In Proceedings of the 2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD’13). 48–54. https://doi.org/10.1109/ICCAD.2013.6691096 Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Chengwei Zhou, Yujie Gu, Xing Fan, Zhiguo Shi, Guoqiang Mao, and Yinmin D. Zhang. 2018. Direction-of-arrival estimation for coprime array via virtual array interpolation. IEEE Transactions on Signal Processing 66, 22 (2018), 5956–5971. https://doi.org/10.1109/TSP.2018.2872012Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Chengwei Zhou, Yujie Gu, Shibo He, and Zhiguo Shi. 2018. A robust and efficient algorithm for coprime array adaptive beamforming. IEEE Transactions on Vehicular Technology 67, 2 (2018), 1099–1112. https://doi.org/10.1109/TVT.2017.2704610Google ScholarGoogle ScholarCross RefCross Ref
  36. Rui Zhou and Weikang Qian. 2016. A general sign bit error correction scheme for approximate adders. In Proceedings of the 26th Edition of the Great Lakes Symposium on VLSI (GLSVLSI’16). ACM, New York, NY, 221–226. https://doi.org/10.1145/2902961.2903012 Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Xian Zhou, Li Zhang, Chuliang Guo, Xunzhao Yin, and Cheng Zhuo. 2020. A convolutional neural network accelerator architecture with fine-granular mixed precision configurability. In Proceedings of the 2020 IEEE International Symposium on Circuits and Systems (ISCAS’20). 1–5. https://doi.org/10.1109/ISCAS45731.2020.9180844Google ScholarGoogle ScholarCross RefCross Ref
  38. Ning Zhu, Wang Ling Goh, Gang Wang, and Kiat Seng Yeo. 2010. Enhanced low-power high-speed adder for error-tolerant application. In Proceedings of the 2010 International SoC Design Conference (ISOCC’10). 323–327. https://doi.org/10.1109/SOCDC.2010.5682905Google ScholarGoogle ScholarCross RefCross Ref
  39. Cheng Zhuo, Shaoheng Luo, Houle Gan, Jiang Hu, and Zhiguo Shi. 2020. Noise-aware DVFS for efficient transitions on battery-powered IoT devices. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39, 7 (2020), 1498–1510. https://doi.org/10.1109/TCAD.2019.2917844Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Cheng Zhuo, Kassan Unda, Yiyu Shi, and Wei-Kai Shih. 2019. From layout to system: Early stage power delivery and architecture co-exploration. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 38, 7 (2019), 1291–1304. https://doi.org/10.1109/TCAD.2018.2834438Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. A Reconfigurable Multiplier for Signed Multiplications with Asymmetric Bit-Widths

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Journal on Emerging Technologies in Computing Systems
            ACM Journal on Emerging Technologies in Computing Systems  Volume 17, Issue 4
            October 2021
            446 pages
            ISSN:1550-4832
            EISSN:1550-4840
            DOI:10.1145/3472280
            • Editor:
            • Ramesh Karri
            Issue’s Table of Contents

            Copyright © 2021 Association for Computing Machinery.

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 30 June 2021
            • Accepted: 1 December 2020
            • Revised: 1 August 2020
            • Received: 1 April 2020
            Published in jetc Volume 17, Issue 4

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format