当前位置: X-MOL 学术Microprocess. Microsyst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A hybrid precision low power computing-in-memory architecture for neural networks
Microprocessors and Microsystems ( IF 1.9 ) Pub Date : 2020-10-20 , DOI: 10.1016/j.micpro.2020.103351
Rui Xu , Linfeng Tao , Tianqi Wang , Xi Jin , Chenxia Li , Zhengda Li , Jun Ren

Recently, non-volatile memory-based computing-in-memory has been regarded as a promising competitor to ultra-low-power AI chips. Implementations based on both binarized (BIN) and multi-bit (MB) schemes are proposed for DNNs/CNNs. However, there are challenges in accuracy and power efficiency in the practical use of both schemes. This paper proposes a hybrid precision architecture and circuit-level techniques to overcome these challenges. According to measured experimental results, a test chip based on the proposed architecture achieves (1) from binarized weights and inputs up to 8-bit input, 5-bit weight, and 7-bit output, (2) an accuracy loss reduction of from 86% to 96% for multiple complex CNNs, and (3) a power efficiency of 2.15TOPS/W based on a 0.22μm CMOS process which greatly reduces costs compared to digital designs with similar power efficiency. With a more advanced process, the architecture can achieve a higher power efficiency. According to our estimation, a power efficiency of over 20TOPS/W can be achieved with a 55nm CMOS process.

中文翻译:


用于神经网络的混合精密低功耗内存计算架构



最近,基于非易失性存储器的内存计算被认为是超低功耗人工智能芯片的有力竞争对手。针对 DNN/CNN,提出了基于二值化 (BIN) 和多位 (MB) 方案的实现。然而,这两种方案在实际使用中都存在精度和功效方面的挑战。本文提出了一种混合精密架构和电路级技术来克服这些挑战。根据测量的实验结果,基于所提出的架构的测试芯片实现了(1)从二值化权重和输入到8位输入、5位权重和7位输出,(2)精度损失降低了对于多个复杂的 CNN,效率为 86% 至 96%;(3) 基于 0.22μm CMOS 工艺,功率效率为 2.15TOPS/W,与具有类似功率效率的数字设计相比,大大降低了成本。凭借更先进的工艺,该架构可以实现更高的电源效率。根据我们的估算,55nm CMOS工艺可以实现超过20TOPS/W的功率效率。
更新日期:2020-10-20
down
wechat
bug