当前位置: X-MOL 学术arXiv.cs.AR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Edge AI without Compromise: Efficient, Versatile and Accurate Neurocomputing in Resistive Random-Access Memory
arXiv - CS - Hardware Architecture Pub Date : 2021-08-17 , DOI: arxiv-2108.07879
Weier WanStanford University, Rajkumar KubendranUniversity of California San DiegoUniversity of Pittsburgh, Clemens SchaeferUniversity of Notre Dame, S. Burc EryilmazStanford University, Wenqiang ZhangTsinghua University, Dabin WuTsinghua University, Stephen DeissUniversity of California San Diego, Priyanka RainaStanford University, He QianTsinghua University, Bin GaoTsinghua University, Siddharth JoshiUniversity of Notre DameUniversity of California San Diego, Huaqiang WuTsinghua University, H. -S. Philip WongStanford University, Gert CauwenberghsUniversity of California San Diego

Realizing today's cloud-level artificial intelligence functionalities directly on devices distributed at the edge of the internet calls for edge hardware capable of processing multiple modalities of sensory data (e.g. video, audio) at unprecedented energy-efficiency. AI hardware architectures today cannot meet the demand due to a fundamental "memory wall": data movement between separate compute and memory units consumes large energy and incurs long latency. Resistive random-access memory (RRAM) based compute-in-memory (CIM) architectures promise to bring orders of magnitude energy-efficiency improvement by performing computation directly within memory. However, conventional approaches to CIM hardware design limit its functional flexibility necessary for processing diverse AI workloads, and must overcome hardware imperfections that degrade inference accuracy. Such trade-offs between efficiency, versatility and accuracy cannot be addressed by isolated improvements on any single level of the design. By co-optimizing across all hierarchies of the design from algorithms and architecture to circuits and devices, we present NeuRRAM - the first multimodal edge AI chip using RRAM CIM to simultaneously deliver a high degree of versatility for diverse model architectures, record energy-efficiency $5\times$ - $8\times$ better than prior art across various computational bit-precisions, and inference accuracy comparable to software models with 4-bit weights on all measured standard AI benchmarks including accuracy of 99.0% on MNIST and 85.7% on CIFAR-10 image classification, 84.7% accuracy on Google speech command recognition, and a 70% reduction in image reconstruction error on a Bayesian image recovery task. This work paves a way towards building highly efficient and reconfigurable edge AI hardware platforms for the more demanding and heterogeneous AI applications of the future.

中文翻译:

不妥协的边缘 AI:电阻式随机存取存储器中的高效、多功能和准确的神经计算

直接在分布在互联网边缘的设备上实现当今的云级人工智能功能需要能够以前所未有的能效处理多种传感数据(例如视频、音频)模式的边缘硬件。由于基本的“内存墙”,今天的 AI 硬件架构无​​法满足需求:独立计算和内存单元之间的数据移动消耗大量能量并导致长延迟。基于电阻式随机存取存储器 (RRAM) 的内存计算 (CIM) 架构有望通过直接在内存中执行计算来带来数量级的能效改进。然而,CIM 硬件设计的传统方法限制了其处理不同 AI 工作负载所需的功能灵活性,并且必须克服会降低推理精度的硬件缺陷。效率、多功能性和准确性之间的这种权衡无法通过对任何单一设计级别的孤立改进来解决。通过对从算法和架构到电路和设备的所有设计层次进行协同优化,我们推出了 NeuRRAM——第一个使用 RRAM CIM 的多模式边缘 AI 芯片,同时为不同的模型架构提供高度的多功能性,创纪录的能效 $5 \times$ - 在各种计算位精度方面比现有技术好 8\times$,推理准确度可与所有测量的标准 AI 基准的 4 位权重软件模型相媲美,包括 MNIST 的准确度为 99.0%,CIFAR 的准确度为 85.7%。 10 图像分类,谷歌语音命令识别准确率为 84.7%,在贝叶斯图像恢复任务中,图像重建错误减少了 70%。这项工作为构建高效且可重新配置的边缘 AI 硬件平台铺平了道路,以适应未来要求更高且异构的 AI 应用。
更新日期:2021-08-19
down
wechat
bug