当前位置: X-MOL 学术IEEE J. Emerg. Sel. Top. Circuits Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Heterogeneous In-Memory Computing Cluster for Flexible End-to-End Inference of Real-World Deep Neural Networks
IEEE Journal on Emerging and Selected Topics in Circuits and Systems ( IF 3.7 ) Pub Date : 4-28-2022 , DOI: 10.1109/jetcas.2022.3170152
Angelo Garofalo 1 , Gianmarco Ottavi 1 , Francesco Conti 1 , Geethan Karunaratne 2 , Irem Boybat 2 , Luca Benini 1 , Davide Rossi 1
Affiliation  

Deployment of modern TinyML tasks on small battery-constrained IoT devices requires high computational energy efficiency. Analog In-Memory Computing (IMC) using non-volatile memory (NVM) promises major efficiency improvements in deep neural network (DNN) inference and serves as on-chip memory storage for DNN weights. However, IMC’s functional flexibility limitations and their impact on performance, energy, and area efficiency are not yet fully understood at the system level. To target practical end-to-end IoT applications, IMC arrays must be enclosed in heterogeneous programmable systems, introducing new system-level challenges which we aim at addressing in this work. We present a heterogeneous tightly-coupled clustered architecture integrating 8 RISC-V cores, an in-memory computing accelerator (IMA), and digital accelerators. We benchmark the system on a highly heterogeneous workload such as the Bottleneck layer from a MobileNetV2, showing 11.5×11.5\times performance and 9.5×9.5\times energy efficiency improvements, compared to highly optimized parallel execution on the cores. Furthermore, we explore the requirements for end-to-end inference of a full mobile-grade DNN (MobileNetV2) in terms of IMC array resources, by scaling up our heterogeneous architecture to a multi-array accelerator. Our results show that our solution, on the end-to-end inference of the MobileNetV2, is one order of magnitude better in terms of execution latency than existing programmable architectures and two orders of magnitude better than state-of-the-art heterogeneous solutions integrating in-memory computing analog cores.

中文翻译:


用于现实世界深度神经网络灵活端到端推理的异构内存计算集群



在小型电池受限的物联网设备上部署现代 TinyML 任务需要高计算能效。使用非易失性存储器 (NVM) 的模拟内存计算 (IMC) 有望显着提高深度神经网络 (DNN) 推理的效率,并用作 DNN 权重的片上内存存储。然而,IMC 的功能灵活性限制及其对性能、能源和面积效率的影响尚未在系统级别得到充分了解。为了针对实际的端到端物联网应用,IMC 阵列必须封装在异构可编程系统中,从而引入新的系统级挑战,我们的目标是在这项工作中解决这些挑战。我们提出了一种异构紧耦合集群架构,集成了 8 个 RISC-V 内核、内存计算加速器 (IMA) 和数字加速器。我们在高度异构的工作负载(例如 MobileNetV2 的瓶颈层)上对系统进行基准测试,与内核上高度优化的并行执行相比,性能提高了 11.5×11.5 倍,能效提高了 9.5×9.5 倍。此外,我们通过将异构架构扩展到多阵列加速器,探索了完整移动级 DNN (MobileNetV2) 在 IMC 阵列资源方面的端到端推理需求。我们的结果表明,我们的解决方案在 MobileNetV2 的端到端推理上,在执行延迟方面比现有的可编程架构好一个数量级,比最先进的异构解决方案好两个数量级集成内存计算模拟核心。
更新日期:2024-08-26
down
wechat
bug