当前位置: X-MOL 学术arXiv.cs.AR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
End-to-end 100-TOPS/W Inference With Analog In-Memory Computing: Are We There Yet?
arXiv - CS - Hardware Architecture Pub Date : 2021-09-03 , DOI: arxiv-2109.01404
Gianmarco Ottavi, Geethan Karunaratne, Francesco Conti, Irem Boybat, Luca Benini, Davide Rossi

In-Memory Acceleration (IMA) promises major efficiency improvements in deep neural network (DNN) inference, but challenges remain in the integration of IMA within a digital system. We propose a heterogeneous architecture coupling 8 RISC-V cores with an IMA in a shared-memory cluster, analyzing the benefits and trade-offs of in-memory computing on the realistic use case of a MobileNetV2 bottleneck layer. We explore several IMA integration strategies, analyzing performance, area, and energy efficiency. We show that while pointwise layers achieve significant speed-ups over software implementation, on depthwise layer the inability to efficiently map parameters on the accelerator leads to a significant trade-off between throughput and area. We propose a hybrid solution where pointwise convolutions are executed on IMA while depthwise on the cluster cores, achieving a speed-up of 3x over SW execution while saving 50% of area when compared to an all-in IMA solution with similar performance.

中文翻译:

模拟内存计算的端到端 100-TOPS/W 推理:我们到了吗?

内存中加速 (IMA) 有望显着提高深度神经网络 (DNN) 推理的效率,但在数字系统中集成 IMA 方面仍然存在挑战。我们提出了一种异构架构,将 8 个 RISC-V 内核与共享内存集群中的 IMA 耦合,分析内存计算在 MobileNetV2 瓶颈层的实际用例中的优势和权衡。我们探索了几种 IMA 集成策略,分析了性能、面积和能源效率。我们表明,虽然逐点层比软件实现实现了显着的加速,但在深度层上,无法有效地映射加速器上的参数会导致吞吐量和面积之间的重大权衡。
更新日期:2021-09-06
down
wechat
bug