IMCA: An Efficient In-Memory Convolution Accelerator,IEEE Transactions on Very Large Scale Integration (VLSI) Systems

当前位置： X-MOL 学术 › IEEE Trans. Very Larg. Scale Integr. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

IMCA: An Efficient In-Memory Convolution Accelerator
IEEE Transactions on Very Large Scale Integration (VLSI) Systems ( IF 2.8 ) Pub Date : 2021-01-14 , DOI: 10.1109/tvlsi.2020.3047641
Hasan Erdem Yantir , Ahmed M. Eltawil , Khaled N. Salama

Traditional convolutional neural network (CNN) architectures suffer from two bottlenecks: computational complexity and memory access cost. In this study, an efficient in-memory convolution accelerator (IMCA) is proposed based on associative in-memory processing to alleviate these two problems directly. In the IMCA, the convolution operations are directly performed inside the memory as in-place operations. The proposed memory computational structure allows for a significant improvement in computational metrics, namely, TOPS/W. Furthermore, due to its unconventional computation style, the IMCA can take advantage of many potential opportunities, such as constant multiplication, bit-level sparsity, and dynamic approximate computing, which, while supported by traditional architectures, require extra overhead to exploit, thus reducing any potential gains. The proposed accelerator architecture exhibits a significant efficiency in terms of area and performance, achieving around 0.65 GOPS and 1.64 TOPS/W at 16-bit fixed-point precision with an area less than 0.25 mm 2 .

中文翻译：

IMCA：高效的内存中卷积加速器

传统的卷积神经网络（CNN）架构存在两个瓶颈：计算复杂性和内存访问成本。在本研究中，提出了一种基于关联内存处理的高效内存卷积加速器（IMCA）来直接缓解这两个问题。在 IMCA 中，卷积运算直接在内存中作为就地运算执行。所提出的内存计算结构可以显着改进计算指标，即 TOPS/W。此外，由于其非常规的计算方式，IMCA可以利用许多潜在的机会，例如常数乘法、位级稀疏性和动态近似计算，这些虽然得到传统架构的支持，但需要额外的开销来利用，从而减少任何潜在的收益。所提出的加速器架构在面积和性能方面表现出显着的效率，在16位定点精度下实现了约0.65 GOPS和1.64 TOPS/W，面积小于0.25 mm 2 。

更新日期：2021-01-14

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11