当前位置: X-MOL 学术IEEE Trans. Very Larg. Scale Integr. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
IMCA: An Efficient In-Memory Convolution Accelerator
IEEE Transactions on Very Large Scale Integration (VLSI) Systems ( IF 2.8 ) Pub Date : 2021-01-14 , DOI: 10.1109/tvlsi.2020.3047641
Hasan Erdem Yantir , Ahmed M. Eltawil , Khaled N. Salama

Traditional convolutional neural network (CNN) architectures suffer from two bottlenecks: computational complexity and memory access cost. In this study, an efficient in-memory convolution accelerator (IMCA) is proposed based on associative in-memory processing to alleviate these two problems directly. In the IMCA, the convolution operations are directly performed inside the memory as in-place operations. The proposed memory computational structure allows for a significant improvement in computational metrics, namely, TOPS/W. Furthermore, due to its unconventional computation style, the IMCA can take advantage of many potential opportunities, such as constant multiplication, bit-level sparsity, and dynamic approximate computing, which, while supported by traditional architectures, require extra overhead to exploit, thus reducing any potential gains. The proposed accelerator architecture exhibits a significant efficiency in terms of area and performance, achieving around 0.65 GOPS and 1.64 TOPS/W at 16-bit fixed-point precision with an area less than 0.25 mm 2 .

中文翻译:


IMCA:高效的内存中卷积加速器



传统的卷积神经网络(CNN)架构存在两个瓶颈:计算复杂性和内存访问成本。在本研究中,提出了一种基于关联内存处理的高效内存卷积加速器(IMCA)来直接缓解这两个问题。在 IMCA 中,卷积运算直接在内存中作为就地运算执行。所提出的内存计算结构可以显着改进计算指标,即 TOPS/W。此外,由于其非常规的计算方式,IMCA可以利用许多潜在的机会,例如常数乘法、位级稀疏性和动态近似计算,这些虽然得到传统架构的支持,但需要额外的开销来利用,从而减少任何潜在的收益。所提出的加速器架构在面积和性能方面表现出显着的效率,在16位定点精度下实现了约0.65 GOPS和1.64 TOPS/W,面积小于0.25 mm 2 。
更新日期:2021-01-14
down
wechat
bug