CIMAT: A Compute-In-Memory Architecture for On-chip Training Based on Transpose SRAM Arrays,IEEE Transactions on Computers

当前位置： X-MOL 学术 › IEEE Trans. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

CIMAT: A Compute-In-Memory Architecture for On-chip Training Based on Transpose SRAM Arrays
IEEE Transactions on Computers ( IF 3.6 ) Pub Date : 2020-01-01 , DOI: 10.1109/tc.2020.2980533
Hongwu Jiang , Xiaochen Peng , Shanshi Huang , Shimeng Yu

Rapid development in deep neural networks (DNNs) is enabling many intelligent applications. However, on-chip training of DNNs is challenging due to the extensive computation and memory bandwidth requirements. To solve the bottleneck of the memory wall problem, compute-in-memory (CIM) approach exploits the analog computation along the bit line of the memory array thus significantly speeds up the vector-matrix multiplications. So far, most of the CIM-based architectures target at implementing inference engine for offline training only. In this article, we propose CIMAT, a CIM Architecture for Training. At the bitcell level, we design two versions of 7T and 8T transpose SRAM to implement bi-directional vector-to-matrix multiplication that is needed for feedforward (FF) and backprogpagation (BP). Moreover, we design the periphery circuitry, mapping strategy and the data flow for the BP process and weight update to support the on-chip training based on CIM. To further improve training performance, we explore the pipeline optimization of proposed architecture. We utilize the mature and advanced CMOS technology at 7 nm to design the CIMAT architecture with 7T/8T transpose SRAM array that supports bi-directional parallel read. We explore the 8-bit training performance of ImageNet on ResNet-18, showing that 7T-based design can achieve 3.38× higher energy efficiency (∼6.02 TOPS/W), 4.34× frame rate (∼4,020 fps) and only 50 percent chip size compared to the baseline architecture with conventional 6T SRAM array that supports row-by-row read only. The even better performance is obtained with 8T-based architecture, which can reach ∼10.79 TOPS/W and ∼48,335 fps with 74-percent chip area compared to the baseline.

中文翻译：

CIMAT：基于转置 SRAM 阵列的片上训练的内存计算架构

深度神经网络 (DNN) 的快速发展使许多智能应用成为可能。然而，由于广泛的计算和内存带宽要求，DNN 的片上训练具有挑战性。为了解决内存墙问题的瓶颈，内存计算 (CIM) 方法利用沿内存阵列位线的模拟计算，从而显着加快向量矩阵乘法。到目前为止，大多数基于 CIM 的架构都旨在实现仅用于离线训练的推理引擎。在本文中，我们提出了 CIMAT，一种用于训练的 CIM 架构。在位单元级别，我们设计了两个版本的 7T 和 8T 转置 SRAM，以实现前馈 (FF) 和反向传播 (BP) 所需的双向向量到矩阵乘法。此外，我们设计外围电路，BP过程和权重更新的映射策略和数据流，以支持基于CIM的片上训练。为了进一步提高训练性能，我们探索了拟议架构的管道优化。我们利用成熟先进的 7 nm CMOS 技术设计具有 7T/8T 转置 SRAM 阵列的 CIMAT 架构，支持双向并行读取。我们探索了 ImageNet 在 ResNet-18 上的 8 位训练性能，表明基于 7T 的设计可以实现 3.38 倍更高的能效（~6.02 TOPS/W）、4.34 倍帧率（~4,020 fps）和仅 50% 的芯片与支持逐行只读的传统 6T SRAM 阵列的基线架构相比，尺寸更小。使用基于 8T 的架构获得更好的性能，可以达到 ~10.79 TOPS/W 和 ~48，

更新日期：2020-01-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11