当前位置: X-MOL 学术IEEE Trans. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Accelerating Deep Neural Network In-situ Training with Non-volatile and Volatile Memory Based Hybrid Precision Synapses
IEEE Transactions on Computers ( IF 3.6 ) Pub Date : 2020-08-01 , DOI: 10.1109/tc.2020.3000218
Yandong Luo , Shimeng Yu

Compute-in-memory (CIM) with emerging non-volatile memories (eNVMs) is time and energy efficient for deep neural network (DNN) inference. However, challenges still remain for DNN in-situ training with eNVMs due to the asymmetric weight update behavior, high programming latency and energy consumption. To overcome these challenges, a hybrid precision synapse combining eNVMs with capacitor has been proposed. It leverages the symmetric and fast weight update in the volatile capacitor, as well as the non-volatility and large dynamic range of the eNVMs. In this article, DNN in-situ training architecture with hybrid precision synapses is proposed and system level benchmarked is conducted. First, the circuit modules required for in-situ training with hybrid precision synapses are designed and the system architecture is proposed. Then, the impact of different weight precision configurations, weight transfer interval and limited capacitor retention time on training accuracy is investigated by incorporating hardware properties into Tensorflow simulation. Finally, the system-level benchmark is conducted at 32nm technology node in the modified NeuroSim simulator for hybrid precision synapse, in comparison with the baseline designs that are solely based on eNVMs or SRAM technology. The benchmark results show that CIM accelerator based on hybrid precision synapse achieves at least 3.07x and 2.89x better energy efficiency for training compared with its eNVM counterparts and SRAM technology at 32nm node, respectively. 227x and 33.8x better energy efficiency are obtained when compared to GPU and TPU. The scaling trend of hybrid precision synapse is projected towards 7nm node and comparison with state-of-the-art 7nm SRAM technology is made.

中文翻译:

使用基于非易失性和易失性存储器的混合精确突触加速深度神经网络原位训练

具有新兴非易失性存储器 (eNVM) 的内存计算 (CIM) 对深度神经网络 (DNN) 推理来说是时间和能源效率高的。然而,由于不对称的权重更新行为、高编程延迟和能耗,使用 eNVM 进行 DNN 原位训练仍然面临挑战。为了克服这些挑战,已经提出了将 eNVM 与电容器相结合的混合精确突触。它利用了易失性电容器中对称且快速的权重更新,以及 eNVM 的非易失性和大动态范围。在本文中,提出了具有混合精度突触的 DNN 原位训练架构,并进行了系统级基准测试。首先,设计了混合精密突触原位训练所需的电路模块,并提出了系统架构。然后,通过将硬件特性结合到 Tensorflow 模拟中,研究了不同权重精度配置、权重转移间隔和有限的电容器保持时间对训练精度的影响。最后,与仅基于 eNVM 或 SRAM 技术的基线设计相比,系统级基准测试是在用于混合精确突触的改良 NeuroSim 模拟器中在 32nm 技术节点上进行的。基准测试结果表明,与 32nm 节点的 eNVM 对应物和 SRAM 技术相比,基于混合精密突触的 CIM 加速器分别实现了至少 3.07 倍和 2.89 倍的训练能效。与 GPU 和 TPU 相比,能效提高了 227 倍和 33.8 倍。
更新日期:2020-08-01
down
wechat
bug