Implementation of an On-Chip Learning Neural Network IC Using Highly Linear Charge Trap Device,IEEE Transactions on Circuits and Systems I: Regular Papers

当前位置： X-MOL 学术 › IEEE Trans. Circuits Syst. I Regul. Pap. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Implementation of an On-Chip Learning Neural Network IC Using Highly Linear Charge Trap Device
IEEE Transactions on Circuits and Systems I: Regular Papers ( IF 5.1 ) Pub Date : 2021-05-21 , DOI: 10.1109/tcsi.2021.3071872
Jong-Moon Choi , Do-Wan Kwon , Je-Joong Woo , Eun-Je Park , Kee-Won Kwon

This paper presents an IC implementation of on-chip learning neural network accelerator using highly linear CMOS-compatible floating gate charge trap devices. A simple learning algorithm utilizing winner-take-all and competitive learning is proposed to design fast and power-efficient hardware. This algorithm was analyzed with behavioral model of emerging non-volatile memory via MATLAB. The linearity, symmetry, and cycle-to-cycle variation of multi-bit switching characteristic affects training accuracy. The proposed content-aware programming technique of modulated column line driver provides flexibility for real-time training while maintaining device linearity, despite having to update a different step for every unit cell and training. The prototype IC is embedded in the process-in-memory structure for energy efficient computing, in which cell arrays were divided into 4 sub-blocks to reduce I-R drop. The prototype IC fabricated using 180nm CMOS technology consumes 353.3pJ and 898.2pJ during inference and training mode, which corresponds 95.05TOPS/W and 38.03 TOPS/W, respectively. The fully integrated non-volatile AI IC with on-chip solution is demonstrated with throughput of 1343.2 GOPS.

中文翻译：

使用高度线性电荷陷阱器件实现片上学习神经网络 IC

本文介绍了使用高度线性的 CMOS 兼容浮栅电荷陷阱器件的片上学习神经网络加速器的 IC 实现。提出了一种利用赢家通吃和竞争学习的简单学习算法来设计快速且节能的硬件。该算法通过MATLAB对新兴非易失性存储器的行为模型进行了分析。多位切换特性的线性度、对称性和周期间变化会影响训练精度。尽管必须为每个单元和训练更新不同的步骤，但所提出的调制列线驱动器的内容感知编程技术为实时训练提供了灵活性，同时保持了设备线性。原型 IC 嵌入在内存中的进程结构中，用于节能计算，其中单元阵列被分成 4 个子块以减少 IR 压降。使用 180nm CMOS 技术制造的原型 IC 在推理和训练模式下消耗 353.3pJ 和 898.2pJ，分别对应于 95.05TOPS/W 和 38.03TOPS/W。具有片上解决方案的完全集成的非易失性 AI IC 的吞吐量为 1343.2 GOPS。

更新日期：2021-06-08

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>