当前位置: X-MOL 学术IEEE Trans. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
PANTHER: A Programmable Architecture for Neural Network Training Harnessing Energy-efficient ReRAM
IEEE Transactions on Computers ( IF 3.6 ) Pub Date : 2020-08-01 , DOI: 10.1109/tc.2020.2998456
Aayush Ankit , Izzat El Hajj , Sai Rahul Chalamalasetti , Sapan Agarwal , Matthew Marinella , Martin Foltin , John Paul Strachan , Dejan Milojicic , Wen-Mei Hwu , Kaushik Roy

The wide adoption of deep neural networks has been accompanied by ever-increasing energy and performance demands due to the expensive nature of training them. Numerous special-purpose architectures have been proposed to accelerate training: both digital and hybrid digital-analog using resistive RAM (ReRAM) crossbars. ReRAM-based accelerators have demonstrated the effectiveness of ReRAM crossbars at performing matrix-vector multiplication operations that are prevalent in training. However, they still suffer from inefficiency due to the use of serial reads and writes for performing the weight gradient and update step. A few works have demonstrated the possibility of performing outer products in crossbars, which can be used to realize the weight gradient and update step without the use of serial reads and writes. However, these works have been limited to low precision operations which are not sufficient for typical training workloads. Moreover, they have been confined to a limited set of training algorithms for fully-connected layers only. To address these limitations, we propose a bit-slicing technique for enhancing the precision of ReRAM-based outer products, which is substantially different from bit-slicing for matrix-vector multiplication only. We incorporate this technique into a crossbar architecture with three variants catered to different training algorithms. To evaluate our design on different types of layers in neural networks (fully-connected, convolutional, etc.) and training algorithms, we develop PANTHER, an ISA-programmable training accelerator with compiler support. Our design can also be integrated into other accelerators in the literature to enhance their efficiency. Our evaluation shows that PANTHER achieves up to $8.02\times$8.02×, $54.21\times$54.21×, and $103\times$103× energy reductions as well as $7.16\times$7.16×, $4.02\times$4.02×, and $16\times$16× execution time reductions compared to digital accelerators, ReRAM-based accelerators, and GPUs, respectively.

中文翻译:

PANTHER:用于神经网络训练的可编程架构,利用节能 ReRAM

由于训练它们的昂贵性质,深度神经网络的广泛采用伴随着不断增加的能量和性能需求。已经提出了许多专用架构来加速训练:使用电阻 RAM (ReRAM) 交叉开关的数字和混合数模。基于 ReRAM 的加速器已经证明了 ReRAM 交叉开关在执行训练中普遍存在的矩阵向量乘法运算方面的有效性。然而,由于使用串行读取和写入来执行权重梯度和更新步骤,它们仍然效率低下。一些作品已经证明了在 crossbar 中执行外积的可能性,它可以用于在不使用串行读写的情况下实现权重梯度和更新步骤。然而,这些工作仅限于低精度操作,不足以应对典型的训练工作负载。此外,它们仅限于用于全连接层的有限训练算法集。为了解决这些限制,我们提出了一种位切片技术,用于提高基于 ReRAM 的外积的精度,这与仅用于矩阵向量乘法的位切片有很大不同。我们将此技术整合到具有三种变体的交叉架构中,以适应不同的训练算法。为了评估我们在神经网络(全连接、卷积等)和训练算法中不同类型层的设计,我们开发了 PANTHER,这是一种具有编译器支持的 ISA 可编程训练加速器。我们的设计也可以集成到文献中的其他加速器中以提高它们的效率。我们的评估表明 PANTHER 达到了$8.02\times$8.02×, $54.21\times$54.21×, 和 $103\times$103× 能源减少以及 $7.16\times$7.16×, $4.02\times$4.02×, 和 $16\times$16× 与数字加速器、基于 ReRAM 的加速器和 GPU 相比,执行时间分别减少。
更新日期:2020-08-01
down
wechat
bug