当前位置: X-MOL 学术IEEE Trans. Parallel Distrib. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
ERA-LSTM: An Efficient ReRAM-Based Architecture for Long Short-Term Memory
IEEE Transactions on Parallel and Distributed Systems ( IF 5.6 ) Pub Date : 2020-06-01 , DOI: 10.1109/tpds.2019.2962806
Jianhui Han , He Liu , Mingyu Wang , Zhaolin Li , Youhui Zhang

Processing-in-memory (PIM) architecture based on resistive random access memory (ReRAM) crossbars is a promising solution to the memory bottleneck that long short-term memory (LSTM) faces. Based on the dataflow analysis of the LSTM computing paradigm, this article proposes to adopt the ReRAM-based analog approximate computing to conduct the LSTM-specific element-wise computation. Combined with the dot-product computation implemented with ReRAM crossbars, a new LSTM processing tile is designed to significantly reduce the demand for analog-to-digital converters (ADCs), which is the major part of power consumption of existing designs. Next, we elaborate on a mapping scheme to efficiently deploy large-scale LSTM onto multiple processing tiles. Finally, an architecture enhancement is proposed to support crossbar-friendly LSTM pruning to further improve efficiency. This overall design, named ERA-LSTM, is presented. Our evaluation shows that it can outperform two state-of-the-art FPGA-based LSTM accelerators by 103.6 and 35.9 times, respectively; compared with a state-of-the-art ReRAM-based LSTM accelerator with digital element-wise computation, it is 6.1 times more efficient. Moreover, our experiments demonstrate that the impact of hardware constraints and approximation errors on the inference accuracy can be effectively reduced by the proposed fine-tuning scheme and by optimizing the design of the approximator.

中文翻译:

ERA-LSTM:一种基于 ReRAM 的高效长短期记忆架构

基于电阻式随机存取存储器 (ReRAM) 交叉开关的内存处理 (PIM) 架构是解决长短期记忆 (LSTM) 面临的内存瓶颈的有前途的解决方案。本文基于LSTM计算范式的数据流分析,提出采用基于ReRAM的模拟近似计算来进行LSTM特定的element-wise计算。结合使用 ReRAM 交叉开关实现的点积计算,新的 LSTM 处理块旨在显着降低对模数转换器 (ADC) 的需求,这是现有设计功耗的主要部分。接下来,我们详细阐述了一种映射方案,以有效地将大规模 LSTM 部署到多个处理块上。最后,提出了一种架构增强,以支持交叉条友好的 LSTM 修剪,以进一步提高效率。介绍了名为 ERA-LSTM 的整体设计。我们的评估表明,它可以分别超过两个最先进的基于 FPGA 的 LSTM 加速器 103.6 倍和 35.9 倍;与具有数字元素计算的最先进的基于 ReRAM 的 LSTM 加速器相比,它的效率提高了 6.1 倍。此外,我们的实验表明,通过所提出的微调方案和优化逼近器的设计,可以有效降低硬件约束和逼近误差对推理精度的影响。分别; 与具有数字元素计算的最先进的基于 ReRAM 的 LSTM 加速器相比,它的效率提高了 6.1 倍。此外,我们的实验表明,通过所提出的微调方案和优化逼近器的设计,可以有效降低硬件约束和逼近误差对推理精度的影响。分别; 与具有数字元素计算的最先进的基于 ReRAM 的 LSTM 加速器相比,它的效率提高了 6.1 倍。此外,我们的实验表明,通过所提出的微调方案和优化逼近器的设计,可以有效降低硬件约束和逼近误差对推理精度的影响。
更新日期:2020-06-01
down
wechat
bug