Balancing Computation Loads and Optimizing Input Vector Loading in LSTM Accelerators,IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

当前位置： X-MOL 学术 › IEEE Trans. Comput.-Aided Design Integr. Circuits Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Balancing Computation Loads and Optimizing Input Vector Loading in LSTM Accelerators
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems ( IF 2.7 ) Pub Date : 2020-09-01 , DOI: 10.1109/tcad.2019.2926482
Junki Park , Wooseok Yi , Daehyun Ahn , Jaeha Kung , Jae-Joon Kim

The long short-term memory (LSTM) is a widely used neural network model for dealing with time-varying data. To reduce the memory requirement, pruning is often applied to the weight matrix of the LSTM, which makes the matrix sparse. In this paper, we present a new sparse matrix format, named rearranged compressed sparse column (RCSC), to maximize the inference speed of the LSTM hardware accelerator. The RCSC format speeds up the inference by: 1) evenly distributing the computation loads to processing elements (PEs) and 2) reducing the input vector load miss within the local buffer. We also propose a hardware architecture adopting hierarchical input buffer to further reduce the pipeline stalls which cannot be handled by the RCSC format alone. The simulation results for various datasets show that combined use of the RSCS format and the proposed hardware requires $2\times $ smaller inference runtime on average compared to the previous work.

中文翻译：

在 LSTM 加速器中平衡计算负载和优化输入向量负载

长短期记忆（LSTM）是一种广泛使用的神经网络模型，用于处理时变数据。为了减少内存需求，通常对 LSTM 的权重矩阵进行剪枝，这使得矩阵变得稀疏。在本文中，我们提出了一种新的稀疏矩阵格式，称为重排压缩稀疏列 (RCSC)，以最大限度地提高 LSTM 硬件加速器的推理速度。RCSC 格式通过以下方式加速推理：1) 将计算负载均匀地分配给处理元素 (PE) 和 2) 减少本地缓冲区内的输入向量负载未命中。我们还提出了一种采用分层输入缓冲区的硬件架构，以进一步减少单独 RCSC 格式无法处理的流水线停顿。

更新日期：2020-09-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11