当前位置: X-MOL 学术IEEE J. Solid-State Circuits › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An 8.93 TOPS/W LSTM Recurrent Neural Network Accelerator Featuring Hierarchical Coarse-Grain Sparsity for On-Device Speech Recognition
IEEE Journal of Solid-State Circuits ( IF 4.6 ) Pub Date : 2020-05-18 , DOI: 10.1109/jssc.2020.2992900
Deepak Kadetotad , Shihui Yin , Visar Berisha , Chaitali Chakrabarti , Jae-sun Seo

Long short-term memory (LSTM) is a type of recurrent neural networks (RNNs), which is widely used for time-series data and speech applications, due to its high accuracy on such tasks. However, LSTMs pose difficulties for efficient hardware implementation because they require a large amount of weight storage and exhibit computation complexity. Prior works have proposed compression techniques to alleviate the storage/computation requirements of LSTMs but elementwise sparsity schemes incur sizable index memory overhead and structured compression techniques report limited compression ratios. In this article, we present an energy-efficient LSTM RNN accelerator, featuring an algorithm-hardware co-optimized memory compression technique called hierarchical coarse-grain sparsity (HCGS). Aided by the HCGS-based blockwise recursive weight compression, we demonstrate LSTM networks with up to 16× fewer weights while achieving minimal error rate degradation. The prototype chip fabricated in 65-nm LP CMOS achieves up to 8.93 TOPS/W for real-time speech recognition using compressed LSTMs based on HCGS. HCGS-based LSTMs have demonstrated energy-efficient speech recognition with low error rates for TIMIT, TED-LIUM, and LibriSpeech data sets.

中文翻译:


8.93 TOPS/W LSTM 递归神经网络加速器,具有用于设备上语音识别的分层粗粒度稀疏性



长短期记忆(LSTM)是循环神经网络(RNN)的一种,由于其在此类任务上的高精度而被广泛用于时间序列数据和语音应用。然而,LSTM 给高效硬件实现带来了困难,因为它们需要大量的权重存储并且表现出计算复杂性。先前的工作已经提出了压缩技术来减轻 LSTM 的存储/计算要求,但元素稀疏方案会产生相当大的索引内存开销,而结构化压缩技术的压缩率有限。在本文中,我们提出了一种节能的 LSTM RNN 加速器,采用称为分层粗粒度稀疏性 (HCGS) 的算法-硬件协同优化内存压缩技术。在基于 HCGS 的块式递归权重压缩的帮助下,我们演示了 LSTM 网络的权重减少了 16 倍,同时实现了最小的错误率下降。采用 65 nm LP CMOS 制造的原型芯片使用基于 HCGS 的压缩 LSTM 实现了高达 8.93 TOPS/W 的实时语音识别。基于 HCGS 的 LSTM 已在 TIMIT、TED-LIUM 和 LibriSpeech 数据集上展示了具有低错误率的节能语音识别。
更新日期:2020-05-18
down
wechat
bug