Grow and Prune Compact, Fast, and Accurate LSTMs,IEEE Transactions on Computers

当前位置： X-MOL 学术 › IEEE Trans. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Grow and Prune Compact, Fast, and Accurate LSTMs
IEEE Transactions on Computers ( IF 3.7 ) Pub Date : 2020-03-01 , DOI: 10.1109/tc.2019.2954495
Xiaoliang Dai , Hongxu Yin , Niraj K. Jha

Long short-term memory (LSTM) has been widely used for sequential data modeling. Researchers have increased LSTM depth by stacking LSTM cells to improve performance. This incurs model redundancy, increases run-time delay, and makes the LSTMs more prone to overfitting. To address these problems, we propose a hidden-layer LSTM (H-LSTM) that adds hidden layers to LSTM's original one-level nonlinear control gates. H-LSTM increases accuracy while employing fewer external stacked layers, thus reducing the number of parameters and run-time latency significantly. We employ grow-and-prune (GP) training to iteratively adjust the hidden layers through gradient-based growth and magnitude-based pruning of connections. This learns both the weights and the compact architecture of H-LSTM control gates. We have GP-trained H-LSTMs for image captioning, speech recognition, and neural machine translation applications. For the NeuralTalk architecture on the MSCOCO dataset, our three models reduce the number of parameters by 38.7× [floating-point operations (FLOPs) by 45.5×], run-time latency by 4.5×, and improve the CIDEr-D score by 2.8 percent, respectively. For the DeepSpeech2 architecture on the AN4 dataset, the first model we generated reduces the number of parameters by 19.4× and run-time latency by 37.4 percent. The second model reduces the word error rate (WER) from 12.9 to 8.7 percent. For the encoder-decoder sequence-to-sequence network on the IWSLT 2014 German-English dataset, the first model we generated reduces the number of parameters by 10.8× and run-time latency by 14.2 percent. The second model increases the BLEU score from 30.02 to 30.98. Thus, GP-trained H-LSTMs can be seen to be compact, fast, and accurate.

中文翻译：

生长和修剪紧凑、快速和准确的 LSTM

长短期记忆（LSTM）已广泛用于序列数据建模。研究人员通过堆叠 LSTM 单元来增加 LSTM 深度以提高性能。这会导致模型冗余，增加运行时间延迟，并使 LSTM 更容易过拟合。为了解决这些问题，我们提出了一种隐藏层 LSTM（H-LSTM），它在 LSTM 的原始单级非线性控制门中添加了隐藏层。H-LSTM 提高了准确性，同时使用了更少的外部堆叠层，从而显着减少了参数数量和运行时延迟。我们采用增长和修剪 (GP) 训练，通过基于梯度的增长和基于幅度的连接修剪来迭代调整隐藏层。这可以学习 H-LSTM 控制门的权重和紧凑架构。我们有 GP 训练的 H-LSTM 用于图像字幕，语音识别和神经机器翻译应用。对于 MSCOCO 数据集上的 NeuralTalk 架构，我们的三个模型将参数数量减少了 38.7 倍 [浮点运算 (FLOP) 减少了 45.5 倍]，运行时延迟减少了 4.5 倍，并将 CIDEr-D 分数提高了 2.8百分比，分别。对于 AN4 数据集上的 DeepSpeech2 架构，我们生成的第一个模型将参数数量减少了 19.4 倍，运行时延迟减少了 37.4%。第二个模型将单词错误率 (WER) 从 12.9% 降低到 8.7%。对于 IWSLT 2014 德语-英语数据集上的编码器-解码器序列到序列网络，我们生成的第一个模型将参数数量减少了 10.8 倍，运行时延迟减少了 14.2%。第二个模型将 BLEU 分数从 30.02 增加到 30.98。因此，

更新日期：2020-03-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>