当前位置: X-MOL 学术Neural Netw. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Working Memory Connections for LSTM
Neural Networks ( IF 6.0 ) Pub Date : 2021-09-04 , DOI: 10.1016/j.neunet.2021.08.030
Federico Landi 1 , Lorenzo Baraldi 1 , Marcella Cornia 1 , Rita Cucchiara 1
Affiliation  

Recurrent Neural Networks with Long Short-Term Memory (LSTM) make use of gating mechanisms to mitigate exploding and vanishing gradients when learning long-term dependencies. For this reason, LSTMs and other gated RNNs are widely adopted, being the standard de facto for many sequence modeling tasks. Although the memory cell inside the LSTM contains essential information, it is not allowed to influence the gating mechanism directly. In this work, we improve the gate potential by including information coming from the internal cell state. The proposed modification, named Working Memory Connection, consists in adding a learnable nonlinear projection of the cell content into the network gates. This modification can fit into the classical LSTM gates without any assumption on the underlying task, being particularly effective when dealing with longer sequences. Previous research effort in this direction, which goes back to the early 2000s, could not bring a consistent improvement over vanilla LSTM. As part of this paper, we identify a key issue tied to previous connections that heavily limits their effectiveness, hence preventing a successful integration of the knowledge coming from the internal cell state. We show through extensive experimental evaluation that Working Memory Connections constantly improve the performance of LSTMs on a variety of tasks. Numerical results suggest that the cell state contains useful information that is worth including in the gate structure.



中文翻译:

LSTM 的工作记忆连接

具有长短期记忆 (LSTM) 的循环神经网络在学习长期依赖项时利用门控机制来减轻梯度爆炸和消失。出于这个原因,LSTM 和其他门控 RNN 被广泛采用,成为事实上的标准用于许多序列建模任务。虽然 LSTM 内部的记忆单元包含必要的信息,但不允许直接影响门控机制。在这项工作中,我们通过包含来自内部单元状态的信息来提高栅极电位。提议的修改,名为工作记忆连接,包括将细胞内容的可学习非线性投影添加到网络门中。这种修改可以适应经典的 LSTM 门,而无需对底层任务做任何假设,在处理更长的序列时特别有效。之前在这个方向上的研究工作可以追溯到 2000 年代初,但无法对普通 LSTM 带来持续的改进。作为本文的一部分,我们确定了一个与先前连接相关的关键问题,该问题严重限制了它们的有效性,因此阻止了来自内部细胞状态的知识的成功整合。我们通过广泛的实验评估表明,工作记忆连接不断提高 LSTM 在各种任务上的性能。数值结果表明细胞状态包含有用的信息,值得包含在门结构中。

更新日期:2021-09-20
down
wechat
bug