当前位置: X-MOL 学术arXiv.cs.NE › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Implicit Bias of Linear RNNs
arXiv - CS - Neural and Evolutionary Computing Pub Date : 2021-01-19 , DOI: arxiv-2101.07833
Melikasadat Emami, Mojtaba Sahraee-Ardakan, Parthe Pandit, Sundeep Rangan, Alyson K. Fletcher

Contemporary wisdom based on empirical studies suggests that standard recurrent neural networks (RNNs) do not perform well on tasks requiring long-term memory. However, precise reasoning for this behavior is still unknown. This paper provides a rigorous explanation of this property in the special case of linear RNNs. Although this work is limited to linear RNNs, even these systems have traditionally been difficult to analyze due to their non-linear parameterization. Using recently-developed kernel regime analysis, our main result shows that linear RNNs learned from random initializations are functionally equivalent to a certain weighted 1D-convolutional network. Importantly, the weightings in the equivalent model cause an implicit bias to elements with smaller time lags in the convolution and hence, shorter memory. The degree of this bias depends on the variance of the transition kernel matrix at initialization and is related to the classic exploding and vanishing gradients problem. The theory is validated in both synthetic and real data experiments.

中文翻译:

线性RNN的隐式偏差

基于经验研究的当代观点表明,标准的递归神经网络(RNN)在需要长期记忆的任务上表现不佳。但是,这种行为的精确推理仍然未知。本文在线性RNN的特殊情况下对此特性进行了严格的解释。尽管这项工作仅限于线性RNN,但是由于这些系统的非线性参数设置,传统上也很难对其进行分析。使用最近开发的内核状态分析,我们的主要结果表明,从随机初始化中获悉的线性RNN在功能上等同于某个加权一维卷积网络。重要的是,等效模型中的加权会对卷积中具有较小时间滞后的元素造成隐式偏差,从而缩短了内存。该偏差的程度取决于初始化时过渡核矩阵的方差,并且与经典的爆炸和消失梯度问题有关。该理论在合成和真实数据实验中均得到验证。
更新日期:2021-01-21
down
wechat
bug