Parallelizing Legendre Memory Unit Training,arXiv - CS - Artificial Intelligence

当前位置： X-MOL 学术 › arXiv.cs.AI › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Parallelizing Legendre Memory Unit Training
arXiv - CS - Artificial Intelligence Pub Date : 2021-02-22 , DOI: arxiv-2102.11417
Narsimha Chilkuri, Chris Eliasmith

Recently, a new recurrent neural network (RNN) named the Legendre Memory Unit (LMU) was proposed and shown to achieve state-of-the-art performance on several benchmark datasets. Here we leverage the linear time-invariant (LTI) memory component of the LMU to construct a simplified variant that can be parallelized during training (and yet executed as an RNN during inference), thus overcoming a well known limitation of training RNNs on GPUs. We show that this reformulation that aids parallelizing, which can be applied generally to any deep network whose recurrent components are linear, makes training up to 200 times faster. Second, to validate its utility, we compare its performance against the original LMU and a variety of published LSTM and transformer networks on seven benchmarks, ranging from psMNIST to sentiment analysis to machine translation. We demonstrate that our models exhibit superior performance on all datasets, often using fewer parameters. For instance, our LMU sets a new state-of-the-art result on psMNIST, and uses half the parameters while outperforming DistilBERT and LSTM models on IMDB sentiment analysis.

中文翻译：

并行的Legendre存储单元培训

最近，提出了一种新的递归神经网络（RNN），称为Legendre记忆单元（LMU），并显示其在多个基准数据集上均具有最先进的性能。在这里，我们利用LMU的线性时不变（LTI）内存组件来构造一个简化的变量，该变量可以在训练期间进行并行化（并在推理过程中作为RNN执行），从而克服了在GPU上训练RNN的众所周知的局限性。我们表明，这种有助于并行化的重新构造通常可应用于任何循环成分为线性的深度网络，从而使训练速度提高了200倍。其次，为了验证其实用性，我们将其性能与原始LMU以及各种已发布的LSTM和互感器网络在七个基准上进行比较，这些基准从psMNIST到情感分析再到机器翻译。我们证明了我们的模型在所有数据集上均表现出卓越的性能，通常使用较少的参数。例如，我们的LMU在psMNIST上设置了最新的结果，并使用了一半的参数，同时在IMDB情绪分析上优于DistilBERT和LSTM模型。

更新日期：2021-02-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文