Training Simplification and Model Simplification for Deep Learning: A Minimal Effort Back Propagation Method,IEEE Transactions on Knowledge and Data Engineering

当前位置： X-MOL 学术 › IEEE Trans. Knowl. Data. Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Training Simplification and Model Simplification for Deep Learning: A Minimal Effort Back Propagation Method
IEEE Transactions on Knowledge and Data Engineering ( IF 8.9 ) Pub Date : 2020-02-01 , DOI: 10.1109/tkde.2018.2883613
Xu Sun , Xuancheng Ren , Shuming Ma , Bingzhen Wei , Wei Li , Jingjing Xu , Houfeng Wang , Yi Zhang

We propose a simple yet effective technique to simplify the training and the resulting model of neural networks. In back propagation, only a small subset of the full gradient is computed to update the model parameters. The gradient vectors are sparsified in such a way that only the top-

$k$

elements (in terms of magnitude) are kept. As a result, only

$k$

rows or columns (depending on the layout) of the weight matrix are modified, leading to a linear reduction in the computational cost. Based on the sparsified gradients, we further simplify the model by eliminating the rows or columns that are seldom updated, which will reduce the computational cost both in the training and decoding, and potentially accelerate decoding in real-world applications. Surprisingly, experimental results demonstrate that most of the time we only need to update fewer than 5 percent of the weights at each back propagation pass. More interestingly, the accuracy of the resulting models is actually improved rather than degraded, and a detailed analysis is given. The model simplification results show that we could adaptively simplify the model which could often be reduced by around 9x, without any loss on accuracy or even with improved accuracy.

中文翻译：

深度学习的训练简化和模型简化：最小努力反向传播方法

我们提出了一种简单而有效的技术来简化神经网络的训练和生成的模型。在反向传播中，只计算完整梯度的一小部分来更新模型参数。梯度向量以这样一种方式稀疏化，即只有顶部-

$千$

克

元素（在数量方面）被保留。结果，只有

$千$

克

权重矩阵的行或列（取决于布局）被修改，导致计算成本的线性降低。基于稀疏梯度，我们通过消除很少更新的行或列来进一步简化模型，这将降低训练和解码中的计算成本，并可能加速实际应用中的解码。令人惊讶的是，实验结果表明，在大多数情况下，我们只需要在每次反向传播时更新不到 5% 的权重。更有趣的是，结果模型的准确率实际上是提高而不是降低，并给出了详细的分析。模型简化结果表明，我们可以自适应地简化模型，通常可以减少约 9 倍，

更新日期：2020-02-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11