当前位置: X-MOL 学术Front. Comput. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Adam revisited: a weighted past gradients perspective
Frontiers of Computer Science ( IF 4.2 ) Pub Date : 2020-01-03 , DOI: 10.1007/s11704-019-8457-x
Hui Zhong , Zaiyi Chen , Chuan Qin , Zai Huang , Vincent W. Zheng , Tong Xu , Enhong Chen

Adaptive learning rate methods have been successfully applied in many fields, especially in training deep neural networks. Recent results have shown that adaptive methods with exponential increasing weights on squared past gradients (i.e., ADAM, RMSPROP) may fail to converge to the optimal solution. Though many algorithms, such as AMSGRAD and ADAMNC, have been proposed to fix the non-convergence issues, achieving a data-dependent regret bound similar to or better than ADAGRAD is still a challenge to these methods. In this paper, we propose a novel adaptive method weighted adaptive algorithm (WADA) to tackle the non-convergence issues. Unlike AMSGRAD and ADAMNC, we consider using a milder growing weighting strategy on squared past gradient, in which weights grow linearly. Based on this idea, we propose weighted adaptive gradient method framework (WAGMF) and implement WADA algorithm on this framework. Moreover, we prove that WADA can achieve a weighted data-dependent regret bound, which could be better than the original regret bound of ADAGRAD when the gradients decrease rapidly. This bound may partially explain the good performance of ADAM in practice. Finally, extensive experiments demonstrate the effectiveness of WADA and its variants in comparison with several variants of ADAM on training convex problems and deep neural networks.

中文翻译:

亚当重新审视:过去渐变的加权视角

自适应学习率方法已成功应用于许多领域,尤其是在训练深度神经网络中。最近的结果表明,在平方过去的梯度上权重呈指数增加的自适应方法(即ADAM,RMSPROP)可能无法收敛到最优解。尽管已经提出了许多算法(例如AMSGRAD和ADAMNC)来解决不收敛问题,但实现与ADAGRAD类似或更好的依赖数据的后悔界限仍然是这些方法的挑战。在本文中,我们提出了一种新颖的自适应方法加权自适应算法(WADA)来解决不收敛问题。与AMSGRAD和ADAMNC不同,我们考虑对平方过去的梯度使用较温和的加权策略,其中加权线性增长。基于这个想法,我们提出了加权自适应梯度法框架(WAGMF),并在该框架上实现了WADA算法。此外,我们证明了WADA可以实现与数据相关的加权后悔界限,当梯度迅速减小时,这可能会比ADAGRAD的原始后悔界限更好。此界限可能部分解释了ADAM在实践中的良好性能。最后,广泛的实验证明了WADA及其变体与ADAM的几种变体在训练凸问题和深度神经网络方面的有效性。此界限可能部分解释了ADAM在实践中的良好性能。最后,广泛的实验证明了WADA及其变体与ADAM的几种变体在训练凸问题和深度神经网络方面的有效性。此界限可能部分解释了ADAM在实践中的良好性能。最后,广泛的实验证明了WADA及其变体与ADAM的几种变体在训练凸问题和深度神经网络方面的有效性。
更新日期:2020-01-03
down
wechat
bug