当前位置: X-MOL 学术Neurocomputing › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
AdaDB: An Adaptive Gradient Method with Data-Dependent Bound
Neurocomputing ( IF 6 ) Pub Date : 2021-01-01 , DOI: 10.1016/j.neucom.2020.07.070
Liu Yang , Deng Cai

Abstract Adaptive gradient learning methods such as Adam, RMSProp, and AdaGrad play an essential role in training a very deep neural network. The learning rates of these optimizers are adaptively changed to accelerate the training process. And the convergence speed is much faster than SGD in many deep learning tasks such as classification and NLP tasks. However, recent works have pointed out that adaptive learning methods can not converge to a critical point under some situations and suffer poor generalization in many deep learning tasks. In our study, we propose AdaDB, an adaptive learning optimizer with data-dependent bound on the learning rate. Every element in the learning rate vector is constrained between a dynamic upper bound and a constant lower bound. And the upper bound is dependent on the data. We also give a theoretical proof of the convergence of AdaDB in the non-convex setting. Our experiments show that AdaDB is capable of eliminating the generalization gap between Adam and SGD. Experiments also reveal the convergence speed of AdaDB is much faster than Adam.

中文翻译:

AdaDB:一种具有数据相关边界的自适应梯度方法

摘要 Adam、RMSProp 和 AdaGrad 等自适应梯度学习方法在训练非常深的神经网络方面发挥着重要作用。这些优化器的学习率会自适应地改变以加速训练过程。并且在分类和NLP任务等很多深度学习任务中收敛速度比SGD快很多。然而,最近的工作指出自适应学习方法在某些情况下无法收敛到临界点,并且在许多深度学习任务中泛化能力较差。在我们的研究中,我们提出了 AdaDB,这是一种自适应学习优化器,其学习率与数​​据相关。学习率向量中的每个元素都被约束在动态上限和恒定下限之间。上限取决于数据。我们还给出了 AdaDB 在非凸设置中收敛的理论证明。我们的实验表明,AdaDB 能够消除 Adam 和 SGD 之间的泛化差距。实验还表明,AdaDB 的收敛速度比 Adam 快得多。
更新日期:2021-01-01
down
wechat
bug