当前位置: X-MOL 学术IEEE Trans. Knowl. Data. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
VR-SGD: A Simple Stochastic Variance Reduction Method for Machine Learning
IEEE Transactions on Knowledge and Data Engineering ( IF 8.9 ) Pub Date : 2020-01-01 , DOI: 10.1109/tkde.2018.2878765
Fanhua Shang , Kaiwen Zhou , Hongying Liu , James Cheng , Ivor W. Tsang , Lijun Zhang , Dacheng Tao , Licheng Jiao

In this paper, we propose a simple variant of the original SVRG, called variance reduced stochastic gradient descent (VR-SGD). Unlike the choices of snapshot and starting points in SVRG and its proximal variant, Prox-SVRG, the two vectors of VR-SGD are set to the average and last iterate of the previous epoch, respectively. The settings allow us to use much larger learning rates, and also make our convergence analysis more challenging. We also design two different update rules for smooth and non-smooth objective functions, respectively, which means that VR-SGD can tackle non-smooth and/or non-strongly convex problems directly without any reduction techniques. Moreover, we analyze the convergence properties of VR-SGD for strongly convex problems, which show that VR-SGD attains linear convergence. Different from most algorithms that have no convergence guarantees for non-strongly convex problems, we also provide the convergence guarantees of VR-SGD for this case, and empirically verify that VR-SGD with varying learning rates achieves similar performance to its momentum accelerated variant that has the optimal convergence rate $\mathcal {O}(1/T^2)$O(1/T2). Finally, we apply VR-SGD to solve various machine learning problems, such as convex and non-convex empirical risk minimization, and leading eigenvalue computation. Experimental results show that VR-SGD converges significantly faster than SVRG and Prox-SVRG, and usually outperforms state-of-the-art accelerated methods, e.g., Katyusha.

中文翻译:

VR-SGD:一种用于机器学习的简单随机方差减少方法

在本文中,我们提出了原始 SVRG 的一个简单变体,称为方差减少随机梯度下降 (VR-SGD)。与 SVRG 及其近端变体 Prox-SVRG 中快照和起点的选择不同,VR-SGD 的两个向量分别设置为前一个 epoch 的平均值和最后一次迭代。这些设置允许我们使用更大的学习率,也使我们的收敛分析更具挑战性。我们还分别为平滑和非平滑目标函数设计了两种不同的更新规则,这意味着 VR-SGD 可以直接解决非平滑和/或非强凸问题,而无需任何缩减技术。此外,我们分析了 VR-SGD 对于强凸问题的收敛特性,这表明 VR-SGD 达到线性收敛。$\mathcal {O}(1/T^2)$(1/2). 最后,我们应用 VR-SGD 来解决各种机器学习问题,例如凸和非凸经验风险最小化,以及领先的特征值计算。实验结果表明,VR-SGD 的收敛速度明显快于 SVRG 和 Prox-SVRG,并且通常优于最先进的加速方法,例如 Katyusha。
更新日期:2020-01-01
down
wechat
bug