当前位置: X-MOL 学术SIAM J. Optim. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
On the Adaptivity of Stochastic Gradient-Based Optimization
SIAM Journal on Optimization ( IF 2.6 ) Pub Date : 2020-05-27 , DOI: 10.1137/19m1256919
Lihua Lei , Michael I. Jordan

SIAM Journal on Optimization, Volume 30, Issue 2, Page 1473-1500, January 2020.
Stochastic gradient-based optimization has been a core enabling methodology in applications to large-scale problems in machine learning and related areas. Despite this progress, the gap between theory and practice remains significant, with theoreticians pursuing mathematical optimality at the cost of obtaining specialized procedures in different regimes (e.g., modulus of strong convexity, magnitude of target accuracy, signal-to-noise ratio), and with practitioners not readily able to know which regime is appropriate to their problem, and seeking broadly applicable algorithms that are reasonably close to optimality. To bridge these perspectives it is necessary to study algorithms that are adaptive to different regimes. We present the stochastically controlled stochastic gradient (SCSG) method for composite convex finite-sum optimization problems and show that it is adaptive to both strong convexity and target accuracy. The adaptivity is achieved by batch variance reduction with adaptive batch sizes and a novel technique, which we refer to as geometrization, and which sets the length of each epoch as a geometric random variable. The algorithm achieves strictly better theoretical complexity than other existing adaptive algorithms, while the tuning parameters of the algorithm depend only on the smoothness parameter of the objective.


中文翻译:

基于随机梯度优化的适应性

SIAM优化杂志,第30卷,第2期,第1473-1500页,2020年1月。
基于随机梯度的优化已成为应用到机器学习和相关领域中的大规模问题的一种核心方法。尽管取得了这一进展,理论和实践之间的差距仍然很大,理论家们追求数学最优性的代价是获得不同体制下的专门程序(例如,强凸模量,目标精度的大小,信噪比)以及从业人员不容易知道哪种体制适合他们的问题,并寻求合理地接近最优的广泛适用的算法。为了弥合这些观点,有必要研究适应不同机制的算法。针对复合凸有限和优化问题,我们提出了随机控制的随机梯度(SCSG)方法,并证明了该方法既适用于强凸性又适用于目标精度。自适应性是通过使用自适应批处理大小减少批处理方差和一种称为“几何化”的新技术来实现的,该技术将每个历元的长度设置为几何随机变量。与其他现有的自适应算法相比,该算法在理论上的严格程度更高,而算法的调整参数仅取决于物镜的平滑度参数。并将每个纪元的长度设置为几何随机变量。与其他现有的自适应算法相比,该算法在理论上的严格程度更高,而算法的调整参数仅取决于物镜的平滑度参数。并将每个纪元的长度设置为几何随机变量。与其他现有的自适应算法相比,该算法在理论上的严格程度更高,而算法的调整参数仅取决于物镜的平滑度参数。
更新日期:2020-07-23
down
wechat
bug