Balancing Rates and Variance via Adaptive Batch-Size for Stochastic Optimization Problems,IEEE Transactions on Signal Processing

当前位置： X-MOL 学术 › IEEE Trans. Signal Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Balancing Rates and Variance via Adaptive Batch-Size for Stochastic Optimization Problems
IEEE Transactions on Signal Processing ( IF 5.4 ) Pub Date : 2022-06-27 , DOI: 10.1109/tsp.2022.3186526
Zhan Gao ₁ , Alec Koppel ₂ , Alejandro Ribeiro ₁

Affiliation

Stochastic gradient descent is a canonical tool for addressing stochastic optimization problems, which forms the bedrock of modern machine learning. In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster to a limiting error. To do so, rather than fixing the mini-batch and the step-size at the outset, we propose a strategy to allow parameters evolving adaptively. Specifically, the batch-size is set to be a piecewise-constant increasing sequence where the increase occurs when a suitable error criterion is satisfied. Moreover, the step-size is selected as that which yields the fastest convergence. The overall algorithm, two scale adaptive (TSA) scheme, is developed for both convex and non-convex problems. It inherits the exact convergence and more importantly, the optimal error decreasing rate and an overall computation reduction are achieved. Furthermore, we extended the TSA method to the generalized adaptive batching framework, which is a generic methodology modular to any stochastic algorithms pursuing a trade-off between convergence rates and stochastic variance. We evaluate the TSA method on the image classification problem on MNIST and CIFAR-10 datasets compared with standard SGD methods and existing adaptive batch-size methods, to corroborate theoretical findings.

中文翻译：

通过自适应批量大小平衡随机优化问题的速率和方差

随机梯度下降是解决随机优化问题的规范工具，它构成了现代机器学习的基石。在这项工作中，我们寻求平衡精确收敛需要衰减步长这一事实与恒定步长更快地学习到极限误差这一事实。为此，我们提出了一种允许参数自适应演变的策略，而不是一开始就固定小批量和步长。具体来说，batch-size 设置为分段常数递增序列，当满足合适的错误标准时，该递增发生。此外，步长被选择为产生最快收敛的步长。总体算法，即双尺度自适应 (TSA) 方案，是针对凸和非凸问题开发的。它继承了精确的收敛性，更重要的是，实现了最佳的误差减少率和整体计算量减少。此外，我们将 TSA 方法扩展到广义自适应批处理框架，这是一种通用方法模块化，适用于任何追求收敛速度和随机方差之间折衷的随机算法。我们将 TSA 方法在 MNIST 和 CIFAR-10 数据集上的图像分类问题上与标准 SGD 方法和现有的自适应批量大小方法进行比较，以证实理论发现。这是一种通用方法模块化，适用于在收敛速度和随机方差之间进行权衡的任何随机算法。我们将 TSA 方法在 MNIST 和 CIFAR-10 数据集上的图像分类问题上与标准 SGD 方法和现有的自适应批量大小方法进行比较，以证实理论发现。这是一种通用方法模块化，适用于在收敛速度和随机方差之间进行权衡的任何随机算法。我们将 TSA 方法在 MNIST 和 CIFAR-10 数据集上的图像分类问题上与标准 SGD 方法和现有的自适应批量大小方法进行比较，以证实理论发现。

更新日期：2022-06-27

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>