Variance Counterbalancing for Stochastic Large-scale Learning,International Journal on Artificial Intelligence Tools

当前位置： X-MOL 学术 › Int. J. Artif. Intell. Tools › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Variance Counterbalancing for Stochastic Large-scale Learning
International Journal on Artificial Intelligence Tools ( IF 1.0 ) Pub Date : 2020-05-20 , DOI: 10.1142/s0218213020500104
Pola Lydia Lagari ₁ , Lefteri H. Tsoukalas ₁ , Isaac E. Lagaris ₂

Affiliation

Stochastic Gradient Descent (SGD) is perhaps the most frequently used method for large scale training. A common example is training a neural network over a large data set, which amounts to minimizing the corresponding mean squared error (MSE). Since the convergence of SGD is rather slow, acceleration techniques based on the notion of “Mini-Batches” have been developed. All of them however, mimicking SGD, impose diminishing step-sizes as a means to inhibit large variations in the MSE objective.In this article, we introduce random sets of mini-batches instead of individual mini-batches. We employ an objective function that minimizes the average MSE and its variance over these sets, eliminating so the need for the systematic step size reduction. This approach permits the use of state-of-the-art optimization methods, far more efficient than the gradient descent, and yields a significant performance enhancement.

中文翻译：

随机大规模学习的方差平衡

随机梯度下降 (SGD) 可能是最常用的大规模训练方法。一个常见的例子是在大型数据集上训练神经网络，这相当于最小化相应的均方误差 (MSE)。由于 SGD 的收敛速度相当慢，因此开发了基于“Mini-Batches”概念的加速技术。然而，所有这些都模仿 SGD，施加递减步长作为抑制 MSE 目标中大变化的一种手段。在本文中，我们介绍了随机小批量集而不是单个小批量。我们采用了一个目标函数来最小化这些集合的平均 MSE 及其方差，从而消除了系统化步长减小的需要。这种方法允许使用最先进的优化方法，

更新日期：2020-05-20

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11