当前位置: X-MOL 学术Knowl. Inf. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Sharp characterization of optimal minibatch size for stochastic finite sum convex optimization
Knowledge and Information Systems ( IF 2.7 ) Pub Date : 2021-07-12 , DOI: 10.1007/s10115-021-01593-1
Atsushi Nitanda 1 , Tomoya Murata 2, 3 , Taiji Suzuki 3, 4
Affiliation  

The minibatching technique has been extensively adopted to facilitate stochastic first-order methods because of their computational efficiency in parallel computing for large-scale machine learning and data mining. Indeed, increasing the minibatch size decreases the iteration complexity (number of minibatch queries) to converge, resulting in the decrease of the running time by processing a minibatch in parallel. However, this gain is usually saturated for too large minibatch sizes and the total computational complexity (number of access to an example) is deteriorated. Hence, the determination of an appropriate minibatch size which controls the trade-off between the iteration and total computational complexities is important to maximize performance of the method with as few computational resources as possible. In this study, we define the optimal minibatch size as the minimum minibatch size with which there exists a stochastic first-order method that achieves the optimal iteration complexity and we call such a method the optimal minibatch method. Moreover, we show that Katyusha (in: Proceedings of annual ACM SIGACT symposium on theory of computing vol 49, pp 1200–1205, ACM, 2017), DASVRDA (Murata and Suzuki, in: Advances in neural information processing systems vol 30, pp 608–617, 2017), and the proposed method which is a combination of Acc-SVRG (Nitanda, in: Advances in neural information processing systems vol 27, pp 1574–1582, 2014) with APPA (Cotter et al. in: Advances in neural information processing systems vol 27, pp 3059–3067, 2014) are optimal minibatch methods. In experiments, we compare optimal minibatch methods with several competitors on \(L_1\)-and \(L_2\)-regularized logistic regression problems and observe that iteration complexities of optimal minibatch methods linearly decrease as minibatch sizes increase up to reasonable minibatch sizes and finally attain the best iteration complexities. This confirms the computational efficiency of optimal minibatch methods suggested by the theory.



中文翻译:

用于随机有限和凸优化的最佳小批量大小的清晰表征

小批量技术已被广泛用于促进随机一阶方法,因为它们在大规模机器学习和数据挖掘的并行计算中具有计算效率。实际上,增加 minibatch 大小会降低收敛的迭代复杂度(minibatch 查询的数量),从而通过并行处理 minibatch 减少运行时间。然而,对于过大的 minibatch 大小,这种增益通常会饱和,并且总计算复杂度(访问示例的次数)会恶化。因此,确定控制迭代和总计算复杂度之间权衡的适当小批量大小对于以尽可能少的计算资源最大化方法的性能很重要。在这项研究中,我们将最优 minibatch 大小定义为最小 minibatch 大小,其中存在实现最优迭代复杂度的随机一阶方法,我们称这种方法为最优 minibatch 方法。此外,我们展示了 Katyusha(在:年度 ACM SIGACT 计算理论研讨会论文集第 49 卷,第 1200-1205 页,ACM,2017 年)、DASVRDA(村田和铃木,在:神经信息处理系统的进展第 30 卷,第608–617, 2017),以及将 Acc-SVRG(Nitanda,在:神经信息处理系统的进展,第 27 卷,第 1574–1582 页,2014 年)与 APPA(Cotter 等人,在:Advances在神经信息处理系统第 27 卷,第 3059–3067 页,2014 年)是最佳的小批量方法。在实验中,我们将最佳小批量方法与几个竞争对手进行比较\(L_1\) - 和\(L_2\) - 正则化逻辑回归问题,并观察到最佳小批量方法的迭代复杂度随着小批量大小增加到合理的小批量大小而线性降低,并最终获得最佳迭代复杂度。这证实了理论建议的最优小批量方法的计算效率。

更新日期:2021-07-13
down
wechat
bug