当前位置: X-MOL 学术arXiv.cs.NA › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Variance Reduction via Primal-Dual Accelerated Dual Averaging for Nonsmooth Convex Finite-Sums
arXiv - CS - Numerical Analysis Pub Date : 2021-02-26 , DOI: arxiv-2102.13643
Chaobing Song, Stephen J. Wright, Jelena Diakonikolas

We study structured nonsmooth convex finite-sum optimization that appears widely in machine learning applications, including support vector machines and least absolute deviation. For the primal-dual formulation of this problem, we propose a novel algorithm called \emph{Variance Reduction via Primal-Dual Accelerated Dual Averaging (\vrpda)}. In the nonsmooth and general convex setting, \vrpda~has the overall complexity $O(nd\log\min \{1/\epsilon, n\} + d/\epsilon )$ in terms of the primal-dual gap, where $n$ denotes the number of samples, $d$ the dimension of the primal variables, and $\epsilon$ the desired accuracy. In the nonsmooth and strongly convex setting, the overall complexity of \vrpda~becomes $O(nd\log\min\{1/\epsilon, n\} + d/\sqrt{\epsilon})$ in terms of both the primal-dual gap and the distance between iterate and optimal solution. Both these results for \vrpda~improve significantly on state-of-the-art complexity estimates, which are $O(nd\log \min\{1/\epsilon, n\} + \sqrt{n}d/\epsilon)$ for the nonsmooth and general convex setting and $O(nd\log \min\{1/\epsilon, n\} + \sqrt{n}d/\sqrt{\epsilon})$ for the nonsmooth and strongly convex setting, in a much more simple and straightforward way. Moreover, both complexities are better than \emph{lower} bounds for general convex finite sums that lack the particular (common) structure that we consider. Our theoretical results are supported by numerical experiments, which confirm the competitive performance of \vrpda~compared to state-of-the-art.

中文翻译:

通过对偶平滑有限和的原始对偶加速对偶平均来减少方差

我们研究结构化非光滑凸有限和优化,该优化在机器学习应用程序中广泛出现,包括支持向量机和最小绝对偏差。对于此问题的原始对偶表述,我们提出了一种称为\ emph {通过原始对偶加速对偶平均(\ vrpda)的方差减少}的新颖算法。在非光滑且一般的凸设置中,\ vrpda〜具有原始对偶间隙的整体复杂度$ O(nd \ log \ min \ {1 / \ epsilon,n \} + d / \ epsilon)$,其中$ n $表示样本数,$ d $表示原始变量的维数,$ \ epsilon $表示所需的精度。在非光滑且强凸的设置中,\ vrpda〜的整体复杂度在这两个方面都变为$ O(nd \ log \ min \ {1 / \ epsilon,n \} + d / \ sqrt {\ epsilon})$原始对偶间隙以及迭代与最佳解之间的距离。\ vrpda〜的这两个结果在最新的复杂度估计上得到了显着改善,它们是$ O(nd \ log \ min \ {1 / \ epsilon,n \} + \ sqrt {n} d / \ epsilon )$用于非光滑且一般凸的设置,$ O(nd \ log \ min \ {1 / \ epsilon,n \} + \ sqrt {n} d / \ sqrt {\ epsilon})$用于非光滑且强凸的设置,以更简单明了的方式进行。此外,对于缺少我们考虑的特定(公共)结构的一般凸有限和,这两个复杂度都比\ emph {下}界更好。我们的理论结果得到了数值实验的支持,这些实验证实了\ vrpda〜与最新技术相比的竞争性能。n \} + \ sqrt {n} d / \ epsilon)$用于非平滑和一般凸设置,以及$ O(nd \ log \ min \ {1 / \ epsilon,n \} + \ sqrt {n} d / \ sqrt {\ epsilon})$用于不光滑且强烈凸的设置,其方式更加简单明了。此外,对于缺少我们考虑的特定(公共)结构的一般凸有限和,这两个复杂度都比\ emph {下}界更好。我们的理论结果得到了数值实验的支持,这些实验证实了\ vrpda〜与最新技术相比的竞争性能。n \} + \ sqrt {n} d / \ epsilon)$用于非平滑和一般凸设置,以及$ O(nd \ log \ min \ {1 / \ epsilon,n \} + \ sqrt {n} d / \ sqrt {\ epsilon})$,用于非光滑且强烈凸的设置,其方式更为简单明了。此外,对于缺少我们考虑的特定(公共)结构的一般凸有限和,这两个复杂度都比\ emph {下}界更好。我们的理论结果得到了数值实验的支持,这些实验证实了\ vrpda〜与最新技术相比的竞争性能。
更新日期:2021-03-01
down
wechat
bug