当前位置: X-MOL 学术J. Comput. Graph. Stat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Easily parallelizable and distributable class of algorithms for structured sparsity, with optimal acceleration
Journal of Computational and Graphical Statistics ( IF 2.4 ) Pub Date : 2019-05-28 , DOI: 10.1080/10618600.2019.1592757
Seyoon Ko 1 , Donghyeon Yu 2 , Joong-Ho Won 1
Affiliation  

Abstract Many statistical learning problems can be posed as minimization of a sum of two convex functions, one typically a composition of nonsmooth and linear functions. Examples include regression under structured sparsity assumptions. Popular algorithms for solving such problems, for example, ADMM, often involve nontrivial optimization subproblems or smoothing approximation. We consider two classes of primal–dual algorithms that do not incur these difficulties, and unify them from a perspective of monotone operator theory. From this unification, we propose a continuum of preconditioned forward–backward operator splitting algorithms amenable to parallel and distributed computing. For the entire region of convergence of the whole continuum of algorithms, we establish its rates of convergence. For some known instances of this continuum, our analysis closes the gap in theory. We further exploit the unification to propose a continuum of accelerated algorithms. We show that the whole continuum attains the theoretically optimal rate of convergence. The scalability of the proposed algorithms, as well as their convergence behavior, is demonstrated up to 1.2 million variables with a distributed implementation. The code is available at https://github.com/kose-y/dist-primal-dual. Supplemental materials for this article are available online.

中文翻译:

易于并行化和可分发的结构化稀疏算法类,具有最佳加速

摘要 许多统计学习问题可以作为两个凸函数之和的最小化提出,其中一个凸函数通常是非光滑函数和线性函数的组合。示例包括结构化稀疏假设下的回归。解决此类问题的流行算法,例如 ADMM,通常涉及非平凡的优化子问题或平滑近似。我们考虑两类不会产生这些困难的原始对偶算法,并从单调算子理论的角度将它们统一起来。基于这种统一,我们提出了一个适用于并行和分布式计算的预处理前向后向算子分裂算法的连续体。对于整个算法连续统的整个收敛区域,我们确定其收敛速度。对于这种连续体的一些已知实例,我们的分析弥补了理论上的差距。我们进一步利用统一提出了一系列加速算法。我们表明整个连续体达到了理论上最佳的收敛速度。所提出算法的可扩展性以及它们的收敛行为通过分布式实现证明了多达 120 万个变量。代码可在 https://github.com/kose-y/dist-primal-dual 获得。本文的补充材料可在线获取。200 万个变量,分布式实现。代码可在 https://github.com/kose-y/dist-primal-dual 获得。本文的补充材料可在线获取。200 万个变量,分布式实现。代码可在 https://github.com/kose-y/dist-primal-dual 获得。本文的补充材料可在线获取。
更新日期:2019-05-28
down
wechat
bug