Distributed algorithms for convex problems with linear coupling constraints

Colombo, Tommaso; Sagratella, Simone

doi:10.1007/s10898-019-00792-z

Distributed algorithms for convex problems with linear coupling constraints

Published: 28 May 2019

Volume 77, pages 53–73, (2020)
Cite this article

Journal of Global Optimization Aims and scope Submit manuscript

339 Accesses
Explore all metrics

Abstract

Distributed and parallel algorithms have been frequently investigated in the recent years, in particular in applications like machine learning. Nonetheless, only a small subclass of the optimization algorithms in the literature can be easily distributed, for the presence, e.g., of coupling constraints that make all the variables dependent from each other with respect to the feasible set. Augmented Lagrangian methods are among the most used techniques to get rid of the coupling constraints issue, namely by moving such constraints to the objective function in a structured, well-studied manner. Unfortunately, standard augmented Lagrangian methods need the solution of a nested problem by needing to (at least inexactly) solve a subproblem at each iteration, therefore leading to potential inefficiency of the algorithm. To fill this gap, we propose an augmented Lagrangian method to solve convex problems with linear coupling constraints that can be distributed and requires a single gradient projection step at every iteration. We give a formal convergence proof to at least $\varepsilon $-approximate solutions of the problem and a detailed analysis of how the parameters of the algorithm influence the value of the approximating parameter $\varepsilon $. Furthermore, we introduce a distributed version of the algorithm allowing to partition the data and perform the distribution of the computation in a parallel fashion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parallel and Distributed Successive Convex Approximation Methods for Big-Data Optimization

Distributed primal outer approximation algorithm for sparse convex programming with separable structures

Article 24 December 2022

Alireza Olama, Eduardo Camponogara & Paulo R. C. Mendes

Optimization for Large-Scale Machine Learning with Distributed Features and Observations

References

Aussel, D., Sagratella, S.: Sufficient conditions to compute any solution of a quasivariational inequality via a variational inequality. Math. Methods Oper. Res. 85(1), 3–18 (2017)
Article MathSciNet MATH Google Scholar
Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (1999)
MATH Google Scholar
Bertsekas, D.P.: Convex Optimization Algorithms. Athena Scientific, Nashua, NH, USA (2015)
MATH Google Scholar
Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation: Numerical Methods, vol. 23. Prentice hall, Englewood Cliffs (1989)
MATH Google Scholar
Birgin, E.G., Martinez, J.M.: Practical Augmented Lagrangian Methods for Constrained Optimization, vol. 10. SIAM, Philadelphia (2014)
Book MATH Google Scholar
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J., et al.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 3(1), 1–122 (2011)
MATH Google Scholar
Cannelli, L., Facchinei, F., Scutari, G.: Multi-agent asynchronous nonconvex large-scale optimization. In: 2017 IEEE 7th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), pp. 1–5. IEEE (2017)
Cassioli, A., Di Lorenzo, D., Sciandrone, M.: On the convergence of inexact block coordinate descent methods for constrained optimization. Eur. J. Oper. Res. 231(2), 274–281 (2013)
Article MathSciNet MATH Google Scholar
Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)
Google Scholar
Clarke, F.H.: Optimization and Nonsmooth Analysis, vol. 5. SIAM, Philadelphia (1990)
Book MATH Google Scholar
Daneshmand, A., Sun, Y., Scutari, G., Facchinei, F., Sadler, B.M.: Decentralized dictionary learning over time-varying digraphs (2018). arXiv preprint arXiv:1808.05933
Di Pillo, G., Lucidi, S.: On exact augmented lagrangian functions in nonlinear programming. In: Di Pillo, G., Giannessi, F. (eds.) Nonlinear Optimization and Applications, pp. 85–100. Springer, Boston (1996)
Chapter MATH Google Scholar
Di Pillo, G., Lucidi, S.: An augmented lagrangian function with improved exactness properties. SIAM J. Optim. 12(2), 376–406 (2002)
Article MathSciNet MATH Google Scholar
Facchinei, F., Kanzow, C.: Generalized Nash equilibrium problems. 4OR 5(3), 173–210 (2007)
Article MathSciNet MATH Google Scholar
Facchinei, F., Kanzow, C., Karl, S., Sagratella, S.: The semismooth Newton method for the solution of quasi-variational inequalities. Comput. Optim. Appl. 62(1), 85–109 (2015)
Article MathSciNet MATH Google Scholar
Facchinei, F., Pang, J.S.: Finite-Dimensional Variational Inequalities and Complementarity Problems. Springer, Berlin (2007)
MATH Google Scholar
Facchinei, F., Sagratella, S.: On the computation of all solutions of jointly convex generalized Nash equilibrium problems. Optim. Lett. 5(3), 531–547 (2011)
Article MathSciNet MATH Google Scholar
Facchinei, F., Scutari, G., Sagratella, S.: Parallel selective algorithms for nonconvex big data optimization. IEEE Trans. Signal Process. 63(7), 1874–1889 (2015)
Article MathSciNet MATH Google Scholar
García, R., Marín, A., Patriksson, M.: Column generation algorithms for nonlinear optimization, I: convergence analysis. Optimization 52(2), 171–200 (2003)
Article MathSciNet MATH Google Scholar
Gondzio, J., Grothey, A.: Exploiting structure in parallel implementation of interior point methods for optimization. Comput. Manag. Sci. 6(2), 135–160 (2009)
Article MathSciNet MATH Google Scholar
Harchaoui, Z., Juditsky, A., Nemirovski, A.: Conditional gradient algorithms for machine learning. In: NIPS Workshop on Optimization for ML, vol. 3, pp. 3-2 (2012)
Hong, M., Luo, Z.Q.: On the linear convergence of the alternating direction method of multipliers. Math. Program. 162(1–2), 165–199 (2017)
Article MathSciNet MATH Google Scholar
Jaggi, M.: Revisiting Frank-Wolfe: projection-free sparse convex optimization. ICML 1, 427–435 (2013)
Google Scholar
Lacoste-Julien, S., Jaggi, M.: On the global linear convergence of Frank-Wolfe optimization variants. In: Advances in Neural Information Processing Systems, pp. 496–504 (2015)
Latorre, V., Sagratella, S.: A canonical duality approach for the solution of affine quasi-variational inequalities. J. Global Optim. 64(3), 433–449 (2016)
Article MathSciNet MATH Google Scholar
Lin, C.J., Lucidi, S., Palagi, L., Risi, A., Sciandrone, M.: Decomposition algorithm model for singly linearly-constrained problems subject to lower and upper bounds. J. Optim. Theory Appl. 141(1), 107–126 (2009)
Article MathSciNet MATH Google Scholar
Lucidi, S.: New results on a class of exact augmented lagrangians. J. Optim. Theory Appl. 58(2), 259–282 (1988)
Article MathSciNet MATH Google Scholar
Lucidi, S., Palagi, L., Risi, A., Sciandrone, M.: A convergent decomposition algorithm for support vector machines. Comput. Optim. Appl. 38(2), 217–234 (2007)
Article MathSciNet MATH Google Scholar
Mangasarian, O.: Machine learning via polyhedral concave minimization. In: Fischer, H., Riedmüller, B., Schäffler, S. (eds.) Applied Mathematics and Parallel Computing, pp. 175–188. Springer, Berlin (1996)
Chapter Google Scholar
Manno, A., Palagi, L., Sagratella, S.: Parallel decomposition methods for linearly constrained problems subject to simple bound with application to the SVMs training. Comput. Optim. Appl. 71(1), 115–145 (2018)
Article MathSciNet MATH Google Scholar
Manno, A., Sagratella, S., Livi, L.: A convergent and fully distributable SVMs training algorithm. In: 2016 International Joint Conference on Neural Networks (IJCNN), pp. 3076–3080. IEEE (2016)
Ouyang, H., Gray, A.: Fast stochastic Frank-Wolfe algorithms for nonlinear SVMs. In: Proceedings of the 2010 SIAM International Conference on Data Mining, pp. 245–256. SIAM (2010)
Piccialli, V., Sciandrone, M.: Nonlinear optimization and support vector machines. 4OR 16(2), 111–149 (2018). https://doi.org/10.1007/s10288-018-0378-2
Article MathSciNet MATH Google Scholar
Rockafellar, R.T.: Augmented Lagrange multiplier functions and duality in nonconvex programming. SIAM J. Control 12(2), 268–285 (1974)
Article MathSciNet MATH Google Scholar
Rockafellar, R.T., Wets, R.J.B.: Variational Analysis, vol. 317. Springer, Berlin (2009)
MATH Google Scholar
Sagratella, S.: Algorithms for generalized potential games with mixed-integer variables. Comput. Optim. Appl. 68(3), 689–717 (2017)
Article MathSciNet MATH Google Scholar
Scutari, G., Facchinei, F., Lampariello, L.: Parallel and distributed methods for constrained nonconvex optimization—part I: theory. IEEE Trans. Signal Process. 65(8), 1929–1944 (2016)
Article MathSciNet MATH Google Scholar
Scutari, G., Facchinei, F., Lampariello, L., Sardellitti, S., Song, P.: Parallel and distributed methods for constrained nonconvex optimization—part II: applications in communications and machine learning. IEEE Trans. Signal Process. 65(8), 1945–1960 (2016)
Article MathSciNet MATH Google Scholar
Scutari, G., Facchinei, F., Lampariello, L., Song, P.: Parallel and distributed methods for nonconvex optimization. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 840–844. IEEE (2014)
Woodsend, K., Gondzio, J.: Hybrid MPI/OpenMP parallel linear support vector machine training. J. Mach. Learn. Res. 10(Aug), 1937–1953 (2009)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer, Control and Management Engineering Antonio Ruberti, Sapienza University of Rome, Rome, Italy
Tommaso Colombo & Simone Sagratella

Authors

Tommaso Colombo
View author publications
You can also search for this author in PubMed Google Scholar
Simone Sagratella
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Simone Sagratella.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The work of the authors was partially supported by the Grant: “Finanziamenti di ateneo per la ricerca scientifica 2018” n. RP11816432902D1E, Sapienza University of Rome.

The classical gradient projection algorithm defined in (2) is not distributable

Consider the general case in which S is not separable due to the presence of the constraints h that couple the different blocks of variables $x_{(\nu )}$, i.e. $m > 0$. To solve Problem (1), one could think to employ the following naive parallel version of the classical gradient projection algorithm (whose original generic iteration is defined in (2)):

$$\begin{aligned} x^{k+1}_{(\nu )} = {\mathscr {P}}_{S_\nu (x^{k}_{(\nu )})} \left[ x^{k}_{(\nu )} - \alpha _k \nabla f(x^k)_{(\nu )} \right] , \qquad \nu = 1, \ldots , N, \end{aligned}$$

where $x^k \in S$ and the decomposed subsets $S_\nu $ are defined in the following way

$$\begin{aligned} S_\nu (x_{(\nu )}^k) \triangleq \left\{ x_{(\nu )} \in X_\nu \, : \, A_{*(\nu )} x_{(\nu )} = A_{*(\nu )} x^k_{(\nu )} \right\} \subseteq \mathfrak {R}^{n_\nu }, \qquad \nu = 1, \ldots , N. \end{aligned}$$

Let $\{x^k\}$ be the sequence produced by this algorithm. The sets $S_\nu $ are fixed during the iterations and depend only on the starting point $x^0$:

$$\begin{aligned} S_\nu (x^{\nu ,k}) = S_\nu (x_{(\nu )}^0) = S_\nu ^0, \qquad \forall \, k \ge 0, \qquad \nu = 1, \ldots , N. \end{aligned}$$

A fixed point ${\overline{x}}$ for $\{x^k\}$ is therefore a solution of the following variational inequality problem (see e.g. [1, 15, 16, 25])

$$\begin{aligned} {\overline{x}} \in X, \;\; h({\overline{x}}) = 0, \qquad \nabla f({\overline{x}})^T (x - {\overline{x}}) \ge 0, \qquad \forall \, x \in \prod _{\nu =1}^N S_\nu ^0. \end{aligned}$$

(10)

On the other hand, computing a solution of Problem (1), being a fixed point for the iterations defined in (2), turns out to be a solution $x^*$ of this different variational inequality

$$\begin{aligned} x^* \in X, \;\; h(x^*) = 0, \qquad \nabla f(x^*)^T (x - x^*) \ge 0, \qquad \forall \, x \in S. \end{aligned}$$

(11)

Notice that the point ${\overline{x}}$, solution of the variational inequality (10), could not be a solution of the variational inequality (11), and therefore of Problem (1). This is due to the fact that, for any feasible starting guess $x^0 \in S$, we obtain only

$$\begin{aligned} \displaystyle \prod _{\nu = 1}^N S_\nu ^{0} \subseteq S, \end{aligned}$$

but not the other inclusion in general. Actually the fixed point ${\overline{x}}$ of $\{x^k\}$ is only an equilibrium of the (potential) generalized Nash equilibrium problem (see e.g. [14, 17, 36]) whose generic player $\nu \in \{1,\ldots , N\}$ solves the following optimization problem that is parametric with respect to all the blocks of variables $x_{(\mu )}$ of the other players $\mu \ne \nu $:

$$\begin{aligned} \begin{array}{rl} \underset{x_{(\nu )}}{\text{ minimize }} \;\; &{} f(x)\\ \text {s.t. } \; &{} h(x) = 0, \;\; x_{(\nu )} \in X_\nu . \end{array} \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Colombo, T., Sagratella, S. Distributed algorithms for convex problems with linear coupling constraints. J Glob Optim 77, 53–73 (2020). https://doi.org/10.1007/s10898-019-00792-z

Download citation

Received: 20 November 2018
Accepted: 22 May 2019
Published: 28 May 2019
Issue Date: May 2020
DOI: https://doi.org/10.1007/s10898-019-00792-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Distributed algorithms for convex problems with linear coupling constraints

Abstract

Access this article

Similar content being viewed by others

Parallel and Distributed Successive Convex Approximation Methods for Big-Data Optimization

Distributed primal outer approximation algorithm for sparse convex programming with separable structures

Optimization for Large-Scale Machine Learning with Distributed Features and Observations

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

The classical gradient projection algorithm defined in (2) is not distributable

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Distributed algorithms for convex problems with linear coupling constraints

Abstract

Access this article

Similar content being viewed by others

Parallel and Distributed Successive Convex Approximation Methods for Big-Data Optimization

Distributed primal outer approximation algorithm for sparse convex programming with separable structures

Optimization for Large-Scale Machine Learning with Distributed Features and Observations

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

The classical gradient projection algorithm defined in (2) is not distributable

The classical gradient projection algorithm defined in (2) is not distributable

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation