Skip to main content
Log in

Distributed algorithms for convex problems with linear coupling constraints

  • Published:
Journal of Global Optimization Aims and scope Submit manuscript

Abstract

Distributed and parallel algorithms have been frequently investigated in the recent years, in particular in applications like machine learning. Nonetheless, only a small subclass of the optimization algorithms in the literature can be easily distributed, for the presence, e.g., of coupling constraints that make all the variables dependent from each other with respect to the feasible set. Augmented Lagrangian methods are among the most used techniques to get rid of the coupling constraints issue, namely by moving such constraints to the objective function in a structured, well-studied manner. Unfortunately, standard augmented Lagrangian methods need the solution of a nested problem by needing to (at least inexactly) solve a subproblem at each iteration, therefore leading to potential inefficiency of the algorithm. To fill this gap, we propose an augmented Lagrangian method to solve convex problems with linear coupling constraints that can be distributed and requires a single gradient projection step at every iteration. We give a formal convergence proof to at least \(\varepsilon \)-approximate solutions of the problem and a detailed analysis of how the parameters of the algorithm influence the value of the approximating parameter \(\varepsilon \). Furthermore, we introduce a distributed version of the algorithm allowing to partition the data and perform the distribution of the computation in a parallel fashion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Aussel, D., Sagratella, S.: Sufficient conditions to compute any solution of a quasivariational inequality via a variational inequality. Math. Methods Oper. Res. 85(1), 3–18 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  2. Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (1999)

    MATH  Google Scholar 

  3. Bertsekas, D.P.: Convex Optimization Algorithms. Athena Scientific, Nashua, NH, USA (2015)

    MATH  Google Scholar 

  4. Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation: Numerical Methods, vol. 23. Prentice hall, Englewood Cliffs (1989)

    MATH  Google Scholar 

  5. Birgin, E.G., Martinez, J.M.: Practical Augmented Lagrangian Methods for Constrained Optimization, vol. 10. SIAM, Philadelphia (2014)

    Book  MATH  Google Scholar 

  6. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J., et al.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 3(1), 1–122 (2011)

    MATH  Google Scholar 

  7. Cannelli, L., Facchinei, F., Scutari, G.: Multi-agent asynchronous nonconvex large-scale optimization. In: 2017 IEEE 7th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), pp. 1–5. IEEE (2017)

  8. Cassioli, A., Di Lorenzo, D., Sciandrone, M.: On the convergence of inexact block coordinate descent methods for constrained optimization. Eur. J. Oper. Res. 231(2), 274–281 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  9. Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)

    Google Scholar 

  10. Clarke, F.H.: Optimization and Nonsmooth Analysis, vol. 5. SIAM, Philadelphia (1990)

    Book  MATH  Google Scholar 

  11. Daneshmand, A., Sun, Y., Scutari, G., Facchinei, F., Sadler, B.M.: Decentralized dictionary learning over time-varying digraphs (2018). arXiv preprint arXiv:1808.05933

  12. Di Pillo, G., Lucidi, S.: On exact augmented lagrangian functions in nonlinear programming. In: Di Pillo, G., Giannessi, F. (eds.) Nonlinear Optimization and Applications, pp. 85–100. Springer, Boston (1996)

    Chapter  MATH  Google Scholar 

  13. Di Pillo, G., Lucidi, S.: An augmented lagrangian function with improved exactness properties. SIAM J. Optim. 12(2), 376–406 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  14. Facchinei, F., Kanzow, C.: Generalized Nash equilibrium problems. 4OR 5(3), 173–210 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  15. Facchinei, F., Kanzow, C., Karl, S., Sagratella, S.: The semismooth Newton method for the solution of quasi-variational inequalities. Comput. Optim. Appl. 62(1), 85–109 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  16. Facchinei, F., Pang, J.S.: Finite-Dimensional Variational Inequalities and Complementarity Problems. Springer, Berlin (2007)

    MATH  Google Scholar 

  17. Facchinei, F., Sagratella, S.: On the computation of all solutions of jointly convex generalized Nash equilibrium problems. Optim. Lett. 5(3), 531–547 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  18. Facchinei, F., Scutari, G., Sagratella, S.: Parallel selective algorithms for nonconvex big data optimization. IEEE Trans. Signal Process. 63(7), 1874–1889 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  19. García, R., Marín, A., Patriksson, M.: Column generation algorithms for nonlinear optimization, I: convergence analysis. Optimization 52(2), 171–200 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  20. Gondzio, J., Grothey, A.: Exploiting structure in parallel implementation of interior point methods for optimization. Comput. Manag. Sci. 6(2), 135–160 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  21. Harchaoui, Z., Juditsky, A., Nemirovski, A.: Conditional gradient algorithms for machine learning. In: NIPS Workshop on Optimization for ML, vol. 3, pp. 3-2 (2012)

  22. Hong, M., Luo, Z.Q.: On the linear convergence of the alternating direction method of multipliers. Math. Program. 162(1–2), 165–199 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  23. Jaggi, M.: Revisiting Frank-Wolfe: projection-free sparse convex optimization. ICML 1, 427–435 (2013)

    Google Scholar 

  24. Lacoste-Julien, S., Jaggi, M.: On the global linear convergence of Frank-Wolfe optimization variants. In: Advances in Neural Information Processing Systems, pp. 496–504 (2015)

  25. Latorre, V., Sagratella, S.: A canonical duality approach for the solution of affine quasi-variational inequalities. J. Global Optim. 64(3), 433–449 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  26. Lin, C.J., Lucidi, S., Palagi, L., Risi, A., Sciandrone, M.: Decomposition algorithm model for singly linearly-constrained problems subject to lower and upper bounds. J. Optim. Theory Appl. 141(1), 107–126 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  27. Lucidi, S.: New results on a class of exact augmented lagrangians. J. Optim. Theory Appl. 58(2), 259–282 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  28. Lucidi, S., Palagi, L., Risi, A., Sciandrone, M.: A convergent decomposition algorithm for support vector machines. Comput. Optim. Appl. 38(2), 217–234 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  29. Mangasarian, O.: Machine learning via polyhedral concave minimization. In: Fischer, H., Riedmüller, B., Schäffler, S. (eds.) Applied Mathematics and Parallel Computing, pp. 175–188. Springer, Berlin (1996)

    Chapter  Google Scholar 

  30. Manno, A., Palagi, L., Sagratella, S.: Parallel decomposition methods for linearly constrained problems subject to simple bound with application to the SVMs training. Comput. Optim. Appl. 71(1), 115–145 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  31. Manno, A., Sagratella, S., Livi, L.: A convergent and fully distributable SVMs training algorithm. In: 2016 International Joint Conference on Neural Networks (IJCNN), pp. 3076–3080. IEEE (2016)

  32. Ouyang, H., Gray, A.: Fast stochastic Frank-Wolfe algorithms for nonlinear SVMs. In: Proceedings of the 2010 SIAM International Conference on Data Mining, pp. 245–256. SIAM (2010)

  33. Piccialli, V., Sciandrone, M.: Nonlinear optimization and support vector machines. 4OR 16(2), 111–149 (2018). https://doi.org/10.1007/s10288-018-0378-2

    Article  MathSciNet  MATH  Google Scholar 

  34. Rockafellar, R.T.: Augmented Lagrange multiplier functions and duality in nonconvex programming. SIAM J. Control 12(2), 268–285 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  35. Rockafellar, R.T., Wets, R.J.B.: Variational Analysis, vol. 317. Springer, Berlin (2009)

    MATH  Google Scholar 

  36. Sagratella, S.: Algorithms for generalized potential games with mixed-integer variables. Comput. Optim. Appl. 68(3), 689–717 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  37. Scutari, G., Facchinei, F., Lampariello, L.: Parallel and distributed methods for constrained nonconvex optimization—part I: theory. IEEE Trans. Signal Process. 65(8), 1929–1944 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  38. Scutari, G., Facchinei, F., Lampariello, L., Sardellitti, S., Song, P.: Parallel and distributed methods for constrained nonconvex optimization—part II: applications in communications and machine learning. IEEE Trans. Signal Process. 65(8), 1945–1960 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  39. Scutari, G., Facchinei, F., Lampariello, L., Song, P.: Parallel and distributed methods for nonconvex optimization. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 840–844. IEEE (2014)

  40. Woodsend, K., Gondzio, J.: Hybrid MPI/OpenMP parallel linear support vector machine training. J. Mach. Learn. Res. 10(Aug), 1937–1953 (2009)

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Simone Sagratella.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The work of the authors was partially supported by the Grant: “Finanziamenti di ateneo per la ricerca scientifica 2018” n. RP11816432902D1E, Sapienza University of Rome.

The classical gradient projection algorithm defined in (2) is not distributable

The classical gradient projection algorithm defined in (2) is not distributable

Consider the general case in which S is not separable due to the presence of the constraints h that couple the different blocks of variables \(x_{(\nu )}\), i.e. \(m > 0\). To solve Problem (1), one could think to employ the following naive parallel version of the classical gradient projection algorithm (whose original generic iteration is defined in (2)):

$$\begin{aligned} x^{k+1}_{(\nu )} = {\mathscr {P}}_{S_\nu (x^{k}_{(\nu )})} \left[ x^{k}_{(\nu )} - \alpha _k \nabla f(x^k)_{(\nu )} \right] , \qquad \nu = 1, \ldots , N, \end{aligned}$$

where \(x^k \in S\) and the decomposed subsets \(S_\nu \) are defined in the following way

$$\begin{aligned} S_\nu (x_{(\nu )}^k) \triangleq \left\{ x_{(\nu )} \in X_\nu \, : \, A_{*(\nu )} x_{(\nu )} = A_{*(\nu )} x^k_{(\nu )} \right\} \subseteq \mathfrak {R}^{n_\nu }, \qquad \nu = 1, \ldots , N. \end{aligned}$$

Let \(\{x^k\}\) be the sequence produced by this algorithm. The sets \(S_\nu \) are fixed during the iterations and depend only on the starting point \(x^0\):

$$\begin{aligned} S_\nu (x^{\nu ,k}) = S_\nu (x_{(\nu )}^0) = S_\nu ^0, \qquad \forall \, k \ge 0, \qquad \nu = 1, \ldots , N. \end{aligned}$$

A fixed point \({\overline{x}}\) for \(\{x^k\}\) is therefore a solution of the following variational inequality problem (see e.g. [1, 15, 16, 25])

$$\begin{aligned} {\overline{x}} \in X, \;\; h({\overline{x}}) = 0, \qquad \nabla f({\overline{x}})^T (x - {\overline{x}}) \ge 0, \qquad \forall \, x \in \prod _{\nu =1}^N S_\nu ^0. \end{aligned}$$
(10)

On the other hand, computing a solution of Problem (1), being a fixed point for the iterations defined in (2), turns out to be a solution \(x^*\) of this different variational inequality

$$\begin{aligned} x^* \in X, \;\; h(x^*) = 0, \qquad \nabla f(x^*)^T (x - x^*) \ge 0, \qquad \forall \, x \in S. \end{aligned}$$
(11)

Notice that the point \({\overline{x}}\), solution of the variational inequality (10), could not be a solution of the variational inequality (11), and therefore of Problem (1). This is due to the fact that, for any feasible starting guess \(x^0 \in S\), we obtain only

$$\begin{aligned} \displaystyle \prod _{\nu = 1}^N S_\nu ^{0} \subseteq S, \end{aligned}$$

but not the other inclusion in general. Actually the fixed point \({\overline{x}}\) of \(\{x^k\}\) is only an equilibrium of the (potential) generalized Nash equilibrium problem (see e.g. [14, 17, 36]) whose generic player \(\nu \in \{1,\ldots , N\}\) solves the following optimization problem that is parametric with respect to all the blocks of variables \(x_{(\mu )}\) of the other players \(\mu \ne \nu \):

$$\begin{aligned} \begin{array}{rl} \underset{x_{(\nu )}}{\text{ minimize }} \;\; &{} f(x)\\ \text {s.t. } \; &{} h(x) = 0, \;\; x_{(\nu )} \in X_\nu . \end{array} \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Colombo, T., Sagratella, S. Distributed algorithms for convex problems with linear coupling constraints. J Glob Optim 77, 53–73 (2020). https://doi.org/10.1007/s10898-019-00792-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10898-019-00792-z

Keywords

Navigation