Skip to main content
Log in

An efficient adaptive accelerated inexact proximal point method for solving linearly constrained nonconvex composite problems

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

This paper proposes an efficient adaptive variant of a quadratic penalty accelerated inexact proximal point (QP-AIPP) method proposed earlier by the authors. Both the QP-AIPP method and its variant solve linearly set constrained nonconvex composite optimization problems using a quadratic penalty approach where the generated penalized subproblems are solved by a variant of the underlying AIPP method. The variant, in turn, solves a given penalized subproblem by generating a sequence of proximal subproblems which are then solved by an accelerated composite gradient algorithm. The main difference between AIPP and its variant is that the proximal subproblems in the former are always convex while the ones in the latter are not necessarily convex due to the fact that their prox parameters are chosen as aggressively as possible so as to improve efficiency. The possibly nonconvex proximal subproblems generated by the AIPP variant are also tentatively solved by a novel adaptive accelerated composite gradient algorithm based on the validity of some key convergence inequalities. As a result, the variant generates a sequence of proximal subproblems where the stepsizes are adaptively changed according to the responses obtained from the calls to the accelerated composite gradient algorithm. Finally, numerical results are given to demonstrate the efficiency of the proposed AIPP and QP-AIPP variants.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. See the MovieLens 100K dataset containing 610 users and 9724 movies, which is found in https://grouplens.org/datasets/movielens/.

References

  1. Amir, B.: First-Order Methods in Optimization, vol. 25. SIAM, Philadelphia (2017)

    MATH  Google Scholar 

  2. Aybat, N.S., Iyengar, G.: A first-order smoothed penalty method for compressed sensing. SIAM J. Optim. 21(1), 287–313 (2011)

    Article  MathSciNet  Google Scholar 

  3. Aybat, N.S., Iyengar, G.: A first-order augmented Lagrangian method for compressed sensing. SIAM J. Optim. 22(2), 429–459 (2012)

    Article  MathSciNet  Google Scholar 

  4. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)

    Article  MathSciNet  Google Scholar 

  5. Carmon, Y., Duchi, J.C., Hinder, O., Sidford, A.: Accelerated methods for nonconvex optimization. SIAM J. Optim. 28(2), 1751–1772 (2018)

    Article  MathSciNet  Google Scholar 

  6. Cartis, C., Gould, N., Toint, P.: On the complexity of steepest descent, Newton’s and regularized Newton’s methods for nonconvex unconstrained optimization problems. SIAM J. Optim. 20(6), 2833–2852 (2010)

    Article  MathSciNet  Google Scholar 

  7. Chen, Y., Lan, G., Ouyang, Y.: Optimal primal-dual methods for a class of saddle point problems. SIAM J. Optim. 24(4), 1779–1814 (2014)

    Article  MathSciNet  Google Scholar 

  8. Drusvyatskiy, D., Paquette, C.: Efficiency of minimizing compositions of convex functions and smooth maps. Math. Program. 178, 1–56 (2018)

    MathSciNet  MATH  Google Scholar 

  9. Ghadimi, S., Lan, G.: Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Math. Program. 156, 59–99 (2016)

    Article  MathSciNet  Google Scholar 

  10. Ghadimi, S., Lan, G., Zhang, H.: Generalized uniformly optimal methods for nonlinear programming. J. Sci. Comput. 79(3), 1854–1881 (2019)

    Article  MathSciNet  Google Scholar 

  11. Gu, Q., Wang, Z., Liu, H.: Sparse pca with oracle property. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 1529–1537. Curran Associates Inc, Red Hook (2014)

    Google Scholar 

  12. He, Y., Monteiro, R.D.C.: Accelerating block-decomposition first-order methods for solving composite saddle-point and two-player Nash equilibrium problems. SIAM J. Optim. 25(4), 2182–2211 (2015)

    Article  MathSciNet  Google Scholar 

  13. He, Y., Monteiro, R.D.C.: An accelerated HPE-type algorithm for a class of composite convex-concave saddle-point problems. SIAM J. Optim. 26(1), 29–56 (2016)

    Article  MathSciNet  Google Scholar 

  14. Kolossoski, O., Monteiro, R.D.C.: An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems. Optim. Methods Softw. 32(6), 1244–1272 (2017)

    Article  MathSciNet  Google Scholar 

  15. Kong, W., Melo, J.G., Monteiro, R.D.C.: Complexity of a quadratic penalty accelerated inexact proximal point method for solving linearly constrained nonconvex composite programs. SIAM J. Optim. 29(4), 2566–2593 (2019)

    Article  MathSciNet  Google Scholar 

  16. Lan, G., Monteiro, R.D.C.: Iteration-complexity of first-order penalty methods for convex programming. Manuscript, School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, 30332 (2008)

  17. Lan, G., Monteiro, R.D.C.: Iteration-complexity of first-order penalty methods for convex programming. Math. Program. 138(1), 115–139 (2013)

    Article  MathSciNet  Google Scholar 

  18. Lan, G., Monteiro, R.D.C.: Iteration-complexity of first-order augmented Lagrangian methods for convex programming. Math. Program. 155(1), 511–547 (2016)

    Article  MathSciNet  Google Scholar 

  19. Li, H., Lin, Z.: Accelerated proximal gradient methods for nonconvex programming. Adv. Neural Inf. Process. Syst. 28, 379–387 (2015)

    Google Scholar 

  20. Liang, J., Monteiro, R.D.C., Sim, C.K.: A fista-type accelerated gradient algorithm for solving smooth nonconvex composite optimization problems. arXiv preprint arXiv:1905.07010 (2019)

  21. Liu, Y.F., Liu, X., Ma, S.: On the nonergodic convergence rate of an inexact augmented lagrangian framework for composite convex programming. Math. Oper. Res. 44(2), 632–650 (2019)

    Article  MathSciNet  Google Scholar 

  22. Lu, Z., Zhou, Z.: Iteration-complexity of first-order augmented Lagrangian methods for convex conic programming. Available on arXiv:1803.09941 (2018)

  23. Monteiro, R.D.C., Ortiz, C., Svaiter, B.F.: An adaptive accelerated first-order method for convex optimization. Comput. Optim. Appl. 64, 31–73 (2016)

    Article  MathSciNet  Google Scholar 

  24. Monteiro, R.D.C., Svaiter, B.F.: Iteration-complexity of a Newton proximal extragradient method for monotone variational inequalities and inclusion problems. SIAM J. Optim. 22(3), 914–935 (2012)

    Article  MathSciNet  Google Scholar 

  25. Monteiro, R.D.C., Svaiter, B.F.: An accelerated hybrid proximal extragradient method for convex optimization and its implications to second-order methods. SIAM J. Optim. 23(2), 1092–1125 (2013)

    Article  MathSciNet  Google Scholar 

  26. Necoara, I., Patrascu, A., Glineur, F.: Complexity of first-order inexact Lagrangian and penalty methods for conic convex programming. Optim. Methods Softw. 34, 1–31 (2017)

    MathSciNet  MATH  Google Scholar 

  27. Nesterov, Y.E.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publ, Boston (2004)

    Book  Google Scholar 

  28. Nesterov, Y.E., Polyak, B.T.: Cubic regularization of newton method and its global performance. Math. Program. 108(1), 177–205 (2006)

    Article  MathSciNet  Google Scholar 

  29. Paquette, C., Lin, H., Drusvyatskiy, D., Mairal, J., Harchaoui, Z.: Catalyst for gradient-based nonconvex optimization. In: AISTATS 2018-21st International Conference on Artificial Intelligence and Statistics, pp 1–10 (2018)

  30. Patrascu, A., Necoara, I., Tran-Dinh, Q.: Adaptive inexact fast augmented Lagrangian methods for constrained convex optimization. Optim. Lett. 11(3), 609–626 (2017)

    Article  MathSciNet  Google Scholar 

  31. Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14(5), 877–898 (1976)

    Article  MathSciNet  Google Scholar 

  32. Solodov, M.V., Svaiter, B.F.: A hybrid approximate extragradient-proximal point algorithm using the enlargement of a maximal monotone operator. Set-Valued Var. Anal. 7(4), 323–345 (1999)

    Article  MathSciNet  Google Scholar 

  33. Tran-Dinh, Q.: Proximal alternating penalty algorithms for nonsmooth constrained convex optimization. Comput. Optim. Appl. 72(1), 1–43 (2019)

    Article  MathSciNet  Google Scholar 

  34. Xu, Y.: Iteration complexity of inexact augmented Lagrangian methods for constrained convex programming. Available on arXiv:1711.05812 (2017)

  35. Yao, Q., Kwok, J.T.: Efficient learning with a family of nonconvex regularizers by redistributing nonconvexity. J. Mach. Learn. Res. 18, 179–1 (2017)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

Weiwei Kong: This author acknowledges the support of the Natural Sciences and Engineering Research Council of Canada (NSERC), [funding reference number PGSD3-516700-2018], and the partial support of ONR Grant N00014-18-1-2077. Renato D.C. Monteiro: The works of this author were partially supported by ONR Grant N00014-18-1-2077. Jefferson G. Melo: The works of this author were paritally supported by CNPq grant 312559/2019-4 and FAPEG/GO.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weiwei Kong.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

This appendix contains proofs and statements of several technical results used in the main body of the paper. It contains three subsections. The first subsection consists of proofs about the refinement procedure of Sect. 2; the second subsection consists of proofs about the R-ACG algorithm of Sect. 3; and the third subsection consists of technical results related to Sect. 5.

1.1 Properties of the refinement procedure

Proof of Proposition 2.1

It follows from [15, Lemma 19] with \((f,h,L)=(f_{{\lambda }},h_{{\lambda }},M_{{\lambda }})\) that \(\varDelta \ge 0\) and

$$\begin{aligned} M_{{\lambda }}(z-{\hat{z}})\in \nabla f_{{\lambda }}(z)+\partial h_{{\lambda }}({\hat{z}})={\lambda }\nabla g(z)+(z-z^{-}-v)+{\lambda }\partial h({\hat{z}}). \end{aligned}$$

Dividing by \({\lambda }\) and rearranging terms yields

$$\begin{aligned} \frac{1}{{\lambda }}\left[ M_{{\lambda }}(z-{\hat{z}})+(v+z^{-}-z)\right] -\nabla g(z)\in \partial h({\hat{z}}). \end{aligned}$$

Adding \(\nabla g({\hat{z}})\) to both sides and using the definition of \({\hat{v}}\) gives

$$\begin{aligned} {\hat{v}}=\frac{1}{{\lambda }}\left[ M_{{\lambda }}(z-{\hat{z}})+(v+z^{-}-z)\right] +\nabla g({\hat{z}})-\nabla g(z)\in \nabla g({\hat{z}})+\partial h({\hat{z}}), \end{aligned}$$

which is the inclusion in (2.8).

We now bound \({\lambda }\Vert {\hat{v}}\Vert\). Since [15, Lemma 19] implies that \(\Vert z-{\hat{z}}\Vert \le \sqrt{2M_{{\lambda }}^{-1}\varDelta }\) and \(\nabla g\) is M–Lipschitz continuous then

$$\begin{aligned} {\lambda }\Vert {\hat{v}}\Vert&\le \Vert M_{{\lambda }}(z-{\hat{z}})\Vert +\Vert v+z^{-}-z\Vert +{\lambda }\Vert \nabla g({\hat{z}})-\nabla g(z)\Vert \\&\le \sqrt{2M_{{\lambda }}\varDelta }+\Vert v+z^{-}-z\Vert +{\lambda }M\Vert {\hat{z}}-z\Vert \le \sqrt{2M_{{\lambda }}\varDelta }+\Vert v+z^{-}-z\Vert +M_{{\lambda }}\Vert {\hat{z}}-z\Vert \\&\le \sqrt{2M_{{\lambda }}\varDelta }+\Vert v+z^{-}-z\Vert +M_{{\lambda }}\sqrt{2M_{{\lambda }}^{-1}\varDelta } =\Vert v+z^{-}-z\Vert +2\sqrt{2M_{{\lambda }}\varDelta }, \end{aligned}$$

which is the inequality in (2.8). \(\square\)

1.2 Properties of the R-ACG algorithm

Proof of Proposition 3.2(a)

Let \(\ell\) denote the quantity in (3.15). Assume that the R-ACG algorithm has performed \(\ell\)-iterations without declaring failure. In view of step 2 of the R-ACG algorithm, it follows that both (3.10) and (3.11) hold for every \(1\le j\le \ell\). We will show that it must stop successfully at the end of the \(\ell ^{\mathrm{th}}\) iteration, and hence that the conclusion of the lemma holds. Indeed, note that (3.14), (3.15), and the fact that \(\log (1+t) \ge t/2\) for all \(t\in [0,1]\) implies that

$$\begin{aligned} A_\ell \ge \frac{2}{1+ 2{{\widetilde{M}}}}\left( 1+ \frac{1}{2} \sqrt{\frac{1}{1+2{{\widetilde{M}}}}}\right) ^{2(\ell -1)}\ge 2C > 2. \end{aligned}$$
(A.1)

Combining the triangle inequality, (3.10), the fact that \(2/A_\ell \le 1/C\) and \((2/A_\ell )^2< 2/A_\ell < 1\) from (A.1), and the relation \((a+b)^2\le 2(a^2+b^2)\) for all \(a, b \in \mathfrak {R}\), we obtain

$$\begin{aligned} \Vert u_\ell \Vert ^2+2\eta _\ell&\le \max \{1/A_\ell ^2,1/(2A_\ell )\}(\Vert A_\ell u_\ell \Vert ^{2}+4A_\ell \eta _\ell )\\&\le \max \{1/A_\ell ^2,1/(2A_\ell )\}(2\Vert A_\ell u_\ell +x_\ell -x_0\Vert ^{2} +2\Vert x_\ell -x_0\Vert ^{2}+4A_\ell \eta _\ell )\\&\le \max \{(2/A_\ell )^2,2/A_\ell \}\Vert x_\ell -x_0\Vert ^{2} \le \frac{1}{C}\Vert x_\ell -x_0\Vert ^{2}. \end{aligned}$$

On the other hand, using the triangle inequality and the fact that \((a+b)^2\le (1+s)a^2+(1+1/s)b^2\) for every \((a,b,s)\in \mathfrak {R}\times \mathfrak {R}\times R_{++}\) (under the choice of \(s=1/(\sqrt{C}-1)\)), we obtain

$$\begin{aligned} \Vert x_\ell -x_0\Vert ^{2}\le \frac{\sqrt{C}}{\sqrt{C}-1}\Vert x_0-x_\ell +u_\ell \Vert ^2 +\sqrt{C}\Vert u_\ell \Vert ^2. \end{aligned}$$

Combining the previous estimates, we then conclude that

$$\begin{aligned} \Vert u_\ell \Vert ^2+2\eta _\ell \le \frac{1}{C-\sqrt{C}}\Vert x_0-x_\ell +u_\ell \Vert ^2 + \frac{1}{\sqrt{C}}\Vert u_\ell \Vert ^2, \end{aligned}$$
(A.2)

which, after a simple algebraic manipulation, easily implies that

$$\begin{aligned} \frac{1}{\sqrt{C}-1}\Vert x_{0}-x_{\ell }+u_{\ell }\Vert ^{2} \ge 2\sqrt{C}\eta _{\ell }+\left( \sqrt{C}-1\right) \Vert u_{\ell }\Vert ^{2} \ge \left( \sqrt{C}-1\right) \left( \Vert u_{\ell }\Vert ^{2} + 2\eta _{\ell } \right) . \end{aligned}$$
(A.3)

Using the first term in the maximum of (3.16) together with the second inequality of (A.3) immediately implies that (3.12) holds with \(j=\ell\). To show that (3.13) holds at \(j=\ell\), observe that the definition of \(\psi\) in (3.3), (3.11) with \(j=\ell\), the second inequality of (A.3), and the second term in the maximum of (3.16) imply that

$$\begin{aligned} \widetilde{\phi }(x_{0})-\widetilde{\phi }(x_{\ell })&\ge \left\langle u_{\ell },x_{0}-x_{\ell }\right\rangle +\eta _{\ell }+\frac{1}{2}\Vert x_{\ell }-x_{0}\Vert ^{2} = \frac{1}{2}\left[ \Vert x_{0}-x_{\ell }+u_{\ell }\Vert ^{2}-\left( \Vert u_{\ell }\Vert ^{2} +2\eta _{\ell }\right) \right] \\&\ge \frac{1}{2} \left[ 1 + \left( \sqrt{C}-1\right) ^{-2}\right] \Vert x_{0} -x_{\ell }+u_{\ell }\Vert ^{2} \ge \frac{1}{\theta }\Vert x_{0}-x_{\ell }+u_{\ell }\Vert ^{2}. \end{aligned}$$

\(\square\)

1.3 Results related to Sect. 5

Lemma A.1

Assume that\(f, h:\mathfrak {R}^{n}\mapsto (-\infty ,\infty ]\)satisfy assumptions (C1) and (C3) in Sect. 5, and that, in addition,fis lower semicontinuous on\(\mathrm {cl}\,(\mathrm {dom}\,h)\). Then,\(\varphi := f+h\)is a proper lower semicontinuous function which has a global minimum over\(\mathfrak {R}^n\).

Proof

Suppose \(\bar{z}\in \mathfrak {R}^{n}\backslash \mathrm {cl}\,(\mathrm {dom}\,h)\). Since \(\mathrm {cl}\,(\mathrm {dom}\,h)\) is closed, there exists \(\varepsilon >0\) such that \(h(u) = \infty\) for every \(u \in \mathfrak {R}^{n}\backslash \mathrm {cl}\,(\mathrm {dom}\,h)\) satisfying \(\Vert u-{\bar{z}}\Vert < \varepsilon\). Hence, \(\liminf _{u\rightarrow \bar{z}}\varphi (u)=\infty =\varphi (\bar{z})\). Now suppose \(\bar{z}\in \mathrm {cl}\,(\mathrm {dom}\,h)\). By the lower semicontinuity of f and h we have

$$\begin{aligned} \liminf _{u\rightarrow \bar{z}}\varphi (u)\ge \liminf _{u\rightarrow \bar{z}}f(u)+\liminf _{u\rightarrow \bar{z}}h(u)\ge f(\bar{z})+h(\bar{z})=\varphi (\bar{z}) \end{aligned}$$

and, since f is differentiable on \(\mathrm {dom}\,h\), the function \(\varphi\) is proper lower semicontinuous with \(\mathrm {dom}\,\varphi = \mathrm {dom}\,h\). The last statement of the lemma follows from the well known fact that infimum of a lower semicontinuous function over a bounded set, namely, \(\mathrm {dom}\,\varphi\), is always attained. \(\square\)

1.4 Comparison with the AIPP method

This subsection presents some computational results that compare the AIPP method of [15] with the R-AIPPc method described at the beginning of Sect. 6. The main problem of interest for this sub-subsection is the quadratic matrix problem described in Sect. 6.1.1.

We now describe the particular implementation of the AIPP method used in this sub-subsection, which differs from its description in [15] in two ways. First, its innermost subroutine, namely, the ACG method, stops immediately when a quadruple \(({\lambda }_k, z_k, v_k, \varepsilon _k)\) satisfying (2.14) is found. Second, for each iteration k of the method, a triple \(({{\hat{z}}}, {{\hat{v}}}, \varDelta )\) is generated from the refinement procedure in Section 2 by assigning \(({{\hat{z}}}, {{\hat{v}}}, \varDelta ) = RP({\lambda }_k, z_{k-1}, z_k, v_k)\), and the method stops with the desired output when \({{\hat{v}}}\) satisfies condition (6.1).

All experiment parameters for the R-AIPPc method and the problem instances are as described in Sect. 6.1.1 below, while the AIPP uses a parameter input of \((\sigma ,{\lambda })=(0.3, 1/(2m))\) for its results.

We now present the numerical tables for this set of problem instances (Table 16).

Table 16 Iteration counts for QM problems with fixed m

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kong, W., Melo, J.G. & Monteiro, R.D.C. An efficient adaptive accelerated inexact proximal point method for solving linearly constrained nonconvex composite problems. Comput Optim Appl 76, 305–346 (2020). https://doi.org/10.1007/s10589-020-00188-w

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-020-00188-w

Keywords

Mathematics Subject Classification

Navigation