Abstract
We demonstrate that difficult non-convex non-smooth optimization problems, such as Nash equilibrium problems and anisotropic as well as isotropic Potts segmentation models, can be written in terms of generalized conjugates of convex functionals. These, in turn, can be formulated as saddle-point problems involving convex non-smooth functionals and a general smooth but non-bilinear coupling term. We then show through detailed convergence analysis that a conceptually straightforward extension of the primal–dual proximal splitting method of Chambolle and Pock is applicable to the solution of such problems. Under sufficient local strong convexity assumptions on the functionals—but still with a non-bilinear coupling term—we even demonstrate local linear convergence of the method. We illustrate these theoretical results numerically on the aforementioned example problems.
Similar content being viewed by others
References
Aragón Artacho, F.J., Geoffroy, M.H.: Characterization of metric regularity of subdifferentials. J. Convex Anal. 15(2), 365–380 (2008)
Aragón Artacho, F.J., Geoffroy, M.H.: Metric subregularity of the convex subdifferential in Banach spaces. J. Nonlinear Convex Anal. 15(1), 35–47 (2014)
Attouch, H., Bolte, J., Svaiter, B.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Progr. 137(1–2), 91–129 (2013). https://doi.org/10.1007/s10107-011-0484-9
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books in Mathematics, 2nd edn. Springer, New York (2017)
Benning, M., Knoll, F., Schönlieb, C.B., Valkonen, T.: Preconditioned ADMM with nonlinear operator constraint. In: L. Bociu, J.A. Désidéri, A. Habbal (eds.) System Modeling and Optimization: 27th IFIP TC 7 Conference, CSMO 2015, Sophia Antipolis, France, June 29–July 3, 2015, Revised Selected Papers, pp. 117–126. Springer International Publishing (2016). https://tuomov.iki.fi/m/nonlinearADMM.pdf
Borzì, A., Kanzow, C.: Formulation and numerical solution of Nash equilibrium multiobjective elliptic control problems. SIAM J. Control Optim. 51(1), 718–744 (2013). https://doi.org/10.1137/120864921
Chambolle, A.: An algorithm for total variation minimization and applications. J. Math. Imaging Vis. 20(1), 89–97 (2004). https://doi.org/10.1023/B:JMIV.0000011325.36760.1e
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40, 120–145 (2011). https://doi.org/10.1007/s10851-010-0251-1
Clason, C., Kunisch, K.: A convex analysis approach to multi-material topology optimization. ESAIM Math. Modell. Numer. Anal. 50(6), 1917–1936 (2016). https://doi.org/10.1051/m2an/2016012
Clason, C., Valkonen, T.: Primal-dual extragradient methods for nonlinear nonsmooth PDE-constrained optimization. SIAM J. Optim. 27(3), 1313–1339 (2017). https://doi.org/10.1137/16M1080859
Clason, C., Mazurenko, S., Valkonen, T.: Acceleration and global convergence of a first-order primal-dual method for nonconvex problems. SIAM J. Optim. 29, 933–963 (2019). https://doi.org/10.1137/18M1170194
Clason, C., Mazurenko, S., Valkonen, T.: Julia codes for “primal-dual proximal splitting and generalized conjugation in non-smooth non-convex optimization”. Online resource on Zenodo (2020). https://doi.org/10.5281/zenodo.3647614
Drori, Y., Sabach, S., Teboulle, M.: A simple algorithm for a class of nonsmooth convex-concave saddle-point problems. Oper. Res. Lett. 43(2), 209–214 (2015). https://doi.org/10.1016/j.orl.2015.02.001
Ekeland, I., Temam, R.: Convex Analysis and Variational Problems. SIAM, Philadelphia (1999)
Elster, K.H., Wolf, A.: Recent Results on Generalized Conjugate Functions, pp. 67–78. Springer, New York (1988)
Facchinei, F., Kanzow, C.: Generalized Nash equilibrium problems. Ann. Oper. Res. 175, 177–211 (2010). https://doi.org/10.1007/s10479-009-0653-x
Flåm, S.D., Antipin, A.S.: Equilibrium programming using proximal-like algorithms. Math. Progr. 78(1, Ser. A), 29–41 (1997). https://doi.org/10.1007/BF02614504
Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6, 721–741 (1984). https://doi.org/10.1109/TPAMI.1984.4767596
Hamedani, E.Y., Aybat, N.S.: A primal-dual algorithm for general convex-concave saddle point problems (2018)
He, N., Juditsky, A., Nemirovski, A.: Mirror prox algorithm for multi-term composite minimization and semi-separable problems. Comput. Optim. Appl. 61(2), 275–319 (2015). https://doi.org/10.1007/s10589-014-9723-3
He, Y., Monteiro, R.D.: An accelerated HPE-type algorithm for a class of composite convex-concave saddle-point problems. SIAM J. Optim. 26(1), 29–56 (2016). https://doi.org/10.1137/14096757X
Juditsky, A., Nemirovski, A.: First Order Methods for Nonsmooth Convex Large-Scale Optimization, pp. 121–148. I General Purpose Methods. MIT Press, Cambridge (2011)
Juditsky, A., Nemirovski, A.: First Order Methods for Nonsmooth Convex Large-Scale Optimization II Utilizing Problems Structure, pp. 149–183. MIT Press, Cambridge (2011)
Kolossoski, O., Monteiro, R.: An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems. Optim. Methods Softw. 32(6), 1244–1272 (2017). https://doi.org/10.1080/10556788.2016.1266355
Krawczyk, J.B., Uryasev, S.: Relaxation algorithms to find Nash equilibria with economic applications. Environ. Model. Assess. 5(1), 63–73 (2000). https://doi.org/10.1023/A:1019097208499
Martinez-Legaz, J.E.: Generalized convex duality and its economic applications. In: Hadjisavvas, N., Komlósi, S., Schaible, S. (eds.) Handbook of Generalized Convexity and Generalized Monotonicity, pp. 237–292. Springer, New York (2005)
Nemirovski, A.: Prox-method with rate of convergence \(O(1/t)\) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15(1), 229–251 (2004). https://doi.org/10.1137/S1052623403425629
Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Progr. 103(1), 127–152 (2005). https://doi.org/10.1007/s10107-004-0552-5
Nikaidô, H., Isoda, K.: Note on non-cooperative convex games. Pac. J. Math. 5, 807–815 (1955). https://doi.org/10.2140/pjm.1955.5.807
Rasband, W.S.: ImageJ. https://imagej.nih.gov/ij/
Rosen, J.B.: Existence and uniqueness of equilibrium points for concave \(n\)-person games. Econometrica 33, 520–534 (1965). https://doi.org/10.2307/1911749
Singer, I.: Duality for Nonconvex Approximation and Optimization. Springer, New York (2006). https://doi.org/10.1007/0-387-28395-1
Storath, M., Weinmann, A., Demaret, L.: Jump-sparse and sparse recovery using Potts functionals. IEEE Trans. Signal Process. 62(14), 3654–3666 (2014). https://doi.org/10.1109/TSP.2014.2329263
Storath, M., Weinmann, A., Frikel, J., Unser, M.: Joint image reconstruction and segmentation using the potts model. Invers. Probl. 31(2), 025003 (2015). https://doi.org/10.1088/0266-5611/31/2/025003
Valkonen, T.: A primal-dual hybrid gradient method for non-linear operators with applications to MRI. Invers. Probl. 30(5), 055012 (2014). https://doi.org/10.1088/0266-5611/30/5/055012
Valkonen, T.: Testing and non-linear preconditioning of the proximal point method. Appl. Math. Optim. (2018). https://doi.org/10.1007/s00245-018-9541-6
Valkonen, T., Pock, T.: Acceleration of the PDHGM on partially strongly convex functions. J. Math. Imaging Vis. 59, 394–414 (2017). https://doi.org/10.1007/s10851-016-0692-2
von Heusinger, A., Kanzow, C.: Optimization reformulations of the generalized Nash equilibrium problem using Nikaido-Isoda-type functions. Comput. Optim. Appl. 43(3), 353–377 (2009). https://doi.org/10.1007/s10589-007-9145-6
Acknowledgements
In the first stages of the research T. Valkonen and S. Mazurenko were supported by the EPSRC First Grant EP/P021298/1, “PARTIAL Analysis of Relations in Tasks of Inversion for Algorithmic Leverage”. Later T. Valkonen was supported by the Academy of Finland grants 314701 and 320022. C. Clason was supported by the German Science Foundation (DFG) under grant Cl 487/2-1. We thank the anonymous reviewers for insightful comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
A data Statement for the EPSRC
The source codes for the numerical experiments are on Zenodo at [12].
Reductions of the Three-Point Condition
The following two propositions demonstrate that Assumption 3.2 (iv) is closely related to standard second-order optimality conditions, i.e., that the Hessian is positive definite at the solution \({\widehat{u}}\).
Proposition A.1
Suppose Assumption 3.2 (ii) (locally Lipschitz gradients of K) holds in some neighborhood \({\mathcal {U}}\) of \({\widehat{u}}\), and for some \(\xi _x\in {\mathbb {R}}\), \(\gamma _x>0\),
Then (15a) holds in \({\mathcal {U}}\) with \(\theta _x=2(\gamma _x-\alpha )L_{yx}^{-1}\), and \(\lambda _x=L_x({\widehat{y}})^2(2\alpha )^{-1}\) for any \(\alpha \in (0,\gamma _x]\).
Proof
An application of Cauchy’s and Young’s inequalities with any factor \(\alpha >0\), Assumption 3.2 (ii), and (66) yields the estimate
At the same time, using (16),
Therefore (15a) holds if we take \(\theta _x\le 2(\gamma _x-\alpha )L_{yx}^{-1}\) and \(\lambda _x=L_x({\widehat{y}})^2(2\alpha )^{-1}\). \(\square \)
Proposition A.2
Suppose Assumption 3.2 (ii) (locally Lipschitz gradients of K) holds in some neighborhood \({\mathcal {U}}\) of \({\widehat{u}}\) with \(L_y(x)\le {\bar{L}}_y\), and that
for some constant \(L_{xy} \ge 0\). Assume, moreover, for some \(\xi _y\in {\mathbb {R}}\), \(\gamma _y>0\) that
Then (15b) holds in \({\mathcal {U}}\) with \(\theta _y=2(\gamma _y-\alpha _1)(1+\alpha _2)^{-1} L_{xy}^{-1}\), and \(\lambda _y=({\bar{L}}_y^2(2\alpha _1)^{-1}+(1+\alpha _2^{-1})L_{xy}\theta _y)\) for any \(\alpha _1\in (0,\gamma _y]\), \(\alpha _2>0\).
Proof
An application of Cauchy’s and Young’s inequalities with any factor \(\alpha >0\), Assumption 3.2 (ii), and (67) yields the estimate
At the same time, using (16) and Young’s inequality for any \(\alpha _2>0\),
Therefore (15b) holds if we take \(\theta _y\le 2\frac{\gamma _y-\alpha _1}{(1+\alpha _2)L_{xy}}\) and \(\lambda _y=\frac{{\bar{L}}_y^2}{2\alpha _1}+(1+\alpha _2^{-1})L_{xy}\theta _y\). \(\square \)
Relaxations of the Three-Point Condition
In all the results of this paper, Assumption 3.2(iv) can be generalized to the following three-point condition similar to the one used in [11].
Assumption B.1
The functional \(K(x,y)\in C^1(X\times Y)\) and there exists a neighborhood
for some \(\rho _x,\rho _y>0\) such that for all \(u',u \in {\mathcal {U}}(\rho _x,\rho _y)\), the following property holds:
-
(iv*)
(three-point condition) There exist \(\theta _x,\theta _y > 0\), \(\lambda _x,\lambda _y\ge 0\), \(\xi _x,\xi _y\in {\mathbb {R}}\), and \(p_x,p_y\in [1,2]\) such that
$$\begin{aligned}&\langle K_x(x',{\widehat{y}})-K_x({\widehat{x}},{\widehat{y}}),x-{\widehat{x}}\rangle +\xi _x\Vert x-{\widehat{x}}\Vert ^2\nonumber \\&\quad \ge \theta _x\Vert K_y({\widehat{x}},y)-K_y(x,y)-K_{yx}(x,y)({\widehat{x}}-x)\Vert ^{p_x} -\frac{\lambda _x}{2}\Vert x-x'\Vert ^2, \end{aligned}$$(69a)$$\begin{aligned}&\langle K_y(x,y)-K_y(x,y')+K_y({\widehat{x}},{\widehat{y}})-K_y({\widehat{x}},y),y-{\widehat{y}}\rangle +\xi _y\Vert y-{\widehat{y}}\Vert ^2 \nonumber \\&\quad \ge \theta _y\Vert K_x(x',{\widehat{y}})-K_x(x',y')-K_{xy}(x',y')({\widehat{y}}-y')\Vert ^{p_y} -\frac{\lambda _y}{2}\Vert y-y'\Vert ^2. \end{aligned}$$(69b)
This assumption introduces \(p_x\) and \(p_y\) in [1, 2], while in Assumption 3.2 (iv) we had \(p_x=p_y=1\). For instance, in [11, Appendix B] we verified Assumption B.1 with \(p_x=2\) for the case \(K(x,y)=\langle A(x),y\rangle \) for the reconstruction of the phase and amplitude of a complex number. This relaxation mainly affects the proof of Step 4 in Theorem 4.2, which now requires a few intermediate derivations.
Corollary B.1
The results of Theorem 4.2 continue to hold if Assumption 3.2 (iv) is replaced with Assumption B.1 (iv*) for some \(p_x,p_y\in [1,2]\), where in case \(p_y \in (1, 2]\), (24d) is replaced by
and in case \(p_x \in (1, 2]\), (24e) is replaced by
Proof
The beginning of the proof follows the exact same steps as in the proof of Theorem 4.2 up until (30). We now use Assumption B.1 (iv*) to further bound \(D_x\) and \(D_y\) similarly to (31) and (32). From (69a),
The following generalized Young’s inequality for any positive a, b, p and q such that \(q^{-1}+p^{-1}=1\) allows for our choice of varying \(p_x\in [1,2]\):
Applying this inequality with \(p=p_x\),
for any \(\zeta _x>0\) to the last term of (71), we arrive at the estimate
We now use \(u^{i+1}\in {\mathcal {U}}(\rho _x,\rho _y)\) for some \(\rho _x,\rho _y\ge 0\), and \(\omega _i^{-1} \le {\underline{\omega }}^{-1}\) to obtain
If \(p_x=1\), we use the assumed inequality \(\theta _x \ge \rho _y{\underline{\omega }}^{-1}\) from (24e) to show that the right-hand side of (73) is non-negative for any \(\zeta _x>0\). Otherwise we take \(\zeta _x :=({\underline{\omega }}\theta _xp_x^{p_x}\rho _y^{p_x-2})^{1/(1-p_x)}\) to ensure the right-hand side of (73) is zero. In either case, \(\theta _x-\rho _y^{2-p_x}(p_x^{p_x}{\underline{\omega }}\zeta _x^{p_x-1})^{-1}\ge 0\) and hence
Analogously, from (69b) and Cauchy’s inequality,
This has a structure similar to (71) with \(\omega _i\) now as a multiplier. Hence, we apply a similar generalized Young’s inequality to the last term with any \(\zeta _y>0\). Noting that \(\omega _i\le {\overline{\omega }}\), we use the following bound similar to (73):
The last inequality holds for any \(\zeta _y>0\) if \(p_y=1\) due to the assumed \(\theta _y \ge {\overline{\omega }}\rho _x\) from (24d); otherwise, we set \(\zeta _y :=(\theta _yp_y^{p_y}\rho _x^{p_y-2}{\overline{\omega }}^{-1})^{1/(1-p_y)}\). We then obtain that
Combining (30), (74), and (75), we can thus bound
where in the final step, we have also used (70) and the selected \(\zeta _x\) and \(\zeta _y\) if \(p_x>1\) or \(p_y>1\) or both. Thus, we obtained exactly the same lower bound as in (33). We then continue along the rest of the proof of Theorem 4.2 to obtain the claim. \(\square \)
It is worth observing that when \(p_x \in (1, 2]\) or \(p_y\in (1,2]\), the inequalities (70) do not directly bound the respective \(\rho _y\) or \(\rho _x\). Hence, we do not need to initalize the corresponding variable locally, unlike when \(p_x=1\) or \(p_y=1\). On the other hand, sufficient strong convexity is required from the corresponding G and \(F^*\).
We start with the lemma ensuring that the iterates stay in the initial neighborhood of the saddle point.
Corollary B.2
The results of Lemma 5.2 continue to hold if the corresponding conditions of Theorem 4.2 are replaced with those in Corollary B.1.
Proof
The proof repeats that of Lemma 5.2, applying Corollary B.1 instead of Theorem 4.2 in Step 2. \(\square \)
We next extend the results of Section 6 to arbitrary choices of both \(p_x \in [1,2]\) and \(p_y \in [1,2]\). This mainly consists of verifying (70a) when \(p_y \ne 1\) and (70b) when \(p_x \ne 1\). Note that it is possible to take \(p_x=1\) and \(p_y \ne 1\), or vice versa, as long as the corresponding conditions are satisfied.
Corollary B.3
The results of Theorem 6.1 continue to hold if Assumption 3.2 (iv) is replaced with Assumption B.1 (iv*) for some \(p_x,p_y\in [1,2]\), where in case \(p_y \in (1, 2]\), (46a) is replaced with
and in case \(p_x \in (1, 2]\), (46b) is replaced with
Proof
Since conditions (77) are sufficient for (70) with \({\overline{\omega }}={\underline{\omega }}=1\) to hold, we can repeat the proof of Theorem 6.1 replacing the references to Theorem 4.2 by references to Corollary B.1 up until (50). If \(p_x>1\), we now obtain a lower bound on \(d_i^x\) by arguing as in (71)–(73) with \({\widehat{u}}\) replaced by \({\bar{u}}\). Specifically, using (16), Assumption B.1 (iv*) at \({\bar{u}}\), and the generalized Young’s inequality (72), we obtain for any \(\zeta _x>0\) that
Inserting \(\zeta _x=(\theta _xp_x^{p_x}(2\rho _y)^{p_x-2})^{1/(1-p_x)}\) and \(\Vert y^{i+1}-{\bar{y}}\Vert \le 2\rho _y\), we eliminate the first term on the right-hand side. Likewise, if \(p_y>1\), similar steps applied to \(d_i^y\) result in
for \(\zeta _y=(\theta _yp_y^{p_y}(2\rho _x)^{p_y-2})^{1/(p_y-1)}\). Using \(\Vert u^{i+1}-u^i\Vert \rightarrow 0\) and the selection of \(\zeta _x\) and \(\zeta _y\), we then obtain the desired estimate \(\limsup _{i\rightarrow \infty }~q_i:=\limsup _{i\rightarrow \infty }~(d_i^x + d_i^y + O(\Vert u^{i+1}-u^i\Vert ))\le 0\). \(\square \)
Corollary B.4
The results of Theorem 6.3 continue to hold if Assumption 3.2 (iv) is replaced with Assumption B.1 (iv*) for some \(p_x,p_y\in [1,2]\), where in case \(p_y \in (1, 2]\), (51a) is replaced for some \({{\tilde{\gamma }}}_G > 0\) with
and in case \(p_x \in (1, 2]\), (51b) is replaced with
Proof
Conditions (78) are sufficient for (70) with \({\overline{\omega }}={\underline{\omega }}=1\) to hold; therefore, we can repeat the proof of Theorem 6.3 replacing the references to Theorem 4.2 by references to Corollary B.1. \(\square \)
Corollary B.5
The results of Theorem 6.4 continue to hold if Assumption 3.2 (iv) is replaced with Assumption B.1 (iv*) for some \(p_x,p_y\in [1,2]\), where in case \(p_y \in (1, 2]\), (54a) is replaced for some \({{\tilde{\gamma }}}_G>0\) with
and in case \(p_x \in (1, 2]\), (54b) is replaced for some \({{\tilde{\gamma }}}_{F^*}>0\) with
Proof
Conditions (79) are sufficient for (70) with \({\overline{\omega }}={\underline{\omega }}=\omega \) to hold; therefore, we can repeat the proof of Theorem 6.4 replacing the references to Theorem 4.2 by references to Corollary B.1. \(\square \)
Corollary B.6
The results of Proposition 6.6 continue to hold if the corresponding conditions of Theorem 6.1, 6.3, or 6.4 are replaced with those in Corollary B.3, B.4, or B.5.
Proof
The proof repeats that of Proposition 6.6. \(\square \)
Verification of Conditions for Step Function Presentation and Potts Model
Throughout this section, we set \(\rho (t) :=2t-t^2\) and \(\kappa (x, y) :=\rho (\langle x,y\rangle )\) for \(x, y \in {\mathbb {R}}^m\). Then \(\rho '(t)=2(1-t)\) so that
where \(a\otimes b\in {\mathbb {R}}^{n\times n}\) is the tensor product between two vectors a and b, producing a matrix of all the combinations of products between the entries.
The following lemma verifies Assumption 3.2 for \(K=\kappa \).
Lemma C.1
Let \(R_K>2\), and suppose \({\widehat{x}},{\widehat{y}}\in {\mathbb {R}}^m\) for \(m \ge 1\) with
Then the function \(K=\kappa \) defined above satisfies Assumption 3.2 for some \(\theta _x, \theta _y >0\) and some \(\rho _x,\rho _y>0\) dependent on \(R_K\) with
as well as the constants \(\xi _x,\xi _y \in {\mathbb {R}}\), \(\lambda _x,\lambda _y \ge 0\) satisfying \(\lambda _x\xi _x > 2(\lambda _x+|{\widehat{y}}|_2^2)|{\widehat{y}}|_2^2\), \(\xi _y > 0\), and \(\lambda _y > |{\widehat{x}}|_2^2\).
Proof
First, Assumption 3.2 (i) holds everywhere since \(K\in C^\infty ({\mathbb {R}}^m)\). To verify Assumption 3.2 (ii), we observe using (80) that
Hence \(L_x\), \(L_y\), and \(L_{yx}\) are as claimed.
To verify Assumption 3.2 (iii), we first of all observe using (81) that
Therefore \({\sup _{(x, y) \in \mathbb {B}({\widehat{x}}, \rho _x) \times \mathbb {B}({\widehat{y}}, \rho _y)} |\kappa _{xy}(x, y)|_2 \le R_K}\) for some \({\rho _x, \rho _y>0}\) dependent on \(R_K>2\).
Finally, to verify Assumption 3.2 (iv), we start with (15a), i.e.,
Expanding the equation using (80), (82), and
we require that
Taking any \(\alpha >0\), this will hold by Cauchy’s and Young’s inequalities if \(\xi _x \ge (2+\alpha )|{\widehat{y}}|_2^2 + 2\theta _x|y|_2\) and \(\lambda _x/2 \ge \alpha ^{-1}|{\widehat{y}}|_2^2\). If \(|{\widehat{y}}|_2=0\), clearly these hold for some \(\alpha ,\theta _x>0\). Otherwise, solving \(\alpha \) from the latter as an equality, i.e., taking \(\alpha =2\lambda ^{-1}_x|{\widehat{y}}|_2^2\), the former holds if \(\xi _x \ge 2(1+\lambda ^{-1}_x|{\widehat{y}}|_2^2)|{\widehat{y}}|_2^2 + 2\theta _x|y|_2\). If \(\lambda _x\xi _x > 2(\lambda _x+|{\widehat{y}}|_2^2)|{\widehat{y}}|_2^2\), this holds for some \(\theta _x,\rho _x,\rho _y>0\) in a neighborhood \({\mathbb {B}({\widehat{x}}, \rho _x) \times \mathbb {B}({\widehat{y}}, \rho _y)}\) of (\({\widehat{x}}, {\widehat{y}})\).
It remains to verify (15b), i.e.,
Again, using (80) and (82) we expand this as
Rearranging the \(\theta _y\)-term, we see that this holds if
Rearranging and estimating the first term as
and then using Young’s inequality on both parts, we obtain the condition
If \(\xi _y > 0\) and \(\lambda _y > |{\widehat{x}}|_2^2\), this holds for some \(\theta _y,\rho _y,\rho _x>0\) in \({\mathbb {B}({\widehat{x}}, \rho _x) \times \mathbb {B}({\widehat{y}}, \rho _y)}\). \(\square \)
We comment on the condition (81) on the primal–dual solutions pair \({\widehat{x}}, {\widehat{y}}\in {\mathbb {R}}\). First, for \(m=1\), this condition reduces to \({\widehat{x}}{\widehat{y}}\in [0, 1]\). This is necessarily satisfied in the case of the step function (where \(f^*=\delta _{[0, \infty )}\)) and in the case of the \(\ell ^0\) function (where \(f^*=0\)) as in both cases, \({\widehat{x}}{\widehat{y}}\in \{0, 1\}\) by the dual optimality condition \(\kappa _y({\widehat{x}}, {\widehat{y}}) \in \partial f^*({\widehat{y}})\). Furthermore, if we take \(f^*_\gamma =\frac{\gamma }{2}|\,\varvec{\cdot }\,|_2^2\) for some \(\gamma \ge 0\), then for any \(m\ge 1\) the dual optimality condition reads \(2{\widehat{x}}(1-\langle {\widehat{x}},{\widehat{y}}\rangle )=\gamma {\widehat{y}}\), i.e, \({\widehat{y}}=2{\widehat{x}}(\gamma +2|{\widehat{x}}|_2^2)^{-1}\), for which (81) is easily verified.
The following lemma shows that Assumption 3.2 remains valid if we include a linear operator in the primal component.
Lemma C.2
Let \(K(x, y)={\tilde{K}}(Ax, y)\) for some \({A \in \mathbb {L}(X; Z)}\) and \({\tilde{K}} \in C^1(Z \times Y)\) on Hilbert spaces X, Y, Z. Suppose \({\tilde{K}}\) satisfies Assumption 3.2 at \(({\widehat{z}}, {\widehat{y}}) :=(A {\widehat{x}}, {\widehat{y}})\). Mark the corresponding constants with a tilde: \({\tilde{L}}_z\), \({\tilde{R}}_K\), and so on. Then K satisfies Assumption 3.2 with \(R_K :={\tilde{R}}_K \Vert A\Vert \); \(\xi _x=\Vert A\Vert {{\tilde{\xi }}}_z\), \(\xi _y={{\tilde{\xi }}}_y\); \(\lambda _x=\Vert A\Vert {{\tilde{\lambda }}}_z\), \(\lambda _y={{\tilde{\lambda }}}_y\); \(\theta _x={{\tilde{\theta }}}_z\), \(\theta _y={{\tilde{\theta }}}_y\Vert A\Vert ^{-1}\); \(\rho _x=\Vert A\Vert ^{-1}{{\tilde{\rho }}}_x\), and \(\rho _y={{\tilde{\rho }}}_y\) as well as
Proof
Observe first of all that by the chain rule,
and hence Assumption 3.2 (i) holds for K if it holds for \({\tilde{K}}\).
Let now Assumption 3.2 (ii) hold for \({\tilde{K}}\) with \({\tilde{L}}_x\), \({\tilde{L}}_y\), and \({\tilde{L}}_{yx}\). Observing that
Assumption 3.2 (ii) thus also holds with the function of (84). Similarly in Assumption 3.2 (iii), we can take \(R_K :={\tilde{R}}_K \Vert A\Vert \).
Finally, we expand Assumption 3.2 (iv) for K as
and
where \(z=Ax\), \(z'=Ax'\), and \({\widehat{z}}=A{\widehat{x}}\). Since \(\Vert z-z'\Vert \le \Vert A\Vert \Vert x-x'\Vert \), etc., this follows from Assumption 3.2 (iv) for \({\tilde{K}}\) with the constants as claimed. \(\square \)
Applying this lemma to \({\tilde{K}}(z, y)=\sum _{k=1}^n \kappa (z_k, y_k)\), we can thus lift the scalar estimates for \(K=\kappa \) as in (80) to the corresponding estimates on \(K(x, y) :=\sum _{k=1}^n \kappa ([D_h x]_k, y_k)\) as used in the Potts model example.
Rights and permissions
About this article
Cite this article
Clason, C., Mazurenko, S. & Valkonen, T. Primal–Dual Proximal Splitting and Generalized Conjugation in Non-smooth Non-convex Optimization. Appl Math Optim 84, 1239–1284 (2021). https://doi.org/10.1007/s00245-020-09676-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00245-020-09676-1
Keywords
- Nonsmooth optimization
- Primal-dual method
- Non-convex-concave saddle-points
- Generalized conjugate
- Potts model
- Nash equilibria