Primal–Dual Proximal Splitting and Generalized Conjugation in Non-smooth Non-convex Optimization

Clason, Christian; Mazurenko, Stanislav; Valkonen, Tuomo

doi:10.1007/s00245-020-09676-1

Primal–Dual Proximal Splitting and Generalized Conjugation in Non-smooth Non-convex Optimization

Published: 13 April 2020

Volume 84, pages 1239–1284, (2021)
Cite this article

Applied Mathematics & Optimization Submit manuscript

747 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

We demonstrate that difficult non-convex non-smooth optimization problems, such as Nash equilibrium problems and anisotropic as well as isotropic Potts segmentation models, can be written in terms of generalized conjugates of convex functionals. These, in turn, can be formulated as saddle-point problems involving convex non-smooth functionals and a general smooth but non-bilinear coupling term. We then show through detailed convergence analysis that a conceptually straightforward extension of the primal–dual proximal splitting method of Chambolle and Pock is applicable to the solution of such problems. Under sufficient local strong convexity assumptions on the functionals—but still with a non-bilinear coupling term—we even demonstrate local linear convergence of the method. We illustrate these theoretical results numerically on the aforementioned example problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Primal-Dual Proximal Algorithms for Structured Convex Optimization: A Unifying Framework

First-Order Primal–Dual Methods for Nonsmooth Non-convex Optimisation

First-Order Primal–Dual Methods for Nonsmooth Non-convex Optimization

References

Aragón Artacho, F.J., Geoffroy, M.H.: Characterization of metric regularity of subdifferentials. J. Convex Anal. 15(2), 365–380 (2008)
MathSciNet MATH Google Scholar
Aragón Artacho, F.J., Geoffroy, M.H.: Metric subregularity of the convex subdifferential in Banach spaces. J. Nonlinear Convex Anal. 15(1), 35–47 (2014)
MathSciNet MATH Google Scholar
Attouch, H., Bolte, J., Svaiter, B.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Progr. 137(1–2), 91–129 (2013). https://doi.org/10.1007/s10107-011-0484-9
Article MathSciNet MATH Google Scholar
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books in Mathematics, 2nd edn. Springer, New York (2017)
Book Google Scholar
Benning, M., Knoll, F., Schönlieb, C.B., Valkonen, T.: Preconditioned ADMM with nonlinear operator constraint. In: L. Bociu, J.A. Désidéri, A. Habbal (eds.) System Modeling and Optimization: 27th IFIP TC 7 Conference, CSMO 2015, Sophia Antipolis, France, June 29–July 3, 2015, Revised Selected Papers, pp. 117–126. Springer International Publishing (2016). https://tuomov.iki.fi/m/nonlinearADMM.pdf
Borzì, A., Kanzow, C.: Formulation and numerical solution of Nash equilibrium multiobjective elliptic control problems. SIAM J. Control Optim. 51(1), 718–744 (2013). https://doi.org/10.1137/120864921
Article MathSciNet MATH Google Scholar
Chambolle, A.: An algorithm for total variation minimization and applications. J. Math. Imaging Vis. 20(1), 89–97 (2004). https://doi.org/10.1023/B:JMIV.0000011325.36760.1e
Article MathSciNet MATH Google Scholar
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40, 120–145 (2011). https://doi.org/10.1007/s10851-010-0251-1
Article MathSciNet MATH Google Scholar
Clason, C., Kunisch, K.: A convex analysis approach to multi-material topology optimization. ESAIM Math. Modell. Numer. Anal. 50(6), 1917–1936 (2016). https://doi.org/10.1051/m2an/2016012
Article MathSciNet MATH Google Scholar
Clason, C., Valkonen, T.: Primal-dual extragradient methods for nonlinear nonsmooth PDE-constrained optimization. SIAM J. Optim. 27(3), 1313–1339 (2017). https://doi.org/10.1137/16M1080859
Article MathSciNet MATH Google Scholar
Clason, C., Mazurenko, S., Valkonen, T.: Acceleration and global convergence of a first-order primal-dual method for nonconvex problems. SIAM J. Optim. 29, 933–963 (2019). https://doi.org/10.1137/18M1170194
Article MathSciNet MATH Google Scholar
Clason, C., Mazurenko, S., Valkonen, T.: Julia codes for “primal-dual proximal splitting and generalized conjugation in non-smooth non-convex optimization”. Online resource on Zenodo (2020). https://doi.org/10.5281/zenodo.3647614
Article MATH Google Scholar
Drori, Y., Sabach, S., Teboulle, M.: A simple algorithm for a class of nonsmooth convex-concave saddle-point problems. Oper. Res. Lett. 43(2), 209–214 (2015). https://doi.org/10.1016/j.orl.2015.02.001
Article MathSciNet MATH Google Scholar
Ekeland, I., Temam, R.: Convex Analysis and Variational Problems. SIAM, Philadelphia (1999)
Book Google Scholar
Elster, K.H., Wolf, A.: Recent Results on Generalized Conjugate Functions, pp. 67–78. Springer, New York (1988)
MATH Google Scholar
Facchinei, F., Kanzow, C.: Generalized Nash equilibrium problems. Ann. Oper. Res. 175, 177–211 (2010). https://doi.org/10.1007/s10479-009-0653-x
Article MathSciNet MATH Google Scholar
Flåm, S.D., Antipin, A.S.: Equilibrium programming using proximal-like algorithms. Math. Progr. 78(1, Ser. A), 29–41 (1997). https://doi.org/10.1007/BF02614504
Article MathSciNet MATH Google Scholar
Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6, 721–741 (1984). https://doi.org/10.1109/TPAMI.1984.4767596
Article MATH Google Scholar
Hamedani, E.Y., Aybat, N.S.: A primal-dual algorithm for general convex-concave saddle point problems (2018)
He, N., Juditsky, A., Nemirovski, A.: Mirror prox algorithm for multi-term composite minimization and semi-separable problems. Comput. Optim. Appl. 61(2), 275–319 (2015). https://doi.org/10.1007/s10589-014-9723-3
Article MathSciNet MATH Google Scholar
He, Y., Monteiro, R.D.: An accelerated HPE-type algorithm for a class of composite convex-concave saddle-point problems. SIAM J. Optim. 26(1), 29–56 (2016). https://doi.org/10.1137/14096757X
Article MathSciNet MATH Google Scholar
Juditsky, A., Nemirovski, A.: First Order Methods for Nonsmooth Convex Large-Scale Optimization, pp. 121–148. I General Purpose Methods. MIT Press, Cambridge (2011)
Google Scholar
Juditsky, A., Nemirovski, A.: First Order Methods for Nonsmooth Convex Large-Scale Optimization II Utilizing Problems Structure, pp. 149–183. MIT Press, Cambridge (2011)
Google Scholar
Kolossoski, O., Monteiro, R.: An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems. Optim. Methods Softw. 32(6), 1244–1272 (2017). https://doi.org/10.1080/10556788.2016.1266355
Article MathSciNet MATH Google Scholar
Krawczyk, J.B., Uryasev, S.: Relaxation algorithms to find Nash equilibria with economic applications. Environ. Model. Assess. 5(1), 63–73 (2000). https://doi.org/10.1023/A:1019097208499
Article Google Scholar
Martinez-Legaz, J.E.: Generalized convex duality and its economic applications. In: Hadjisavvas, N., Komlósi, S., Schaible, S. (eds.) Handbook of Generalized Convexity and Generalized Monotonicity, pp. 237–292. Springer, New York (2005)
Chapter Google Scholar
Nemirovski, A.: Prox-method with rate of convergence $O(1/t)$ for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15(1), 229–251 (2004). https://doi.org/10.1137/S1052623403425629
Article MathSciNet MATH Google Scholar
Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Progr. 103(1), 127–152 (2005). https://doi.org/10.1007/s10107-004-0552-5
Article MathSciNet MATH Google Scholar
Nikaidô, H., Isoda, K.: Note on non-cooperative convex games. Pac. J. Math. 5, 807–815 (1955). https://doi.org/10.2140/pjm.1955.5.807
Article MathSciNet MATH Google Scholar
Rasband, W.S.: ImageJ. https://imagej.nih.gov/ij/
Rosen, J.B.: Existence and uniqueness of equilibrium points for concave $n$-person games. Econometrica 33, 520–534 (1965). https://doi.org/10.2307/1911749
Article MathSciNet MATH Google Scholar
Singer, I.: Duality for Nonconvex Approximation and Optimization. Springer, New York (2006). https://doi.org/10.1007/0-387-28395-1
Book MATH Google Scholar
Storath, M., Weinmann, A., Demaret, L.: Jump-sparse and sparse recovery using Potts functionals. IEEE Trans. Signal Process. 62(14), 3654–3666 (2014). https://doi.org/10.1109/TSP.2014.2329263
Article MathSciNet MATH Google Scholar
Storath, M., Weinmann, A., Frikel, J., Unser, M.: Joint image reconstruction and segmentation using the potts model. Invers. Probl. 31(2), 025003 (2015). https://doi.org/10.1088/0266-5611/31/2/025003
Article MathSciNet MATH Google Scholar
Valkonen, T.: A primal-dual hybrid gradient method for non-linear operators with applications to MRI. Invers. Probl. 30(5), 055012 (2014). https://doi.org/10.1088/0266-5611/30/5/055012
Article MATH Google Scholar
Valkonen, T.: Testing and non-linear preconditioning of the proximal point method. Appl. Math. Optim. (2018). https://doi.org/10.1007/s00245-018-9541-6
Article MATH Google Scholar
Valkonen, T., Pock, T.: Acceleration of the PDHGM on partially strongly convex functions. J. Math. Imaging Vis. 59, 394–414 (2017). https://doi.org/10.1007/s10851-016-0692-2
Article MathSciNet MATH Google Scholar
von Heusinger, A., Kanzow, C.: Optimization reformulations of the generalized Nash equilibrium problem using Nikaido-Isoda-type functions. Comput. Optim. Appl. 43(3), 353–377 (2009). https://doi.org/10.1007/s10589-007-9145-6
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

In the first stages of the research T. Valkonen and S. Mazurenko were supported by the EPSRC First Grant EP/P021298/1, “PARTIAL Analysis of Relations in Tasks of Inversion for Algorithmic Leverage”. Later T. Valkonen was supported by the Academy of Finland grants 314701 and 320022. C. Clason was supported by the German Science Foundation (DFG) under grant Cl 487/2-1. We thank the anonymous reviewers for insightful comments.

Author information

Authors and Affiliations

Faculty of Mathematics, University Duisburg-Essen, 45117, Essen, Germany
Christian Clason
Loschmidt Laboratories, Masaryk University, Brno, Czechia
Stanislav Mazurenko
Department of Mathematical Sciences, University of Liverpool, Liverpool, UK
Stanislav Mazurenko & Tuomo Valkonen
ModeMat, Escuela Politécnica Nacional, Quito, Ecuador
Tuomo Valkonen
Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
Tuomo Valkonen

Authors

Christian Clason
View author publications
You can also search for this author in PubMed Google Scholar
Stanislav Mazurenko
View author publications
You can also search for this author in PubMed Google Scholar
Tuomo Valkonen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christian Clason.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

A data Statement for the EPSRC

The source codes for the numerical experiments are on Zenodo at [12].

Reductions of the Three-Point Condition

The following two propositions demonstrate that Assumption 3.2 (iv) is closely related to standard second-order optimality conditions, i.e., that the Hessian is positive definite at the solution ${\widehat{u}}$.

Proposition A.1

Suppose Assumption 3.2 (ii) (locally Lipschitz gradients of K) holds in some neighborhood ${\mathcal {U}}$ of ${\widehat{u}}$, and for some $\xi _x\in {\mathbb {R}}$, $\gamma _x>0$,

$$\begin{aligned} \xi _x\Vert x-{\widehat{x}}\Vert ^2 +\langle K_x(x,{\widehat{y}})-K_x({\widehat{x}},{\widehat{y}}),x-{\widehat{x}}\rangle \ge \gamma _x\Vert x-{\widehat{x}}\Vert ^2 \quad ((x,y)\in {\mathcal {U}}). \end{aligned}$$

(66)

Then (15a) holds in ${\mathcal {U}}$ with $\theta _x=2(\gamma _x-\alpha )L_{yx}^{-1}$, and $\lambda _x=L_x({\widehat{y}})^2(2\alpha )^{-1}$ for any $\alpha \in (0,\gamma _x]$.

Proof

An application of Cauchy’s and Young’s inequalities with any factor $\alpha >0$, Assumption 3.2 (ii), and (66) yields the estimate

$$\begin{aligned} \begin{aligned} \langle K_x(x',{\widehat{y}})-K_x({\widehat{x}},{\widehat{y}}),x-{\widehat{x}}\rangle +\xi _x\Vert x-{\widehat{x}}\Vert ^2&= \langle K_x(x,{\widehat{y}})-K_x({\widehat{x}},{\widehat{y}}),x-{\widehat{x}}\rangle +\xi _x\Vert x-{\widehat{x}}\Vert ^2 \\&\quad +\langle K_x(x',{\widehat{y}})-K_x(x,{\widehat{y}}),x-{\widehat{x}}\rangle \\&\ge (\gamma _x-\alpha )\Vert x-{\widehat{x}}\Vert ^2-L_x({\widehat{y}})^2(4\alpha )^{-1} \Vert x'-x\Vert ^2. \end{aligned} \end{aligned}$$

At the same time, using (16),

$$\begin{aligned} \Vert K_y({\widehat{x}},y)-K_y(x,y)-K_{yx}(x,y)({\widehat{x}}-x)\Vert \le \frac{L_{yx}}{2}\Vert x-{\widehat{x}}\Vert ^2. \end{aligned}$$

Therefore (15a) holds if we take $\theta _x\le 2(\gamma _x-\alpha )L_{yx}^{-1}$ and $\lambda _x=L_x({\widehat{y}})^2(2\alpha )^{-1}$. $\square $

Proposition A.2

Suppose Assumption 3.2 (ii) (locally Lipschitz gradients of K) holds in some neighborhood ${\mathcal {U}}$ of ${\widehat{u}}$ with $L_y(x)\le {\bar{L}}_y$, and that

$$\begin{aligned} \Vert K_{xy}(x,y')-K_{xy}(x,y)\Vert \le L_{xy}\Vert y'-y\Vert \quad (u,u'\in {\mathcal {U}}) \end{aligned}$$

for some constant $L_{xy} \ge 0$. Assume, moreover, for some $\xi _y\in {\mathbb {R}}$, $\gamma _y>0$ that

$$\begin{aligned} \xi _y\Vert y-{\widehat{y}}\Vert ^2 +\langle K_y({\widehat{x}},{\widehat{y}})-K_y({\widehat{x}},y),y-{\widehat{y}}\rangle \ge \gamma _y\Vert y-{\widehat{y}}\Vert ^2 \quad ((x,y)\in {\mathcal {U}}). \end{aligned}$$

(67)

Then (15b) holds in ${\mathcal {U}}$ with $\theta _y=2(\gamma _y-\alpha _1)(1+\alpha _2)^{-1} L_{xy}^{-1}$, and $\lambda _y=({\bar{L}}_y^2(2\alpha _1)^{-1}+(1+\alpha _2^{-1})L_{xy}\theta _y)$ for any $\alpha _1\in (0,\gamma _y]$, $\alpha _2>0$.

Proof

An application of Cauchy’s and Young’s inequalities with any factor $\alpha >0$, Assumption 3.2 (ii), and (67) yields the estimate

$$\begin{aligned} \begin{aligned} \langle K_y(x,y)-K_y(x,y')&+K_y({\widehat{x}},{\widehat{y}})-K_y({\widehat{x}},y),y-{\widehat{y}}\rangle +\xi _y\Vert y-{\widehat{y}}\Vert ^2 \\&\ge \langle K_y(x,y)-K_y(x,y'),y-{\widehat{y}}\rangle +\gamma _y\Vert y-{\widehat{y}}\Vert ^2 \\&\ge (\gamma _y-\alpha _1)\Vert y-{\widehat{y}}\Vert ^2-\frac{L_y(x)^2}{4\alpha _1} \Vert y'-y\Vert ^2. \end{aligned}\end{aligned}$$

At the same time, using (16) and Young’s inequality for any $\alpha _2>0$,

$$\begin{aligned}&\Vert K_x(x',{\widehat{y}})-K_x(x',y')-K_{xy}(x',y')({\widehat{y}}-y')\Vert \le \frac{L_{xy}}{2}\Vert y'-{\widehat{y}}\Vert ^2 \\&\quad \le \frac{L_{xy}}{2}(1+\alpha _2)\Vert y-{\widehat{y}}\Vert ^2 +\frac{L_{xy}}{2}(1+\alpha _2^{-1})\Vert y'-y\Vert ^2. \end{aligned}$$

Therefore (15b) holds if we take $\theta _y\le 2\frac{\gamma _y-\alpha _1}{(1+\alpha _2)L_{xy}}$ and $\lambda _y=\frac{{\bar{L}}_y^2}{2\alpha _1}+(1+\alpha _2^{-1})L_{xy}\theta _y$. $\square $

Relaxations of the Three-Point Condition

In all the results of this paper, Assumption 3.2(iv) can be generalized to the following three-point condition similar to the one used in [11].

Assumption B.1

The functional $K(x,y)\in C^1(X\times Y)$ and there exists a neighborhood

$$\begin{aligned} {{\mathcal {U}}(\rho _x,\rho _y) :=(\mathbb {B}({\widehat{x}}, \rho _x) \cap {\mathcal {X}}_G) \times (\mathbb {B}({\widehat{y}}, \rho _y) \cap {\mathcal {Y}}_{F^*}),} \end{aligned}$$

(68)

for some $\rho _x,\rho _y>0$ such that for all $u',u \in {\mathcal {U}}(\rho _x,\rho _y)$, the following property holds:

(iv*)
(three-point condition) There exist $\theta _x,\theta _y > 0$, $\lambda _x,\lambda _y\ge 0$, $\xi _x,\xi _y\in {\mathbb {R}}$, and $p_x,p_y\in [1,2]$ such that
$$\begin{aligned}&\langle K_x(x',{\widehat{y}})-K_x({\widehat{x}},{\widehat{y}}),x-{\widehat{x}}\rangle +\xi _x\Vert x-{\widehat{x}}\Vert ^2\nonumber \\&\quad \ge \theta _x\Vert K_y({\widehat{x}},y)-K_y(x,y)-K_{yx}(x,y)({\widehat{x}}-x)\Vert ^{p_x} -\frac{\lambda _x}{2}\Vert x-x'\Vert ^2, \end{aligned}$$
(69a)
$$\begin{aligned}&\langle K_y(x,y)-K_y(x,y')+K_y({\widehat{x}},{\widehat{y}})-K_y({\widehat{x}},y),y-{\widehat{y}}\rangle +\xi _y\Vert y-{\widehat{y}}\Vert ^2 \nonumber \\&\quad \ge \theta _y\Vert K_x(x',{\widehat{y}})-K_x(x',y')-K_{xy}(x',y')({\widehat{y}}-y')\Vert ^{p_y} -\frac{\lambda _y}{2}\Vert y-y'\Vert ^2. \end{aligned}$$
(69b)

This assumption introduces $p_x$ and $p_y$ in [1, 2], while in Assumption 3.2 (iv) we had $p_x=p_y=1$. For instance, in [11, Appendix B] we verified Assumption B.1 with $p_x=2$ for the case $K(x,y)=\langle A(x),y\rangle $ for the reconstruction of the phase and amplitude of a complex number. This relaxation mainly affects the proof of Step 4 in Theorem 4.2, which now requires a few intermediate derivations.

Corollary B.1

The results of Theorem 4.2 continue to hold if Assumption 3.2 (iv) is replaced with Assumption B.1 (iv*) for some $p_x,p_y\in [1,2]$, where in case $p_y \in (1, 2]$, (24d) is replaced by

$$\begin{aligned} \gamma _G&\ge {{\tilde{\gamma }}}_G + \xi _x + \frac{p_y-1}{(\theta _yp_y^{p_y}\rho _x^{p_y-2}{\overline{\omega }}^{-1})^{\frac{1}{p_y-1}}}, \end{aligned}$$

(70a)

and in case $p_x \in (1, 2]$, (24e) is replaced by

$$\begin{aligned} \gamma _{F^*}&\ge {{\tilde{\gamma }}}_{F^*}+ \xi _y + \frac{p_x-1}{({\underline{\omega }}\theta _xp_x^{p_x} \rho _y^{p_x-2})^{\frac{1}{p_x-1}}}. \end{aligned}$$

(70b)

Proof

The beginning of the proof follows the exact same steps as in the proof of Theorem 4.2 up until (30). We now use Assumption B.1 (iv*) to further bound $D_x$ and $D_y$ similarly to (31) and (32). From (69a),

$$\begin{aligned} \begin{aligned} D_x&\ge \theta _x\Vert K_y({\widehat{x}},y^{i+1})-K_y(x^{i+1},y^{i+1})-K_{yx}(x^{i+1},y^{i+1})({\widehat{x}}-x^{i+1})\Vert ^{p_x} -\frac{\lambda _x}{2}\Vert x^{i+1}-x^i\Vert ^2 \\&-\Vert y^{i+1}-{\widehat{y}}\Vert \Vert K_y({\widehat{x}},y^{i+1})-K_y(x^{i+1},y^{i+1})-K_{yx}(x^{i+1},y^{i+1})({\widehat{x}}-x^{i+1})\Vert \omega _i^{-1}. \end{aligned} \end{aligned}$$

(71)

The following generalized Young’s inequality for any positive a, b, p and q such that $q^{-1}+p^{-1}=1$ allows for our choice of varying $p_x\in [1,2]$:

$$\begin{aligned} ab=\left( ab^{\frac{2-p}{p}}\right) b^{2\frac{p-1}{p}} \le \frac{1}{p}\left( ab^{\frac{2-p}{p}}\right) ^p+\frac{1}{q}b^{2\frac{p-1}{p}q} =\frac{1}{p}a^pb^{2-p}+\biggl (1-\frac{1}{p}\biggr )b^2. \end{aligned}$$

(72)

Applying this inequality with $p=p_x$,

$$\begin{aligned} a&:=(\zeta _x p_x)^{-1/2} \Vert K_y({\widehat{x}},y^{i+1})-K_y(x^{i+1},y^{i+1})-K_{yx}(x^{i+1},y^{i+1})({\widehat{x}}-x^{i+1})\Vert , \\ b&:=(\zeta _x p_x)^{1/2}\Vert y^{i+1}-{\widehat{y}}\Vert , \end{aligned}$$

for any $\zeta _x>0$ to the last term of (71), we arrive at the estimate

$$\begin{aligned} \begin{aligned} D_x&\ge \theta _x\Vert K_y({\widehat{x}},y^{i+1})-K_y(x^{i+1},y^{i+1})-K_{yx}(x^{i+1},y^{i+1})({\widehat{x}}-x^{i+1})\Vert ^{p_x} -\frac{\lambda _x}{2}\Vert x^{i+1}-x^i\Vert ^2 \\&\quad -\frac{\Vert y^{i+1}-{\widehat{y}}\Vert ^{2-p_x}}{p_x^{p_x}\omega _i\zeta _x^{p_x-1}} \Vert K_y({\widehat{x}},y^{i+1})-K_y(x^{i+1},y^{i+1})-K_{yx}(x^{i+1},y^{i+1})({\widehat{x}}-x^{i+1})\Vert ^{p_x} \\&\quad -\frac{p_x-1}{\omega _i}\zeta _x \Vert y^{i+1}-{\widehat{y}}\Vert ^2. \end{aligned} \end{aligned}$$

We now use $u^{i+1}\in {\mathcal {U}}(\rho _x,\rho _y)$ for some $\rho _x,\rho _y\ge 0$, and $\omega _i^{-1} \le {\underline{\omega }}^{-1}$ to obtain

$$\begin{aligned} \theta _x-\Vert y^{i+1}-{\widehat{y}}\Vert ^{2-p_x}(p_x^{p_x}\omega _i\zeta _x^{p_x-1})^{-1} \ge \theta _x-\rho _y^{2-p_x}(p_x^{p_x}{\underline{\omega }}\zeta _x^{p_x-1})^{-1}. \end{aligned}$$

(73)

If $p_x=1$, we use the assumed inequality $\theta _x \ge \rho _y{\underline{\omega }}^{-1}$ from (24e) to show that the right-hand side of (73) is non-negative for any $\zeta _x>0$. Otherwise we take $\zeta _x :=({\underline{\omega }}\theta _xp_x^{p_x}\rho _y^{p_x-2})^{1/(1-p_x)}$ to ensure the right-hand side of (73) is zero. In either case, $\theta _x-\rho _y^{2-p_x}(p_x^{p_x}{\underline{\omega }}\zeta _x^{p_x-1})^{-1}\ge 0$ and hence

$$\begin{aligned} D_x \ge -\frac{\lambda _x}{2}\Vert x^{i+1}-x^i\Vert ^2 -(p_x-1)\omega ^{-1}_i\zeta _x \Vert y^{i+1}-{\widehat{y}}\Vert ^2. \end{aligned}$$

(74)

Analogously, from (69b) and Cauchy’s inequality,

$$\begin{aligned} \begin{aligned} D_y&\ge \theta _y\Vert K_x(x^i,{\widehat{y}})-K_x(x^i,y^i)-K_{xy}(x^i,y^i)({\widehat{y}}-y^i)\Vert ^{p_y} -\frac{\lambda _y}{2}\Vert y^{i+1}-y^i\Vert ^2 \\&\quad -\omega _i\Vert x^{i+1}-{\widehat{x}}\Vert \Vert K_x(x^i,{\widehat{y}})-K_x(x^i,y^i)-K_{xy}(x^i,y^i)({\widehat{y}}-y^i)\Vert . \end{aligned} \end{aligned}$$

This has a structure similar to (71) with $\omega _i$ now as a multiplier. Hence, we apply a similar generalized Young’s inequality to the last term with any $\zeta _y>0$. Noting that $\omega _i\le {\overline{\omega }}$, we use the following bound similar to (73):

$$\begin{aligned} \theta _y-\Vert x^{i+1}-{\widehat{x}}\Vert ^{2-p_y}\omega _i(p_y^{p_y}\zeta _y^{p_y-1})^{-1} \ge \theta _y-\rho _x^{2-p_y}{\overline{\omega }}(p_y^{p_y}\zeta _y^{p_y-1})^{-1} \ge 0. \end{aligned}$$

The last inequality holds for any $\zeta _y>0$ if $p_y=1$ due to the assumed $\theta _y \ge {\overline{\omega }}\rho _x$ from (24d); otherwise, we set $\zeta _y :=(\theta _yp_y^{p_y}\rho _x^{p_y-2}{\overline{\omega }}^{-1})^{1/(1-p_y)}$. We then obtain that

$$\begin{aligned} D_y \ge -\frac{\lambda _y}{2}\Vert y^{i+1}-y^i\Vert ^2 -(p_y-1)\omega _i\zeta _y \Vert x^{i+1}-{\widehat{x}}\Vert ^2. \end{aligned}$$

(75)

Combining (30), (74), and (75), we can thus bound

$$\begin{aligned} \begin{aligned} D&= \eta _i D_x+\eta _{i+1}D_y+\eta _{i+1}D_\omega +\eta _i(\gamma _G-{{\tilde{\gamma }}}_G-\xi _x)\Vert x^{i+1}-{\widehat{x}}\Vert ^2 \\&\quad +\eta _{i+1}(\gamma _{F^*}-{{\tilde{\gamma }}}_{F^*}-\xi _y)\Vert y^{i+1}-{\widehat{y}}\Vert ^2 \\&\ge \eta _{i+1}(\gamma _{F^*}-{{\tilde{\gamma }}}_{F^*}-\xi _y-(p_x-1)\zeta _x) \Vert y^{i+1}-{\widehat{y}}\Vert ^2 -\eta _i\frac{\lambda _x}{2}\Vert x^{i+1}-x^i\Vert ^2 \\&\quad +\eta _{i}(\gamma _G-{{\tilde{\gamma }}}_G-\xi _x-(p_y-1)\zeta _y)\Vert x^{i+1}-{\widehat{x}}\Vert ^2 -\eta _{i+1}\frac{\lambda _y}{2}\Vert y^{i+1}-y^i\Vert ^2 \\&\quad -\eta _i \frac{L_{yx}}{2}(\omega _i+2)\rho _y\Vert x^{i+1}-x^i\Vert ^2 \\&\ge -\eta _i\frac{\lambda _x+L_{yx}(\omega _i+2)\rho _y}{2}\Vert x^{i+1}-x^i\Vert ^2 -\eta _{i+1}\frac{\lambda _y}{2}\Vert y^{i+1}-y^i\Vert ^2, \end{aligned} \end{aligned}$$

(76)

where in the final step, we have also used (70) and the selected $\zeta _x$ and $\zeta _y$ if $p_x>1$ or $p_y>1$ or both. Thus, we obtained exactly the same lower bound as in (33). We then continue along the rest of the proof of Theorem 4.2 to obtain the claim. $\square $

It is worth observing that when $p_x \in (1, 2]$ or $p_y\in (1,2]$, the inequalities (70) do not directly bound the respective $\rho _y$ or $\rho _x$. Hence, we do not need to initalize the corresponding variable locally, unlike when $p_x=1$ or $p_y=1$. On the other hand, sufficient strong convexity is required from the corresponding G and $F^*$.

We start with the lemma ensuring that the iterates stay in the initial neighborhood of the saddle point.

Corollary B.2

The results of Lemma 5.2 continue to hold if the corresponding conditions of Theorem 4.2 are replaced with those in Corollary B.1.

Proof

The proof repeats that of Lemma 5.2, applying Corollary B.1 instead of Theorem 4.2 in Step 2. $\square $

We next extend the results of Section 6 to arbitrary choices of both $p_x \in [1,2]$ and $p_y \in [1,2]$. This mainly consists of verifying (70a) when $p_y \ne 1$ and (70b) when $p_x \ne 1$. Note that it is possible to take $p_x=1$ and $p_y \ne 1$, or vice versa, as long as the corresponding conditions are satisfied.

Corollary B.3

The results of Theorem 6.1 continue to hold if Assumption 3.2 (iv) is replaced with Assumption B.1 (iv*) for some $p_x,p_y\in [1,2]$, where in case $p_y \in (1, 2]$, (46a) is replaced with

$$\begin{aligned} \xi _x&= \gamma _G - \frac{p_y-1}{(\theta _yp_y^{p_y}(2\rho _x)^{p_y-2})^{\frac{1}{p_y-1}}}, \end{aligned}$$

(77a)

and in case $p_x \in (1, 2]$, (46b) is replaced with

$$\begin{aligned} \xi _y&= \gamma _{F^*} - \frac{p_x-1}{(\theta _xp_x^{p_x}(2\rho _y)^{p_x-2})^{\frac{1}{p_x-1}}}. \end{aligned}$$

(77b)

Proof

Since conditions (77) are sufficient for (70) with ${\overline{\omega }}={\underline{\omega }}=1$ to hold, we can repeat the proof of Theorem 6.1 replacing the references to Theorem 4.2 by references to Corollary B.1 up until (50). If $p_x>1$, we now obtain a lower bound on $d_i^x$ by arguing as in (71)–(73) with ${\widehat{u}}$ replaced by ${\bar{u}}$. Specifically, using (16), Assumption B.1 (iv*) at ${\bar{u}}$, and the generalized Young’s inequality (72), we obtain for any $\zeta _x>0$ that

$$\begin{aligned} \begin{aligned} d_i^x&\le -\theta _x\Vert K_y({\bar{x}},y^{i+1})-K_y(x^{i+1},y^{i+1})-K_{yx}(x^{i+1},y^{i+1})({\bar{x}}-x^{i+1})\Vert ^{p_x} \\&\quad +\Vert y^{i+1}-{\bar{y}}\Vert \Vert K_y({\bar{x}},y^{i+1})-K_y(x^{i+1},y^{i+1})-K_{yx}(x^{i+1},y^{i+1})({\bar{x}}-x^{i+1})\Vert \\&\quad +\frac{\lambda _x}{2}\Vert x^{i+1}-x^i\Vert ^2 -\frac{p_y-1}{(\theta _yp_y^{p_y}(2\rho _x)^{p_y-2})^{\frac{1}{p_y-1}}}\Vert x^{i+1}-{\bar{x}}\Vert ^2 \\&\le \left( \frac{\Vert y^{i+1}-{\bar{y}}\Vert ^{2-p_x}}{p_x^{p_x}\zeta _x^{p_x-1}}-\theta _x\right) \Vert K_y({\bar{x}},y^{i+1})-K_y(x^{i+1},y^{i+1})-K_{yx}(x^{i+1},y^{i+1})({\bar{x}}-x^{i+1})\Vert ^{p_x} \\&\quad +(p_x-1)\zeta _x \Vert y^{i+1}-{\bar{y}}\Vert ^2 +\frac{\lambda _x}{2}\Vert x^{i+1}-x^i\Vert ^2 -\frac{p_y-1}{(\theta _yp_y^{p_y}(2\rho _x)^{p_y-2})^{\frac{1}{p_y-1}}}\Vert x^{i+1}-{\bar{x}}\Vert ^2. \end{aligned} \end{aligned}$$

Inserting $\zeta _x=(\theta _xp_x^{p_x}(2\rho _y)^{p_x-2})^{1/(1-p_x)}$ and $\Vert y^{i+1}-{\bar{y}}\Vert \le 2\rho _y$, we eliminate the first term on the right-hand side. Likewise, if $p_y>1$, similar steps applied to $d_i^y$ result in

$$\begin{aligned} d_i^y\le (p_y-1)\zeta _y \Vert x^{i+1}-{\bar{x}}\Vert ^2 +\frac{\lambda _y}{2}\Vert y^{i+1}-y^i\Vert ^2 -\frac{p_x-1}{(\theta _xp_x^{p_x}(2\rho _y)^{p_x-2})^{\frac{1}{p_x-1}}}\Vert y^{i+1}-{\bar{y}}\Vert ^2 \end{aligned}$$

for $\zeta _y=(\theta _yp_y^{p_y}(2\rho _x)^{p_y-2})^{1/(p_y-1)}$. Using $\Vert u^{i+1}-u^i\Vert \rightarrow 0$ and the selection of $\zeta _x$ and $\zeta _y$, we then obtain the desired estimate $\limsup _{i\rightarrow \infty }~q_i:=\limsup _{i\rightarrow \infty }~(d_i^x + d_i^y + O(\Vert u^{i+1}-u^i\Vert ))\le 0$. $\square $

Corollary B.4

The results of Theorem 6.3 continue to hold if Assumption 3.2 (iv) is replaced with Assumption B.1 (iv*) for some $p_x,p_y\in [1,2]$, where in case $p_y \in (1, 2]$, (51a) is replaced for some ${{\tilde{\gamma }}}_G > 0$ with

$$\begin{aligned} \xi _x&= \gamma _G - {{\tilde{\gamma }}}_G - \frac{p_y-1}{(\theta _yp_y^{p_y}(\rho _x)^{p_y-2})^{\frac{1}{p_y-1}}}, \end{aligned}$$

(78a)

and in case $p_x \in (1, 2]$, (51b) is replaced with

$$\begin{aligned} \xi _y&= \gamma _{F^*} - \frac{p_x-1}{(\theta _xp_x^{p_x}(\rho _y)^{p_x-2})^{\frac{1}{p_x-1}}}. \end{aligned}$$

(78b)

Proof

Conditions (78) are sufficient for (70) with ${\overline{\omega }}={\underline{\omega }}=1$ to hold; therefore, we can repeat the proof of Theorem 6.3 replacing the references to Theorem 4.2 by references to Corollary B.1. $\square $

Corollary B.5

The results of Theorem 6.4 continue to hold if Assumption 3.2 (iv) is replaced with Assumption B.1 (iv*) for some $p_x,p_y\in [1,2]$, where in case $p_y \in (1, 2]$, (54a) is replaced for some ${{\tilde{\gamma }}}_G>0$ with

$$\begin{aligned} \xi _x&= \gamma _G - {{\tilde{\gamma }}}_G - \frac{p_y-1}{(\theta _yp_y^{p_y}(\rho _x)^{p_y-2}\omega ^{-1})^{\frac{1}{p_y-1}}}, \end{aligned}$$

(79a)

and in case $p_x \in (1, 2]$, (54b) is replaced for some ${{\tilde{\gamma }}}_{F^*}>0$ with

$$\begin{aligned} \xi _y&= \gamma _{F^*} - {{\tilde{\gamma }}}_{F^*} - \frac{p_x-1}{(\omega \theta _xp_x^{p_x}(\rho _y)^{p_x-2})^{\frac{1}{p_x-1}}}. \end{aligned}$$

(79b)

Proof

Conditions (79) are sufficient for (70) with ${\overline{\omega }}={\underline{\omega }}=\omega $ to hold; therefore, we can repeat the proof of Theorem 6.4 replacing the references to Theorem 4.2 by references to Corollary B.1. $\square $

Corollary B.6

The results of Proposition 6.6 continue to hold if the corresponding conditions of Theorem 6.1, 6.3, or 6.4 are replaced with those in Corollary B.3, B.4, or B.5.

Proof

The proof repeats that of Proposition 6.6. $\square $

Verification of Conditions for Step Function Presentation and Potts Model

Throughout this section, we set $\rho (t) :=2t-t^2$ and $\kappa (x, y) :=\rho (\langle x,y\rangle )$ for $x, y \in {\mathbb {R}}^m$. Then $\rho '(t)=2(1-t)$ so that

$$\begin{aligned} \kappa _x(x,y)= & {} 2y(1-\langle y,x\rangle )\quad \text {and} \, \kappa _{xy}(x,y)=2(I-\langle y,x\rangle I-y \otimes x), \end{aligned}$$

(80a)

$$\begin{aligned} \kappa _y(x,y)= & {} 2x(1-\langle x,y\rangle )\quad \text {and} \, \kappa _{yx}(x,y)=2(I-\langle x,y\rangle I-x \otimes y), \end{aligned}$$

(80b)

where $a\otimes b\in {\mathbb {R}}^{n\times n}$ is the tensor product between two vectors a and b, producing a matrix of all the combinations of products between the entries.

The following lemma verifies Assumption 3.2 for $K=\kappa $.

Lemma C.1

Let $R_K>2$, and suppose ${\widehat{x}},{\widehat{y}}\in {\mathbb {R}}^m$ for $m \ge 1$ with

$$\begin{aligned} 0 \le \langle {\widehat{x}},{\widehat{y}}\rangle I + {\widehat{x}}\otimes {\widehat{y}}\le 2I. \end{aligned}$$

(81)

Then the function $K=\kappa $ defined above satisfies Assumption 3.2 for some $\theta _x, \theta _y >0$ and some $\rho _x,\rho _y>0$ dependent on $R_K$ with

$$\begin{aligned} L_x(y)&= 2|y|_2^2,&L_y(x)&= 2|x|_2^2,&L_{yx}&= 4(|{\widehat{y}}|_2+\rho _y), \end{aligned}$$

as well as the constants $\xi _x,\xi _y \in {\mathbb {R}}$, $\lambda _x,\lambda _y \ge 0$ satisfying $\lambda _x\xi _x > 2(\lambda _x+|{\widehat{y}}|_2^2)|{\widehat{y}}|_2^2$, $\xi _y > 0$, and $\lambda _y > |{\widehat{x}}|_2^2$.

Proof

First, Assumption 3.2 (i) holds everywhere since $K\in C^\infty ({\mathbb {R}}^m)$. To verify Assumption 3.2 (ii), we observe using (80) that

$$\begin{aligned} \kappa _x(x', y)-\kappa _x(x, y)= & {} 2(y \otimes y)(x-x'), \end{aligned}$$

(82a)

$$\begin{aligned} \kappa _{xy}(x, y')-\kappa _{xy}(x, y)= & {} 2\langle y-y',x\rangle I+2(y-y') \otimes x, \end{aligned}$$

(82b)

$$\begin{aligned} \kappa _y(x, y')-\kappa _y(x, y)= & {} 2(x \otimes x)(y-y'), \end{aligned}$$

(82c)

$$\begin{aligned} \kappa _{yx}(x', y)-\kappa _{yx}(x, y)= & {} 2\langle x-x',y\rangle I+2(x-x') \otimes y. \end{aligned}$$

(82d)

Hence $L_x$, $L_y$, and $L_{yx}$ are as claimed.

To verify Assumption 3.2 (iii), we first of all observe using (81) that

$$\begin{aligned} |\kappa _{xy}({\widehat{x}}, {\widehat{y}})|_2 = 2|I-\langle {\widehat{y}},{\widehat{x}}\rangle I-{\widehat{y}}\otimes {\widehat{x}}|_2 \le 2. \end{aligned}$$

Therefore ${\sup _{(x, y) \in \mathbb {B}({\widehat{x}}, \rho _x) \times \mathbb {B}({\widehat{y}}, \rho _y)} |\kappa _{xy}(x, y)|_2 \le R_K}$ for some ${\rho _x, \rho _y>0}$ dependent on $R_K>2$.

Finally, to verify Assumption 3.2 (iv), we start with (15a), i.e.,

$$\begin{aligned}&\langle \kappa _x(x',{\widehat{y}})-\kappa _x({\widehat{x}},{\widehat{y}}),x-{\widehat{x}}\rangle +\xi _x|x-{\widehat{x}}|_2^2 \\&\quad \ge \theta _x|\kappa _y({\widehat{x}},y)-\kappa _y(x,y)-\kappa _{yx}(x,y)({\widehat{x}}-x)|_2 -\frac{\lambda _x}{2}|x-x'|_2^2. \end{aligned}$$

Expanding the equation using (80), (82), and

$$\begin{aligned} \begin{aligned} \kappa _y({\widehat{x}},y)&-\kappa _y(x,y)-\kappa _{yx}(x,y)({\widehat{x}}-x) \\&=2{\widehat{x}}(1-\langle {\widehat{x}},y\rangle )-2x(1-\langle x,y\rangle ) -2(I-\langle x,y\rangle I-x \otimes y)({\widehat{x}}-x) \\ {}&=2[\langle x,y\rangle x-\langle {\widehat{x}},y\rangle {\widehat{x}}+(\langle x,y\rangle I+x \otimes y)({\widehat{x}}-x)] \\ {}&=2[\langle x-{\widehat{x}},y\rangle {\widehat{x}}+(x \otimes y)({\widehat{x}}-x)] \\ {}&=-2(({\widehat{x}}-x) \otimes y)({\widehat{x}}-x), \end{aligned} \end{aligned}$$

we require that

$$\begin{aligned} 2\langle {\widehat{x}}-x',x-{\widehat{x}}\rangle _{{\widehat{y}}\otimes {\widehat{y}}} +\xi _x|x-{\widehat{x}}|_2^2 \ge 2\theta _x|y|_2|x-{\widehat{x}}|_2^2 -\frac{\lambda _x}{2}|x-x'|_2^2. \end{aligned}$$

(83)

Taking any $\alpha >0$, this will hold by Cauchy’s and Young’s inequalities if $\xi _x \ge (2+\alpha )|{\widehat{y}}|_2^2 + 2\theta _x|y|_2$ and $\lambda _x/2 \ge \alpha ^{-1}|{\widehat{y}}|_2^2$. If $|{\widehat{y}}|_2=0$, clearly these hold for some $\alpha ,\theta _x>0$. Otherwise, solving $\alpha $ from the latter as an equality, i.e., taking $\alpha =2\lambda ^{-1}_x|{\widehat{y}}|_2^2$, the former holds if $\xi _x \ge 2(1+\lambda ^{-1}_x|{\widehat{y}}|_2^2)|{\widehat{y}}|_2^2 + 2\theta _x|y|_2$. If $\lambda _x\xi _x > 2(\lambda _x+|{\widehat{y}}|_2^2)|{\widehat{y}}|_2^2$, this holds for some $\theta _x,\rho _x,\rho _y>0$ in a neighborhood ${\mathbb {B}({\widehat{x}}, \rho _x) \times \mathbb {B}({\widehat{y}}, \rho _y)}$ of (${\widehat{x}}, {\widehat{y}})$.

It remains to verify (15b), i.e.,

$$\begin{aligned}&\langle \kappa _y(x,y)-\kappa _y(x,y')+\kappa _y({\widehat{x}},{\widehat{y}})-\kappa _y({\widehat{x}},y),y-{\widehat{y}}\rangle +\xi _y|y-{\widehat{y}}|_2^2 \\&\quad \ge \theta _y|\kappa _x(x',{\widehat{y}})-\kappa _x(x',y')-\kappa _{xy}(x',y')({\widehat{y}}-y')|_2 -\frac{\lambda _y}{2}|y-y'|_2^2. \end{aligned}$$

Again, using (80) and (82) we expand this as

$$\begin{aligned} 2\langle y'-y,y-{\widehat{y}}\rangle _{x \otimes x}+2|y-{\widehat{y}}|_{{\widehat{x}}\otimes {\widehat{x}}}^2 +\xi _y|y-{\widehat{y}}|_2^2 \ge 2\theta _y|x'|_2|y'-{\widehat{y}}|_2^2 -\frac{\lambda _y}{2}|y-y'|_2^2. \end{aligned}$$

Rearranging the $\theta _y$-term, we see that this holds if

$$\begin{aligned}&2\langle y'-y,y-{\widehat{y}}\rangle _{x \otimes x-2\theta _y|x'|_2I} +2|y-{\widehat{y}}|^2_{{\widehat{x}}\otimes {\widehat{x}}}+(\xi _y-2\theta _y)|x'|_2|y-{\widehat{y}}|_2^2 \\&\quad \ge \left( 2\theta _y|x'|_2-\frac{\lambda _y}{2}\right) |y'-y|_2^2. \end{aligned}$$

Rearranging and estimating the first term as

$$\begin{aligned} \begin{aligned} 2\langle y'-y,y-{\widehat{y}}\rangle _{x \otimes x-2\theta _y|x'|_2I}&= 2\langle y'-y,x\rangle \langle y-{\widehat{y}},x\rangle -4\theta _y|x'|_2\langle y'-y,y-{\widehat{y}}\rangle \\&\ge -2|y-{\widehat{y}}|^2_{x \otimes x}-\frac{1}{2}|y'-y|^2_{x \otimes x} -4\theta _y|x'|_2|y'-y|_2^2-\theta _y|x'|_2|y-{\widehat{y}}|_2^2 \end{aligned} \end{aligned}$$

and then using Young’s inequality on both parts, we obtain the condition

$$\begin{aligned} 2\left( |y-{\widehat{y}}|^2_{{\widehat{x}}\otimes {\widehat{x}}}-|y-{\widehat{y}}|^2_{x \otimes x}\right) +(\xi _y-3\theta _y)|x'|_2|y-{\widehat{y}}|_2^2 \ge \left( \frac{1}{2}|x|_2^2 + 6\theta _y|x'|_2-\frac{\lambda _y}{2}\right) |y'-y|_2^2. \end{aligned}$$

If $\xi _y > 0$ and $\lambda _y > |{\widehat{x}}|_2^2$, this holds for some $\theta _y,\rho _y,\rho _x>0$ in ${\mathbb {B}({\widehat{x}}, \rho _x) \times \mathbb {B}({\widehat{y}}, \rho _y)}$. $\square $

We comment on the condition (81) on the primal–dual solutions pair ${\widehat{x}}, {\widehat{y}}\in {\mathbb {R}}$. First, for $m=1$, this condition reduces to ${\widehat{x}}{\widehat{y}}\in [0, 1]$. This is necessarily satisfied in the case of the step function (where $f^*=\delta _{[0, \infty )}$) and in the case of the $\ell ^0$ function (where $f^*=0$) as in both cases, ${\widehat{x}}{\widehat{y}}\in \{0, 1\}$ by the dual optimality condition $\kappa _y({\widehat{x}}, {\widehat{y}}) \in \partial f^*({\widehat{y}})$. Furthermore, if we take $f^*_\gamma =\frac{\gamma }{2}|\,\varvec{\cdot }\,|_2^2$ for some $\gamma \ge 0$, then for any $m\ge 1$ the dual optimality condition reads $2{\widehat{x}}(1-\langle {\widehat{x}},{\widehat{y}}\rangle )=\gamma {\widehat{y}}$, i.e, ${\widehat{y}}=2{\widehat{x}}(\gamma +2|{\widehat{x}}|_2^2)^{-1}$, for which (81) is easily verified.

The following lemma shows that Assumption 3.2 remains valid if we include a linear operator in the primal component.

Lemma C.2

Let $K(x, y)={\tilde{K}}(Ax, y)$ for some ${A \in \mathbb {L}(X; Z)}$ and ${\tilde{K}} \in C^1(Z \times Y)$ on Hilbert spaces X, Y, Z. Suppose ${\tilde{K}}$ satisfies Assumption 3.2 at $({\widehat{z}}, {\widehat{y}}) :=(A {\widehat{x}}, {\widehat{y}})$. Mark the corresponding constants with a tilde: ${\tilde{L}}_z$, ${\tilde{R}}_K$, and so on. Then K satisfies Assumption 3.2 with $R_K :={\tilde{R}}_K \Vert A\Vert $; $\xi _x=\Vert A\Vert {{\tilde{\xi }}}_z$, $\xi _y={{\tilde{\xi }}}_y$; $\lambda _x=\Vert A\Vert {{\tilde{\lambda }}}_z$, $\lambda _y={{\tilde{\lambda }}}_y$; $\theta _x={{\tilde{\theta }}}_z$, $\theta _y={{\tilde{\theta }}}_y\Vert A\Vert ^{-1}$; $\rho _x=\Vert A\Vert ^{-1}{{\tilde{\rho }}}_x$, and $\rho _y={{\tilde{\rho }}}_y$ as well as

$$\begin{aligned} L_x(y)&=\Vert A\Vert ^2{\tilde{L}}_z(y),&L_y(x)&={\tilde{L}}_y(Ax),&L_{yx}&=\Vert A\Vert ^2{\tilde{L}}_{yz}. \end{aligned}$$

(84)

Proof

Observe first of all that by the chain rule,

$$\begin{aligned} K_x(x, y)&= A^* {\tilde{K}}_z(Ax, y),&K_y(x, y)&= {\tilde{K}}_y(Ax, y),&K_{xy}(x,y)&=A^*{\tilde{K}}_{zy}(Ax, y), \end{aligned}$$

and hence Assumption 3.2 (i) holds for K if it holds for ${\tilde{K}}$.

Let now Assumption 3.2 (ii) hold for ${\tilde{K}}$ with ${\tilde{L}}_x$, ${\tilde{L}}_y$, and ${\tilde{L}}_{yx}$. Observing that

$$\begin{aligned} {A\mathbb {B}({\widehat{x}}, \rho _x) \times \mathbb {B}({\widehat{y}}, \rho _y) \subset \mathbb {B}({\widehat{z}}, {{\tilde{\rho }}}_x) \times \mathbb {B}({\widehat{y}},{{\tilde{\rho }}}_y),} \end{aligned}$$

(85)

Assumption 3.2 (ii) thus also holds with the function of (84). Similarly in Assumption 3.2 (iii), we can take $R_K :={\tilde{R}}_K \Vert A\Vert $.

Finally, we expand Assumption 3.2 (iv) for K as

$$\begin{aligned} \begin{aligned}&\langle {\tilde{K}}_z(z',{\widehat{y}})-{\tilde{K}}_z({\widehat{z}},{\widehat{y}}),z-{\widehat{z}}\rangle +\xi _x\Vert x-{\widehat{x}}\Vert ^2 \\&\quad \ge \theta _x\Vert {\tilde{K}}_y({\widehat{z}},y)-{\tilde{K}}_y(z,y)-{\tilde{K}}_{yz}(z,y)({\widehat{z}}-z)\Vert -\frac{\lambda _x}{2}\Vert x-x'\Vert ^2 \end{aligned} \end{aligned}$$

and

$$\begin{aligned} \begin{aligned}&\langle {\tilde{K}}_y(z,y)-{\tilde{K}}_y(z,y') +{\tilde{K}}_y({\widehat{z}},{\widehat{y}})-{\tilde{K}}_y({\widehat{z}},y),y-{\widehat{y}}\rangle +\xi _y\Vert y-{\widehat{y}}\Vert ^2 \\&\quad \ge \theta _y\Vert A^*[{\tilde{K}}_z(z',{\widehat{y}}) -{\tilde{K}}_z(z',y')-{\tilde{K}}_{zy}(z',y')({\widehat{y}}-y')]\Vert -\frac{\lambda _y}{2}\Vert y-y'\Vert ^2, \end{aligned} \end{aligned}$$

where $z=Ax$, $z'=Ax'$, and ${\widehat{z}}=A{\widehat{x}}$. Since $\Vert z-z'\Vert \le \Vert A\Vert \Vert x-x'\Vert $, etc., this follows from Assumption 3.2 (iv) for ${\tilde{K}}$ with the constants as claimed. $\square $

Applying this lemma to ${\tilde{K}}(z, y)=\sum _{k=1}^n \kappa (z_k, y_k)$, we can thus lift the scalar estimates for $K=\kappa $ as in (80) to the corresponding estimates on $K(x, y) :=\sum _{k=1}^n \kappa ([D_h x]_k, y_k)$ as used in the Potts model example.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Clason, C., Mazurenko, S. & Valkonen, T. Primal–Dual Proximal Splitting and Generalized Conjugation in Non-smooth Non-convex Optimization. Appl Math Optim 84, 1239–1284 (2021). https://doi.org/10.1007/s00245-020-09676-1

Download citation

Published: 13 April 2020
Issue Date: October 2021
DOI: https://doi.org/10.1007/s00245-020-09676-1

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Primal–Dual Proximal Splitting and Generalized Conjugation in Non-smooth Non-convex Optimization

Abstract

Access this article

Similar content being viewed by others

Primal-Dual Proximal Algorithms for Structured Convex Optimization: A Unifying Framework

First-Order Primal–Dual Methods for Nonsmooth Non-convex Optimisation

First-Order Primal–Dual Methods for Nonsmooth Non-convex Optimization

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

A data Statement for the EPSRC

Reductions of the Three-Point Condition

Proposition A.1

Proof

Proposition A.2

Proof

Relaxations of the Three-Point Condition

Assumption B.1

Corollary B.1

Proof

Corollary B.2

Proof

Corollary B.3

Proof

Corollary B.4

Proof

Corollary B.5

Proof

Corollary B.6

Proof

Verification of Conditions for Step Function Presentation and Potts Model

Lemma C.1

Proof

Lemma C.2

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation