Skip to main content
Log in

Composite convex optimization with global and local inexact oracles

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

We introduce new global and local inexact oracle concepts for a wide class of convex functions in composite convex minimization. Such inexact oracles naturally arise in many situations, including primal–dual frameworks, barrier smoothing, and inexact evaluations of gradients and Hessians. We also provide examples showing that the class of convex functions equipped with the newly inexact oracles is larger than standard self-concordant and Lipschitz gradient function classes. Further, we investigate several properties of convex and/or self-concordant functions under our inexact oracles which are useful for algorithmic development. Next, we apply our theory to develop inexact proximal Newton-type schemes for minimizing general composite convex optimization problems equipped with such inexact oracles. Our theoretical results consist of new optimization algorithms accompanied with global convergence guarantees to solve a wide class of composite convex optimization problems. When the first objective term is additionally self-concordant, we establish different local convergence results for our method. In particular, we prove that depending on the choice of accuracy levels of the inexact second-order oracles, we obtain different local convergence rates ranging from linear and superlinear to quadratic. In special cases, where convergence bounds are known, our theory recovers the best known rates. We also apply our settings to derive a new primal–dual method for composite convex minimization problems involving linear operators. Finally, we present some representative numerical examples to illustrate the benefit of the new algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)

    Article  MathSciNet  Google Scholar 

  2. Ben-Tal, A., El Ghaoui, L., Nemirovski, A.: Robust Optimization. Princeton University Press, Princeton (2009)

    Book  Google Scholar 

  3. Ben-Tal, A., Nemirovski, A.: Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications, vol. 3. SIAM, University City (2001)

    Book  Google Scholar 

  4. Bogolubsky, L., Dvurechenskii, P., Gasnikov, A., Gusev, G., Nesterov, Y., Raigorodskii, A., Tikhonov, A., Zhukovskii, M.: Learning supervised pagerank with gradient-based and gradient-free optimization methods. In: Advances in Neural Information Processing Systems, pp. 4914–4922 (2016)

  5. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)

    Article  Google Scholar 

  6. Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)

    Article  MathSciNet  Google Scholar 

  7. Conn, A.R., Scheinberg, K., Vicente, L.N.: Introduction to Derivative-Free Optimization. SIAM, University City (2008)

    MATH  Google Scholar 

  8. d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008)

    Article  MathSciNet  Google Scholar 

  9. Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Math. Program. 146(1–2), 37–75 (2014)

    Article  MathSciNet  Google Scholar 

  10. Dvurechensky, P., Gasnikov, A.: Stochastic intermediate gradient method for convex problems with stochastic inexact oracle. J. Optim. Theory Appl. 171(1), 121–145 (2016)

    Article  MathSciNet  Google Scholar 

  11. Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3), 432–441 (2008)

    Article  Google Scholar 

  12. Gao, W., Goldfarb, D.: Quasi-Newton methods: superlinear convergence without linesearch for self-concordant functions. Optim. Method Softw. 34(1), 194–217 (2019)

    Article  Google Scholar 

  13. Harmany, Z.T., Marcia, R.F., Willett, R.M.: This is SPIRAL-TAP: sparse poisson intensity reconstruction algorithms—theory and practice. IEEE Trans. Image Process. 21(3), 1084–1096 (2012)

    Article  MathSciNet  Google Scholar 

  14. Hsieh, C.J., Sustik, M.A., Dhillon, I.S., Ravikumar, P.: Sparse inverse covariance matrix estimation using quadratic approximation. Adv. Neutral Inf. Process. Syst. 24, 1–18 (2011)

    Google Scholar 

  15. Lefkimmiatis, S., Unser, M.: Poisson image reconstruction with hessian schatten-norm regularization. IEEE Trans. Image Process. 22(11), 4314–4327 (2013)

    Article  MathSciNet  Google Scholar 

  16. Li, J., Andersen, M., Vandenberghe, L.: Inexact proximal newton methods for self-concordant functions. Math. Methods Oper. Res. 85(1), 19–41 (2017)

    Article  MathSciNet  Google Scholar 

  17. Li, L., Toh, K.C.: An inexact interior-point method for \(\ell _1\)-regularized sparse covariance selection. Math. Program. Compt. 2(3), 291–315 (2010)

    Article  Google Scholar 

  18. Lu, Z.: Randomized block proximal damped Newton method for composite self-concordant minimization. SIAM J. Optim. 27(3), 1910–1942 (2017)

    Article  MathSciNet  Google Scholar 

  19. Marron, S.J., Todd, M.J., Ahn, J.: Distance-weighted discrimination. J. Am. Stat. Assoc. 102(480), 1267–1271 (2007)

    Article  MathSciNet  Google Scholar 

  20. Necoara, I., Patrascu, A., Glineur, F.: Complexity of first-order inexact Lagrangian and penalty methods for conic convex programming. Optim. Method Softw. 34(2), 305–335 (2019)

    Article  MathSciNet  Google Scholar 

  21. Necoara, I., Suykens, J.A.K.: Interior-point Lagrangian decomposition method for separable convex optimization. J. Optim. Theory Appl. 143(3), 567–588 (2009)

    Article  MathSciNet  Google Scholar 

  22. Nemirovskii, A., Yudin, D.: Problem Complexity and Method Efficiency in Optimization. Wiley, New York (1983)

    Google Scholar 

  23. Nesterov, Y.: Introductory Lectures on Convex Optimization : A Basic Course, Volume 87 of Applied Optimization. Kluwer Academic Publishers, Berlin (2004)

    Book  Google Scholar 

  24. Nesterov, Y., Nemirovski, A.: Interior-Point Polynomial Algorithms in Convex Programming. Society for Industrial Mathematics, New York (1994)

    Book  Google Scholar 

  25. Nocedal, J., Wright, S.J.: Numerical Optimization. Springer Series in Operations Research and Financial Engineering, 2nd edn. Springer, New York (2006)

    MATH  Google Scholar 

  26. Olsen, P.A., Oztoprak, F., Nocedal, J., Rennie, S.J.: Newton-like methods for sparse inverse covariance estimation. Adv. Neural Inf. Process. Syst. 25, 1–9 (2012)

    Google Scholar 

  27. Ostrovskii, D.M., Bach, F.: Finite-sample analysis of M-estimators using self-concordance. arXiv:1810.06838v1 (2018)

  28. Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Optim. 1(3), 123–231 (2013)

    Google Scholar 

  29. Rockafellar, R.T.: Convex Analysis. Princeton Mathematics Series, vol. 28. Princeton University Press, Princeton (1970)

    Book  Google Scholar 

  30. Shapiro, A., Dentcheva, D., Ruszczynski, A.: Lectures on Stochastic Programming: Modelling and Theory. SIAM, University City (2009)

    Book  Google Scholar 

  31. Sun, T., Tran-Dinh, Q.: Generalized self-concordant functions: a recipe for Newton-type methods. Math. Program. 178, 145–213 (2018)

    Article  MathSciNet  Google Scholar 

  32. Toh, K.-C., Todd, M.J., Tütüncü, R.H.: On the implementation and usage of SDPT3—a Matlab software package for semidefinite-quadratic-linear programming. Technical Report 4, NUS Singapore (2010)

  33. Tran-Dinh, Q., Kyrillidis, A., Cevher, V.: Composite self-concordant minimization. J. Mach. Learn. Res. 15, 374–416 (2015)

    MathSciNet  MATH  Google Scholar 

  34. Tran-Dinh, Q., Necoara, I., Savorgnan, C., Diehl, M.: An inexact perturbed path-following method for Lagrangian decomposition in large-scale separable convex optimization. SIAM J. Optim. 23(1), 95–125 (2013)

    Article  MathSciNet  Google Scholar 

  35. Tran-Dinh, Q., Sun, T., Lu, S.: Self-concordant inclusions: a unified framework for path-following generalized Newton-type algorithms. Math. Program. 177(1–2), 173–223 (2019)

    Article  MathSciNet  Google Scholar 

  36. Zhang, R.Y., Fattahi, S., Sojoudi, S.: Linear-time algorithm for learning large-scale sparse graphical models. IEEE Access 7, 12658–12672 (2019)

    Article  Google Scholar 

  37. Zhang, Y., Lin, X.: DiSCO: distributed optimization for self-concordant empirical loss. In: Proceedings of the 32th International Conference on Machine Learning, pp. 362–370 (2015)

Download references

Acknowledgements

The work of Q. Tran-Dinh was partly supported by the National Science Foundation (NSF), Grant: DMS-1619884, and the Office of Naval Research (ONR), Grant: N00014-20-1-2088 (2020–2023). The work of I. Necoara was partly supported by the Executive Agency for Higher Education, Research and Innovation Funding (UEFISCDI), Romania, PNIII-P4-PCE-2016-0731, Project ScaleFreeNet, No. 39/2017.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Quoc Tran-Dinh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A The proof of technical results in the main text

A The proof of technical results in the main text

This appendix provides the proofs of technical results and missing concepts in the main text.

1.1 A.1 The proof of Lemma 1: properties of global inexact oracle

(a) Substituting \(x = y\) into (5), we obtain (7) directly for all \(x\in \mathrm {dom}(f)\).

(b) Clearly, if \(\langle g(\bar{x}), y - \bar{x}\rangle \ge 0\) for all \(y\in \mathrm {dom}(f)\), then \(\langle g(\bar{x}), x^{\star } - \bar{x}\rangle \ge 0\) for a minimizer \(x^{\star }\) of f. Using this relation into (5), we have

$$\begin{aligned} f^{\star }&= f(x^{\star }) \overset{(5)}{\ge } {\tilde{f}}(\bar{x}) + \langle g(\bar{x}), x^{\star } - \bar{x}\rangle + \omega ((1-\delta _0)\vert\!\Vert x^{\star } - \bar{x} \Vert\!\vert_{\bar{x}}) \\&\ge {\tilde{f}}(\bar{x}) + \omega ((1-\delta _0)\vert\!\Vert x^{\star } - \bar{x} \Vert\!\vert_{\bar{x}}) \qquad \qquad \qquad \text {since}\,\langle g(\bar{x}), x^{\star } - \bar{x}\rangle \ge 0\\&\ge {\tilde{f}}(\bar{x}) \overset{(7)}{\ge } f(\bar{x}) - \delta _1, \end{aligned}$$

which implies \(f^{\star } \le f(\bar{x}) \le f^{\star } + \delta _1\).

(c) Let \(\nabla f(x)\) be a (sub)gradient of f at \(x \in \mathrm {int}\left( \mathrm {dom}(f)\right) \). For \(y\in \mathrm {dom}(f)\), it follows from (5) and (7) that

$$\begin{aligned} f(y)\ge f(x) + \left\langle \nabla f(x),y-x\right\rangle \overset{(7)}{\ge } {\tilde{f}}(x)+\left\langle \nabla f(x),y-x\right\rangle . \end{aligned}$$

Subtracting this estimate from the second inequality of (5), we have

$$\begin{aligned} \left\langle \nabla f(x)-g(x),y-x\right\rangle \le \omega _{*}\left( (1+\delta _0)\vert\!\Vert y - x \Vert\!\vert_x\right) + \delta _1, \end{aligned}$$
(52)

provided that \(\vert\!\Vert y - x \Vert\!\vert_x < \frac{1}{1+\delta _0}\). Let us consider an arbitrary \(z\in {\mathbb{R}}^p\) such that

$$\begin{aligned} \vert\!\Vert \nabla f(x)-g(x) \Vert\!\vert_x^{*}=\left| \left\langle \nabla f(x)-g(x),z\right\rangle \right| ~~~\text {and}~~~\vert\!\Vert z \Vert\!\vert_x=1. \end{aligned}$$

Let us choose \(y = y_{\tau }(x) := x + \tau \text {sign}(\left\langle \nabla f(x)-g(x),z\right\rangle )z\) for some \(\tau > 0\). Since \(x\in \mathrm {int}\left( \mathrm {dom}(f)\right) \), for sufficiently small \(\tau \), \(y\in \mathrm {dom}(f)\). Moreover, (52) becomes \(\tau \vert\!\Vert \nabla f(x)-g(x) \Vert\!\vert_x^{*}\le \omega _{*}\left( (1+\delta _0)\tau \right) +\delta _1\), which is equivalent to

$$\begin{aligned} \vert\!\Vert \nabla f(x)-g(x) \Vert\!\vert_x^{*}\le s(\tau ;\delta _0,\delta _1) := \tfrac{\omega _{*}\left( (1+\delta _0)\tau \right) +\delta _1}{\tau }. \end{aligned}$$
(53)

Let us take \(\tau := \frac{\delta _2}{(1+\delta _0 + \delta _2)(1+\delta _0)}\) for some sufficiently small \(\delta _2 > 0\). Then, we can easily check that \(\vert\!\Vert y - x \Vert\!\vert_x = \tau < \frac{1}{1+\delta _0}\). In this case, the right-hand side of (53) becomes

$$\begin{aligned} s(\tau ;\delta _0,\delta _1) = \tfrac{(1+\delta _0)(1+\delta _0 + \delta _2)}{\delta _2}\left[ \delta _1 + \ln \left( 1 + \tfrac{\delta _2}{1+\delta _0}\right) \right] - (1+\delta _0), \end{aligned}$$
(54)

for any \(\delta _2 >0\). Minimizing the right-hand side of (54) w.r.t. \(\delta _2 > 0\), we can show that the minimum is attained at \(\delta _2 := \delta _2(\delta _0,\delta _1) > 0\) which is the unique solution \(\delta _2 := (1+\delta _0)\omega ^{-1}(\delta _1)\) of \(\omega \left( \tfrac{\delta _2}{1+\delta _0} \right) = \delta _1\) in \(\delta _2\), where \(\omega ^{-1}\) is the inverse function of \(\omega \) (note that \(\omega (\tau ) = \tau - \ln (1 + \tau )\)).

Now, substituting \(\delta _2 = \delta _2(\delta _0,\delta _1)\) back into \(s(\tau ;\delta _0,\delta _1)\), we can see that the minimum value of (54) is \(\delta _2 := \delta _2(\delta _0,\delta _1) = (1+\delta _0)\omega ^{-1}(\delta _1)\). By directly using the definition of \(\omega \), it is obvious that if \(\delta _0 \rightarrow 0\) and \(\delta _1\rightarrow 0\), then \(\delta _2 := (1+\delta _0)\omega ^{-1}(\delta _1) \rightarrow 0\).

(d) Let us consider the function \(\varphi (y) := f(y) - \langle \nabla {f}(x^0), y\rangle \) for some \(x^0 \in \mathrm {dom}(f)\). It is clear that \(\nabla {\varphi }(x^0) = 0\), which shows that \(x^0\) is a minimizer of \(\varphi \). Hence, we have \(\varphi (x^0) \le \varphi (x - tH(x)^{-1}h(x))\) for some \(t > 0\) such that \(x\in \mathrm {int}\left( \mathrm {dom}(f)\right) \) and \(x - tH(x)^{-1}h(x) \in \mathrm {dom}(f)\). If we define \(\tilde{\varphi }(x) := {\tilde{f}}(x) - \langle \nabla {f}(x^0), x\rangle \), and \(h(x) := g(x) - \nabla {f}(x^0)\), then, by using (5), we can further derive

$$\begin{aligned} \varphi (x^0) \le \varphi (x - tH(x)^{-1}h(x)) \le \tilde{\varphi }(x) - t(\vert\!\Vert h(x) \Vert\!\vert_{x}^{*})^2 + \omega _{*}\left( (1+\delta _0)t \vert\!\Vert h(x) \Vert\!\vert_{x}^{*}) \right) + \delta _1. \end{aligned}$$

Minimizing the right-hand side of the last estimate w.r.t. \(t > 0\), we obtain

$$\begin{aligned} \varphi (x^0) \le \tilde{\varphi }(x) - \omega \left( \tfrac{\vert\!\Vert h(x) \Vert\!\vert_{x}^{*}}{1+\delta _0}\right) + \delta _1, \end{aligned}$$

given \(t = \frac{1}{(1+\delta _0)(1+\delta _0+\vert\!\Vert h(x) \Vert\!\vert_x^{*})}\). Using the definition of \(\varphi \) and the Cauchy-Schwarz inequality, we have

$$\begin{aligned} \omega \left( \frac{\vert\!\Vert h(x) \Vert\!\vert_{x}^{*}}{1+\delta _0}\right)&\le {\tilde{f}}(x) - f(x^0) - \langle \nabla {f}(x^0), x - x^0\rangle + \delta _1 \\&\overset{(5)}{\le } -\omega ((1-\delta _0)\vert\!\Vert x-x^0 \Vert\!\vert_x)+\langle g(x)-\nabla f(x^0),x-x^0\rangle +\delta _1 \\&\le \vert\!\Vert h(x) \Vert\!\vert_{x}^{*}\vert\!\Vert x-x^0 \Vert\!\vert_x + \delta _1. \end{aligned}$$

By letting \(x^0 = y\) into this inequality, we obtain exactly (9). \(\square \)

1.2 A.2 The proof of Lemma 2: properties of local inexact oracle

From the second line of (6), for any \(u\in {\mathbb{R}}^p\), we have

$$\begin{aligned} (1-\delta _3)^2\Vert u\Vert _x^2 \le \vert\!\Vert u \Vert\!\vert_x^2 \le (1+\delta _3)^2\Vert u\Vert _x^2, \end{aligned}$$

which implies the first expression of (10).

Using again the second line of (6), we have \(\frac{1}{(1+\delta _3)^2}\nabla ^2{f}(x)^{-1} \preceq H(x)^{-1} \preceq \frac{1}{(1-\delta _3)^2}\nabla ^2{f}(x)^{-1}\). Hence, for any \(v\in {\mathbb{R}}^p\), one has

$$\begin{aligned} \frac{1}{(1+\delta _3)^2}(\Vert v\Vert _x^{*})^2 \le (\vert\!\Vert v \Vert\!\vert_x^{*})^2\le \frac{1}{(1-\delta _3)^2}(\Vert v\Vert _x^{*})^2, \end{aligned}$$

which implies the second expression of (10).

Now, we prove (11). For any \(x, y \in {\mathcal{X}}\), using (10) with \(u := y - x\), we have

$$\begin{aligned} \frac{\vert\!\Vert y-x \Vert\!\vert_x}{(1 + \delta _3)} \le \left\| y-x\right\| _x \le \frac{\vert\!\Vert y-x \Vert\!\vert_x}{(1 - \delta _3)}. \end{aligned}$$
(55)

For \(x, y \in {\mathcal{X}}\) such that \(\vert\!\Vert y-x \Vert\!\vert_x < 1-\delta _3\) for \(\delta _3 \in [0, 1)\), the estimate (55) implies that

$$\begin{aligned} (1 - \Vert y-x\Vert _x)^2 \ge \left( 1 - \frac{\vert\!\Vert y-x \Vert\!\vert_x}{1-\delta _3}\right) ^2 = \frac{\left( 1 - \delta _3 - \vert\!\Vert y-x \Vert\!\vert_x\right) ^2}{(1-\delta _3)^2}. \end{aligned}$$
(56)

Since \(\vert\!\Vert y-x \Vert\!\vert_x < 1-\delta _3\), by (55), we have \(\left\| y-x\right\| _x < 1\). Hence, by [23, Theorem 4.1.6], we can show that

$$\begin{aligned} \left( 1 - \left\| y-x\right\| _x\right) ^2\nabla ^2f(x) \preceq \nabla ^2f(y) \preceq \frac{1}{\left( 1 - \left\| y-x\right\| _x\right) ^{2}}\nabla ^2f(x). \end{aligned}$$
(57)

Combining (57) and (56), and using again (6), we can further derive

$$\begin{aligned} H(y)&\overset{(6)}{\succeq } (1-\delta _3)^2\nabla ^2{f}(y) \overset{(57)}{\succeq } (1-\delta _3)^2 \left( 1 - \left\| y-x\right\| _x\right) ^2\nabla ^2{f}(x)\\&\overset{(56)}{\succeq } \left( 1 - \delta _3 - \vert\!\Vert y-x \Vert\!\vert_x\right) ^2\nabla ^2{f}(x) \\&\overset{(6)}{\succeq } \left[ \frac{1 - \delta _3 - \vert\!\Vert y-x \Vert\!\vert_x}{1+\delta _3}\right] ^2H(x), \end{aligned}$$

and

$$\begin{aligned} H(y) \overset{(6)}{\preceq } (1+\delta _3)^2\nabla ^2{f}(y)&\overset{(57)}{\preceq } \tfrac{(1+\delta _3)^2}{\left( 1 - \left\| y-x\right\| _x\right) ^2} \nabla ^2f(x) \overset{(56)}{\preceq } \left[ \tfrac{(1-\delta _3)(1+\delta _3)}{1 - \delta _3 - \vert\!\Vert y-x \Vert\!\vert_x}\right] ^2\nabla ^2f(x)\\&\overset{(6)}{\preceq } \left[ \tfrac{1+\delta _3}{1 + \delta _3 - \vert\!\Vert y-x \Vert\!\vert_x}\right] ^2H(x). \end{aligned}$$

Therefore, we obtain the first estimate of (11) from these expressions.

From the second line of (6), we have \(-(2\delta _3 - \delta _3^2)\nabla ^2{f}(x) \preceq H(x) - \nabla ^2 f(x) \preceq (2\delta _3 + \delta _3^2)\nabla ^2 f(x)\). If we define \(G_x := [\nabla ^2 f(x)]^{-1/2}(\nabla ^2 f(x) - H(x))[\nabla ^2 f(x)]^{-1/2}\), then the last estimate implies that

$$\begin{aligned} \left\| G_x\right\| \le 2\delta _3+\delta _3^2. \end{aligned}$$
(58)

Moreover, by (55), (57), (58), and the Cauchy–Schwarz inequality in (i), we can further derive

$$\begin{aligned} \vert\!\Vert (\nabla ^2 f(x)-H(x))v \Vert\!\vert_y^{*}&\overset{(55)}{\le } \tfrac{1}{1-\delta _3} \Vert (\nabla ^2 f(x)-H(x))v\Vert _y^{*} \\&\overset{(57)}{\le } \tfrac{1}{1-\delta _3} \left( v^{\top } (\nabla ^2 f(x) - H(x)) \tfrac{1}{(1 - \Vert y-x\Vert _x)^2}\nabla ^2 f(x)^{-1} \right. \\&\qquad \left. (\nabla ^2 f(x) - H(x)) v \right) ^{1/2} \\&= \tfrac{1}{(1-\delta _3)(1 - \left\| y-x\right\| _x)} \Vert G_x [\nabla ^2 f(x)]^{1/2}v\Vert \\&\overset{(i)}{\le } \tfrac{1}{(1-\delta _3)(1 - \left\| y-x\right\| _x)} \left\| G_x\right\| \left\| v\right\| _x \\&\overset{(58), (55)}{\le } \tfrac{2\delta _3 + \delta _3^2}{(1-\delta _3)^2\left( 1 - (1-\delta _3)^{-1} \vert\!\Vert y-x \Vert\!\vert_x\right) } \vert\!\Vert v \Vert\!\vert_x \\&= \tfrac{2\delta _3 + \delta _3^2}{(1-\delta _3)(1 -\delta _3 - \vert\!\Vert y-x \Vert\!\vert_x)} \vert\!\Vert v \Vert\!\vert_x, \end{aligned}$$

which is exactly the second estimate of (11). \(\square \)

1.3 A.3 The proof of Lemma 3: computational inexact oracle

(a) We first prove the left-hand side inequality of (5). Since f is standard self-concordant, for any \(x, y\in \mathrm {dom}(f)\) and \(\alpha \in [0, 1]\), we have

$$\begin{aligned} f(y)&\overset{(i)}{\ge } f(x) + \langle \nabla {f}(x), y-x\rangle + \omega (\left\| y-x\right\| _x) \nonumber \\&\overset{(15)}{\ge } {\hat{f}}(x) + \langle g(x), y-x\rangle - \varepsilon + \langle \nabla {f}(x) - g(x), y-x\rangle + \omega (\left\| y-x\right\| _x) \nonumber \\&\overset{(ii),(10)}{\ge } {\hat{f}}(x) + \langle g(x), y-x\rangle - \varepsilon - \vert\!\Vert\nabla {f}(x) - g(x)\Vert\!\vert^{*}_{x}\vert\!\Vert y-x \Vert\!\vert_x \nonumber \\&\quad + \omega ((1-\delta _3)\vert\!\Vert y-x \Vert\!\vert_x)\nonumber \\&\ge {\hat{f}}(x) + \langle g(x), y-x\rangle + \omega (\alpha (1-\delta _3)\vert\!\Vert y-x \Vert\!\vert_x) - \varepsilon \nonumber \\&\quad - {~} \delta _2\vert\!\Vert y-x \Vert\!\vert_x + \omega ((1-\delta _3)\vert\!\Vert y-x \Vert\!\vert_x) - \omega (\alpha (1-\delta _3)\vert\!\Vert y-x \Vert\!\vert_x), \end{aligned}$$
(59)

where (i) follows from [23, Theorem 4.1.7] and (ii) follows from the Cauchy-Schwarz inequality.

Let \(\gamma := 1 - \delta _3 \in (0, 1]\). We consider the function

$$\begin{aligned} \underline{\psi }(t)&:= -\delta _2 t + \omega (\gamma t) - \omega (\alpha \gamma t) \\&= \gamma t - \ln (1 + \gamma t) - \delta _2 t - \alpha \gamma t + \ln (1 + \alpha \gamma t) \\&= \left[ \gamma (1-\alpha ) - \delta _2\right] t - \ln \left( 1 + \frac{\gamma (1-\alpha )t}{1+\alpha \gamma t}\right) . \end{aligned}$$

The first and second derivatives of \(\underline{\psi }\) are given respectively by

$$\begin{aligned} \underline{\psi }'(t)= & {} (1-\alpha )\gamma - \delta _2 - \frac{\gamma }{1+\gamma t} + \frac{\alpha \gamma }{1 + \alpha \gamma t}\;\text {and}\\ \underline{\psi }''(t)= & {} \frac{\gamma ^2}{(1 + \gamma t)^2} - \frac{(\alpha \gamma )^2}{(1 + \alpha \gamma t)^2}. \end{aligned}$$

Since \(\alpha \in [0, 1]\), it is easy to check that \(\underline{\psi }''(t) \ge 0\) for all \(t \ge 0\). Hence, \(\underline{\psi }\) is convex.

If \((1-\alpha )\gamma > \delta _2\), then \(\underline{\psi }\) attains the minimum at \(\underline{t}^{*} > 0\) as the positive solution of \(\underline{\psi }'(t) = (1-\alpha )\gamma - \delta _2 - \frac{\gamma }{1+\gamma t} + \frac{\alpha \gamma }{1 + \alpha \gamma t} = 0\). Solving this equation for a positive solution, we get

$$\begin{aligned} \underline{t}^{*}:= & {} \tfrac{1}{2\alpha \gamma }\left[ \sqrt{(1+\alpha )^2 + \tfrac{4\alpha \delta _2}{(1-\alpha )\gamma - \delta _2}} - (1+\alpha )\right] \\= & {} \frac{2\delta _2}{\gamma \left[ (1-\alpha )\gamma - \delta _2\right] \left[ \sqrt{(1+\alpha )^2 + \tfrac{4\alpha \delta _2}{(1-\alpha )\gamma - \delta _2}} + (1+\alpha )\right] } > 0. \end{aligned}$$

Let choose \(\alpha := 1 - \frac{2\delta _2}{1-\delta _3} = \frac{1-2\delta _2-\delta _3}{1-\delta _3}\). To guarantee \(\alpha \in [0, 1]\), we impose \(2\delta _2 + \delta _3 \in [0, 1)\). Moreover, \((1-\alpha )\gamma - \delta _2 = \delta _2 \ge 0\). Substituting \(\alpha \) and \(\gamma = 1 - \delta _3\) into \(\underline{t}^{*}\), we eventually obtain

$$\begin{aligned} \underline{t}^{*} = \frac{1}{(1-\delta _2 - \delta _3) + \sqrt{(1-\delta _2 - \delta _3)^2 + (1-\delta _3)(1-2\delta _2 - \delta _3)}}. \end{aligned}$$

As a result, we can directly compute

$$\begin{aligned} \frac{\gamma (1-\alpha )\underline{t}^{*}}{1+\alpha \gamma \underline{t}^{*}} = \frac{2\delta _2}{(2- 3\delta _2 - 2\delta _3) + \sqrt{(1-\delta _2 - \delta _3)^2 + (1-\delta _3)(1-2\delta _2 - \delta _3)}}. \end{aligned}$$

In this case, we can write the minimum value \(\underline{\psi }(\underline{t}^{*})\) of \(\underline{\psi }\) explicitly as

$$\begin{aligned} \underline{\psi }^{*}(\delta _2,\delta _3) := \underline{\psi }(\underline{t}^{*}) = \frac{\delta _2}{ \underline{c}_{23} + (1-\delta _2-\delta _3)} - \ln \left( 1 + \frac{2\delta _2}{ \underline{c}_{23} + (2- 3\delta _2 - 2\delta _3)}\right) \ge 0, \end{aligned}$$

where \(\underline{c}_{23} := \left[ (1-\delta _2 - \delta _3)^2 + (1-\delta _3)(1-2\delta _2 - \delta _3)\right] ^{1/2} \ge 0\). This is exactly third line of (16).

Now, substituting this lower bound \(\underline{\psi }^{*}(\delta _2,\delta _3)\) of \(\underline{\psi }\) into (59) and noting that \(\alpha (1-\delta _3) = 1 - 2\delta _2 - \delta _3\), we obtain

$$\begin{aligned} f(y) \ge {\hat{f}}(x) + \langle g(x), y-x\rangle + \omega \left( (1-2\delta _2-\delta _3)\vert\!\Vert y - x\Vert\!\vert_x\right) - \varepsilon + \underline{\psi }^{*}(\delta _2,\delta _3). \end{aligned}$$

Clearly, if we define \({\tilde{f}}(x) := {\hat{f}}(x) - \varepsilon ~+~ \underline{\psi }^{*}(\delta _2,\delta _3)\) and \(\delta _0 := 2\delta _2 + \delta _3 \in [0, 1)\), then the last inequality is exactly the left-hand side inequality of (5).

(b) To prove the right-hand side inequality of (5), we first derive

$$\begin{aligned} f(y)&\overset{{} \textit{(a)}}{\le } f(x) + \langle \nabla {f}(x), y - x\rangle + \omega _{*}(\Vert y-x\Vert _x) \nonumber \\&\overset{(15),(10), \textit{(b)}}{\le } {\hat{f}}(x) + \langle g(x), y - x\rangle + \omega _{*}(\beta (1+\delta _3)\vert\!\Vert y-x \Vert\!\vert_x) \nonumber \\&\quad + {~} \varepsilon + \vert\!\Vert g(x) - \nabla {f}(x \Vert\!\vert_x^{*}\vert\!\Vert y - x \Vert\!\vert_x + \omega _{*}((1+\delta _3)\vert\!\Vert y-x \Vert\!\vert_x) \nonumber \\&\quad - \omega _{*}(\beta (1+\delta _3)\vert\!\Vert y-x \Vert\!\vert_x), \end{aligned}$$
(60)

where (a) follows from [23, Theorem 4.1.8] and (b) holds due to the Cauchy-Schwarz inequality.

Let \(\beta \ge 1\) and \(\bar{\gamma } := 1 + \delta _3 \ge 1\). We consider the following function

$$\begin{aligned} \bar{\psi }(t):= \,& {} \delta _2t +\omega _{*}(\bar{\gamma } t) - \omega _{*}(\beta \bar{\gamma } t) \\=\, & {} \delta _2 t -\bar{\gamma } t - \ln (1-\bar{\gamma } t)+\beta \bar{\gamma } t + \ln (1-\beta \bar{\gamma } t) \\=\, & {} \left[ (\beta - 1)\bar{\gamma } + \delta _2\right] t - \ln \left( 1 + \frac{\bar{\gamma }(\beta -1)t}{1-\beta \bar{\gamma } t}\right) . \end{aligned}$$

First, we compute the first and second derivatives of \(\bar{\psi }\) respectively as

$$\begin{aligned} \bar{\psi }'(t)= & {} (\beta -1)\bar{\gamma }+\delta _2-\frac{\beta \bar{\gamma }}{1-\beta \bar{\gamma }t}+\frac{\bar{\gamma }}{1-\bar{\gamma }t}~~{\text { and }}~~ \\ \bar{\psi }''(t)= & {} -\frac{(\beta \bar{\gamma })^2}{(1-\beta \bar{\gamma }t)^2}+\frac{\bar{\gamma }^2}{(1-\bar{\gamma }t)^2}. \end{aligned}$$

Clearly, \(\bar{\psi }''(t) \le 0\) for all \(0 \le t < \frac{1}{\bar{\gamma }\beta }\). Hence, \(\bar{\psi }\) is concave in t. To find the maximum value of \(\bar{\psi }\), we need to solve \(\bar{\psi }'(t) = 0\) for \(t > 0\), and obtain

$$\begin{aligned} \bar{t}^{*}= & {} \tfrac{1}{2\beta \bar{\gamma }}\left( 1+\beta - \sqrt{(1+\beta )^2 - \tfrac{4\beta \delta _2}{(\beta -1)\bar{\gamma }+\delta _2}}\right) \\= & {} \frac{2\delta _2}{\bar{\gamma }\left[ (\beta -1)\bar{\gamma }+\delta _2\right] \left[ \sqrt{(1+\beta )^2 - \tfrac{4\beta \delta _2}{(\beta -1)\bar{\gamma }+\delta _2}} + 1+\beta \right] } > 0. \end{aligned}$$

Let us choose \(\beta := 1 + \frac{2\delta _2}{1+\delta _3} \ge 1\). Then we can explicitly compute \(\bar{t}^{*}\) as

$$\begin{aligned} \bar{t}^{*} = \frac{1}{3(1+\delta _2+\delta _3) + \sqrt{3(1+\delta _2+\delta _3)^2 - (1+\delta _3)(1+2\delta _2+\delta _3)}} > 0. \end{aligned}$$

To evaluate \(\bar{\psi }(\bar{t}^{*})\), we first compute

$$\begin{aligned} \frac{\bar{\gamma }(\beta -1)\bar{t}^{*}}{1 - \beta \bar{\gamma }\bar{t}^{*}} = \frac{2\delta _2}{2(1+\delta _2+\delta _3) + \sqrt{3(1+\delta _2+\delta _3)^2 - (1+\delta _3)(1+2\delta _2+\delta _3)}}. \end{aligned}$$

Using this expression, we can explicitly compute the maximum value \(\bar{\psi }(\bar{t}^{*})\) of \(\bar{\psi }\) as

$$\begin{aligned} \bar{\psi }^{*}(\delta _2,\delta _3) := \bar{\psi }(\bar{t}^{*}) = \frac{3\delta _2}{3(1+\delta _2+\delta _3) + \bar{c}_{23}} - \ln \left( 1 + \frac{2\delta _2}{2(1+\delta _2+\delta _3) + \bar{c}_{23}}\right) \ge 0, \end{aligned}$$

where \(\bar{c}_{23} := \sqrt{3(1+\delta _2+\delta _3)^2 - (1+\delta _3)(1+2\delta _2+\delta _3)} \ge 0\). Plugging this expression into (60), and noting that \(\beta (1+\delta _3) = 1 + 2\delta _2 + \delta _3 = 1 + \delta _0\), we can show that

$$\begin{aligned} f(y) \le {\hat{f}}(x) + \langle g(x), y - x\rangle + \omega _{*}\left( (1+\delta _0)\vert\!\Vert y-x \Vert\!\vert_x\right) + \varepsilon + \bar{\psi }^{*}(\delta _2,\delta _3). \end{aligned}$$

Finally, by defining \(\delta _1 := \max \left\{ 0, 2\varepsilon + \bar{\psi }^{*}(\delta _2,\delta _3) - \underline{\psi }^{*}(\delta _2,\delta _3)\right\} \ge 0\) and noting that \({\tilde{f}}(x) = {\hat{f}}(x) - \varepsilon + \underline{\psi }^{*}\), we obtain

$$\begin{aligned} f(y) \le {\tilde{f}}(x) + \langle g(x), y - x\rangle + \omega _{*}\left( (1+\delta _0)\vert\!\Vert y-x\Vert\!\vert_x\right) + \delta _1, \end{aligned}$$

which proves the right-hand side inequality of (5). \(\square \)

1.4 A.4 The proof of Lemma 4: inexact oracle of the dual problem

Since \(\varphi \) is self-concordant, by [23, Theorem 4.1.6] and \(\delta (x) := \left\| {\tilde{u}}^{*}(x) - u^{*}(x)\right\| _{{\tilde{u}}^{*}(x)}\), we have

$$\begin{aligned} (1-\delta (x))^2 [\nabla ^2 \varphi (u^{*}(x))]^{-1} \preceq [\nabla ^2 \varphi ({\tilde{u}}^{*}(x))]^{-1} \preceq (1- \delta (x))^{-2} [\nabla ^2 \varphi (u^{*}(x))]^{-1}. \end{aligned}$$

Multiplying this estimate by A and \(A^{\top }\) on the left and right, respectively we obtain

$$\begin{aligned} (1-\delta (x))^2 A[\nabla ^2 \varphi (u^{*}(x))]^{-1}A^{\top }\preceq & \, {} A[\nabla ^2 \varphi ({\tilde{u}}^{*}(x))]^{-1}A^{\top }\\\preceq & \, {} (1- \delta (x))^{-2} A[\nabla ^2 \varphi (u^{*}(x))]^{-1}A^{\top }. \end{aligned}$$

Using (19) and (20), this estimate leads to

$$\begin{aligned} (1-\delta (x))^2\nabla ^2f(x) \preceq H(x) \preceq (1- \delta (x))^{-2}\nabla ^2f(x). \end{aligned}$$
(61)

Since \(\delta (x) \le \delta \) and \(\delta _3 := \frac{\delta }{1-\delta } \in [0, 1)\), we have \((1-\delta (x))^2 \ge (1-\delta )^2 \ge (1-\delta _3)^2\) and \(\frac{1}{(1-\delta (x))^2} \le \frac{1}{(1-\delta )^2} = (1+\delta _3)^2\). Using these inequalities in (61), we obtain the second bound of (21).

Next, by the definition of g(x) and \(\nabla {f}(x)\), we can derive that

$$\begin{aligned} \left[ \vert\!\Vert g(x) - \nabla f(x)\Vert\!\vert_x^{*}\right] ^2&= ({\tilde{u}}^{*}(x) \!-\! u^{*}(x))^{\top }{\!\!}A^{\top }\left( A\nabla ^2{\varphi }({\tilde{u}}^{*}(x))^{-1}{\!\!}A^{\top }\right) ^{-1}{\!\!}A({\tilde{u}}^{*}(x) \!-\! u^{*}(x)) \\&\overset{{(i)}}{\le } ({\tilde{u}}^{*}(x) - u^{*}(x))^{\top }\nabla ^2{\varphi }({\tilde{u}}^{*}(x))({\tilde{u}}^{*}(x) - u^{*}(x)) \\&= \left\| {\tilde{u}}^{*}(x) - u^{*}(x)\right\| _{{\tilde{u}}^{*}(x)}^2 \le \delta ^2(x)\le \delta ^2, \end{aligned}$$

where we use \(A^{\top }(AQ^{-1}A^{\top })^{-1}A \preceq Q\) for \(Q = \nabla ^2{\varphi }(u^{*}(x))\succ 0\) in (i) (see [34] for a detailed proof of this inequality). This expression implies \(\vert\!\Vert g(x) - \nabla f(x) \Vert\!\vert_x^{*} \le \delta \), the first estimate of (21).

Now, by the definition of f in (17) and of \({\tilde{f}}\) in (20), respectively, and the optimality condition \(\nabla {\varphi }(u^{*}(x)) = A^{\top }x\) in (17), we have

$$\begin{aligned} f(x) - {{\tilde{f}}}(x)&\overset{(17), (20)}{=} \left[ \langle u^{*}(x), A^{\top }x\rangle - \varphi (u^{*}(x)) \right] - \left[ \langle {\tilde{u}}^{*}(x), A^{\top }x\rangle - \varphi ({\tilde{u}}^{*}(x)) \right] \\&= \varphi ({\tilde{u}}^{*}(x)) - \varphi (u^{*}(x)) - \langle A^{\top }x, {\tilde{u}}^{*}(x) - u^{*}(x)\rangle \\&= \varphi ({\tilde{u}}^{*}(x)) - \varphi (u^{*}(x)) - \left\langle \nabla \varphi (u^{*}(x)), {\tilde{u}}^{*}(x) - u^{*}(x)\right\rangle . \end{aligned}$$

Since \(\varphi \) is standard self-concordant, using [23, Theorem 4.1.7, 4.1.8] we obtain from the last expression that

$$\begin{aligned} \omega (\Vert {\tilde{u}}^{*}(x) - u^{*}(x)\Vert _{u^{*}(x)}) \le f(x) - {{\tilde{f}}}(x) \le \omega _{*}(\Vert {\tilde{u}}^{*}(x) - u^{*}(x)\Vert _{u^{*}(x)}), \end{aligned}$$

which leads to

$$\begin{aligned} 0\le \omega \left( \tfrac{\delta (x)}{1+\delta (x)}\right) \le f(x) - {{\tilde{f}}}(x) \le \omega _{*}\left( \tfrac{\delta (x)}{1-\delta (x)}\right) \le \omega _{*}\left( \tfrac{\delta }{1-\delta }\right) , \end{aligned}$$

provided that \(\delta (x)<1\). This condition leads to \(\vert f(x) - {\tilde{f}}(x)\vert \le \omega _{*}\left( \tfrac{\delta }{1-\delta }\right) =: \varepsilon \).

Using Lemma 3 with \(\varepsilon :=\omega _{*}\left( \frac{\delta }{1-\delta }\right) \) and \(\delta _2 := \delta \), and \(\delta _3\) defined above, we conclude that \(({\tilde{f}}, g, H)\) given by (20) is a \((\delta _0,\delta _1)\)-global inexact oracle of f, where \(\delta _0\) and \(\delta _2\) are computed from Lemma 3. Since \(2\delta _2+\delta _3<1\) is required in Lemma 3, by a direct numerical calculation, we obtain \(\delta \in [0,0.292]\).

From the optimality condition of (18) we have \(\nabla {\varphi }(u^{*}(x)) - A^{\top }x = 0\). Let \(r(x) := \nabla {\varphi }({\tilde{u}}^{*}(x)) - A^{\top }x\). Then, using the self-concordance of \(\varphi \), by [23, Theorem 4.1.7], we have

$$\begin{aligned} \tfrac{\Vert {\tilde{u}}^{*}(x) - u^{*}(x)\Vert _{u^{*}(x)}^2}{1 + \Vert {\tilde{u}}^{*}(x) - u^{*}(x)\Vert _{u^{*}(x)}} \overset{{(a)}}{\le } \langle \nabla {\varphi }({\tilde{u}}^{*}(x)) - \nabla {\varphi }(u^{*}(x)), {\tilde{u}}^{*}(x) - u^{*}(x)\rangle = \langle r(x), {\tilde{u}}^{*}(x) - u^{*}(x)\rangle , \end{aligned}$$

where we use [23, Theorem 4.1.7] in (a). Since \(\delta (x) := \Vert {\tilde{u}}^{*}(x) - u^{*}(x)\Vert _{{\tilde{u}}^{*}(x)}\), by the Cauchy-Schwarz inequality, we can show that \(\frac{\delta (x)^2}{1 + \delta (x)} \le \Vert r(x)\Vert ^{*}_{{\tilde{u}}^{*}(x)}\delta (x)\), which leads to \(\frac{\delta (x)}{1+\delta (x)} \le \Vert r(x)\Vert ^{*}_{{\tilde{u}}^{*}(x)}\).

Finally, we assume that \(\Vert r(x)\Vert ^{*}_{{\tilde{u}}^{*}(x)} \le \frac{\delta }{1+\delta }\) for some \(\delta > 0\) as stated in Lemma 4. Using this condition and the last inequality \(\frac{\delta (x)}{1+\delta (x)} \le \Vert r(x)\Vert ^{*}_{{\tilde{u}}^{*}(x)}\) we have \(\frac{\delta (x)}{1+\delta (x)} \le \frac{\delta }{1+\delta }\), which implies that \(\delta (x) \le \delta \). \(\square \)

1.5 A.5 The proof of Lemma 5: key estimate for local convergence analysis

First, recall that \(\nu ^k\in g(x^k)+H(x^k)(z^k - x^k)+\partial R(z^k)\) from (28). Using the definition of \(\mathcal{P}_x\) from (23), this expression leads to

$$\begin{aligned}&H(x^k)x^k + \nu ^k - g(x^k) \in \partial R(z^k)+H(x^k)z^k~~~~{\iff }\\&\quad z^k = \mathcal{P}_{x^k}(x^k + [H(x^k)]^{-1}(\nu ^k-g(x^k))). \end{aligned}$$

Shifting the index from k to \(k+1\), the last expression leads to

$$\begin{aligned} z^{k+1} = \mathcal{P}_{x^{k+1}}(x^{k+1} + [H(x^{k+1})]^{-1}(\nu ^{k+1} - g(x^{k+1}))). \end{aligned}$$
(62)

Next, if we denote by \(r_{x^k}(z^k):=g(x^k)+H(x^k)(z^k-x^k)\), then again from (28) and (23), we can rewrite

$$\begin{aligned}&\nu ^k-r_{x^k}(z^k) \in \partial R(z^k) \nonumber \\&\quad \iff z^k+[H(x^{k+1})]^{-1}(\nu ^k-r_{x^k}(z^k)) \in z^k+[H(x^{k+1})]^{-1}\partial R(z^k)\nonumber \\&\quad \iff z^k = \mathcal{P}_{x^{k+1}}(z^k+[H(x^{k+1})]^{-1}(\nu ^k-r_{x^k}(z^k))). \end{aligned}$$
(63)

Denote \(H_k := H(x^k)\), \(f_k':=\nabla f(x^k)\), and \(g_k := g(x^k)\) for simplicity. By the triangle inequality, we have

$$\begin{aligned} \lambda _{k+1}=\vert\!\Vert z^{k+1} - x^{k+1}\Vert\!\vert_{x^{k+1}} \le \vert\!\Vert x^{k+1} - z^{k}\Vert\!\vert_{x^{k+1}} + \vert\!\Vert z^{k+1} - z^{k}\Vert\!\vert_{x^{k+1}}. \end{aligned}$$
(64)

To upper bound \(\lambda _{k+1}\), we upper bound each term of (64) as follows.

(a) For the first term \(\vert\!\Vert x^{k+1}-z^k\Vert\!\vert_{x^{k+1}}\) of (64), since f is standard self-concordant, by (10) and [23, Theorem 4.1.5], we have

$$\begin{aligned} \vert\!\Vert x^{k+1}-z^k\Vert\!\vert_{x^{k+1}}&\overset{(10)}{\le } (1+\delta _3)\Vert x^{k+1}-z^k\Vert _{x^{k+1}} \\&\overset{{[23, \text{ Theorem } 4.1.5]}}{\le } \frac{1}{1-\Vert x^{k+1}-x^k\Vert _{x^k}}\cdot (1+\delta _3)\Vert x^{k+1}-z^k\Vert _{x^{k}} \\&\overset{(10)}{\le } \frac{1}{1-\tfrac{1}{1-\delta _3}\vert\!\Vert x^{k+1}-x^k \Vert\!\vert_{x^k}} \cdot \frac{(1+\delta _3)\vert\!\Vert x^{k+1}-z^k \Vert\!\vert_{x^{k}}}{1-\delta _3} \\&= \frac{(1+\delta _3)\vert\!\Vert x^{k+1}-z^k \Vert\!\vert_{x^{k}}}{1-\delta _3 - \vert\!\Vert x^{k+1}-x^k \Vert\!\vert_{x^k}}. \end{aligned}$$

Since \(\alpha _k \in [0, 1]\), \(\lambda _k := \vert\!\Vert d^k \Vert\!\vert_{x^k}\), and \(x^{k+1} := x^k + \alpha _k(z^k - x^k) = x^k + \alpha _kd^k\) due to (iPNA), we have

$$\begin{aligned} \left\{ \begin{array}{ll} \vert\!\Vert x^{k+1}-z^k \Vert\!\vert_{x^k} &{}= \vert\!\Vert (1-\alpha _k)(z^k - x^k) \Vert\!\vert_{x^k} = (1-\alpha _k)\vert\!\Vert d^k \Vert\!\vert_{x^k} = (1-\alpha _k)\lambda _k, \\ \vert\!\Vert x^{k+1} - x^k \Vert\!\vert_{x^k} &{}= \vert\!\Vert x^k + \alpha _k(z^k - x^k) - x^k \Vert\!\vert_{x^k} = \alpha _k\vert\!\Vert z^k - x^k \Vert\!\vert_{x^k} = \alpha _k\vert\!\Vert d^k \Vert\!\vert_{x^k} = \alpha _k\lambda _k. \end{array}\right. \end{aligned}$$
(65)

Substituting (65) into the last estimate, we obtain

$$\begin{aligned} \vert\!\Vert x^{k+1}-z^k \Vert\!\vert_{x^{k+1}} \le \frac{(1+\delta _3)(1-\alpha _k)\lambda _k}{1-\delta _3 - \alpha _k\lambda _k}. \end{aligned}$$
(66)

(b) For the second term \(\vert\!\Vert z^{k+1} - z^{k} \Vert\!\vert_{x^{k+1}}\) of (64), using (62), (63), the triangle inequality in (i), and the nonexpansiveness of the scaled proximal operator \(\mathcal{P}_{x}\) from (25), we can show that

$$\begin{aligned}&\vert\!\Vert z^{k+1} - z^{k} \Vert\!\vert_{x^{k+1}} \nonumber \\&\quad \overset{(62),(63)}{=} \vert\!\Vert \mathcal{P}_{x^{k+1}}(x^{k+1} + H_{k+1}^{-1}(\nu ^{k+1}-g_{k+1}))-\mathcal{P}_{x^{k+1}}(z^k+H_{k+1}^{-1}(\nu ^k-r_{x^k}(z^k))) \Vert\!\vert_{x^{k+1}}\nonumber \\&\quad \overset{(25)}{\le } \vert\!\Vert (x^{k+1} + H_{k+1}^{-1}(\nu ^{k+1}-g_{k+1})-(z^k+H_{k+1}^{-1}(\nu ^k - r_{x^k}(z^k))) \Vert\!\vert_{x^{k+1}}\nonumber \\&\quad \overset{(3)}{=} \vert\!\Vert (H_{k+1}-H_k)(x^{k+1} - z^k)-(g_{k+1}-g_k-H_k(x^{k+1}-x^k)) + (\nu ^{k+1}-\nu ^k) \Vert\!\vert_{x^{k+1}}^{*} \nonumber \\&\quad \overset{(i)}{\le } \underbrace{ \vert\!\Vert (H_{k+1}-H_k)(x^{k+1} - z^k)-(g_{k+1}-g_k-H_k(x^{k+1}-x^k)) \Vert\!\vert_{x^{k+1}}^{*} }_{[\mathcal{T}_1]} \nonumber \\&\qquad + \underbrace{ \vert\!\Vert \nu ^{k+1}-\nu ^k \Vert\!\vert_{x^{k+1}}^{*} }_{[\mathcal{T}_2]}. \end{aligned}$$
(67)

To further estimate the last term \([\mathcal{T}_2]\) of (67), we have

$$\begin{aligned} \vert\!\Vert \nu ^{k} \Vert\!\vert_{x^{k+1}}^{*}&\overset{(10)}{\le } \frac{1}{1-\delta _3}\Vert \nu ^k\Vert _{x^{k+1}}^{*} \overset{{[23, \text { Theorem } 4.1.6]}}{\le } \frac{\Vert \nu ^k\Vert _{x^{k}}^{*} }{(1-\delta _3)(1-\Vert x^{k+1}-x^k\Vert _{x^k})}\\&\overset{(10)}{\le } \frac{(1+\delta _3)\vert\!\Vert \nu ^k \Vert\!\vert_{x^k}^{*}}{(1 - \delta _3)\left( 1 - \frac{1}{1-\delta _3} \vert\!\Vert x^{k+1}-x^k\Vert\!\vert_{x^k}\right) } \\&\overset{(65)}{=} \frac{(1 + \delta _3)}{(1-\delta _3 - \alpha _k\lambda _k)} \vert\!\Vert\nu ^k\Vert\!\vert\_{x^k}^{*}. \end{aligned}$$

Utilizing this estimate and the triangle inequality, we can estimate the term \([\mathcal{T}_2]\) of (67) as

$$\begin{aligned}{}[\mathcal{T}_2]&:= \vert\!\Vert \nu ^{k+1} \!-\! \nu ^{k} \Vert\!\vert_{x^{k+1}}^{*} \le \vert\!\Vert \nu ^{k+1} \Vert\!\vert_{x^{k+1}}^{*} + \vert\!\Vert \nu ^{k} \Vert\!\vert_{x^{k+1}}^{*} \nonumber \\&\le \vert\!\Vert \nu ^{k+1} \Vert\!\vert_{x^{k+1}}^{*} + \frac{(1 + \delta _3)}{(1-\delta _3 - \alpha _k\lambda _k)}\vert\!\Vert \nu ^k \Vert\!\vert_{x^k}^{*} \overset{(28)}{\le } \delta _4\lambda _{k+1}+\frac{(1 + \delta _3)\delta _4\lambda _k}{(1-\delta _3 - \alpha _k\lambda _k)}. \end{aligned}$$
(68)

Now, using triangle inequality, we can split the term \([\mathcal{T}_1]\) of (67) as

$$\begin{aligned}{}[\mathcal{T}_1]:= \;& {} \vert\!\Vert (H_{k+1}-H_k)(x^{k+1} - z^k)-(g_{k+1}-g_k-H_k(x^{k+1}-x^k)) \Vert\!\vert_{x^{k+1}}^{*}\nonumber \\\le \;& {} \vert\!\Vert H_{k+1}(x^{k+1}-z^k) \Vert\!\vert_{x^{k+1}}^{*} + \vert\!\Vert H_k(x^{k+1} \!-\! z^k) \Vert\!\vert_{x^{k+1}}^{*} + \vert\!\Vert f_{k+1}' \nonumber \\&- g_{k+1} \Vert\!\vert^{*}_{x^{k+1}} + \vert\!\Vert {f_k'-g_k} \Vert\!\vert^{*}_{x^{k+1}}\nonumber \\&+ \vert\!\Vert f_{k+1}'-f_k'-\nabla ^2 f(x^k)(x^{k+1}-x^k) \Vert\!\vert^{*}_{x^{k+1}} \nonumber \\&+ \vert\!\Vert (H_k-\nabla ^2 f(x^k))(x^{k+1}-x^k) \Vert\!\vert^{*}_{x^{k+1}}. \end{aligned}$$
(69)

In addition, using the left-hand side inequality in the first line of (11) with \(x := x^k\) and \(y := x^{k+1}\), we have

$$\begin{aligned} \vert\!\Vert \cdot \Vert\!\vert_{x^{k+1}}^{*} \overset{(11)}{\le } \frac{(1+\delta _3)}{(1-\delta _3 - \vert\!\Vert x^{k+1}-x^k \Vert\!\vert_{x^k})}\vert\!\Vert \cdot \Vert\!\vert_{x^k}^{*} \overset{(65)}{=} \frac{(1+\delta _3)}{(1-\delta _3 - \alpha _k\lambda _k)}\vert\!\Vert \cdot \Vert\!\vert_{x^k}^{*}. \end{aligned}$$
(70)

To estimate each term of (69), we first note that

$$\begin{aligned} \vert\!\Vert H_{k+1}(x^{k+1}-z^k) \Vert\!\vert_{x^{k+1}}^{*} \overset{(3)}{=} \vert\!\Vert x^{k+1}-z^k \Vert\!\vert_{x^{k+1}} \overset{(66)}{\le } \frac{(1+\delta _3)(1-\alpha _k)\lambda _k}{1-\delta _3 - \alpha _k\lambda _k}. \end{aligned}$$
(71)

Second, using (70) and \(\vert\!\Vert H_kd^k \Vert\!\vert_{x^k}^{*} = \vert\!\Vert d^k \Vert\!\vert_{x^k} = \lambda _k\), we can show that

$$\begin{aligned} \vert\!\Vert H_k(x^{k+1} - z^k)\Vert\!\vert_{x^{k+1}}^{*}&= (1-\alpha _k)\vert\!\Vert H_kd^k \Vert\!\vert_{x^{k+1}}^{*} \nonumber \\&\overset{(70)}{\le } \frac{(1+\delta _3)(1-\alpha _k)}{(1- \delta _3 - \vert\!\Vert x^{k+1} - x^k \Vert\!\vert_{x^k})}\vert\!\Vert H_kd^k \Vert\!\vert_{x^k}^{*} \nonumber \\&\overset{(71)}{\le } \frac{(1+\delta _3)(1-\alpha _k)\lambda _k}{(1- \delta _3 - \alpha _k\lambda _k)}. \end{aligned}$$
(72)

Third, by (6) and (70), we have

$$\begin{aligned} \left\{ \begin{array}{ll} \vert\!\Vert f_{k+1}' - g_{k+1} \Vert\!\vert^{*}_{x^{k+1}} &{}\overset{(6)}{\le } \delta _2\\ \vert\!\Vert {f_k'-g_k} \Vert\!\vert^{*}_{x^{k+1}} &{}\overset{(70)}{\le } \frac{(1+\delta _3)}{(1- \delta _3 - \alpha _k\lambda _k)}\vert\!\Vert {f_k'-g_k} \Vert\!\vert^{*}_{x^{k}} \overset{(6)}{\le } \frac{(1+\delta _3)\delta _2}{(1- \delta _3 - \alpha _k\lambda _k)}. \end{array}\right. \end{aligned}$$
(73)

Fourth, utilizing (70) and [23, Theorem 4.1.14], we can show that

$$\begin{aligned}&\vert\!\Vert f_{k+1}'-f_k'-\nabla ^2 f(x^k)(x^{k+1}-x^k) \Vert\!\vert^{*}_{x^{k+1}}\nonumber \\&\quad \overset{(10)}{\le } \frac{1}{(1-\delta _3)}\Vert f_{k+1}'-f_k'-\nabla ^2 f(x^k)(x^{k+1}-x^k)\Vert ^{*}_{x^{k+1}} \nonumber \\&\quad \overset{{[23, \text { Theorem } 4.1.14]}}{\le } \frac{1}{(1-\delta _3)}\left( \frac{\Vert x^{k+1}-x^k\Vert _{x^k}}{1-\Vert x^{k+1}-x^k\Vert _{x_k}}\right) ^2 \nonumber \\&\quad \overset{(10)}{\le } \frac{1}{(1-\delta _3)}\left( \frac{\vert\!\Vert x^{k+1}-x^k \Vert\!\vert_{x^k}}{1-\delta _3 - \vert\!\Vert x^{k+1}-x^k \Vert\!\vert_{x_k}}\right) ^2 \nonumber \\&\quad \overset{(65)}{=} \frac{1}{(1-\delta _3)}\left( \frac{\alpha _k\lambda _k}{1-\delta _3 - \alpha _k\lambda _k}\right) ^2. \end{aligned}$$
(74)

Fifth, employing the second inequality of (11) with \(x := x^k\), \(y := x^{k+1}\), and \(v := x^{k+1} - x^k\), we can show that

$$\begin{aligned}&\vert\!\Vert(H_k-\nabla ^2 f(x^k))(x^{k+1}-x^k)\Vert\!\vert^{*}_{x^{k+1}}\nonumber \\&\quad \overset{(11)}{\le } \frac{(2\delta _3 + \delta _3^2)}{(1-\delta _3)(1 -\delta _3 - \vert\!\Vert x^{k+1} - x^k \Vert\!\vert_{x^k})} \vert\!\Vert x^{k+1} - x^k \Vert\!\vert_{x^k} \nonumber \\&\quad \overset{(65)}{=} \frac{(2 + \delta _3)\delta _3\alpha _k\lambda _k}{(1-\delta _3)(1 -\delta _3 - \alpha _k\lambda _k)}. \end{aligned}$$
(75)

Finally, substituting (71), (72), (73), (74), and (75) into (69), we can upper bound \([\mathcal{T}_1]\) as

$$\begin{aligned}{}[\mathcal{T}_1]\le & {} \frac{(1+\delta _3)(1-\alpha _k)}{1-\delta _3 - \alpha _k\lambda _k}\lambda _k + \frac{(1 + \delta _3)(1-\alpha _k)}{(1-\delta _3 - \alpha _k\lambda _k)}\lambda _k + \delta _2 + \frac{(1 + \delta _3)\delta _2}{(1-\delta _3 - \alpha _k\lambda _k)}\\&\quad + \frac{1}{1-\delta _3}\cdot \left( \frac{\alpha _k\lambda _k}{1-\delta _3 - \alpha _k\lambda _k}\right) ^2 + \frac{(2+\delta _3)\delta _3}{1-\delta _3}\cdot \frac{\alpha _k\lambda _k}{1-\delta _3 - \alpha _k\lambda _k}. \end{aligned}$$

Plugging this upper bound of \([\mathcal{T}_1]\) and the upper bound of \([\mathcal{T}_2]\) from (68) into (67), we obtain

$$\begin{aligned} \vert\!\Vert z^{k+1} - z^{k} \Vert\!\vert_{x^{k+1}}\le \;& {} \frac{2(1+\delta _3)(1-\alpha _k)}{1-\delta _3 - \alpha _k\lambda _k}\lambda _k + \delta _2 + \frac{(1 + \delta _3)\delta _2}{(1-\delta _3 - \alpha _k\lambda _k)} \\&+\, \frac{1}{1-\delta _3} \cdot \left( \frac{\alpha _k\lambda _k}{1-\delta _3 - \alpha _k\lambda _k}\right) ^2 \\&+\, \frac{(2+\delta _3)\delta _3}{1-\delta _3} \cdot \frac{\alpha _k\lambda _k}{1-\delta _3 - \alpha _k\lambda _k} \\&+\, {~}\delta _4\lambda _{k+1} + \frac{(1 + \delta _3)\delta _4\lambda _k}{(1-\delta _3 - \alpha _k\lambda _k)}. \end{aligned}$$

Substituting this estimate and (66) back into (64) we get

$$\begin{aligned} \lambda _{k+1}&\le \delta _4\lambda _{k+1} + \delta _2 + \frac{3(1+\delta _3)(1-\alpha _k)\lambda _k}{1-\delta _3 - \alpha _k\lambda _k} + \frac{(1 + \delta _3)\delta _2}{(1-\delta _3 - \alpha _k\lambda _k)}\\&\quad + \frac{1}{1-\delta _3} \cdot \left( \frac{\alpha _k\lambda _k}{1-\delta _3 - \alpha _k\lambda _k}\right) ^2 \\&\quad +{~} \frac{(2+\delta _3)\delta _3}{1-\delta _3} \cdot \frac{\alpha _k\lambda _k}{1-\delta _3 - \alpha _k\lambda _k} + \frac{(1 + \delta _3)\delta _4\lambda _k}{(1-\delta _3 - \alpha _k\lambda _k)}. \end{aligned}$$

Since \(0< 1 - \delta _4 < 1\), rearranging this estimate, we obtain (36). \(\square \)

1.6 A.6 Detailed proofs of the missing technical results in the main text

In this subsection, we provide more details of some missing proofs in the main text.

(a) Technical details in the proof of Theorems 3and 4: Let us denote the right-hand side of (36) by

$$\begin{aligned} H(\alpha _k, \lambda _k, \theta )&:= \frac{\delta _2}{1-\delta _4} + \frac{(1+\delta _3)\left[ \delta _2 + \left( \delta _4 + 3(1-\alpha _k)\right) \lambda _k\right] }{(1-\delta _4)\left( 1 - \delta _3 - \alpha _k\lambda _k\right) }\\&\quad + \frac{\alpha _k(2+\delta _3)\delta _3\lambda _k}{(1-\delta _4)(1-\delta _3)\left( 1 - \delta _3 - \alpha _k\lambda _k\right) } \\&\quad + {~} \frac{\alpha _k^2\lambda _k^2}{(1-\delta _4)(1-\delta _3)\left( 1 - \delta _3 - \alpha _k\lambda _k\right) ^2}, \end{aligned}$$

where \(\lambda _k, \delta _2 \ge 0\), \(\alpha _k \in [0, 1]\), \(\delta _3, \delta _4 \in [0, 1)\), \(\alpha _k\lambda _k + \delta _3 < 1\), and \(\theta := (\delta _2, \delta _3, \delta _4)\).

If \(\alpha _k = 1\), then \(H(\cdot )\) reduces to

$$\begin{aligned} H_1(\lambda _k, \theta )&:= \frac{\delta _2}{1-\delta _4} + \frac{(1+\delta _3)\left( \delta _2 + \delta _4\lambda _k\right) }{(1-\delta _4)\left( 1 - \delta _3 - \lambda _k\right) } + \frac{(2+\delta _3)\delta _3\lambda _k}{(1-\delta _4)(1-\delta _3)\left( 1 - \delta _3 - \lambda _k\right) } \nonumber \\&\quad + \frac{\lambda _k^2}{(1-\delta _4)(1-\delta _3)\left( 1 - \delta _3 - \lambda _k\right) ^2}. \end{aligned}$$
(76)

If \(\alpha _k = \frac{1-\delta _4}{(1+\delta )(1+\delta + (1-\delta )\lambda _k)} \in [0, 1]\), then \(H(\cdot )\) can be rewritten as

$$\begin{aligned} H_2(\alpha _k,\lambda _k, \delta , \theta )&:= \frac{\delta _2}{1-\delta _4} + \frac{(1+\delta _3)\left( \delta _2 + \delta _4\lambda _k\right) }{(1-\delta _4)\left( 1 - \delta _3 - \alpha _k\lambda _k\right) }\nonumber \\&\quad + \frac{3(1+\delta _3)\lambda _k}{\left( 1 - \delta _3 - \alpha _k\lambda _k\right) }\cdot \frac{2\delta + \delta ^2 + \delta _4 + (1-\delta ^2)\lambda _k}{(1+\delta )(1+\delta + (1-\delta )\lambda _k)}\nonumber \\&\quad + \frac{\alpha _k(2+\delta _3)\delta _3\lambda _k}{(1-\delta _4)(1-\delta _3)\left( 1 - \delta _3 - \alpha _k\lambda _k\right) } \nonumber \\&\quad + \frac{\alpha _k^2\lambda _k^2}{(1-\delta _4)(1-\delta _3)\left( 1 - \delta _3 - \alpha _k\lambda _k\right) ^2}. \end{aligned}$$
(77)

The following lemma is used to prove Theorems 3 and 4 in the main text.

Lemma 6

The function\(H_1(\cdot )\)defined by (76) is monotonically increasing w.r.t. each variable\(\lambda _k \ge 0\), \(\delta _2 \ge 0\), \(\delta _3 \in [0, 1)\), and\(\delta _4\in [0, 1)\)such that\(\lambda _k + \delta _3 < 1\).

Similarly, for given\(\lambda _k > 0\)and\(\delta \in [0, 1)\), the function\(H_2(\cdot )\)defined by (77) is monotonically increasing w.r.t. each variable\(\alpha _k \in [0, 1]\), \(\delta _2 \ge 0\), \(\delta _3 \in [0, 1)\), and\(\delta _4\in [0, 1)\)such that\(\alpha _k\lambda _k + \delta _3 < 1\). Moreover, if\(0 \le \delta _3, \delta _4 \le \delta \), then we can upper bound\(H_2\)as\(H_2(\alpha _k,\lambda _k, \delta , \theta ) \le {\widehat{H}}_2(\lambda _k, \delta , \delta _2)\), where

$$\begin{aligned} {\widehat{H}}_2(\lambda _k, \delta , \delta _2)&:= \frac{\delta _2}{1-\delta } + \frac{(1+\delta )\left( \delta _2 + \delta \lambda _k\right) }{(1-\delta )\left( 1 - \delta - \lambda _k\right) } + \frac{3\lambda _k\left[ 3\delta + \delta ^2 + (1-\delta ^2)\lambda _k\right] }{\left( 1 - \delta - \lambda _k\right) \left[ 1+\delta + (1-\delta )\lambda _k\right] } \\&\quad + \frac{(2+\delta )\delta \lambda _k}{(1-\delta )^2\left( 1 - \delta - \lambda _k\right) } + \frac{\lambda _k^2}{(1-\delta )^2\left( 1 - \delta - \lambda _k\right) ^2}. \end{aligned}$$

The function\({\widehat{H}}_2(\cdot )\)is also monotonically increasing w.r.t. each variable\(\delta _2\)and\(\lambda _k\).

Proof

We first consider \(H_1\) defined by (76). For \(\lambda _k \ge 0\), \(\delta _2 \ge 0\), \(\delta _3 \in [0, 1)\), and \(\delta _4\in [0, 1)\) such that \(\lambda _k + \delta _3 < 1\), the term 1 is \(\frac{\delta _2}{1-\delta _4}\), which is monotonically increasing w.r.t. \(\delta _2\) and \(\delta _4\). The term 2 is monotonically increasing w.r.t. \(\lambda _k\), \(\delta _2\), \(\delta _3\), and \(\delta _4\). The terms 3 and 4 are monotonically increasing w.r.t. \(\lambda _k\), \(\delta _3\), and \(\delta _4\). Consequently, \(H_1(\cdot )\) is monotonically increasing w.r.t. \(\lambda _k\), \(\delta _2\), \(\delta _3\), and \(\delta _4\).

For fixed \(\lambda _k > 0\) and \(\delta \in [0, 1)\), we consider \(H_2(\cdot )\) defined by (77). Clearly, for \(\alpha _k \in [0, 1]\), \(\delta _2 \ge 0\), \(\delta _3 \in [0, 1)\), and \(\delta _4\in [0, 1)\) such that \(\alpha _k\lambda _k + \delta _3 < 1\), the term 1 is monotonically increasing w.r.t. \(\delta _2\) and \(\delta _4\). The term 2 is monotonically increasing w.r.t. \(\lambda _k\), \(\delta _2\), \(\delta _3\), and \(\delta _4\). The terms 3 and 4 are monotonically increasing w.r.t. \(\delta _3\), and \(\delta _4\). Consequently, \(H_2(\cdot )\) is monotonically increasing w.r.t. \(\delta _2\), \(\delta _3\), and \(\delta _4\). Using the upper bound \(\delta \) of \(\delta _3\) and \(\delta _4\) into \(H_2\), we can easily get \(H_2(\alpha _k,\lambda _k, \delta , \theta ) \le {\widehat{H}}_2(\lambda _k, \delta , \delta _2)\). The monotonic increase of \({\widehat{H}}_2\) w.r.t. \(\delta _2\) and \(\lambda _k\) can be easily checked directly by verifying each term separately. \(\square \)

(b) The detailed proof for Example 1(c) in Sect. 3.1: We provide here the detailed proof of the estimate (14) in Example 1(c) of Sect. 3.1.

Since \(f_1(x) = -\ln (x)\), \(f_2(x) = \max \{\delta _1x, \delta _1\}\), and \(f(x) = f_1(x) + f_2(x)\), we have \(\mathrm {dom}(f) = \{x\in {\mathbb{R}}\mid x > 0\}\). Moreover, since \(\nabla ^2{f_1}(x) = H(x) = \frac{1}{x^2}\), the condition \(\vert \!\Vert y - x\vert \!\Vert _{x} < 1\) (here we use \(\delta _0 = 0\)) leads to \(\frac{(y-x)^2}{x^2} < 1\), which is equivalent to \(-x< y-x < x\), or equivalently, \(0< y < 2x\). Since \(y > 0\), the condition \(\Vert y - x\Vert _{x} < 1\) is equivalent to \(0< y < 2x\). In this case, we have

$$\begin{aligned} g_2(x) = \left\{ \begin{array}{ll} 0 &{}\text {if }~x < 1\\ \delta _1 &{}\text {if }~x > 1\\ {[0, \delta _1]} &{}\text {if }~ x = 1. \end{array}\right. \end{aligned}$$

Using this expression, one can show that

$$\begin{aligned}&f_2(y) - f_2(x) - \langle g_2(x), y-x\rangle \\&\quad = \max \{\delta _1y, \delta _1\} - \max \{\delta _1x, \delta _1\} - \delta _1(y-x) \\&\quad \le {\left\{ \begin{array}{ll} \delta _1 - \delta _1 &{}\text {if }~0< x< 1~{\text { and }}~~0< y< 1\\ 2\delta _1 - \delta _1 &{}\text {if }~0< x< 1~\text { and }~~1< y \le 2x< 2\\ \delta _1 - \delta _1x - \delta _1(y-x) &{}\text {if }~x> 1~\text { and }~~0< y \le 1\\ \delta _1y - \delta _1x - \delta _1(y-x) &{}\text {if }~x> 1~\text { and }~~y> 1\\ \delta _1 - \delta _1 - \xi (y-1) &{}\text {if }~x = 1~\text { and }~~0< y \le 1,~~\xi \in [0, \delta _1]\\ 2\delta _1 - \delta _1 - \xi (y-1) &{}\text {if }~x = 1~\text { and }~~1< y \le 2x = 2. \end{array}\right. }\\&\quad \le {\left\{ \begin{array}{ll} 0 &{}\text {if }~0< x< 1~\text { and }~~0< y< 1\\ \delta _1 &{}\text {if }~0< x< 1~\text { and }~~1< y \le 2x< 2\\ \delta _1 &{}\text {if }~x> 1~\text { and }~~0< y \le 1\\ 0 &{}\text {if }~x> 1~\text { and }~~y > 1\\ \delta _1 &{}\text {if }~x = 1~\text { and }~~0< y \le 1,~~\xi \in [0, \delta _1]\\ \delta _1 &{}\text {if }~x = 1~\text { and }~~1 < y \le 2x = 2. \end{array}\right. } \end{aligned}$$

In summary, we get \(f_2(y) - f_2(x) - \langle g_2(x), y-x\rangle \le \delta _1\), which is exactly (14). \(\square \)

1.7 A.7 Implementation details: approximate proximal Newton directions

When solving for \(z^k\) in (iPNA), we use FISTA [1]. At the \(j^{\mathrm {th}}\) iteration of the inner loop, \(d^j\) is computed as

$$\begin{aligned} d^j := \mathrm {prox}_{\alpha R}\left( x^k+w-\alpha (g(x^k)+H(x^k)w)\right) -x^k, \end{aligned}$$

where \(w^j :=d^{j-1}+\frac{t_{j-1}-1}{t_{j}}(d^{j-1}-d^{j-2})\). By the definition of \(\mathrm {prox}_{\alpha R}\), the following relation holds:

$$\begin{aligned} \tfrac{1}{\alpha }(w^j - d^j)\in g(x^k)+H(x^k)w^j +\partial R(x^k+d^j), \end{aligned}$$

which guarantees that the vector \(\nu ^k :=\frac{w^j - d^j}{\alpha }+H(x^k)(d^j - w^j)=\left( \frac{{\mathbb{I}}_p}{\alpha }-H(x^k)\right) (w^j - d^j)\) satisfies the condition \(\nu ^k \in g(x^k)+H(x^k)(d^j)+\partial R(x^k+d^j)\). In our implementation, this \(\nu ^k\) was used in (28) to determine whether to accept this \(d^k := d^j\) as an inexact proximal Newton direction at the iteration k in (iPNA).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, T., Necoara, I. & Tran-Dinh, Q. Composite convex optimization with global and local inexact oracles. Comput Optim Appl 76, 69–124 (2020). https://doi.org/10.1007/s10589-020-00174-2

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-020-00174-2

Keywords

Mathematics Subject Classification

Navigation