Skip to main content
Log in

A note on solving nonlinear optimization problems in variable precision

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

This short note considers an efficient variant of the trust-region algorithm with dynamic accuracy proposed by Carter (SIAM J Sci Stat Comput 14(2):368–388, 1993) and by Conn et al. (Trust-region methods. MPS-SIAM series on optimization, SIAM, Philadelphia, 2000) as a tool for very high-performance computing, an area where it is critical to allow multi-precision computations for keeping the energy dissipation under control. Numerical experiments are presented indicating that the use of the considered method can bring substantial savings in objective function’s and gradient’s evaluation “energy costs” by efficiently exploiting multi-precision computations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. The solution of nonlinear systems of equations is considered rather than unconstrained optimization.

  2. Numerical experiments not reported here suggest that our default choice of remembering 15 secant pairs gives good performance, although keeping a smaller number of pairs is still acceptable.

  3. Carter [10] requires \(\omega _g \le 1-\eta _2\) while we require \(\omega _g\le \kappa _g\) with \(\kappa _g\) satisfying (2.5). A fixed value is also used for \(\omega _f\), whose upper bound depends on \(\omega _g\). The Hessian approximation is computed using an unsafeguarded standard BFGS update.

  4. The collection of [8] and a few other problems, all available in Matlab.

  5. Remember it is proportional to the square of the number of significant digits.

References

  1. Baboulin, M., Buttari, A., Dongarra, J., Kurzak, J., Langou, J., Luszczek, P., Tomov, S.: Accelerating scientific computations with mixed precision algorithms. Comput. Phys. Commun. 180, 2526–2533 (2009)

    Article  Google Scholar 

  2. Bellavia, S., Gratton, S., Riccietti, E.: A Levenberg–Marquardt method for large nonlinear least-squares problems with dynamic accuracy in functions and gradients. Numer. Math. 140, 791–825 (2018)

    Article  MathSciNet  Google Scholar 

  3. Bellavia, S., Gurioli, G., Morini, B.: Theoretical study of an adaptive cubic regularization method with dynamic inexact Hessian information (2018). arXiv:1808.06239

  4. Bellavia, S., Gurioli, G., Morini, B., Toint, Ph.L.: Adaptive regularization algorithms with inexact evaluations for nonconvex optimization. SIAM J. Optim. 29(4), 2881–2915 (2019)

    Article  MathSciNet  Google Scholar 

  5. Bergou, E., Diouane, Y., Kungurtsev, V., Royer, C.W.: A subsampling line-search method with second-order results (2018). arXiv:1810.07211

  6. Blanchet, J., Cartis, C., Menickelly, M., Scheinberg, K.: Convergence rate analysis of a stochastic trust region method via supermartingales. INFORMS J. Optim. 1(2), 92–119 (2019)

    Article  Google Scholar 

  7. Brown, A.A., Bartholomew-Biggs, M.: Some effective methods for unconstrained optimization based on the solution of ordinary differential equations. Technical Report Technical Report 178, Hatfield Polytechnic, Hatfield, UK (1987)

  8. Buckley, A.G.: Test functions for unconstrained minimization. Technical Report CS-3, Computing Science Division, Dalhousie University, Dalhousie, Canada (1989)

  9. Carter, R.G.: A worst-case example using linesearch methods for numerical optimization with inexact gradient evaluations. Technical Report MCS-P283-1291, Argonne National Laboratory, Argonne, USA (1991)

  10. Carter, R.G.: Numerical experience with a class of algorithms for nonlinear optimization using inexact function and gradient information. SIAM J. Sci. Stat. Comput. 14(2), 368–388 (1993)

    Article  MathSciNet  Google Scholar 

  11. Cartis, C., Gould, N.I.M., Toint, Ph.L.: Worst-case evaluation complexity and optimality of second-order methods for nonconvex smooth optimization. In: The Proceedings of the 2018 International Conference of Mathematicians (ICM 2018), Rio de Janeiro (2018)

  12. Cartis, C., Scheinberg, K.: Global convergence rate analysis of unconstrained optimization methods based on probabilistic models. Math. Program. Ser. A 159(2), 337–375 (2018)

    Article  MathSciNet  Google Scholar 

  13. Chen, X., Jiang, B., Lin, T., Zhang, S.: On adaptive cubic regularization Newton’s methods for convex optimization via random sampling (2018). arXiv:1802.05426

  14. Conn, A.R., Gould, N.I.M., Lescrenier, M., Toint, Ph.L.: Performance of a multifrontal scheme for partially separable optimization. In: Gomez, S., Hennart, J.P. (eds.) Advances in Optimization and Numerical Analysis. Proceedings of the Sixth Workshop on Optimization and Numerical Analysis, Oaxaca, Mexico, vol. 275, pp. 79–96. Kluwer Academic Publishers, Dordrecht (1994)

  15. Conn, A.R., Gould, N.I.M., Toint, Ph.L.: LANCELOT: a Fortran package for large-scale nonlinear optimization (Release A). Number 17 in Springer Series in Computational Mathematics. Springer, Heidelberg (1992)

  16. Conn, A.R., Gould, N.I.M., Toint, Ph.L.: Trust-Region Methods. MPS-SIAM Series on Optimization. SIAM, Philadelphia (2000)

    Google Scholar 

  17. Dixon, L.C.W., Maany, Z.: A family of test problems with sparse Hessian for unconstrained optimization. Technical Report 206, Numerical Optimization Center, Hatfield Polytechnic, Hatfield, UK (1988)

  18. Elster, C., Neumaier, A.: A method of trust region type for minimizing noisy functions. Computing 58(1), 31–46 (1997)

    Article  MathSciNet  Google Scholar 

  19. Galal, S., Horowitz, M.: Energy-efficient floating-point unit design. IEEE Trans. Comput. 60(7), 913–922 (2011)

    Article  MathSciNet  Google Scholar 

  20. Gould, N.I.M., Orban, D., Toint, Ph.L.: CUTEst: a constrained and unconstrained testing environment with safe threads for mathematical optimization. Comput. Optim. Appl. 60(3), 545–557 (2015)

    Article  MathSciNet  Google Scholar 

  21. Griewank, A., Toint, Ph.L.: Partitioned variable metric updates for large structured optimization problems. Numer. Math. 39, 119–137 (1982)

    Article  MathSciNet  Google Scholar 

  22. Higham, N.J.: The rise of multiprecision computations. Talk at SAMSI 2017. https://bit.ly/higham-samsi17(2017)

  23. Kugler, L.: Is “good enough” computing good enough? Commun. ACM 58, 12–14 (2015)

    Article  Google Scholar 

  24. Leyffer, S., Wild, S., Fagan, M., Snir, M., Palem, K., Yoshii, K., Finkel, H.: Moore with less—leapgrogging Moore’s law with inexactness for supercomputing (2016). arXiv:1610.02606v2 (to appear in Proceedings of PMES 2018: 3rd International Workshop on Post Moore’s Era Supercomputing)

  25. Li, G.: The secant/finite difference algorithm for solving sparse nonlinear systems of equations. SIAM J. Numer. Anal. 25(5), 1181–1196 (1988)

    Article  MathSciNet  Google Scholar 

  26. Matsuoka, S.: Private communication (2018)

  27. Moré, J.J., Garbow, B.S., Hillstrom, K.E.: Testing unconstrained optimization software. ACM Trans. Math. Softw. 7(1), 17–41 (1981)

    Article  MathSciNet  Google Scholar 

  28. Nocedal, J., Wright, S.J.: Numerical Optimization. Series in Operations Research. Springer, Heidelberg (1999)

    Book  Google Scholar 

  29. Palem, K.V.: Inexactness and a future of computing. Philos. Trans. R. Soc. A 372, 20130281 (2014)

    Article  MathSciNet  Google Scholar 

  30. Poenisch, G., Schwetlick, H.: Computing turning points of curves implicitly defined by nonlinear equations depending on a parameter. Computing 20, 101–121 (1981)

    MathSciNet  Google Scholar 

  31. Pu, J., Galal, S., Yang, X., Shacham, O., Horowitz, M.: FPMax: a 106GFLOPS/W at 217GFLOPS/mm2 single-precision FPU, and a 43.7 GFLOPS/W at 74.6 GFLOPS/mm2 double-precision FPU, in 28nm UTBB FDSOI. In: Hardware Architecture (2016)

  32. Schmidt, J.W., Vetters, K.: Albeitungsfreie verfahren fur nichtlineare optimierungsproblem. Numer. Math. 15, 263–282 (1970)

    Article  MathSciNet  Google Scholar 

  33. Spedicato, E.: Computational experience with quasi-Newton algorithms for minimization problems of moderately large size. Technical Report CISE-N-175, CISE Documentation Service, Segrate, Milano (1975)

  34. Wang, N., Choi, J., Brand, D., Chen, C.-Y., Gopalakrishnan, K.: Training deep neural networks with 8-bit floating point numbers. In: 32nd Conference on Neural Information Processing Systems (2018). arXiv:1812.08011

  35. Xu, P., Roosta-Khorasani, F., Mahoney, M.W.: Newton-type methods for non-convex optimization under inexact Hessian information (2017). arXiv:1708.07164v3

Download references

Acknowledgements

S. Gratton: Partially supported by 3IA Artificial and Natural Intelligence Toulouse Institute, French “Investing for the Future - PIA3” program under the Grant Agreement ANR-19-PI3A-0004. P. L. Toint: Partially supported by ANR-11-LABX-0040-CIMI within the program ANR-11-IDEX-0002-02.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. Gratton.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Complexity theory for the TR1DA algorithm

Appendix: Complexity theory for the TR1DA algorithm

For the sake of accuracy and completeness, we now provide details of the first-order worst-case complexity analyis summarized at the end of Sect. 2. As indicated there, the following development can be seen as a combination of the arguments proposed by [16] for the convergence theory of trust-region methods with inexact gradients (pp. 280) and dynamic accuracy (pp. 400).

We assume that

AS.1::

The objective function f is twice continuously differentiable in \(\mathfrak {R}^n\) and there exist a constant \(\kappa _\nabla \ge 0\) such that \(\Vert \nabla _x^2f(x)\Vert \le \kappa _\nabla\) for all \(x \in \mathfrak {R}^n\).

AS.2::

There exists a constant \(\kappa _H\ge 0\) such that \(\Vert H_k\Vert \le \kappa _H\) for all \(k\ge 0\).

AS.3:

There exists a constant \(\kappa _{\mathrm{low}}\) such that \(f(x)\ge \kappa _{\mathrm{low}}\) for all \(x\in \mathfrak {R}^n\).

Lemma A.1

Suppose AS.1 and AS.2 hold. Then, for each \(k\ge 0,\)

$$\begin{aligned} |f(x_k+s_k)-m(x_k,s_k)| \le |f_k-f(x_k)| +\kappa _g \Vert \overline{g}(x_k,\omega _{g,k})\Vert \Delta _k + \kappa _{H\nabla }\Delta _k^2. \end{aligned}$$
(A.1)

for \(\kappa _{H\nabla } = 1+\max [\kappa _H, \kappa _\nabla ].\)

Proof

(See [16, Theorem 8.4.2].) The definition (2.8), (2.6), the mean-value theorem, the Cauchy–Schwarz inequality and AS.1 give that, for some \(\xi _k\) in the segment \([x_k,x_k+s_k]\),

$$\begin{aligned} |f(x_k+s_k)-m(x_k,s_k)| &\le |f_k-f(x_k)| +|s_k^T(\nabla _x^1f(x_k)-\overline{g}(x_k,\omega _{g,k})| \\&\quad+ { \frac{1}{2}}|s_k^T\nabla _x^2f(\xi _k)s_k| + { \frac{1}{2}}|s_k^TH_ks_k| \\ &\le |f_k-f(x_k)| +\kappa _g \Vert \overline{g}(x_k,\omega _{g,k})\Vert \, \Vert s_k\Vert + { \frac{1}{2}}(\kappa _H + \kappa _\nabla )\Vert s_k\Vert ^2 \end{aligned}$$

and (A.1) follows from the the Cauchy–Schwarz inequality and the inequality \(\Vert s_k\Vert \le \Delta _k\). \(\square\)

Lemma A.2

We have that, for all \(k\ge 0,\)

$$\begin{aligned} \max \left[ |f_k-f(x_k)|,|f_k^+ - f(x_k+s_k)|\right] \le \eta _0\left[ m(x_k,0)-m(x_k,s_k)\right] \end{aligned}$$
(A.2)

and

$$\begin{aligned} \rho _k \ge \eta _1 \;\; \text{ implies } \text{ that } \;\; \frac{f(x_k)-f(x_k+s_k)}{m(x_k,0)-m(x_k,s_k)} \ge \eta _1 - 2\eta _0 >0. \end{aligned}$$
(A.3)

Proof

(See [16, p. 401].) The mechanism of the TR1DA algorithm ensures that (A.2) holds. Hence,

$$\begin{aligned} \frac{\left[ f_k-f(x_k)\right] +\left[ |f_k^+ -f(x_k+s_k)\right] }{m(x_k,0)-m(x_k,s_k)} \le 2 \eta _0. \end{aligned}$$

As a consequence, for iterations where \(\rho _k \ge \eta _1\),

$$\begin{aligned} \rho _k = \frac{f_k-f_k^+}{m(x_k,0)-m(x_k,s_k)} = \frac{f(x_k)-f(x_k+s_k)}{m(x_k,0)-m(x_k,s_k)} + \frac{\left[ f_k-f(x_k)\right] +\left[ |f_k^+-f(x_k+s_k)\right] }{m(x_k,0)-m(x_k,s_k)} \end{aligned}$$

and (A.3) must hold. \(\square\)

This result implies, in particular, that the sequence \(\{f(x_k)\}_{k\ge 0}\) is non-increasing, and the TR1DA algorithm is therefore monotone on the exact function f.

Lemma A.3

Suppose AS.1 and AS.2 hold, and that \(\overline{g}(x_k,\omega _{g,k})\ne 0.\) Then

$$\begin{aligned} \Delta _k \le \frac{\Vert \overline{g}(x_k,\omega _{g,k})\Vert }{2\kappa _{H\nabla }} \Big [ { \frac{1}{2}}(1-\eta _1)-\eta _0-\kappa _g\Big ] \;\; \text{ implies } \text{ that } \;\; \Delta _{k+1} \ge \Delta _k. \end{aligned}$$
(A.4)

Proof

(See [16, Theorem 8.4.3].) Since (2.5) implies that \({ \frac{1}{2}}(1-\eta _1)-\eta _0-\kappa _g \in (0,1)\) the first part of (A.4) then gives that \(\Delta _k < \Vert \overline{g}(x_k,\omega _{g,k})\Vert\, /\, \kappa _{H\nabla }\). Hence the inequality \(1+\Vert H_k\Vert \le \kappa _{H\nabla }\) and (2.9) yield that

$$\begin{aligned} m(x_k,0)-m(x_k,s_k) \ge { \frac{1}{2}}\Vert \overline{g}(x_k,\omega _{g,k})\Vert \min \left[ \frac{\Vert \overline{g}(x_k,\omega _{g,k})\Vert }{\kappa _{H\nabla }},\Delta _k\right] = { \frac{1}{2}}\Vert \overline{g}(x_k,\omega _{g,k})\Vert \,\Delta _k. \end{aligned}$$

As a consequence, we may use (2.11), the Cauchy–Schwarz inequality, (A.2) (twice), (A.1), the inequality \(\kappa _{H\nabla }\ge 1\) and the first part of (A.4) to deduce that, for all \(k\ge 0\),

$$\begin{aligned} \begin{array}{lcl} |\rho _k-1| &{} = &{} \frac{ |f_k^+- m(x_k,s_k)|}{ m(x_k,0)-m(x_k,s_k)}\\ &{} \le &{} \frac{ |f_k^+-f(x_k+s_k)|+|f(x_k+s_k)-m(x_k,s_k)|}{ m(x_k,0)-m(x_k,s_k)}\\ &{} \le &{} 2 \eta _0 +\frac{ \kappa _g\Vert \overline{g}(x_k,\omega _{g,k})\Vert \Delta _k+\kappa _{H\nabla }\Delta _k^2}{ { \frac{1}{2}}\Vert \overline{g}(x_k,\omega _{g,k})\Vert \,\Delta _k}\\ &{} \le &{} 2\eta _0 +2\kappa _g+ 2\kappa _{H\nabla }\frac{ \Delta _k}{ \Vert \overline{g}(x_k,\omega _{g,k})\Vert }\\ &{} \le &{} 1-\eta _2. \end{array} \end{aligned}$$

Thus \(\rho _k\ge \eta _2\) and (2.12) ensures the second part of (A.4). \(\square\)

Lemma A.4

Suppose that AS.1 and AS.2 hold. Then, before termination,

$$\begin{aligned} \Delta _k \ge \min \left[ \Delta _0,\theta \epsilon \right] \;\; \text{ where } \;\; 0< \theta {\mathop {=}\limits ^{\mathrm{def}}}\frac{\gamma _1\big [{ \frac{1}{2}}(1-\eta _1)-\eta _0-\kappa _g\big ]}{\kappa _{H\nabla }(1+\kappa _g)} <\frac{1}{\kappa _{H\nabla }(1+\kappa _g)}. \end{aligned}$$
(A.5)

Proof

(See [16, Theorem 8.4.4].) Before termination, we must have that

$$\begin{aligned} \Vert \overline{g}(x_k,\omega _{g,k})\Vert \ge \frac{\epsilon }{1+\kappa _g}. \end{aligned}$$
(A.6)

Suppose that iteration k is the first iteration such that

$$\begin{aligned} \Delta _{k+1} \le \frac{\gamma _1\epsilon }{\kappa _{H\nabla }(1+\kappa _g)} \big [{ \frac{1}{2}}(1-\eta _1)-\eta _0-\kappa _g\big ]. \end{aligned}$$
(A.7)

Then the update (2.12) implies that

$$\begin{aligned} \Delta _k \le \frac{\epsilon }{\kappa _{H\nabla }(1+\kappa _g)} \big [{ \frac{1}{2}}(1-\eta _1)-\eta _0-\kappa _g\big ] \le \frac{\Vert \overline{g}(x_k,\omega _{g,k})\Vert }{\kappa _{H\nabla }} \big [{ \frac{1}{2}}(1-\eta _1)-\eta _0-\kappa _g\big ] \end{aligned}$$

where we have used (A.6) to deduce the last inequality. But this bound and (A.4) imply that \(\Delta _{k+1}\ge \Delta _k\), which is impossible since \(\Delta _k\) is reduced at iteration k. Hence no k exists such that (A.7) holds and the desired conclusion follows. \(\square\)

Lemma A.5

For each \(k \ge 0,\) define

$$\begin{aligned} \mathcal{S}_k {\mathop {=}\limits ^{\mathrm{def}}}\{ j \in \{ 0, \ldots , k \} \mid \rho _j \ge \eta _1 \} \;\; \text{ and } \;\; \mathcal{U}_k {\mathop {=}\limits ^{\mathrm{def}}}\{ 0, \ldots , k \} \setminus \mathcal{S}_k. \end{aligned}$$
(A.8)

the index sets of “successful” and “unsuccessful” iterations, respectively. Then

$$\begin{aligned} k \le |\mathcal{S}_k| \left( 1-\frac{\log \gamma _3}{\log \gamma _2}\right) +\frac{1}{|\log \gamma _2|}\log \left( \frac{\Delta _0}{\theta \epsilon }\right) . \end{aligned}$$
(A.9)

Proof

Observe that (2.12) implies that

$$\begin{aligned} \Delta _{j+1} \le \gamma _3 \Delta _j \;\; \text{ for } \;\; j \in \mathcal{S}_k \end{aligned}$$

and that

$$\begin{aligned} \Delta _{j+1}\le \gamma _2\Delta _j \;\; \text{ for } \;\; j \in \mathcal{U}_k. \end{aligned}$$

Combining these two inequalities, we obtain from (A.5) that

$$\begin{aligned} \min \big [\Delta _0,\theta \epsilon \Big ]\le \Delta _k \le \Delta _0\gamma _3^{|S_k|}\gamma _2^{|U_k|} \end{aligned}$$

Dividing by \(\Delta _0\), taking logarithms and recalling that \(\gamma _2\in (0,1)\), we get

$$\begin{aligned} |\mathcal{U}_k| \le -|\mathcal{S}_k| \frac{\log \gamma _3}{\log \gamma _2} - \frac{1}{\log \gamma _2}\log \left( \frac{\Delta _0}{\theta \epsilon }\right) . \end{aligned}$$

Hence (A.9) follows since \(k = |\mathcal{S}_k|+|\mathcal{U}_k|\). \(\square\)

Theorem A.1

Suppose that AS.1–AS.3 hold. Suppose also that\(\Delta _0\ge \theta \epsilon,\)where\(\theta\)is defined in (A.5). Then the TR1DA algorithm produces an iterate\(x_k\)such that\(\Vert \nabla _x^1f(x_k)\Vert \le \epsilon\) in at most

$$\tau _{{\mathcal{S}}} \mathop = \limits^{{{\text{def}}}} \frac{{2(f(x_{0} ) - \kappa _{{{\text{low}}}} )(1 + \kappa _{g} )}}{{(\eta _{1} - 2\eta _{0} )\theta }} \cdot \frac{1}{{\epsilon^{2} }}$$
(A.10)

successful iterations, at most

$$\begin{aligned} \tau _{\mathrm{tot}} {\mathop {=}\limits ^{\mathrm{def}}}\tau _S \left( 1-\frac{\log \gamma _3}{\log \gamma _2}\right) +\frac{1}{|\log \gamma _2|}\log \left( \frac{\Delta _0}{\theta \epsilon }\right) \end{aligned}$$
(A.11)

iterations in total, at most\(\tau _{\mathrm{tot}}\)(approximate) evaluations of the gradient satisfying (2.6), and at most\(2\tau _{\mathrm{tot}}\)(approximate) evaluations of the objective function satisfying (2.2).

Proof

As in the previous proof, (A.6) must hold before termination. Using AS.3, (A.8), (A.3), (2.9), (A.6), the assumption that \(\Delta _0\ge \theta \epsilon\) and (A.5), we obtain that, for an arbitrary \(k\ge 0\) before termination,

$$\begin{aligned} f(x_0)-\kappa _{\mathrm{low}}& \ge \sum _{j\in \mathcal{S}_k}\left[ f(x_j)-f(x_{j+1})\right] \\& \ge (\eta _1-2\eta _0) \sum _{j\in \mathcal{S}_k}\left[ m(x_j,0)-m(x_j,s_j)\right] \\& \ge { \frac{1}{2}}(\eta _1-2\eta _0) \sum _{j\in \mathcal{S}_k}\Vert \overline{g}(x_j,\omega _{g,j})\Vert \min \left[ \frac{ \Vert \overline{g}(x_j,\omega _{g,j})\Vert }{ 1+\Vert H_j\Vert },\Delta _j\right] \\& \ge { \frac{1}{2}}|\mathcal{S}_k|(\eta _1-2\eta _0)\frac{ \epsilon }{ 1+\kappa _g} \min \left[ \frac{ \epsilon }{ \kappa _{H\nabla }(1+\kappa _g)},\min \Big [\Delta _0,\theta \epsilon \Big ]\right] \\& = |\mathcal{S}_k| \,\frac{ (\eta _1-2\eta _0)}{ 2(1+\kappa _g)} \min \left[ \frac{ 1}{ \kappa _{H\nabla }(1+\kappa _g)},\theta \right] \,\epsilon ^2 \\& = |\mathcal{S}_k|\,\frac{ (\eta _1-2\eta _0)\theta }{ 2(1+\kappa _g)} \, \epsilon ^2 \end{aligned}$$

and therefore

$$|{\mathcal{S}}_{k} | \le \frac{{2(f(x_{0} ) - \kappa _{{{\text{low}}}} )(1 + \kappa _{g} )}}{{(\eta _{1} - 2\eta _{0} )\theta }} \cdot \frac{1}{{\epsilon ^{2} }}\mathop = \limits^{{{\text{def}}}} \frac{{\tau _{{\mathcal{S}}} }}{{\epsilon ^{2} }}.$$

As a consequence \(\Vert \overline{g}(x_k,\omega _{g,k})\Vert < \epsilon /(1+\kappa _g)\) after at most \(\tau _S\epsilon ^{-2}\) successful iterations and the algorithm terminates. The relation (2.13) then ensures that \(\Vert \nabla _x^1f(x_k)\Vert < \epsilon\), yielding (A.10). We may now use (A.9) and the mechanism of the algorithm to complete the proof. \(\square\)

Given that \(\epsilon \in (0,1]\), we immediately note that

$$\tau _{{\mathcal{S}}} = {\mathcal{O}}(\epsilon ^{{ - 2}} )\;\;{\text{ and }}\;\;\tau _{{{\text{tot}}}} = {\mathcal{O}}(\epsilon ^{{ - 2}} ).$$

Moreover, the proof of Theorem A.1 implies that these complexity bounds improve from \(\mathcal{O}(\epsilon ^{-2})\) to \(\mathcal{O}(\epsilon ^{-1})\) if \(\epsilon\) is so large or \(\Delta _0\) so small to yield \(\Delta _0 < \theta \epsilon\).

We conclude this brief complexity theory by noting that the domain in which AS.1 is assumed to hold can be restricted to the “tree of iterates” \(\cup _{k\ge 0}[x_k,x_k+s_k]\) without altering our results. This can be useful if an upper bound \({\bar{\Delta }}\) is imposed on the step’s length, in which case the monotonicty of the algorithm ensures that the tree of iterates remains in the set

$$\begin{aligned} \{y \in \mathfrak {R}^n \,|\, y = x+s \;\; \text{ with } \;\; f(x)\le f(x_0) \;\; \text{ and } \;\; \Vert s\Vert \le {\bar{\Delta }} \}. \end{aligned}$$

While it can be difficult to verify AS.1 on the (a priori unpredictable) tree of iterates, verifying it on the above set is much easier.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gratton, S., Toint, P.L. A note on solving nonlinear optimization problems in variable precision. Comput Optim Appl 76, 917–933 (2020). https://doi.org/10.1007/s10589-020-00190-2

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-020-00190-2

Keywords

Navigation