A note on solving nonlinear optimization problems in variable precision

Gratton, S.; Toint, Ph. L.

doi:10.1007/s10589-020-00190-2

A note on solving nonlinear optimization problems in variable precision

Published: 07 May 2020

Volume 76, pages 917–933, (2020)
Cite this article

Computational Optimization and Applications Aims and scope Submit manuscript

S. Gratton¹ &
Ph. L. Toint²

256 Accesses
8 Citations
Explore all metrics

Abstract

This short note considers an efficient variant of the trust-region algorithm with dynamic accuracy proposed by Carter (SIAM J Sci Stat Comput 14(2):368–388, 1993) and by Conn et al. (Trust-region methods. MPS-SIAM series on optimization, SIAM, Philadelphia, 2000) as a tool for very high-performance computing, an area where it is critical to allow multi-precision computations for keeping the energy dissipation under control. Numerical experiments are presented indicating that the use of the considered method can bring substantial savings in objective function’s and gradient’s evaluation “energy costs” by efficiently exploiting multi-precision computations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

TROPHY: Trust Region Optimization Using a Precision Hierarchy

An adaptive trust-region method without function evaluations

Article 02 March 2022

Geovani N. Grapiglia & Gabriel F. D. Stella

Two globally convergent nonmonotone trust-region methods for unconstrained optimization

Article 26 March 2015

Masoud Ahookhosh & Susan Ghaderi

Notes

The solution of nonlinear systems of equations is considered rather than unconstrained optimization.
Numerical experiments not reported here suggest that our default choice of remembering 15 secant pairs gives good performance, although keeping a smaller number of pairs is still acceptable.
Carter [10] requires $\omega _g \le 1-\eta _2$ while we require $\omega _g\le \kappa _g$ with $\kappa _g$ satisfying (2.5). A fixed value is also used for $\omega _f$, whose upper bound depends on $\omega _g$. The Hessian approximation is computed using an unsafeguarded standard BFGS update.
The collection of [8] and a few other problems, all available in Matlab.
Remember it is proportional to the square of the number of significant digits.

References

Baboulin, M., Buttari, A., Dongarra, J., Kurzak, J., Langou, J., Luszczek, P., Tomov, S.: Accelerating scientific computations with mixed precision algorithms. Comput. Phys. Commun. 180, 2526–2533 (2009)
Article Google Scholar
Bellavia, S., Gratton, S., Riccietti, E.: A Levenberg–Marquardt method for large nonlinear least-squares problems with dynamic accuracy in functions and gradients. Numer. Math. 140, 791–825 (2018)
Article MathSciNet Google Scholar
Bellavia, S., Gurioli, G., Morini, B.: Theoretical study of an adaptive cubic regularization method with dynamic inexact Hessian information (2018). arXiv:1808.06239
Bellavia, S., Gurioli, G., Morini, B., Toint, Ph.L.: Adaptive regularization algorithms with inexact evaluations for nonconvex optimization. SIAM J. Optim. 29(4), 2881–2915 (2019)
Article MathSciNet Google Scholar
Bergou, E., Diouane, Y., Kungurtsev, V., Royer, C.W.: A subsampling line-search method with second-order results (2018). arXiv:1810.07211
Blanchet, J., Cartis, C., Menickelly, M., Scheinberg, K.: Convergence rate analysis of a stochastic trust region method via supermartingales. INFORMS J. Optim. 1(2), 92–119 (2019)
Article Google Scholar
Brown, A.A., Bartholomew-Biggs, M.: Some effective methods for unconstrained optimization based on the solution of ordinary differential equations. Technical Report Technical Report 178, Hatfield Polytechnic, Hatfield, UK (1987)
Buckley, A.G.: Test functions for unconstrained minimization. Technical Report CS-3, Computing Science Division, Dalhousie University, Dalhousie, Canada (1989)
Carter, R.G.: A worst-case example using linesearch methods for numerical optimization with inexact gradient evaluations. Technical Report MCS-P283-1291, Argonne National Laboratory, Argonne, USA (1991)
Carter, R.G.: Numerical experience with a class of algorithms for nonlinear optimization using inexact function and gradient information. SIAM J. Sci. Stat. Comput. 14(2), 368–388 (1993)
Article MathSciNet Google Scholar
Cartis, C., Gould, N.I.M., Toint, Ph.L.: Worst-case evaluation complexity and optimality of second-order methods for nonconvex smooth optimization. In: The Proceedings of the 2018 International Conference of Mathematicians (ICM 2018), Rio de Janeiro (2018)
Cartis, C., Scheinberg, K.: Global convergence rate analysis of unconstrained optimization methods based on probabilistic models. Math. Program. Ser. A 159(2), 337–375 (2018)
Article MathSciNet Google Scholar
Chen, X., Jiang, B., Lin, T., Zhang, S.: On adaptive cubic regularization Newton’s methods for convex optimization via random sampling (2018). arXiv:1802.05426
Conn, A.R., Gould, N.I.M., Lescrenier, M., Toint, Ph.L.: Performance of a multifrontal scheme for partially separable optimization. In: Gomez, S., Hennart, J.P. (eds.) Advances in Optimization and Numerical Analysis. Proceedings of the Sixth Workshop on Optimization and Numerical Analysis, Oaxaca, Mexico, vol. 275, pp. 79–96. Kluwer Academic Publishers, Dordrecht (1994)
Conn, A.R., Gould, N.I.M., Toint, Ph.L.: LANCELOT: a Fortran package for large-scale nonlinear optimization (Release A). Number 17 in Springer Series in Computational Mathematics. Springer, Heidelberg (1992)
Conn, A.R., Gould, N.I.M., Toint, Ph.L.: Trust-Region Methods. MPS-SIAM Series on Optimization. SIAM, Philadelphia (2000)
Google Scholar
Dixon, L.C.W., Maany, Z.: A family of test problems with sparse Hessian for unconstrained optimization. Technical Report 206, Numerical Optimization Center, Hatfield Polytechnic, Hatfield, UK (1988)
Elster, C., Neumaier, A.: A method of trust region type for minimizing noisy functions. Computing 58(1), 31–46 (1997)
Article MathSciNet Google Scholar
Galal, S., Horowitz, M.: Energy-efficient floating-point unit design. IEEE Trans. Comput. 60(7), 913–922 (2011)
Article MathSciNet Google Scholar
Gould, N.I.M., Orban, D., Toint, Ph.L.: CUTEst: a constrained and unconstrained testing environment with safe threads for mathematical optimization. Comput. Optim. Appl. 60(3), 545–557 (2015)
Article MathSciNet Google Scholar
Griewank, A., Toint, Ph.L.: Partitioned variable metric updates for large structured optimization problems. Numer. Math. 39, 119–137 (1982)
Article MathSciNet Google Scholar
Higham, N.J.: The rise of multiprecision computations. Talk at SAMSI 2017. https://bit.ly/higham-samsi17(2017)
Kugler, L.: Is “good enough” computing good enough? Commun. ACM 58, 12–14 (2015)
Article Google Scholar
Leyffer, S., Wild, S., Fagan, M., Snir, M., Palem, K., Yoshii, K., Finkel, H.: Moore with less—leapgrogging Moore’s law with inexactness for supercomputing (2016). arXiv:1610.02606v2 (to appear in Proceedings of PMES 2018: 3rd International Workshop on Post Moore’s Era Supercomputing)
Li, G.: The secant/finite difference algorithm for solving sparse nonlinear systems of equations. SIAM J. Numer. Anal. 25(5), 1181–1196 (1988)
Article MathSciNet Google Scholar
Matsuoka, S.: Private communication (2018)
Moré, J.J., Garbow, B.S., Hillstrom, K.E.: Testing unconstrained optimization software. ACM Trans. Math. Softw. 7(1), 17–41 (1981)
Article MathSciNet Google Scholar
Nocedal, J., Wright, S.J.: Numerical Optimization. Series in Operations Research. Springer, Heidelberg (1999)
Book Google Scholar
Palem, K.V.: Inexactness and a future of computing. Philos. Trans. R. Soc. A 372, 20130281 (2014)
Article MathSciNet Google Scholar
Poenisch, G., Schwetlick, H.: Computing turning points of curves implicitly defined by nonlinear equations depending on a parameter. Computing 20, 101–121 (1981)
MathSciNet Google Scholar
Pu, J., Galal, S., Yang, X., Shacham, O., Horowitz, M.: FPMax: a 106GFLOPS/W at 217GFLOPS/mm2 single-precision FPU, and a 43.7 GFLOPS/W at 74.6 GFLOPS/mm2 double-precision FPU, in 28nm UTBB FDSOI. In: Hardware Architecture (2016)
Schmidt, J.W., Vetters, K.: Albeitungsfreie verfahren fur nichtlineare optimierungsproblem. Numer. Math. 15, 263–282 (1970)
Article MathSciNet Google Scholar
Spedicato, E.: Computational experience with quasi-Newton algorithms for minimization problems of moderately large size. Technical Report CISE-N-175, CISE Documentation Service, Segrate, Milano (1975)
Wang, N., Choi, J., Brand, D., Chen, C.-Y., Gopalakrishnan, K.: Training deep neural networks with 8-bit floating point numbers. In: 32nd Conference on Neural Information Processing Systems (2018). arXiv:1812.08011
Xu, P., Roosta-Khorasani, F., Mahoney, M.W.: Newton-type methods for non-convex optimization under inexact Hessian information (2017). arXiv:1708.07164v3

Download references

Acknowledgements

S. Gratton: Partially supported by 3IA Artificial and Natural Intelligence Toulouse Institute, French “Investing for the Future - PIA3” program under the Grant Agreement ANR-19-PI3A-0004. P. L. Toint: Partially supported by ANR-11-LABX-0040-CIMI within the program ANR-11-IDEX-0002-02.

Author information

Authors and Affiliations

INP, IRIT, Université de Toulouse, Toulouse, France
S. Gratton
NAXYS, University of Namur, Namur, Belgium
Ph. L. Toint

Authors

S. Gratton
View author publications
You can also search for this author in PubMed Google Scholar
Ph. L. Toint
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. Gratton.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Complexity theory for the TR1DA algorithm

For the sake of accuracy and completeness, we now provide details of the first-order worst-case complexity analyis summarized at the end of Sect. 2. As indicated there, the following development can be seen as a combination of the arguments proposed by [16] for the convergence theory of trust-region methods with inexact gradients (pp. 280) and dynamic accuracy (pp. 400).

We assume that

AS.1::: The objective function f is twice continuously differentiable in $\mathfrak {R}^n$ and there exist a constant $\kappa _\nabla \ge 0$ such that $\Vert \nabla _x^2f(x)\Vert \le \kappa _\nabla$ for all $x \in \mathfrak {R}^n$.
AS.2::: There exists a constant $\kappa _H\ge 0$ such that $\Vert H_k\Vert \le \kappa _H$ for all $k\ge 0$.
AS.3:: There exists a constant $\kappa _{\mathrm{low}}$ such that $f(x)\ge \kappa _{\mathrm{low}}$ for all $x\in \mathfrak {R}^n$.

Lemma A.1

Suppose AS.1 and AS.2 hold. Then, for each $k\ge 0,$

$$\begin{aligned} |f(x_k+s_k)-m(x_k,s_k)| \le |f_k-f(x_k)| +\kappa _g \Vert \overline{g}(x_k,\omega _{g,k})\Vert \Delta _k + \kappa _{H\nabla }\Delta _k^2. \end{aligned}$$

(A.1)

for $\kappa _{H\nabla } = 1+\max [\kappa _H, \kappa _\nabla ].$

Proof

(See [16, Theorem 8.4.2].) The definition (2.8), (2.6), the mean-value theorem, the Cauchy–Schwarz inequality and AS.1 give that, for some $\xi _k$ in the segment $[x_k,x_k+s_k]$,

$$\begin{aligned} |f(x_k+s_k)-m(x_k,s_k)| &\le |f_k-f(x_k)| +|s_k^T(\nabla _x^1f(x_k)-\overline{g}(x_k,\omega _{g,k})| \\&\quad+ { \frac{1}{2}}|s_k^T\nabla _x^2f(\xi _k)s_k| + { \frac{1}{2}}|s_k^TH_ks_k| \\ &\le |f_k-f(x_k)| +\kappa _g \Vert \overline{g}(x_k,\omega _{g,k})\Vert \, \Vert s_k\Vert + { \frac{1}{2}}(\kappa _H + \kappa _\nabla )\Vert s_k\Vert ^2 \end{aligned}$$

and (A.1) follows from the the Cauchy–Schwarz inequality and the inequality $\Vert s_k\Vert \le \Delta _k$. $\square$

Lemma A.2

We have that, for all $k\ge 0,$

$$\begin{aligned} \max \left[ |f_k-f(x_k)|,|f_k^+ - f(x_k+s_k)|\right] \le \eta _0\left[ m(x_k,0)-m(x_k,s_k)\right] \end{aligned}$$

(A.2)

and

$$\begin{aligned} \rho _k \ge \eta _1 \;\; \text{ implies } \text{ that } \;\; \frac{f(x_k)-f(x_k+s_k)}{m(x_k,0)-m(x_k,s_k)} \ge \eta _1 - 2\eta _0 >0. \end{aligned}$$

(A.3)

Proof

(See [16, p. 401].) The mechanism of the TR1DA algorithm ensures that (A.2) holds. Hence,

$$\begin{aligned} \frac{\left[ f_k-f(x_k)\right] +\left[ |f_k^+ -f(x_k+s_k)\right] }{m(x_k,0)-m(x_k,s_k)} \le 2 \eta _0. \end{aligned}$$

As a consequence, for iterations where $\rho _k \ge \eta _1$,

$$\begin{aligned} \rho _k = \frac{f_k-f_k^+}{m(x_k,0)-m(x_k,s_k)} = \frac{f(x_k)-f(x_k+s_k)}{m(x_k,0)-m(x_k,s_k)} + \frac{\left[ f_k-f(x_k)\right] +\left[ |f_k^+-f(x_k+s_k)\right] }{m(x_k,0)-m(x_k,s_k)} \end{aligned}$$

and (A.3) must hold. $\square$

This result implies, in particular, that the sequence $\{f(x_k)\}_{k\ge 0}$ is non-increasing, and the TR1DA algorithm is therefore monotone on the exact function f.

Lemma A.3

Suppose AS.1 and AS.2 hold, and that $\overline{g}(x_k,\omega _{g,k})\ne 0.$ Then

$$\begin{aligned} \Delta _k \le \frac{\Vert \overline{g}(x_k,\omega _{g,k})\Vert }{2\kappa _{H\nabla }} \Big [ { \frac{1}{2}}(1-\eta _1)-\eta _0-\kappa _g\Big ] \;\; \text{ implies } \text{ that } \;\; \Delta _{k+1} \ge \Delta _k. \end{aligned}$$

(A.4)

Proof

(See [16, Theorem 8.4.3].) Since (2.5) implies that ${ \frac{1}{2}}(1-\eta _1)-\eta _0-\kappa _g \in (0,1)$ the first part of (A.4) then gives that $\Delta _k < \Vert \overline{g}(x_k,\omega _{g,k})\Vert\, /\, \kappa _{H\nabla }$. Hence the inequality $1+\Vert H_k\Vert \le \kappa _{H\nabla }$ and (2.9) yield that

$$\begin{aligned} m(x_k,0)-m(x_k,s_k) \ge { \frac{1}{2}}\Vert \overline{g}(x_k,\omega _{g,k})\Vert \min \left[ \frac{\Vert \overline{g}(x_k,\omega _{g,k})\Vert }{\kappa _{H\nabla }},\Delta _k\right] = { \frac{1}{2}}\Vert \overline{g}(x_k,\omega _{g,k})\Vert \,\Delta _k. \end{aligned}$$

As a consequence, we may use (2.11), the Cauchy–Schwarz inequality, (A.2) (twice), (A.1), the inequality $\kappa _{H\nabla }\ge 1$ and the first part of (A.4) to deduce that, for all $k\ge 0$,

$$\begin{aligned} \begin{array}{lcl} |\rho _k-1| &{} = &{} \frac{ |f_k^+- m(x_k,s_k)|}{ m(x_k,0)-m(x_k,s_k)}\\ &{} \le &{} \frac{ |f_k^+-f(x_k+s_k)|+|f(x_k+s_k)-m(x_k,s_k)|}{ m(x_k,0)-m(x_k,s_k)}\\ &{} \le &{} 2 \eta _0 +\frac{ \kappa _g\Vert \overline{g}(x_k,\omega _{g,k})\Vert \Delta _k+\kappa _{H\nabla }\Delta _k^2}{ { \frac{1}{2}}\Vert \overline{g}(x_k,\omega _{g,k})\Vert \,\Delta _k}\\ &{} \le &{} 2\eta _0 +2\kappa _g+ 2\kappa _{H\nabla }\frac{ \Delta _k}{ \Vert \overline{g}(x_k,\omega _{g,k})\Vert }\\ &{} \le &{} 1-\eta _2. \end{array} \end{aligned}$$

Thus $\rho _k\ge \eta _2$ and (2.12) ensures the second part of (A.4). $\square$

Lemma A.4

Suppose that AS.1 and AS.2 hold. Then, before termination,

$$\begin{aligned} \Delta _k \ge \min \left[ \Delta _0,\theta \epsilon \right] \;\; \text{ where } \;\; 0< \theta {\mathop {=}\limits ^{\mathrm{def}}}\frac{\gamma _1\big [{ \frac{1}{2}}(1-\eta _1)-\eta _0-\kappa _g\big ]}{\kappa _{H\nabla }(1+\kappa _g)} <\frac{1}{\kappa _{H\nabla }(1+\kappa _g)}. \end{aligned}$$

(A.5)

Proof

(See [16, Theorem 8.4.4].) Before termination, we must have that

$$\begin{aligned} \Vert \overline{g}(x_k,\omega _{g,k})\Vert \ge \frac{\epsilon }{1+\kappa _g}. \end{aligned}$$

(A.6)

Suppose that iteration k is the first iteration such that

$$\begin{aligned} \Delta _{k+1} \le \frac{\gamma _1\epsilon }{\kappa _{H\nabla }(1+\kappa _g)} \big [{ \frac{1}{2}}(1-\eta _1)-\eta _0-\kappa _g\big ]. \end{aligned}$$

(A.7)

Then the update (2.12) implies that

$$\begin{aligned} \Delta _k \le \frac{\epsilon }{\kappa _{H\nabla }(1+\kappa _g)} \big [{ \frac{1}{2}}(1-\eta _1)-\eta _0-\kappa _g\big ] \le \frac{\Vert \overline{g}(x_k,\omega _{g,k})\Vert }{\kappa _{H\nabla }} \big [{ \frac{1}{2}}(1-\eta _1)-\eta _0-\kappa _g\big ] \end{aligned}$$

where we have used (A.6) to deduce the last inequality. But this bound and (A.4) imply that $\Delta _{k+1}\ge \Delta _k$, which is impossible since $\Delta _k$ is reduced at iteration k. Hence no k exists such that (A.7) holds and the desired conclusion follows. $\square$

Lemma A.5

For each $k \ge 0,$ define

$$\begin{aligned} \mathcal{S}_k {\mathop {=}\limits ^{\mathrm{def}}}\{ j \in \{ 0, \ldots , k \} \mid \rho _j \ge \eta _1 \} \;\; \text{ and } \;\; \mathcal{U}_k {\mathop {=}\limits ^{\mathrm{def}}}\{ 0, \ldots , k \} \setminus \mathcal{S}_k. \end{aligned}$$

(A.8)

the index sets of “successful” and “unsuccessful” iterations, respectively. Then

$$\begin{aligned} k \le |\mathcal{S}_k| \left( 1-\frac{\log \gamma _3}{\log \gamma _2}\right) +\frac{1}{|\log \gamma _2|}\log \left( \frac{\Delta _0}{\theta \epsilon }\right) . \end{aligned}$$

(A.9)

Proof

Observe that (2.12) implies that

$$\begin{aligned} \Delta _{j+1} \le \gamma _3 \Delta _j \;\; \text{ for } \;\; j \in \mathcal{S}_k \end{aligned}$$

and that

$$\begin{aligned} \Delta _{j+1}\le \gamma _2\Delta _j \;\; \text{ for } \;\; j \in \mathcal{U}_k. \end{aligned}$$

Combining these two inequalities, we obtain from (A.5) that

$$\begin{aligned} \min \big [\Delta _0,\theta \epsilon \Big ]\le \Delta _k \le \Delta _0\gamma _3^{|S_k|}\gamma _2^{|U_k|} \end{aligned}$$

Dividing by $\Delta _0$, taking logarithms and recalling that $\gamma _2\in (0,1)$, we get

$$\begin{aligned} |\mathcal{U}_k| \le -|\mathcal{S}_k| \frac{\log \gamma _3}{\log \gamma _2} - \frac{1}{\log \gamma _2}\log \left( \frac{\Delta _0}{\theta \epsilon }\right) . \end{aligned}$$

Hence (A.9) follows since $k = |\mathcal{S}_k|+|\mathcal{U}_k|$. $\square$

Theorem A.1

Suppose that AS.1–AS.3 hold. Suppose also that$\Delta _0\ge \theta \epsilon,$where$\theta$is defined in (A.5). Then the TR1DA algorithm produces an iterate$x_k$such that$\Vert \nabla _x^1f(x_k)\Vert \le \epsilon$ in at most

$$\tau _{{\mathcal{S}}} \mathop = \limits^{{{\text{def}}}} \frac{{2(f(x_{0} ) - \kappa _{{{\text{low}}}} )(1 + \kappa _{g} )}}{{(\eta _{1} - 2\eta _{0} )\theta }} \cdot \frac{1}{{\epsilon^{2} }}$$

(A.10)

successful iterations, at most

$$\begin{aligned} \tau _{\mathrm{tot}} {\mathop {=}\limits ^{\mathrm{def}}}\tau _S \left( 1-\frac{\log \gamma _3}{\log \gamma _2}\right) +\frac{1}{|\log \gamma _2|}\log \left( \frac{\Delta _0}{\theta \epsilon }\right) \end{aligned}$$

(A.11)

iterations in total, at most$\tau _{\mathrm{tot}}$(approximate) evaluations of the gradient satisfying (2.6), and at most$2\tau _{\mathrm{tot}}$(approximate) evaluations of the objective function satisfying (2.2).

Proof

As in the previous proof, (A.6) must hold before termination. Using AS.3, (A.8), (A.3), (2.9), (A.6), the assumption that $\Delta _0\ge \theta \epsilon$ and (A.5), we obtain that, for an arbitrary $k\ge 0$ before termination,

$$\begin{aligned} f(x_0)-\kappa _{\mathrm{low}}& \ge \sum _{j\in \mathcal{S}_k}\left[ f(x_j)-f(x_{j+1})\right] \\& \ge (\eta _1-2\eta _0) \sum _{j\in \mathcal{S}_k}\left[ m(x_j,0)-m(x_j,s_j)\right] \\& \ge { \frac{1}{2}}(\eta _1-2\eta _0) \sum _{j\in \mathcal{S}_k}\Vert \overline{g}(x_j,\omega _{g,j})\Vert \min \left[ \frac{ \Vert \overline{g}(x_j,\omega _{g,j})\Vert }{ 1+\Vert H_j\Vert },\Delta _j\right] \\& \ge { \frac{1}{2}}|\mathcal{S}_k|(\eta _1-2\eta _0)\frac{ \epsilon }{ 1+\kappa _g} \min \left[ \frac{ \epsilon }{ \kappa _{H\nabla }(1+\kappa _g)},\min \Big [\Delta _0,\theta \epsilon \Big ]\right] \\& = |\mathcal{S}_k| \,\frac{ (\eta _1-2\eta _0)}{ 2(1+\kappa _g)} \min \left[ \frac{ 1}{ \kappa _{H\nabla }(1+\kappa _g)},\theta \right] \,\epsilon ^2 \\& = |\mathcal{S}_k|\,\frac{ (\eta _1-2\eta _0)\theta }{ 2(1+\kappa _g)} \, \epsilon ^2 \end{aligned}$$

and therefore

$$|{\mathcal{S}}_{k} | \le \frac{{2(f(x_{0} ) - \kappa _{{{\text{low}}}} )(1 + \kappa _{g} )}}{{(\eta _{1} - 2\eta _{0} )\theta }} \cdot \frac{1}{{\epsilon ^{2} }}\mathop = \limits^{{{\text{def}}}} \frac{{\tau _{{\mathcal{S}}} }}{{\epsilon ^{2} }}.$$

As a consequence $\Vert \overline{g}(x_k,\omega _{g,k})\Vert < \epsilon /(1+\kappa _g)$ after at most $\tau _S\epsilon ^{-2}$ successful iterations and the algorithm terminates. The relation (2.13) then ensures that $\Vert \nabla _x^1f(x_k)\Vert < \epsilon$, yielding (A.10). We may now use (A.9) and the mechanism of the algorithm to complete the proof. $\square$

Given that $\epsilon \in (0,1]$, we immediately note that

$$\tau _{{\mathcal{S}}} = {\mathcal{O}}(\epsilon ^{{ - 2}} )\;\;{\text{ and }}\;\;\tau _{{{\text{tot}}}} = {\mathcal{O}}(\epsilon ^{{ - 2}} ).$$

Moreover, the proof of Theorem A.1 implies that these complexity bounds improve from $\mathcal{O}(\epsilon ^{-2})$ to $\mathcal{O}(\epsilon ^{-1})$ if $\epsilon$ is so large or $\Delta _0$ so small to yield $\Delta _0 < \theta \epsilon$.

We conclude this brief complexity theory by noting that the domain in which AS.1 is assumed to hold can be restricted to the “tree of iterates” $\cup _{k\ge 0}[x_k,x_k+s_k]$ without altering our results. This can be useful if an upper bound ${\bar{\Delta }}$ is imposed on the step’s length, in which case the monotonicty of the algorithm ensures that the tree of iterates remains in the set

$$\begin{aligned} \{y \in \mathfrak {R}^n \,|\, y = x+s \;\; \text{ with } \;\; f(x)\le f(x_0) \;\; \text{ and } \;\; \Vert s\Vert \le {\bar{\Delta }} \}. \end{aligned}$$

While it can be difficult to verify AS.1 on the (a priori unpredictable) tree of iterates, verifying it on the above set is much easier.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gratton, S., Toint, P.L. A note on solving nonlinear optimization problems in variable precision. Comput Optim Appl 76, 917–933 (2020). https://doi.org/10.1007/s10589-020-00190-2

Download citation

Received: 17 December 2018
Published: 07 May 2020
Issue Date: July 2020
DOI: https://doi.org/10.1007/s10589-020-00190-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A note on solving nonlinear optimization problems in variable precision

Abstract

Access this article

Similar content being viewed by others

TROPHY: Trust Region Optimization Using a Precision Hierarchy

An adaptive trust-region method without function evaluations

Two globally convergent nonmonotone trust-region methods for unconstrained optimization

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Complexity theory for the TR1DA algorithm

Lemma A.1

Proof

Lemma A.2

Proof

Lemma A.3

Proof

Lemma A.4

Proof

Lemma A.5

Proof

Theorem A.1

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A note on solving nonlinear optimization problems in variable precision

Abstract

Access this article

Similar content being viewed by others

TROPHY: Trust Region Optimization Using a Precision Hierarchy

An adaptive trust-region method without function evaluations

Two globally convergent nonmonotone trust-region methods for unconstrained optimization

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Complexity theory for the TR1DA algorithm

Appendix: Complexity theory for the TR1DA algorithm

Lemma A.1

Proof

Lemma A.2

Proof

Lemma A.3

Proof

Lemma A.4

Proof

Lemma A.5

Proof

Theorem A.1

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation