Skip to main content
Log in

Numerically tractable optimistic bilevel problems

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

We consider a class of optimistic bilevel problems. Specifically, we address bilevel problems in which at the lower level the objective function is fully convex and the feasible set does not depend on the upper level variables. We show that this nontrivial class of mathematical programs is sufficiently broad to encompass significant real-world applications and proves to be numerically tractable. From this respect, we establish that the stationary points for a relaxation of the original problem can be obtained addressing a suitable generalized Nash equilibrium problem. The latter game is proven to be convex and with a nonempty solution set. Leveraging this correspondence, we provide a provably convergent, easily implementable scheme to calculate stationary points of the relaxed bilevel program. As witnessed by some numerical experiments on an application in economics, this algorithm turns out to be numerically viable also for big dimensional problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Aussel, D., Sagratella, S.: Sufficient conditions to compute any solution of a quasivariational inequality via a variational inequality. Math. Methods Oper. Res. 85(1), 3–18 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  2. Bank, B., Guddat, J., Klatte, D., Kummer, B., Tammer, K.: Non-Linear Parametric Optimization. Akademie-Verlag, Berlin (1982)

    Book  MATH  Google Scholar 

  3. Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (1992)

    Google Scholar 

  4. Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)

    MATH  Google Scholar 

  5. Clarke, F.H.: Optimization and Nonsmooth Analysis. Wiley, Hoboken (1983)

    MATH  Google Scholar 

  6. Colson, B., Marcotte, P., Savard, G.: An overview of bilevel optimization. Ann. Oper. Res. 153(1), 235–256 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  7. Dempe, S.: Foundations of Bilevel Programming. Springer, Berlin (2002)

    MATH  Google Scholar 

  8. Dempe, S.: Annotated bibliography on bilevel programming and mathematical programs with equilibrium constraints. Optimization 52(3), 333–359 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  9. Dempe, S., Dutta, J.: Is bilevel programming a special case of a mathematical program with complementarity constraints? Math. Progr. 131(1–2), 37–48 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  10. Dempe, S., Franke, S.: On the solution of convex bilevel optimization problems. Comput. Optim. Appl. 63(3), 685–703 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  11. Dempe, S., Zemkoho, A.B.: The bilevel programming problem: reformulations, constraint qualifications and optimality conditions. Math. Progr. 138(1–2), 447–473 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  12. Dreves, A., Facchinei, F., Fischer, A., Herrich, M.: A new error bound result for generalized Nash equilibrium problems and its algorithmic application. Comput. Optim. Appl. 59(1–2), 63–84 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  13. Dreves, A., Facchinei, F., Kanzow, C., Sagratella, S.: On the solution of the KKT conditions of generalized Nash equilibrium problems. SIAM J. Optim. 21(3), 1082–1108 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  14. Facchinei, F., Kanzow, C.: Generalized Nash equilibrium problems. Ann. Oper. Res. 175(1), 177–211 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  15. Facchinei, F., Kanzow, C., Karl, S., Sagratella, S.: The semismooth Newton method for the solution of quasi-variational inequalities. Comput. Optim. Appl. 62(1), 85–109 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  16. Facchinei, F., Kanzow, C., Sagratella, S.: Solving quasi-variational inequalities via their KKT conditions. Math. Progr. 144(1–2), 369–412 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  17. Facchinei, F., Lampariello, L.: Partial penalization for the solution of generalized Nash equilibrium problems. J. Glob. Optim. 50(1), 39–57 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  18. Facchinei, F., Lampariello, L., Scutari, G.: Feasible methods for nonconvex nonsmooth problems with applications in green communications. Math. Progr. 164(1–2), 55–90 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  19. Facchinei, F., Pang, J.-S.: Finite-Dimensional Variational Inequalities and Complementarity Problems. Springer, Berlin (2003)

    MATH  Google Scholar 

  20. Facchinei, F., Pang, J.-S., Scutari, G., Lampariello, L.: Vi-constrained hemivariational inequalities: distributed algorithms and power control in ad-hoc networks. Math. Program. 145(1–2), 59–96 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  21. Facchinei, F., Piccialli, V., Sciandrone, M.: Decomposition algorithms for generalized potential games. Comput. Opt. Appl. 50(2), 237–262 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  22. Kanzow, C., Steck, D.: Augmented Lagrangian methods for the solution of generalized Nash equilibrium problems. SIAM J. Optim. 26(4), 2034–2058 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  23. Lampariello, L., Sagratella, S.: A bridge between bilevel programs and Nash games. J. Optim. Theory Appl. 174(2), 613–635 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  24. Lampariello, L., Sagratella, S., Stein, O.: The standard pessimistic bilevel problem. SIAM J. Optim. 29(2), 1634–1656 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  25. Latorre, V., Sagratella, S.: A canonical duality approach for the solution of affine quasi-variational inequalities. J. Glob. Optim. 64(3), 433–449 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  26. Lignola, M.B., Morgan, J.: Stability of regularized bilevel programming problems. J. Optim. Theory Appl. 93(3), 575–596 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  27. Lin, G.-H., Xu, M., Ye, J.J.: On solving simple bilevel programs with a nonconvex lower level program. Math. Program. 144(1–2), 277–305 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  28. Luo, Z.-Q., Pang, J.-S., Ralph, D.: Mathematical Programs with Equilibrium Constraints. Cambridge University Press, Cambridge (1996)

    Book  MATH  Google Scholar 

  29. Outrata, J.V.: On the numerical solution of a class of Stackelberg problems. Z. Oper. Res. 34(4), 255–277 (1990)

    MathSciNet  MATH  Google Scholar 

  30. Pang, J.-S., Fukushima, M.: Quasi-variational inequalities, generalized Nash equilibria, and multi-leader-follower games. Comput. Manag. Sci. 2, 21–56 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  31. Pang, J.-S., Scutari, G.: Nonconvex games with side constraints. SIAM J. Optim. 21(4), 1491–1522 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  32. Rockafellar, R.T., Wets, J.B.: Variational Analysis. Springer, Berlin (1998)

    Book  MATH  Google Scholar 

  33. Sagratella, S.: Algorithms for generalized potential games with mixed-integer variables. Comput. Optim. Appl. 68(3), 689–717 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  34. Scutari, G., Facchinei, F., Lampariello, L.: Parallel and distributed methods for constrained nonconvex optimization-part i: Theory. IEEE Trans. Signal Process. 65(8), 1929–1944 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  35. Scutari, G., Facchinei, F., Lampariello, L., Sardellitti, S., Song, P.: Parallel and distributed methods for constrained nonconvex optimization-part ii: Applications in communications and machine learning. IEEE Trans. Signal Process. 65(8), 1945–1960 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  36. Scutari, G., Facchinei, F., Lampariello, L., Song, P.: Parallel and distributed methods for nonconvex optimization. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 840–844. IEEE (2014)

  37. Scutari, G., Facchinei, F., Pang, J.-S., Lampariello, L.: Equilibrium selection in power control games on the interference channel. In: 2012 Proceedings IEEE INFOCOM, pp. 675–683. IEEE (2012)

  38. Song, P., Scutari, G., Facchinei, F., Lampariello, L.: D3m: Distributed multi-cell multigroup multicasting. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3741–3745. IEEE (2016)

  39. Tanino, T., Ogawa, T.: An algorithm for solving two-level convex optimization problems. Int. J. Syst. Sci. 15(2), 163–174 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  40. Von Stackelberg, H.: Marktform und Gleichgewicht. Springer, Berlin (1934)

    MATH  Google Scholar 

  41. Ye, J.J.: Constraint qualifications and KKT conditions for bilevel programming problems. Math. Oper. Res. 31(4), 811–824 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  42. Zemkoho, A.B.: Solving ill-posed bilevel programs. Set Valued Var. Anal. 24(3), 423–448 (2016)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Simone Sagratella.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: What happens when \(\varepsilon \downarrow 0\)

Although the solution of the perturbed problem (\(\text{SBP}_\varepsilon \)) is significant and meaningful per se both in a theoretical and in a practical perspectives, the question arises quite naturally on what happens when the perturbation \(\varepsilon \) goes to zero (see e.g. [26, 27]).

As previously recalled, the original (SBP) is a nonconvex problem for which standard constraint qualifications are not readily at hand. Thus, the computation of a Fritz-John (FJ) point (in the sense of the following definition (see [5, Theorem 6.1.1])) for (SBP) may seem a reasonable goal.

Definition 5

(FJ point) Let (xy) be feasible for (SBP). We say that (xy) is a FJ point for (SBP), if multipliers \((\lambda _0, \lambda _1) \in {\mathbb {R}}^2_+\), not all zero, exist such that:

$$\begin{aligned} \begin{array}{l} 0 \in \lambda _0 \nabla _{1} F(x, y) + \lambda _1 \nabla _1 f(x, y) - \lambda _1 \nabla \varphi (x) + N_X(x)\\ 0 \in \lambda _0 \nabla _{2} F(x, y) + \lambda _1 \nabla _2 f(x, y) + N_U(y)\\ \lambda _1 \in N_{{\mathbb {R}}_-}(f(x, y) - \varphi (x)). \end{array} \end{aligned}$$
(19)

Unfortunately, by leveraging Proposition 1 and the first order optimality conditions for the lower level problem, one can easily show that any feasible point for (SBP) is also a FJ solution with \(\lambda _0 = 0\).

The following proposition gives the theoretical guarantee that Algorithm 1, with \(\varepsilon \downarrow 0\), provides us with a FJ point for (SBP): hence, in the worst case, we find a point that at least is feasible for (SBP).

Proposition 3

Let\(\{\varepsilon _k\}\)be a sequence such that\(\varepsilon _k \downarrow 0\), and\(\{(x^k, y^k)\}\)be a corresponding sequence of stationary points for(SBP\(_{\varepsilon ^k}\)). Then, any accumulation point\((\bar{x}, \bar{y})\) of \(\{(x^k, y^k)\}\)is a FJ point for (SBP).

Proof

We assume, without loss of generality, that the whole sequence \(\{(x^k, y^k)\}\) converges to \((\bar{x}, \bar{y})\). Let \(\lambda _1^k \in {\mathbb {R}}_+\) be such that

$$\begin{aligned} \begin{array}{r} - \nabla _{1} F(x^k, y^k) - \lambda _1^k \nabla _1 f(x^k, y^k) - \lambda _1^k \nabla \varphi (x^k) \in N_X(x^k)\\ - \nabla _{2} F(x^k, y^k) - \lambda _1^k \nabla _2 f(x^k, y^k) \in N_U(y^k)\\ \lambda _1^k \in N_{{\mathbb {R}}_-}(f(x^k, y^k) - \varphi (x^k) - \varepsilon ^k), \end{array} \end{aligned}$$
(20)

that is \(\lambda _1^k\) is a multiplier associated with the stationary point \((x^k, y^k)\).

We recall that the normal cones \(N_X(\bullet )\), \(N_U(\bullet )\) and \(N_{{\mathbb {R}}_-}(\bullet )\), considered as set-valued mappings, are outer semicontinuous at \(\bar{x}\), \(\bar{y}\) and \(f(\bar{x}, \bar{y}) - \varphi (\bar{x})\), relative to X, U and \({\mathbb {R}}_-\), respectively (see [32, Proposition 6.6]). In the subsequent developments, these fundamental properties will be freely invoked.

Preliminarily, we note that, by definition, we have \(x^k \in X\), \(y^k \in U\) and \(f(x^k, y^k) - \varphi (x^k) - \varepsilon ^k \le 0\) for every k. We distinguish two cases.

  1. (i)

    Consider a subsequence \(\{\lambda _1^k\}_{\mathcal {K}}\) such that \(\lambda _1^k \underset{\mathcal {K}}{\rightarrow } \bar{\lambda }_1\). Passing to the limit (over \({\mathcal {K}}\)) in (20), we have, by the continuity of the functions involved,

    $$\begin{aligned} \begin{array}{r} - \nabla _1 F(\bar{x}, \bar{y})- \bar{\lambda }_1 \nabla _1 f(\bar{x}, \bar{y}) - \bar{\lambda }_1 \nabla \varphi (\bar{x}) \in N_X(\bar{x})\\ - \nabla _2 F(\bar{x}, \bar{y}) - \bar{\lambda }_1 \nabla _2 f(\bar{x}, \bar{y}) \in N_U(\bar{y})\\ \bar{\lambda }_1 \in N_{{\mathbb {R}}_-}(f(\bar{x}, \bar{y}) - \varphi (\bar{x})). \end{array} \end{aligned}$$
    (21)

    Hence, \((\bar{x}, \bar{y})\) is a FJ point for problem (SBP) with corresponding multipliers \((\lambda _0, \lambda _1) = (1, \bar{\lambda }_1)\).

  2. (ii)

    As opposed to case (i), let, without loss of generality, \(\lambda _1^k \rightarrow \infty \). Dividing both sides of relations (20) by \(\lambda _1^k\) and passing to the limit, we obtain

    $$\begin{aligned} \begin{array}{r} - \nabla _1 f(\bar{x}, \bar{y}) - \nabla \varphi (\bar{x}) \in N_X(\bar{x})\\ - \nabla _2 f(\bar{x}, \bar{y}) \in N_U(\bar{y})\\ 1 \in N_{{\mathbb {R}}_-}(f(\bar{x}, \bar{y}) - \varphi (\bar{x})). \end{array} \end{aligned}$$
    (22)

    Thus, \((\bar{x}, \bar{y})\) is a FJ point for problem (SBP) with corresponding multipliers \((\lambda _0, \lambda _1) = (0, 1)\).

\(\square \)

Clearly, if the sequence of multipliers \(\{\lambda _1^k\}\) associated with the stationary point \((x^k, y^k)\) is bounded, then the corresponding cluster point satisfies condition (19) with \(\lambda _0 \ne 0\).

Corollary 1

Let\(\{\varepsilon _k\}\)be a sequence such that\(\varepsilon _k \downarrow 0\), and\(\{(x^k, y^k)\}\)be a corresponding sequence of stationary points for(SBP\(_{\varepsilon ^k}\)). If there exists a bounded sequence of multipliers\(\lambda _1^k\)satisfying (20), then, any accumulation point\((\bar{x}, \bar{y})\)of\(\{(x^k, y^k)\}\)is a FJ point for (SBP) with\(\lambda _0 \ne 0\).

As observed in [27, Theorem 4.1], it can be proven that any accumulation point of a sequence \(\{(x^k, y^k)\}\) of (inexact) global solutions for (SBP\(_{\varepsilon ^k}\)), as \(\varepsilon ^k\) goes to zero, is globally optimal for (SBP). Of course, this is exactly what one would like to find, but (SBP\(_{\varepsilon ^k}\)) is a nonconvex program and, thus, computing one of its (inexact) global solutions may be impractical. In this sense, the result in the following proposition (which is reminiscent of [10, Theorem 4.4]) fits our approach better.

Proposition 4

Let\(\delta > 0\)and\(\{\varepsilon ^k\}\)be a sequence such that\(\varepsilon ^k \downarrow 0\)and\(\{(x^k, y^k)\}\)be a corresponding sequence of points belonging to\(W_{\varepsilon ^k}\)such that

$$\begin{aligned} F(x^k, y^k) \le F(x, y), \quad \forall (x, y) \in W_{\varepsilon ^k} \cap {\mathbb {B}}_\delta (x^k, y^k), \end{aligned}$$
(23)

where\({\mathbb {B}}_\delta (x^k, y^k) \in {\mathbb {R}}^{n_0 + n_1}\)is the open ball centered in\((x^k, y^k)\)with radius\(\delta \).

Then, each accumulation point\((\bar{x}, \bar{y})\)of\(\{(x^k, y^k)\}\)is local optimal for (SBP).

Proof

First, we note that, by the continuity of the functions involved, \(W_\varepsilon \) is outer semicontinuous at any \(\varepsilon \ge 0\), relative to \({\mathbb {R}}_+\); hence, we have \((\bar{x}, \bar{y}) \in W_0 = W\). Suppose by contradiction and without loss of generality that \((\widetilde{x}, \widetilde{y}) \in W \cap {\mathbb {B}}_{\frac{\delta }{2}} (\bar{x}, \bar{y})\) exists such that

$$\begin{aligned} F(\widetilde{x}, \widetilde{y}) < F(\bar{x}, \bar{y}). \end{aligned}$$
(24)

Since, without loss of generality, the whole sequence \(\{(x^k, y^k)\}\) converges to \((\bar{x}, \bar{y})\), we can say that \(\{(x^k, y^k)\} \in {\mathbb {B}}_{\frac{\delta }{2}} (\bar{x}, \bar{y})\), for every k sufficiently large. This, in turn, entails \((\widetilde{x}, \widetilde{y}) \in {\mathbb {B}}_{\delta } (x^k, y^k)\); observing that \(W = W_0 \subseteq W_{\varepsilon ^k}\), we have also \((\widetilde{x}, \widetilde{y}) \in W_{\varepsilon ^k}\) and, thus,

$$\begin{aligned} F(x^k, y^k) \le F(\widetilde{x}, \widetilde{y}). \end{aligned}$$

The latter relation, passing to the limit, contradicts (24). \(\square \)

Appendix B: Generalizations

The numerical approach described in Sect. 4, as anticipated in Sect. 2, can be suitably modified (see Algorithm 2) in order to cope also with a nonconvex objective function F and with a possibly unbounded set X. Also, we allow for the possible inexact (iterative) solution of subproblem (\(\hbox {P1}_\varepsilon \)).

For this to be done, taking inspiration from [18, 34,35,36, 38], we introduce the following modified version of problem (\(\text{ P1 }_\varepsilon (x^k,y^k,w^k)\)):

$$\begin{aligned} \begin{array}{cl} \underset{x,y}{\text{ minimize }} & \widetilde{F}(x,y; x^k, y^k) + \frac{\tau }{2} \Vert (x, y) - (x^k, y^k)\Vert ^2 \\ \text{ s.t. } & x \in X, \, y \in U\\ & f(x, y) \le f(x^k, w^k) + \nabla _1 f(x^k, w^k)^{\scriptscriptstyle T}(x - x^k) + \varepsilon , \end{array}\qquad \qquad \widetilde{\text {P1}}_\varepsilon (x^k,y^k,w^k) \end{aligned}$$

where \(\tau \) is a positive constant and \(\widetilde{F}: ({\mathbb {R}}^{n_0} \times {\mathbb {R}}^{n_1}) \times ({\mathbb {R}}^{n_0} \times {\mathbb {R}}^{n_1}) \rightarrow {\mathbb {R}}\) is a suitable convex approximation of F at the base point \((x^k, y^k)\) satisfying the following properties:

  1. (I)

    \( \widetilde{F}(\bullet , \bullet ; x^k, y^k)\) is convex for every \((x^k, y^k)\);

  2. (II)

    \(\nabla _{12} \widetilde{F}(\bullet , \bullet ; \bullet , \bullet )\) is continuous;

  3. (III)

    \(\nabla _{12} \widetilde{F}(x^k, y^k; x^k, y^k) = \nabla F(x^k, y^k)\) for every \((x^k, y^k)\);

where we denote by \(\nabla _{12} \widetilde{F}\) the gradient of \(\widetilde{F}\) with respect to the first and the second variables blocks. An easy example for the approximating function \(\widetilde{F}\) is the linearization of the continuously differentiable function F at the base point \((x^k, y^k)\), i.e. \(\widetilde{F}(x, y; x^k, y^k) = F(x^k, y^k) + \nabla F(x^k, y^k)^{\scriptscriptstyle T}[(x, y) - (x^k, y^k)]\). Clearly, one can employ any other possible convex approximant, like second order-type approximations, such that the mild conditions (II) and (III) hold, see [18, 34, 35].

Moreover, let \((\widehat{x}^k, \widehat{y}^k)\) be the unique solution of problem (\(\widetilde{\text {P1}}_\varepsilon (x^k,y^k,w^k)\)).

figure b

Note that Algorithm 2 differs from Algorithm 1 just in step (S.1). Here, see step (S.1a), we consider the possibly inaccurate computation of the solution of problem (\(\widetilde{\text {P1}}_\varepsilon (x^k,y^k,w^k)\)). The error \(\delta \) must obey rather standard rules (see the conditions in Theorem B.1 (i)). Condition \(\Vert (v^k, z^k) - (\widehat{x}^k, \widehat{y}^k) \Vert \le \delta ^k\) in (S.1a) would require us to estimate \(\Vert (v^k, z^k) - (\widehat{x}^k, \widehat{y}^k) \Vert \). This is possible by resorting to appropriate error bounds (see [19]) which are available for the strongly convex problem (\(\widetilde{\text {P1}}_\varepsilon (x^k,y^k,w^k)\)). The convergence properties of Algorithm 2 are summarized in the following theorem.

Theorem B.1

Assume that\(\nabla F\)is Lipschitz continuous with constantLand the level set\({\mathcal {L}}_\alpha \triangleq \{(x,y) \in X \times U \, : \, F(x,y) \le \alpha \}\)is bounded for every\(\alpha \in {\mathbb {R}}\). Let\(\{(x^k, y^k, w^k)\}\)be the sequence generated by Algorithm 2.

  1. (i)

    If the step-size\(\gamma \)is bounded away from zero and smaller than\(\min \{1, \frac{\tau }{L}\}\)and\(\delta ^k \le a \min \{b^k/\Vert \nabla F(x^k,y^k)\Vert ,c^k\}\)for some non negative\(a, b^k, c^k\)with\(\sum _k b^k < +\infty \)and\(\sum _k (c^k)^2 < +\infty \), then any cluster point\((\bar{x}, \bar{y}, \bar{w})\)of the bounded sequence\(\{(x^k, y^k, w^k)\}\)is a KKT point for (\(\text{GNEP}_{\varepsilon }\)) and, in turn\((\bar{x}, \bar{y})\)is stationary for (\(\text{SBP}_\varepsilon \));

  2. (ii)

    In the exact case, i.e.\(\delta ^k = 0\)for everyk, Algorithm 2 drives\(\Vert (\widehat{x}^k, \widehat{y}^k) - (x^k, y^k)\Vert \)below a prescribed tolerance\(\rho > 0\)after at most\({\mathcal {O}}(\rho ^{-2})\)iterations.

Proof

  1. (i)

    We prove by induction that \((x^k, y^k)\) is feasible for subproblem (\(\widetilde{\text {P1}}_\varepsilon (x^k, y^k, w^k)\)), for every k. Observe that \((x^0, y^0)\) is feasible for (\(\widetilde{\text {P1}}_\varepsilon (x^0, y^0, w^0)\)) by construction. Then, suppose that \((x^k, y^k)\) is feasible for (\(\widetilde{\text {P1}}_\varepsilon (x^k, y^k, w^k)\)). We now show that \((x^{k+1}, y^{k+1})\) is feasible for (\(\widetilde{\text {P1}}_\varepsilon (x^{k+1}, y^{k+1}, w^{k+1})\)). In view of step (S.1b) with \(\gamma \le 1\) and thanks to the convexity of problem (\(\widetilde{\text {P1}}_\varepsilon (x^k, y^k, w^k)\)), \((x^{k+1}, y^{k+1})\) is feasible for (\(\widetilde{\text {P1}}_\varepsilon (x^k, y^k, w^k)\)), that is

    $$\begin{aligned} f(x^{k+1},y^{k+1}) \le f(x^k,w^k) + \nabla _1 f(x^k,w^k)^{\scriptscriptstyle T}(x^{k+1} - x^k) + \varepsilon . \end{aligned}$$
    (25)

The convexity of \(\varphi \) (see Proposition 1) entails \(\varphi (x^k) + \nabla \varphi (x^k)^{\scriptscriptstyle T}(x^{k+1} - x^k) \le \varphi (x^{k+1})\). Since \(w^k \in S(x^k)\), we have \( \varphi (x^k) = f(x^k, w^k)\) and, by (7), \(\nabla _1 f(x^k, w^k) = \nabla \varphi (x^k)\). Moreover, since \(w^{k+1} \in S(x^{k+1})\), we have \(\varphi (x^{k+1}) = f(x^{k+1}, w^{k+1})\). In turn,

$$\begin{aligned} f(x^k, w^k) + \nabla _1 f(x^k, w^k)^{\scriptscriptstyle T}(x^{k+1} - x^k) \le f(x^{k+1}, w^{k+1}). \end{aligned}$$

Combining the latter inequality with (25), we obtain

$$\begin{aligned} \begin{array}{rcl} f(x^{k+1},y^{k+1}) & \le & f(x^{k+1},w^{k+1}) + \varepsilon \\ & = & f(x^{k+1},w^{k+1}) + \nabla _1 f(x^{k+1},w^{k+1})^{\scriptscriptstyle T}(x^{k+1} - x^{k+1}) + \varepsilon , \end{array} \end{aligned}$$

and thus \((x^{k+1}, y^{k+1})\) is feasible for (\(\widetilde{\text {P1}}_\varepsilon (x^{k+1}, y^{k+1}, w^{k+1})\)).

By the minimum principle, since \((x^k, y^k)\) and \((\widehat{x}^k, \widehat{y}^k)\) are feasible and optimal for problem (\(\widetilde{\text {P1}}_\varepsilon (x^k,y^k,w^k)\)), respectively,

$$\begin{aligned} \left( \nabla _{12} \widetilde{F}(\widehat{x}^k, \widehat{y}^k; x^k, y^k) + \tau \, [(\widehat{x}^k, \widehat{y}^k) - (x^k, y^k)]\right) ^{\scriptscriptstyle T}[(x^k, y^k) - (\widehat{x}^k, \widehat{y}^k)] \ge 0. \end{aligned}$$

Hence,

$$\begin{aligned} \nabla _{12} \widetilde{F}(\widehat{x}^k, \widehat{y}^k; x^k, y^k)^{\scriptscriptstyle T}[(\widehat{x}^k, \widehat{y}^k) - (x^k, y^k)] \le - \tau \, \Vert (\widehat{x}^k, \widehat{y}^k) - (x^k, y^k)\Vert ^2, \end{aligned}$$
(26)

and, in turn,

$$\begin{aligned} \begin{array}{rcl} \nabla F(x^k, y^k)^{\scriptscriptstyle T}[(\widehat{x}^k, \widehat{y}^k) - (x^k, y^k)] & = & \left[ \nabla _{12} \widetilde{F}(\widehat{x}^k, \widehat{y}^k; x^k, y^k) - \nabla _{12} \widetilde{F}(\widehat{x}^k, \widehat{y}^k; x^k, y^k)\right. \\ & & \left. + \nabla _{12} \widetilde{F}(x^k, y^k; x^k, y^k)\right] ^{\scriptscriptstyle T}[(\widehat{x}^k, \widehat{y}^k) - (x^k, y^k)]\\ & \le & - \tau \, \Vert (\widehat{x}^k, \widehat{y}^k) - (x^k, y^k)\Vert ^2, \end{array} \end{aligned}$$

where the equality follows from condition (III), and the inequality is due to assumption (I) and relation (26). As a consequence,

$$\begin{aligned} \begin{array}{l} \nabla F(x^k, y^k)^{\scriptscriptstyle T}[(v^k, z^k) - (\widehat{x}^k, \widehat{y}^k) + (\widehat{x}^k, \widehat{y}^k) - (x^k, y^k)] \le - \tau \, \Vert (\widehat{x}^k, \widehat{y}^k) - (x^k, y^k)\Vert ^2\\ \quad +\,\nabla F(x^k, y^k)^{\scriptscriptstyle T}[(v^k, z^k) - (\widehat{x}^k, \widehat{y}^k)]. \end{array} \end{aligned}$$
(27)

By the descent lemma [3, Proposition A.24] and step (S.1b) of the algorithm, we get

$$\begin{aligned} \begin{array}{rcl} F(x^{k+1}, y^{k+1}) & \le & F(x^{k}, y^{k}) + \gamma \nabla F(x^k, y^k)^{\scriptscriptstyle T}[(v^k, z^k) - (x^k, y^k)]\\ & & + \frac{\gamma ^2 L}{2} \Vert (v^{k}, z^{k}) - (x^k, y^k)\Vert ^2, \end{array} \end{aligned}$$

which, combined with (27), gives

$$\begin{aligned} \begin{array}{rcl} F(x^{k+1}, y^{k+1}) - F(x^{k}, y^{k}) & \le & +\gamma \Vert \nabla F(x^k, y^k)\Vert \delta ^k - \gamma \tau \, \Vert (\widehat{x}^k, \widehat{y}^k) - (x^k, y^k)\Vert ^2\\ & & + \frac{\gamma ^2 L}{2} \Vert (v^{k}, z^{k}) - (x^k, y^k)\Vert ^2.\\ \end{array} \end{aligned}$$
(28)

Since \(\Vert (v^k , z^k) - (x^k, y^k)\Vert ^2 \le 2\Vert (\widehat{x}^k , \widehat{y}^k) - (x^k, y^k)\Vert ^2 + 2\Vert (v^k , z^k) - (\widehat{x}^k , \widehat{y}^k)\Vert ^2 \le 2\Vert (\widehat{x}^k , \widehat{y}^k) - (x^k, y^k)\Vert ^2 + 2 (\delta ^k)^2\), from (28) we obtain

$$\begin{aligned} \begin{array}{rcl} F(x^{k+1}, y^{k+1}) - F(x^{k}, y^{k}) & \le & - \gamma \left( \tau - \gamma L\right) \Vert (\widehat{x}^{k}, \widehat{y}^{k}) - (x^k, y^k)\Vert ^2 + T^k\\ & = & -\gamma \, \eta \Vert (\widehat{x}^{k}, \widehat{y}^{k}) - (x^k, y^k)\Vert ^2 + T^k,\\ \end{array} \end{aligned}$$
(29)

where \(\eta \triangleq \left( \tau - \gamma L\right) > 0\) since the step-size is bounded away from zero and \(\gamma < \min \{1, \frac{\tau }{L}\}\), and \(T^k \triangleq \gamma \Vert \nabla F(x^k, y^k)\Vert \delta ^k + L(\gamma \delta ^k)^2\). By the sufficient decrease condition (29), in view of the assumption on any \(\mathcal {L}_\alpha \), and observing that \(\sum _{k=0}^\infty T^k < \infty \), we have \(\Vert (\widehat{x}^{k}, \widehat{y}^{k}) - (x^k, y^k)\Vert \rightarrow 0\) and \(\{(x^k, y^k, w^k)\}\) turns out to be bounded by [4, Lemma 3.4]. Reasoning similarly to what done in the proof of Theorem 4.1, the assertion follows readily leveraging condition (III) in the limit.

  1. (ii)

    Taking the sum of iterations up to N in both sides of (29) where \(T^k = 0\) for every k, and considering the worst case, that is \(\Vert (\widehat{x}^k, \widehat{y}^k) - (x^k, y^k)\Vert > \rho \) for every \(k \in \{0, \ldots , N\}\), we have

    $$\begin{aligned} \rho ^2 (N + 1) < \sum _{k = 0}^N \Vert (\widehat{x}^k, \widehat{y}^k) - (x^k, y^k)\Vert ^2 \le \frac{F(x^0, y^0) - F(x^{N+1}, y^{N+1})}{\gamma \eta } \le \frac{F^0 - F^m}{\gamma \eta }, \end{aligned}$$

    where \(F^0 \triangleq F(x^0, y^0)\) and \(F^m\) is the minimum value attained by the continuous function F on the set \(X \times U\). Therefore, in order to maintain the measure \(\Vert (\widehat{x}^k, \widehat{y}^k) - (x^k, y^k)\Vert \) greater than \(\rho \), the number of iterations cannot exceed the following bound:

    $$\begin{aligned} N + 1 < \frac{F^0 - F^m}{\gamma \eta \rho ^2}. \end{aligned}$$

    In turn, the claim in (ii) is proven. \(\square \)

Differently from the convex case (see Algorithm 1), when dealing with a nonconvex objective function F, one cannot rely on a unit step-size, in general: for this reason, the presence of \(\gamma \) in step (S.1b) in Algorithm 2 is required. We add that, apart from the constant one, other choices for the step-size are legitimate for our method to converge: in fact, one can prove that also diminishing or Armijo-like step-sizes (see [18]) can be employed in step (S.1b) of the algorithm. Finally, the result in (ii) provides one with a theoretical bound on the maximum number of iterations that are needed in order for the algorithm to satisfy a possible, practical stopping criterion up to an accuracy \(\delta \) (see also the comments in Sect. 5.1).

Appendix C: On the connections between bilevel problems and Nash games

Referring to the framework addressed in Sect. 2, here we summarize the distinctive properties of (\(\text{GNEP}_{\varepsilon }\)) compared to those of the Nash model introduced in [23], pointing out their different relations with (\(\text{SBP}_\varepsilon \)).

First, as for (\(\text{GNEP}_{\varepsilon }\)), we recall that:

  • (\(\text{GNEP}_{\varepsilon }\)) is a convex game that certainly has an equilibrium (see Remark 1);

  • equilibria of (\(\text{GNEP}_{\varepsilon }\)) lead to stationary points of (\(\text{SBP}_\varepsilon \)) and vice versa (see Theorem 3.1);

  • any optimal point of (\(\text{SBP}_\varepsilon \)) leads to an equilibrium of (\(\text{GNEP}_{\varepsilon }\)) (consequence of Theorem 3.1).

The Nash model as introduced in [23] and referred to (\(\text{SBP}_\varepsilon \)) reads as:

$$\begin{aligned} \begin{array}{clccl} \underset{x, y}{\text{ minimize }} & \; F(x,y) & & \underset{w}{\text{ minimize }} & f(x,w)\\ \text{ s.t. } & x \in X, \, y \in U & & \text{ s.t. } & w \in U.\\ & f(x, y) \le f(x, w) + \varepsilon & & & \\ \end{array} \end{aligned}$$
(30)

Even at first glance the differences between (\(\text{GNEP}_{\varepsilon }\)) and (30) are apparent. The problems of player 1 in (\(\text{GNEP}_{\varepsilon }\)) and (30) differ in the value function constraint. Furthermore, player 2 in (\(\text{GNEP}_{\varepsilon }\)) controls both u and w, while in (30) only w. Moreover,

  • (30) is a non necessarily convex game with a possibly empty equilibrium set (see Example 4);

  • stationary points of (\(\text{SBP}_\varepsilon \)) may not lead to equilibria of (30) (see Example 4);

  • any equilibrium of (30) leads to an optimal point of (\(\text{SBP}_\varepsilon \)) (see [23, Corollary 3.1]).

In the light of the analysis above, the properties of (\(\text{GNEP}_{\varepsilon }\)) are completely different from those of (30): in fact, [23] is intended to provide a theoretical analysis of the relations between (optimistic) bilevel problems and Nash games in terms of their optimal solutions, while the peculiar properties of (\(\text{GNEP}_{\varepsilon }\)), which is tailored to address stationary solutions, pave the way to algorithmic developments for (\(\text{SBP}_\varepsilon \)).

The following example from [23] witnesses the validity of the claims above.

Example 4

Consider the following (\(\text{SBP}_\varepsilon \)) with \(\varepsilon >0\) such that \(\sqrt{\varepsilon } < 3 - 2 \sqrt{2}\):

$$\begin{aligned} \begin{array}{cl} \underset{x,y}{\text{ minimize }} & x^2 + y^2\\ \text{ s.t. } & x \in [-1, 1], \, y \in [-1, 1]\\ & (x + y - 1)^2 \le \varphi (x) + \varepsilon , \end{array} \end{aligned}$$
(31)

where \(\varphi (x) \triangleq \min _w \{(x + w - 1)^2 \, : \, w \in [-1, 1]\}\). The unique solution of (31) is \((\bar{x}, \bar{y}) = \left( \frac{1-\sqrt{\varepsilon }}{2}, \frac{1-\sqrt{\varepsilon }}{2}\right) \). In this case GNEP (30) reads as:

$$\begin{aligned} \begin{array}{clccl} \underset{x, y}{\text{ minimize }} & \; x^2 + y^2 & & \underset{w}{\text{ minimize }} & (x + w - 1)^2\\ \text{ s.t. } & x \in [-1, 1], \, y \in [-1, 1] & & \text{ s.t. } & w \in [-1, 1],\\ & (x + y - 1)^2 \le (x + w - 1)^2 + \varepsilon & & & \\ \end{array} \end{aligned}$$
(32)

which, clearly is nonconvex due to the presence of the nonconvex value function constraint in player 1’s problem. The point \((\bar{x}, \bar{y}, \bar{w}) = \left( \frac{1-\sqrt{\varepsilon }}{2}, \frac{1-\sqrt{\varepsilon }}{2}, \frac{1+\sqrt{\varepsilon }}{2}\right) \), where \(\bar{w} = \frac{1+\sqrt{\varepsilon }}{2}\) is the unique solution of the player 2’s problem when \(x = \bar{x}\), is not an equilibrium for (32). It is actually a matter of calculation to show that the feasible point \((\widehat{x},\widehat{y}) = \left( 0,\frac{1+\sqrt{\varepsilon }}{2}\right) \) entails a lower value for the objective function of player 1’s problem when \(w=\bar{w}\). Therefore, since the unique solution of problem (\(\text{SBP}_\varepsilon \)) (which is a fortiori a stationary point for (\(\text{SBP}_\varepsilon \))) does not lead to an equilibrium of (32), by [23, Corollary 3.1], the set of equilibria of (32) is empty.

On the contrary, (\(\text{GNEP}_{\varepsilon }\)), that is,

$$\begin{aligned} \begin{array}{clccl} \underset{x, y}{\text{ minimize }} & \; x^2 + y^2 & & \underset{u,w}{\text{ minimize }} & (x + w - 1)^2 + \frac{(u - x)^2}{2}\\ \text{ s.t. } & x \in [-1, 1], \, y \in [-1, 1] & & \text{ s.t. } & w \in [-1, 1],\\ & (x + y - 1)^2 \le (u + w - 1)^2\\ & \qquad \qquad + 2 (u + w -1) (x - u) + \varepsilon & & & \\ \end{array} \end{aligned}$$

in view of the results in Sect. 3, is convex and the point \((\bar{x}, \bar{y}, \bar{x}, \bar{w})\) is one of its equilibria. \(\square \)

Summing up, as for global solutions, we sketch the problems’ relations in the following scheme.

figure c

Finally, as significant byproduct, also concerning the global solution sets, (\(\text{GNEP}_{\varepsilon }\)) nicely completes the picture initiated in [23]. In fact, the (xy)-part of the equilibrium set of GNEP (30) is a subset of the set of global solutions of (\(\text{SBP}_\varepsilon \)), while the (xy)-part of the equilibrium set of (\(\text{GNEP}_{\varepsilon }\)) is a superset of the set of global solutions of (\(\text{SBP}_\varepsilon \)).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lampariello, L., Sagratella, S. Numerically tractable optimistic bilevel problems. Comput Optim Appl 76, 277–303 (2020). https://doi.org/10.1007/s10589-020-00178-y

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-020-00178-y

Keywords

Navigation