Abstract
We consider a class of optimistic bilevel problems. Specifically, we address bilevel problems in which at the lower level the objective function is fully convex and the feasible set does not depend on the upper level variables. We show that this nontrivial class of mathematical programs is sufficiently broad to encompass significant real-world applications and proves to be numerically tractable. From this respect, we establish that the stationary points for a relaxation of the original problem can be obtained addressing a suitable generalized Nash equilibrium problem. The latter game is proven to be convex and with a nonempty solution set. Leveraging this correspondence, we provide a provably convergent, easily implementable scheme to calculate stationary points of the relaxed bilevel program. As witnessed by some numerical experiments on an application in economics, this algorithm turns out to be numerically viable also for big dimensional problems.
Similar content being viewed by others
References
Aussel, D., Sagratella, S.: Sufficient conditions to compute any solution of a quasivariational inequality via a variational inequality. Math. Methods Oper. Res. 85(1), 3–18 (2017)
Bank, B., Guddat, J., Klatte, D., Kummer, B., Tammer, K.: Non-Linear Parametric Optimization. Akademie-Verlag, Berlin (1982)
Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (1992)
Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)
Clarke, F.H.: Optimization and Nonsmooth Analysis. Wiley, Hoboken (1983)
Colson, B., Marcotte, P., Savard, G.: An overview of bilevel optimization. Ann. Oper. Res. 153(1), 235–256 (2007)
Dempe, S.: Foundations of Bilevel Programming. Springer, Berlin (2002)
Dempe, S.: Annotated bibliography on bilevel programming and mathematical programs with equilibrium constraints. Optimization 52(3), 333–359 (2003)
Dempe, S., Dutta, J.: Is bilevel programming a special case of a mathematical program with complementarity constraints? Math. Progr. 131(1–2), 37–48 (2012)
Dempe, S., Franke, S.: On the solution of convex bilevel optimization problems. Comput. Optim. Appl. 63(3), 685–703 (2016)
Dempe, S., Zemkoho, A.B.: The bilevel programming problem: reformulations, constraint qualifications and optimality conditions. Math. Progr. 138(1–2), 447–473 (2013)
Dreves, A., Facchinei, F., Fischer, A., Herrich, M.: A new error bound result for generalized Nash equilibrium problems and its algorithmic application. Comput. Optim. Appl. 59(1–2), 63–84 (2014)
Dreves, A., Facchinei, F., Kanzow, C., Sagratella, S.: On the solution of the KKT conditions of generalized Nash equilibrium problems. SIAM J. Optim. 21(3), 1082–1108 (2011)
Facchinei, F., Kanzow, C.: Generalized Nash equilibrium problems. Ann. Oper. Res. 175(1), 177–211 (2010)
Facchinei, F., Kanzow, C., Karl, S., Sagratella, S.: The semismooth Newton method for the solution of quasi-variational inequalities. Comput. Optim. Appl. 62(1), 85–109 (2015)
Facchinei, F., Kanzow, C., Sagratella, S.: Solving quasi-variational inequalities via their KKT conditions. Math. Progr. 144(1–2), 369–412 (2014)
Facchinei, F., Lampariello, L.: Partial penalization for the solution of generalized Nash equilibrium problems. J. Glob. Optim. 50(1), 39–57 (2011)
Facchinei, F., Lampariello, L., Scutari, G.: Feasible methods for nonconvex nonsmooth problems with applications in green communications. Math. Progr. 164(1–2), 55–90 (2017)
Facchinei, F., Pang, J.-S.: Finite-Dimensional Variational Inequalities and Complementarity Problems. Springer, Berlin (2003)
Facchinei, F., Pang, J.-S., Scutari, G., Lampariello, L.: Vi-constrained hemivariational inequalities: distributed algorithms and power control in ad-hoc networks. Math. Program. 145(1–2), 59–96 (2014)
Facchinei, F., Piccialli, V., Sciandrone, M.: Decomposition algorithms for generalized potential games. Comput. Opt. Appl. 50(2), 237–262 (2011)
Kanzow, C., Steck, D.: Augmented Lagrangian methods for the solution of generalized Nash equilibrium problems. SIAM J. Optim. 26(4), 2034–2058 (2016)
Lampariello, L., Sagratella, S.: A bridge between bilevel programs and Nash games. J. Optim. Theory Appl. 174(2), 613–635 (2017)
Lampariello, L., Sagratella, S., Stein, O.: The standard pessimistic bilevel problem. SIAM J. Optim. 29(2), 1634–1656 (2019)
Latorre, V., Sagratella, S.: A canonical duality approach for the solution of affine quasi-variational inequalities. J. Glob. Optim. 64(3), 433–449 (2016)
Lignola, M.B., Morgan, J.: Stability of regularized bilevel programming problems. J. Optim. Theory Appl. 93(3), 575–596 (1997)
Lin, G.-H., Xu, M., Ye, J.J.: On solving simple bilevel programs with a nonconvex lower level program. Math. Program. 144(1–2), 277–305 (2014)
Luo, Z.-Q., Pang, J.-S., Ralph, D.: Mathematical Programs with Equilibrium Constraints. Cambridge University Press, Cambridge (1996)
Outrata, J.V.: On the numerical solution of a class of Stackelberg problems. Z. Oper. Res. 34(4), 255–277 (1990)
Pang, J.-S., Fukushima, M.: Quasi-variational inequalities, generalized Nash equilibria, and multi-leader-follower games. Comput. Manag. Sci. 2, 21–56 (2009)
Pang, J.-S., Scutari, G.: Nonconvex games with side constraints. SIAM J. Optim. 21(4), 1491–1522 (2011)
Rockafellar, R.T., Wets, J.B.: Variational Analysis. Springer, Berlin (1998)
Sagratella, S.: Algorithms for generalized potential games with mixed-integer variables. Comput. Optim. Appl. 68(3), 689–717 (2017)
Scutari, G., Facchinei, F., Lampariello, L.: Parallel and distributed methods for constrained nonconvex optimization-part i: Theory. IEEE Trans. Signal Process. 65(8), 1929–1944 (2017)
Scutari, G., Facchinei, F., Lampariello, L., Sardellitti, S., Song, P.: Parallel and distributed methods for constrained nonconvex optimization-part ii: Applications in communications and machine learning. IEEE Trans. Signal Process. 65(8), 1945–1960 (2017)
Scutari, G., Facchinei, F., Lampariello, L., Song, P.: Parallel and distributed methods for nonconvex optimization. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 840–844. IEEE (2014)
Scutari, G., Facchinei, F., Pang, J.-S., Lampariello, L.: Equilibrium selection in power control games on the interference channel. In: 2012 Proceedings IEEE INFOCOM, pp. 675–683. IEEE (2012)
Song, P., Scutari, G., Facchinei, F., Lampariello, L.: D3m: Distributed multi-cell multigroup multicasting. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3741–3745. IEEE (2016)
Tanino, T., Ogawa, T.: An algorithm for solving two-level convex optimization problems. Int. J. Syst. Sci. 15(2), 163–174 (1984)
Von Stackelberg, H.: Marktform und Gleichgewicht. Springer, Berlin (1934)
Ye, J.J.: Constraint qualifications and KKT conditions for bilevel programming problems. Math. Oper. Res. 31(4), 811–824 (2006)
Zemkoho, A.B.: Solving ill-posed bilevel programs. Set Valued Var. Anal. 24(3), 423–448 (2016)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: What happens when \(\varepsilon \downarrow 0\)
Although the solution of the perturbed problem (\(\text{SBP}_\varepsilon \)) is significant and meaningful per se both in a theoretical and in a practical perspectives, the question arises quite naturally on what happens when the perturbation \(\varepsilon \) goes to zero (see e.g. [26, 27]).
As previously recalled, the original (SBP) is a nonconvex problem for which standard constraint qualifications are not readily at hand. Thus, the computation of a Fritz-John (FJ) point (in the sense of the following definition (see [5, Theorem 6.1.1])) for (SBP) may seem a reasonable goal.
Definition 5
(FJ point) Let (x, y) be feasible for (SBP). We say that (x, y) is a FJ point for (SBP), if multipliers \((\lambda _0, \lambda _1) \in {\mathbb {R}}^2_+\), not all zero, exist such that:
Unfortunately, by leveraging Proposition 1 and the first order optimality conditions for the lower level problem, one can easily show that any feasible point for (SBP) is also a FJ solution with \(\lambda _0 = 0\).
The following proposition gives the theoretical guarantee that Algorithm 1, with \(\varepsilon \downarrow 0\), provides us with a FJ point for (SBP): hence, in the worst case, we find a point that at least is feasible for (SBP).
Proposition 3
Let\(\{\varepsilon _k\}\)be a sequence such that\(\varepsilon _k \downarrow 0\), and\(\{(x^k, y^k)\}\)be a corresponding sequence of stationary points for(SBP\(_{\varepsilon ^k}\)). Then, any accumulation point\((\bar{x}, \bar{y})\) of \(\{(x^k, y^k)\}\)is a FJ point for (SBP).
Proof
We assume, without loss of generality, that the whole sequence \(\{(x^k, y^k)\}\) converges to \((\bar{x}, \bar{y})\). Let \(\lambda _1^k \in {\mathbb {R}}_+\) be such that
that is \(\lambda _1^k\) is a multiplier associated with the stationary point \((x^k, y^k)\).
We recall that the normal cones \(N_X(\bullet )\), \(N_U(\bullet )\) and \(N_{{\mathbb {R}}_-}(\bullet )\), considered as set-valued mappings, are outer semicontinuous at \(\bar{x}\), \(\bar{y}\) and \(f(\bar{x}, \bar{y}) - \varphi (\bar{x})\), relative to X, U and \({\mathbb {R}}_-\), respectively (see [32, Proposition 6.6]). In the subsequent developments, these fundamental properties will be freely invoked.
Preliminarily, we note that, by definition, we have \(x^k \in X\), \(y^k \in U\) and \(f(x^k, y^k) - \varphi (x^k) - \varepsilon ^k \le 0\) for every k. We distinguish two cases.
-
(i)
Consider a subsequence \(\{\lambda _1^k\}_{\mathcal {K}}\) such that \(\lambda _1^k \underset{\mathcal {K}}{\rightarrow } \bar{\lambda }_1\). Passing to the limit (over \({\mathcal {K}}\)) in (20), we have, by the continuity of the functions involved,
$$\begin{aligned} \begin{array}{r} - \nabla _1 F(\bar{x}, \bar{y})- \bar{\lambda }_1 \nabla _1 f(\bar{x}, \bar{y}) - \bar{\lambda }_1 \nabla \varphi (\bar{x}) \in N_X(\bar{x})\\ - \nabla _2 F(\bar{x}, \bar{y}) - \bar{\lambda }_1 \nabla _2 f(\bar{x}, \bar{y}) \in N_U(\bar{y})\\ \bar{\lambda }_1 \in N_{{\mathbb {R}}_-}(f(\bar{x}, \bar{y}) - \varphi (\bar{x})). \end{array} \end{aligned}$$(21)Hence, \((\bar{x}, \bar{y})\) is a FJ point for problem (SBP) with corresponding multipliers \((\lambda _0, \lambda _1) = (1, \bar{\lambda }_1)\).
-
(ii)
As opposed to case (i), let, without loss of generality, \(\lambda _1^k \rightarrow \infty \). Dividing both sides of relations (20) by \(\lambda _1^k\) and passing to the limit, we obtain
$$\begin{aligned} \begin{array}{r} - \nabla _1 f(\bar{x}, \bar{y}) - \nabla \varphi (\bar{x}) \in N_X(\bar{x})\\ - \nabla _2 f(\bar{x}, \bar{y}) \in N_U(\bar{y})\\ 1 \in N_{{\mathbb {R}}_-}(f(\bar{x}, \bar{y}) - \varphi (\bar{x})). \end{array} \end{aligned}$$(22)Thus, \((\bar{x}, \bar{y})\) is a FJ point for problem (SBP) with corresponding multipliers \((\lambda _0, \lambda _1) = (0, 1)\).
\(\square \)
Clearly, if the sequence of multipliers \(\{\lambda _1^k\}\) associated with the stationary point \((x^k, y^k)\) is bounded, then the corresponding cluster point satisfies condition (19) with \(\lambda _0 \ne 0\).
Corollary 1
Let\(\{\varepsilon _k\}\)be a sequence such that\(\varepsilon _k \downarrow 0\), and\(\{(x^k, y^k)\}\)be a corresponding sequence of stationary points for(SBP\(_{\varepsilon ^k}\)). If there exists a bounded sequence of multipliers\(\lambda _1^k\)satisfying (20), then, any accumulation point\((\bar{x}, \bar{y})\)of\(\{(x^k, y^k)\}\)is a FJ point for (SBP) with\(\lambda _0 \ne 0\).
As observed in [27, Theorem 4.1], it can be proven that any accumulation point of a sequence \(\{(x^k, y^k)\}\) of (inexact) global solutions for (SBP\(_{\varepsilon ^k}\)), as \(\varepsilon ^k\) goes to zero, is globally optimal for (SBP). Of course, this is exactly what one would like to find, but (SBP\(_{\varepsilon ^k}\)) is a nonconvex program and, thus, computing one of its (inexact) global solutions may be impractical. In this sense, the result in the following proposition (which is reminiscent of [10, Theorem 4.4]) fits our approach better.
Proposition 4
Let\(\delta > 0\)and\(\{\varepsilon ^k\}\)be a sequence such that\(\varepsilon ^k \downarrow 0\)and\(\{(x^k, y^k)\}\)be a corresponding sequence of points belonging to\(W_{\varepsilon ^k}\)such that
where\({\mathbb {B}}_\delta (x^k, y^k) \in {\mathbb {R}}^{n_0 + n_1}\)is the open ball centered in\((x^k, y^k)\)with radius\(\delta \).
Then, each accumulation point\((\bar{x}, \bar{y})\)of\(\{(x^k, y^k)\}\)is local optimal for (SBP).
Proof
First, we note that, by the continuity of the functions involved, \(W_\varepsilon \) is outer semicontinuous at any \(\varepsilon \ge 0\), relative to \({\mathbb {R}}_+\); hence, we have \((\bar{x}, \bar{y}) \in W_0 = W\). Suppose by contradiction and without loss of generality that \((\widetilde{x}, \widetilde{y}) \in W \cap {\mathbb {B}}_{\frac{\delta }{2}} (\bar{x}, \bar{y})\) exists such that
Since, without loss of generality, the whole sequence \(\{(x^k, y^k)\}\) converges to \((\bar{x}, \bar{y})\), we can say that \(\{(x^k, y^k)\} \in {\mathbb {B}}_{\frac{\delta }{2}} (\bar{x}, \bar{y})\), for every k sufficiently large. This, in turn, entails \((\widetilde{x}, \widetilde{y}) \in {\mathbb {B}}_{\delta } (x^k, y^k)\); observing that \(W = W_0 \subseteq W_{\varepsilon ^k}\), we have also \((\widetilde{x}, \widetilde{y}) \in W_{\varepsilon ^k}\) and, thus,
The latter relation, passing to the limit, contradicts (24). \(\square \)
Appendix B: Generalizations
The numerical approach described in Sect. 4, as anticipated in Sect. 2, can be suitably modified (see Algorithm 2) in order to cope also with a nonconvex objective function F and with a possibly unbounded set X. Also, we allow for the possible inexact (iterative) solution of subproblem (\(\hbox {P1}_\varepsilon \)).
For this to be done, taking inspiration from [18, 34,35,36, 38], we introduce the following modified version of problem (\(\text{ P1 }_\varepsilon (x^k,y^k,w^k)\)):
where \(\tau \) is a positive constant and \(\widetilde{F}: ({\mathbb {R}}^{n_0} \times {\mathbb {R}}^{n_1}) \times ({\mathbb {R}}^{n_0} \times {\mathbb {R}}^{n_1}) \rightarrow {\mathbb {R}}\) is a suitable convex approximation of F at the base point \((x^k, y^k)\) satisfying the following properties:
-
(I)
\( \widetilde{F}(\bullet , \bullet ; x^k, y^k)\) is convex for every \((x^k, y^k)\);
-
(II)
\(\nabla _{12} \widetilde{F}(\bullet , \bullet ; \bullet , \bullet )\) is continuous;
-
(III)
\(\nabla _{12} \widetilde{F}(x^k, y^k; x^k, y^k) = \nabla F(x^k, y^k)\) for every \((x^k, y^k)\);
where we denote by \(\nabla _{12} \widetilde{F}\) the gradient of \(\widetilde{F}\) with respect to the first and the second variables blocks. An easy example for the approximating function \(\widetilde{F}\) is the linearization of the continuously differentiable function F at the base point \((x^k, y^k)\), i.e. \(\widetilde{F}(x, y; x^k, y^k) = F(x^k, y^k) + \nabla F(x^k, y^k)^{\scriptscriptstyle T}[(x, y) - (x^k, y^k)]\). Clearly, one can employ any other possible convex approximant, like second order-type approximations, such that the mild conditions (II) and (III) hold, see [18, 34, 35].
Moreover, let \((\widehat{x}^k, \widehat{y}^k)\) be the unique solution of problem (\(\widetilde{\text {P1}}_\varepsilon (x^k,y^k,w^k)\)).
Note that Algorithm 2 differs from Algorithm 1 just in step (S.1). Here, see step (S.1a), we consider the possibly inaccurate computation of the solution of problem (\(\widetilde{\text {P1}}_\varepsilon (x^k,y^k,w^k)\)). The error \(\delta \) must obey rather standard rules (see the conditions in Theorem B.1 (i)). Condition \(\Vert (v^k, z^k) - (\widehat{x}^k, \widehat{y}^k) \Vert \le \delta ^k\) in (S.1a) would require us to estimate \(\Vert (v^k, z^k) - (\widehat{x}^k, \widehat{y}^k) \Vert \). This is possible by resorting to appropriate error bounds (see [19]) which are available for the strongly convex problem (\(\widetilde{\text {P1}}_\varepsilon (x^k,y^k,w^k)\)). The convergence properties of Algorithm 2 are summarized in the following theorem.
Theorem B.1
Assume that\(\nabla F\)is Lipschitz continuous with constantLand the level set\({\mathcal {L}}_\alpha \triangleq \{(x,y) \in X \times U \, : \, F(x,y) \le \alpha \}\)is bounded for every\(\alpha \in {\mathbb {R}}\). Let\(\{(x^k, y^k, w^k)\}\)be the sequence generated by Algorithm 2.
-
(i)
If the step-size\(\gamma \)is bounded away from zero and smaller than\(\min \{1, \frac{\tau }{L}\}\)and\(\delta ^k \le a \min \{b^k/\Vert \nabla F(x^k,y^k)\Vert ,c^k\}\)for some non negative\(a, b^k, c^k\)with\(\sum _k b^k < +\infty \)and\(\sum _k (c^k)^2 < +\infty \), then any cluster point\((\bar{x}, \bar{y}, \bar{w})\)of the bounded sequence\(\{(x^k, y^k, w^k)\}\)is a KKT point for (\(\text{GNEP}_{\varepsilon }\)) and, in turn\((\bar{x}, \bar{y})\)is stationary for (\(\text{SBP}_\varepsilon \));
-
(ii)
In the exact case, i.e.\(\delta ^k = 0\)for everyk, Algorithm 2 drives\(\Vert (\widehat{x}^k, \widehat{y}^k) - (x^k, y^k)\Vert \)below a prescribed tolerance\(\rho > 0\)after at most\({\mathcal {O}}(\rho ^{-2})\)iterations.
Proof
-
(i)
We prove by induction that \((x^k, y^k)\) is feasible for subproblem (\(\widetilde{\text {P1}}_\varepsilon (x^k, y^k, w^k)\)), for every k. Observe that \((x^0, y^0)\) is feasible for (\(\widetilde{\text {P1}}_\varepsilon (x^0, y^0, w^0)\)) by construction. Then, suppose that \((x^k, y^k)\) is feasible for (\(\widetilde{\text {P1}}_\varepsilon (x^k, y^k, w^k)\)). We now show that \((x^{k+1}, y^{k+1})\) is feasible for (\(\widetilde{\text {P1}}_\varepsilon (x^{k+1}, y^{k+1}, w^{k+1})\)). In view of step (S.1b) with \(\gamma \le 1\) and thanks to the convexity of problem (\(\widetilde{\text {P1}}_\varepsilon (x^k, y^k, w^k)\)), \((x^{k+1}, y^{k+1})\) is feasible for (\(\widetilde{\text {P1}}_\varepsilon (x^k, y^k, w^k)\)), that is
$$\begin{aligned} f(x^{k+1},y^{k+1}) \le f(x^k,w^k) + \nabla _1 f(x^k,w^k)^{\scriptscriptstyle T}(x^{k+1} - x^k) + \varepsilon . \end{aligned}$$(25)
The convexity of \(\varphi \) (see Proposition 1) entails \(\varphi (x^k) + \nabla \varphi (x^k)^{\scriptscriptstyle T}(x^{k+1} - x^k) \le \varphi (x^{k+1})\). Since \(w^k \in S(x^k)\), we have \( \varphi (x^k) = f(x^k, w^k)\) and, by (7), \(\nabla _1 f(x^k, w^k) = \nabla \varphi (x^k)\). Moreover, since \(w^{k+1} \in S(x^{k+1})\), we have \(\varphi (x^{k+1}) = f(x^{k+1}, w^{k+1})\). In turn,
Combining the latter inequality with (25), we obtain
and thus \((x^{k+1}, y^{k+1})\) is feasible for (\(\widetilde{\text {P1}}_\varepsilon (x^{k+1}, y^{k+1}, w^{k+1})\)).
By the minimum principle, since \((x^k, y^k)\) and \((\widehat{x}^k, \widehat{y}^k)\) are feasible and optimal for problem (\(\widetilde{\text {P1}}_\varepsilon (x^k,y^k,w^k)\)), respectively,
Hence,
and, in turn,
where the equality follows from condition (III), and the inequality is due to assumption (I) and relation (26). As a consequence,
By the descent lemma [3, Proposition A.24] and step (S.1b) of the algorithm, we get
which, combined with (27), gives
Since \(\Vert (v^k , z^k) - (x^k, y^k)\Vert ^2 \le 2\Vert (\widehat{x}^k , \widehat{y}^k) - (x^k, y^k)\Vert ^2 + 2\Vert (v^k , z^k) - (\widehat{x}^k , \widehat{y}^k)\Vert ^2 \le 2\Vert (\widehat{x}^k , \widehat{y}^k) - (x^k, y^k)\Vert ^2 + 2 (\delta ^k)^2\), from (28) we obtain
where \(\eta \triangleq \left( \tau - \gamma L\right) > 0\) since the step-size is bounded away from zero and \(\gamma < \min \{1, \frac{\tau }{L}\}\), and \(T^k \triangleq \gamma \Vert \nabla F(x^k, y^k)\Vert \delta ^k + L(\gamma \delta ^k)^2\). By the sufficient decrease condition (29), in view of the assumption on any \(\mathcal {L}_\alpha \), and observing that \(\sum _{k=0}^\infty T^k < \infty \), we have \(\Vert (\widehat{x}^{k}, \widehat{y}^{k}) - (x^k, y^k)\Vert \rightarrow 0\) and \(\{(x^k, y^k, w^k)\}\) turns out to be bounded by [4, Lemma 3.4]. Reasoning similarly to what done in the proof of Theorem 4.1, the assertion follows readily leveraging condition (III) in the limit.
-
(ii)
Taking the sum of iterations up to N in both sides of (29) where \(T^k = 0\) for every k, and considering the worst case, that is \(\Vert (\widehat{x}^k, \widehat{y}^k) - (x^k, y^k)\Vert > \rho \) for every \(k \in \{0, \ldots , N\}\), we have
$$\begin{aligned} \rho ^2 (N + 1) < \sum _{k = 0}^N \Vert (\widehat{x}^k, \widehat{y}^k) - (x^k, y^k)\Vert ^2 \le \frac{F(x^0, y^0) - F(x^{N+1}, y^{N+1})}{\gamma \eta } \le \frac{F^0 - F^m}{\gamma \eta }, \end{aligned}$$where \(F^0 \triangleq F(x^0, y^0)\) and \(F^m\) is the minimum value attained by the continuous function F on the set \(X \times U\). Therefore, in order to maintain the measure \(\Vert (\widehat{x}^k, \widehat{y}^k) - (x^k, y^k)\Vert \) greater than \(\rho \), the number of iterations cannot exceed the following bound:
$$\begin{aligned} N + 1 < \frac{F^0 - F^m}{\gamma \eta \rho ^2}. \end{aligned}$$In turn, the claim in (ii) is proven. \(\square \)
Differently from the convex case (see Algorithm 1), when dealing with a nonconvex objective function F, one cannot rely on a unit step-size, in general: for this reason, the presence of \(\gamma \) in step (S.1b) in Algorithm 2 is required. We add that, apart from the constant one, other choices for the step-size are legitimate for our method to converge: in fact, one can prove that also diminishing or Armijo-like step-sizes (see [18]) can be employed in step (S.1b) of the algorithm. Finally, the result in (ii) provides one with a theoretical bound on the maximum number of iterations that are needed in order for the algorithm to satisfy a possible, practical stopping criterion up to an accuracy \(\delta \) (see also the comments in Sect. 5.1).
Appendix C: On the connections between bilevel problems and Nash games
Referring to the framework addressed in Sect. 2, here we summarize the distinctive properties of (\(\text{GNEP}_{\varepsilon }\)) compared to those of the Nash model introduced in [23], pointing out their different relations with (\(\text{SBP}_\varepsilon \)).
First, as for (\(\text{GNEP}_{\varepsilon }\)), we recall that:
-
(\(\text{GNEP}_{\varepsilon }\)) is a convex game that certainly has an equilibrium (see Remark 1);
-
equilibria of (\(\text{GNEP}_{\varepsilon }\)) lead to stationary points of (\(\text{SBP}_\varepsilon \)) and vice versa (see Theorem 3.1);
-
any optimal point of (\(\text{SBP}_\varepsilon \)) leads to an equilibrium of (\(\text{GNEP}_{\varepsilon }\)) (consequence of Theorem 3.1).
The Nash model as introduced in [23] and referred to (\(\text{SBP}_\varepsilon \)) reads as:
Even at first glance the differences between (\(\text{GNEP}_{\varepsilon }\)) and (30) are apparent. The problems of player 1 in (\(\text{GNEP}_{\varepsilon }\)) and (30) differ in the value function constraint. Furthermore, player 2 in (\(\text{GNEP}_{\varepsilon }\)) controls both u and w, while in (30) only w. Moreover,
-
(30) is a non necessarily convex game with a possibly empty equilibrium set (see Example 4);
-
stationary points of (\(\text{SBP}_\varepsilon \)) may not lead to equilibria of (30) (see Example 4);
-
any equilibrium of (30) leads to an optimal point of (\(\text{SBP}_\varepsilon \)) (see [23, Corollary 3.1]).
In the light of the analysis above, the properties of (\(\text{GNEP}_{\varepsilon }\)) are completely different from those of (30): in fact, [23] is intended to provide a theoretical analysis of the relations between (optimistic) bilevel problems and Nash games in terms of their optimal solutions, while the peculiar properties of (\(\text{GNEP}_{\varepsilon }\)), which is tailored to address stationary solutions, pave the way to algorithmic developments for (\(\text{SBP}_\varepsilon \)).
The following example from [23] witnesses the validity of the claims above.
Example 4
Consider the following (\(\text{SBP}_\varepsilon \)) with \(\varepsilon >0\) such that \(\sqrt{\varepsilon } < 3 - 2 \sqrt{2}\):
where \(\varphi (x) \triangleq \min _w \{(x + w - 1)^2 \, : \, w \in [-1, 1]\}\). The unique solution of (31) is \((\bar{x}, \bar{y}) = \left( \frac{1-\sqrt{\varepsilon }}{2}, \frac{1-\sqrt{\varepsilon }}{2}\right) \). In this case GNEP (30) reads as:
which, clearly is nonconvex due to the presence of the nonconvex value function constraint in player 1’s problem. The point \((\bar{x}, \bar{y}, \bar{w}) = \left( \frac{1-\sqrt{\varepsilon }}{2}, \frac{1-\sqrt{\varepsilon }}{2}, \frac{1+\sqrt{\varepsilon }}{2}\right) \), where \(\bar{w} = \frac{1+\sqrt{\varepsilon }}{2}\) is the unique solution of the player 2’s problem when \(x = \bar{x}\), is not an equilibrium for (32). It is actually a matter of calculation to show that the feasible point \((\widehat{x},\widehat{y}) = \left( 0,\frac{1+\sqrt{\varepsilon }}{2}\right) \) entails a lower value for the objective function of player 1’s problem when \(w=\bar{w}\). Therefore, since the unique solution of problem (\(\text{SBP}_\varepsilon \)) (which is a fortiori a stationary point for (\(\text{SBP}_\varepsilon \))) does not lead to an equilibrium of (32), by [23, Corollary 3.1], the set of equilibria of (32) is empty.
On the contrary, (\(\text{GNEP}_{\varepsilon }\)), that is,
in view of the results in Sect. 3, is convex and the point \((\bar{x}, \bar{y}, \bar{x}, \bar{w})\) is one of its equilibria. \(\square \)
Summing up, as for global solutions, we sketch the problems’ relations in the following scheme.
Finally, as significant byproduct, also concerning the global solution sets, (\(\text{GNEP}_{\varepsilon }\)) nicely completes the picture initiated in [23]. In fact, the (x, y)-part of the equilibrium set of GNEP (30) is a subset of the set of global solutions of (\(\text{SBP}_\varepsilon \)), while the (x, y)-part of the equilibrium set of (\(\text{GNEP}_{\varepsilon }\)) is a superset of the set of global solutions of (\(\text{SBP}_\varepsilon \)).
Rights and permissions
About this article
Cite this article
Lampariello, L., Sagratella, S. Numerically tractable optimistic bilevel problems. Comput Optim Appl 76, 277–303 (2020). https://doi.org/10.1007/s10589-020-00178-y
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-020-00178-y