A Second-Order Sufficient Optimality Condition for Risk-Neutral Bi-level Stochastic Linear Programs

Claus, Matthias

doi:10.1007/s10957-020-01775-x

A Second-Order Sufficient Optimality Condition for Risk-Neutral Bi-level Stochastic Linear Programs

Open access
Published: 10 November 2020

Volume 188, pages 243–259, (2021)
Cite this article

Download PDF

You have full access to this open access article

Journal of Optimization Theory and Applications Aims and scope Submit manuscript

A Second-Order Sufficient Optimality Condition for Risk-Neutral Bi-level Stochastic Linear Programs

Download PDF

Matthias Claus ORCID: orcid.org/0000-0002-2617-0855¹

791 Accesses
2 Citations
Explore all metrics

Abstract

The expectation functionals, which arise in risk-neutral bi-level stochastic linear models with random lower-level right-hand side, are known to be continuously differentiable, if the underlying probability measure has a Lebesgue density. We show that the gradient may fail to be local Lipschitz continuous under this assumption. Our main result provides sufficient conditions for Lipschitz continuity of the gradient of the expectation functional and paves the way for a second-order optimality condition in terms of generalized Hessians. Moreover, we study geometric properties of regions of strong stability and derive representation results, which may facilitate the computation of gradients.

Strong convexity in risk-averse stochastic programs with complete recourse

Article 01 August 2018

Matthias Claus, Rüdiger Schultz & Kai Spürkel

Quadratic two-stage stochastic optimization with coherent measures of risk

Article 04 March 2017

Jie Sun, Li-Zhi Liao & Brian Rodrigues

First order asymptotics of the sample average approximation method to solve risk averse stochastic programs

Article Open access 26 December 2023

Volker Krätschmer

1 Introduction

We study bi-level stochastic linear programs with random right-hand side in the lower-level constraint system. The sequential nature of bi-level programming motivates a setting where the leader decides nonanticipatorily, while the follower can observe the realization of the randomness. A discussion of the related literature is provided in the recent [1]. A central result of [1] states that evaluating the leader’s random outcome by taking the expectation leads to a continuously differentiable functional if the underlying probability measure is absolutely continuous w.r.t. the Lebesgue measure. This allows to formulate first-order necessary optimality conditions for the risk-neutral model. The main result of the present work provides sufficient conditions, namely boundedness of the support and uniform boundedness of the Lebesgue density of the underlying probability measure, that ensure Lipschitz continuity of the gradient of the expectation functional. Moreover, we show that the assumptions of [1] are too weak to even guarantee local Lipschitz continuity of the gradient. By the main result, second-order necessary and sufficient optimality conditions can be formulated in terms of generalized Hessians. As part of the preparatory work for the proof of the main result, we in particular show that any region of strong stability in the sense of [1, Definition 4.1] is a finite union of polyhedral cones. This representation is of independent interest, as it may facilitate the calculation or estimation of gradients of the expectation functional and thus enhance gradient descent-based approaches. The paper is organized as follows: The model and related results of [1] are discussed in Sect. 2, while the main result and a variation with weaker assumptions are formulated in Sect. 3. Sections 4 and 5 are dedicated to geometric properties of regions of strong stability and related projections that appear in the representation of the gradient. Results of these sections play an important role in the proof of the main result that is given in Sect. 6. A second-order sufficient optimality condition is formulated in Sect. 7. The paper concludes with a brief discussion of the results and an outlook in Sect. 8.

2 Model and Notation

Consider the optimistic formulation of a parametric bi-level linear program

$$\begin{aligned} \min _x \left\{ c^\top x + \min _y \{q^\top y \; :\; y \in \varPsi (x,z)\} \; :\; x \in X \right\} , \end{aligned}$$

(1)

where $z \in {\mathbb {R}}^s$ is a parameter and the data comprise a nonempty polyhedron $X \subseteq {\mathbb {R}}^n$, vectors $c \in {\mathbb {R}}^n$, $q \in {\mathbb {R}}^m$ and the lower-level optimal solution set mapping $\varPsi : {\mathbb {R}}^n \times {\mathbb {R}}^s \rightrightarrows {\mathbb {R}}^m$ defined by

$$\begin{aligned} \varPsi (x,z) := \underset{y}{\mathrm {Argmin}} \; \{d^\top y \; :\; Ay \le Tx + z\} \end{aligned}$$

with $A \in {\mathbb {R}}^{s \times m}$, $T \in {\mathbb {R}}^{s \times n}$ and $d \in {\mathbb {R}}^m$. By [1, Lemma 2.1], the extended real-valued mapping $f: {\mathbb {R}}^n \times {\mathbb {R}}^s \rightarrow \overline{{\mathbb {R}}} := {\mathbb {R}} \cup \lbrace \pm \infty \rbrace $ given by

$$\begin{aligned} f(x,z) := c^\top x + \min _y \lbrace q^\top y \; :\; y \in \varPsi (x,z) \rbrace . \end{aligned}$$

is real valued and Lipschitz continuous on the polyhedron

$$\begin{aligned} F = \{(x,z) \in {\mathbb {R}}^n \times {\mathbb {R}}^s \; :\; \exists y \in {\mathbb {R}}^m: Ay \le Tx + z\} \end{aligned}$$

if $\mathrm {dom} \; f$ is nonempty. Let $Z: \Omega \rightarrow {\mathbb {R}}^s$ be a random vector on some probability space $(\Omega , {\mathcal {F}}, {\mathbb {P}})$ and denote the induced Borel probability measure by $\mu _Z = {\mathbb {P}} \circ Z^{-1} \in {\mathcal {P}}({\mathbb {R}}^s)$. Furthermore, we introduce the set

$$\begin{aligned} F_Z := \lbrace x \in {\mathbb {R}}^n \; :\; (x,z) \in F \; \forall z \in \mathrm {supp} \; \mu _Z \rbrace . \end{aligned}$$

If $\mathrm {dom} \; f$ is nonempty and we impose the moment condition

$$\begin{aligned} \mu _Z \in {\mathcal {M}}^1_s := \left\{ \mu \in {\mathcal {P}}({\mathbb {R}}^s) \; :\; \int _{{\mathbb {R}}^s} \Vert z\Vert ~\mu (\mathrm{d}z) < \infty \right\} , \end{aligned}$$

the mapping ${\mathbb {F}}: F_Z \rightarrow L^1(\Omega , {\mathcal {F}}, {\mathbb {P}})$ given by ${\mathbb {F}}(x) := f(x,Z(\cdot ))$ is well defined and Lipschitz continuous by [1, Lemma 2.4]. In a situation where the parameter z in (1) is given by a realization of the random vector Z that the follower can observe while the leader has to decide x nonanticipatorily, the upper-level outcome can be modeled by ${\mathbb {F}}(x)$. If we assume $X \subseteq F_Z$ and the leader’s decision is based on the expectation, we obtain the risk-neutral stochastic program

$$\begin{aligned} \min _x \left\{ {\mathbb {E}}[{\mathbb {F}}(x)] \; :\; x \in X \right\} . \end{aligned}$$

(2)

The following is shown in [1, Theorem 3.1, Corollary 4.7]:

Theorem 2.1

Assume $\mathrm {dom} \; f \ne \emptyset $ and that $\mu _Z \in {\mathcal {M}}^1_s$ is absolutely continuous w.r.t. the Lebesgue measure. Then, the mapping ${\mathcal {Q}}_{\mathbb {E}}: F_Z \rightarrow {\mathbb {R}}$ defined by ${\mathcal {Q}}_{\mathbb {E}}(x) = {\mathbb {E}}[{\mathbb {F}}(x)]$ is well defined, Lipschitz continuous and continuously differentiable at any $x_0 \in \mathrm {int} \; F_Z$.

We shall discuss some key ideas of the proof and introduce the relevant notation: Set

$$\begin{aligned} {\hat{q}} := \begin{pmatrix} q \\ -q \\ 0_s \end{pmatrix}, \; {\hat{y}} := \begin{pmatrix} y_+ \\ y_- \\ t \end{pmatrix}, \; {\hat{d}} := \begin{pmatrix} d \\ -d \\ 0_s \end{pmatrix}, \; \text {and} \; {\hat{A}} := (A,-A,I_s), \end{aligned}$$

and then, f admits the representation

$$\begin{aligned} f(x,z) = c^\top x + \min _{{\hat{y}}} \big \{ {\hat{q}}^\top {\hat{y}} \; :\; {\hat{y}} \in \underset{{\hat{y}}'}{\mathrm {Argmin}} \lbrace {\hat{d}}^\top {\hat{y}}' \; :\; {\hat{A}}{\hat{y}}' = Tx+z, \; {\hat{y}}' \ge 0 \rbrace \big \}. \end{aligned}$$

(3)

Remark 2.1

The subsequent analysis does not depend on the specific structure of ${\hat{q}}, {\hat{d}}$ and ${\hat{A}}$ and applies whenever (3) holds with some matrix ${\hat{A}}$ satisfying $\mathrm {rank} \; {\hat{A}} = s$.

As the rows of ${\hat{A}}$ are linearly independent, the set

$$\begin{aligned} {\mathcal {A}} := \lbrace {\hat{A}}_B \in {\mathbb {R}}^{s \times s} \; :\; {\hat{A}}_B \; \text {is a regular submatrix of } {\hat{A}} \rbrace \end{aligned}$$

of lower-level base matrices is nonempty. A base matrix ${\hat{A}}_B \in {\mathcal {A}}$ is optimal for the lower-level problem for a given (x, z) if it is feasible, i.e.,

${\hat{A}}_B^{-1}(Tx+z) \ge 0$, and the associated reduced cost vector ${\hat{d}}_N^\top - {\hat{d}}_B^\top {\hat{A}}_B^{-1} {\hat{A}}_N$ is nonnegative. Furthermore, for any optimal base matrix ${\hat{A}}_{B'} \in {\mathcal {A}}$, there exists a feasible base matrix ${\hat{A}}_B \in {\mathcal {A}}$ satisfying

$$\begin{aligned} {\hat{A}}_{B'}^{-1}(Tx+z) = {\hat{A}}_{B}^{-1}(Tx+z) \; \; \text {and} \; \; {\hat{d}}_N^\top - {\hat{d}}_B^\top {\hat{A}}_B^{-1} {\hat{A}}_N \ge 0. \end{aligned}$$

Set

$$\begin{aligned} {\mathcal {A}}^*:= \lbrace {\hat{A}}_B \in {\mathcal {A}} \; :\; {\hat{d}}_N^\top - {\hat{d}}_B^\top {\hat{A}}_B^{-1} {\hat{A}}_N \ge 0 \rbrace \end{aligned}$$

and assume $\mathrm {dom} \; f \ne \emptyset $, and then,

$$\begin{aligned} f(x,z) = c^\top x + \min _{{\hat{A}}_B} \big \{ {\hat{q}}^\top _B {\hat{A}}_B^{-1}(Tx+z) \; :\; {\hat{A}}_B^{-1}(Tx+z) \ge 0, \; {\hat{A}}_B \in {\mathcal {A}}^*\big \} \end{aligned}$$

holds for any $(x,z) \in F$. A key concept is the region of strong stability associated with a base matrix ${\hat{A}}_{B} \in {\mathcal {A}}^*$ given by the set

$$\begin{aligned} {\mathcal {S}}({\hat{A}}_B) := \lbrace (x,z) \in F \; :\; {\hat{A}}_B^{-1}(Tx + z) \ge 0, \; c^\top x + {\hat{q}}^\top _B {\hat{A}}_B^{-1}(Tx+z) = f(x,z) \rbrace , \end{aligned}$$

on which f coincides with the affine linear mapping

$$\begin{aligned} f(x,z) = c^\top x + {\hat{q}}^\top _B {\hat{A}}_B^{-1}(Tx+z). \end{aligned}$$

Under the assumptions of Theorem 2.1, we have

$$\begin{aligned} F = \bigcup _{{\hat{A}}_B \in {\mathcal {A}}^*} {\mathcal {S}}({\hat{A}}_B) \end{aligned}$$

and the gradient of ${\mathcal {Q}}_{\mathbb {E}}$ admits the representation

$$\begin{aligned} \nabla {\mathcal {Q}}_{{\mathbb {E}}}(x) = c^\top + \sum _{\varDelta \in D} \mu _Z[{\mathcal {W}}(x,\varDelta )] \varDelta \; \; \forall x \in \mathrm {int} \; F_Z \end{aligned}$$

(4)

where $D := \lbrace {\hat{q}}_B^\top {\hat{A}}_B^{-1} T \; :\; {\hat{A}}_B \in {\mathcal {A}}^*\rbrace $, and the set-valued aggregation mappings ${\mathcal {W}}, \overline{{\mathcal {W}}}: {\mathbb {R}}^n \times D \rightrightarrows {\mathbb {R}}^s$ are given by

$$\begin{aligned}&{\mathcal {W}}(x,\varDelta ) :=\left\{ z \in {\mathbb {R}}^s \; \ \; (x,z) \in \bigcup _{{\hat{A}}_B \in {\mathcal {A}}^*: \; {\hat{q}}_B^\top {\hat{A}}_B^{-1} T = \varDelta } \mathrm {int} \; {\mathcal {S}}(A_B)\right\} \\ \text {and} \; \;&\overline{{\mathcal {W}}}(x,\varDelta ) := \left\{ z \in {\mathbb {R}}^s \; \ \; (x,z) \in \bigcup _{{\hat{A}}_B \in {\mathcal {A}}^*: \; {\hat{q}}_B^\top {\hat{A}}_B^{-1} T = \varDelta } \mathrm {cl} \; \mathrm {int} \; {\mathcal {S}}(A_B)\right\} , \end{aligned}$$

respectively (cf. [1, Theorem 4.3, Corollary 4.7]). Continuity of the $\nabla {\mathcal {Q}}_{{\mathbb {E}}}$ follows from the fact that the outer semicontinuity of $\overline{{\mathcal {W}}}$ and

$$\begin{aligned} \sum _{\varDelta \in D} \mu _Z[\overline{{\mathcal {W}}}(x,\varDelta )] = 1 \; \; \forall (x, \varDelta ) \in \mathrm {int} \; F_Z\times D \end{aligned}$$

imply continuity of the weight functional $M_\varDelta : {\mathbb {R}}^n \rightarrow {\mathbb {R}}$,

$$\begin{aligned} M_\varDelta (x) := \mu _Z[{\mathcal {W}}(x,\varDelta )] = \mu _Z[\overline{{\mathcal {W}}}(x,\varDelta )]. \end{aligned}$$

(5)

for any $\varDelta \in D$.

3 Main Result

We shall first show that the assumptions of Theorem 2.1 are too weak to guarantee Lipschitz continuity $\nabla {\mathcal {Q}}_{\mathbb {E}}$.

Example 3.1

Consider the case where

$$\begin{aligned} {\hat{d}} = (0,0,0,0)^\top , \; {\hat{A}} = \begin{pmatrix} 1 &{}0 &{}1 &{}1 \\ 0 &{}1 &{}\frac{3}{2} &{} \frac{1}{2} \end{pmatrix} \; \; \text {and} \; \; T = (0,1)^\top . \end{aligned}$$

The feasible set of the lower-level problem is compact for any parameters in the polyhedral cone $F = \lbrace (x,z) \in {\mathbb {R}} \times {\mathbb {R}}^2 \; :\; z_1 \ge 0, \; x + z_2 \ge 0 \rbrace $, which implies that $\mathrm {dom} \; f$ coincides with F for any ${\hat{q}} \in {\mathbb {R}}^4$. As the objective function is constant, any feasible base matrix is optimal for the lower-level problem. Denote the elements of ${\mathcal {A}} = {\mathcal {A}}^*$ by ${\hat{A}}_1, \ldots , {\hat{A}}_6$, and let

$$\begin{aligned} \varTheta _i = \lbrace (x,z) \in {\mathbb {R}}^n \times {\mathbb {R}}^s \; :\; {\hat{A}}_i^{-1}(Tx + z) \ge 0 \rbrace \end{aligned}$$

be the set of parameters for which ${\hat{A}}_i$ is feasible for the lower-level problem. A straightforward calculation shows that we have

$$\begin{aligned} {\hat{A}}_1 = \begin{pmatrix} 1 &{}0 \\ 0 &{}1 \end{pmatrix}, \; {\hat{A}}_1^{-1} = \begin{pmatrix} 1 &{}0 \\ 0 &{}1 \end{pmatrix}, \; \varTheta _1 = \lbrace (x,z) \in {\mathbb {R}} \times {\mathbb {R}}^2 \; :\; z_1 \ge 0, \; x + z_2 \ge 0 \rbrace , \\ {\hat{A}}_2 = \begin{pmatrix} 1 &{}1 \\ 0 &{}\frac{3}{2} \end{pmatrix}, \; {\hat{A}}_2^{-1} = \begin{pmatrix} 1 &{}-\frac{2}{3} \\ 0 &{}\frac{2}{3} \end{pmatrix}, \; \varTheta _2 = \lbrace (x,z) \in {\mathbb {R}} \times {\mathbb {R}}^2 \; :\; 0 \le x + z_2 \le \frac{3}{2}z_1 \rbrace , \\ {\hat{A}}_3 = \begin{pmatrix} 1 &{}1 \\ 0 &{}\frac{1}{2} \end{pmatrix}, \; {\hat{A}}_3^{-1} = \begin{pmatrix} 1 &{}-2 \\ 0 &{}2 \end{pmatrix}, \; \varTheta _3 = \lbrace (x,z) \in {\mathbb {R}} \times {\mathbb {R}}^2 \; :\; 0 \le x + z_2 \le \frac{1}{2} z_1 \rbrace , \\ {\hat{A}}_4 = \begin{pmatrix} 0 &{}1 \\ 1 &{}\frac{3}{2} \end{pmatrix}, \; {\hat{A}}_4^{-1} = \begin{pmatrix} -\frac{3}{2} &{}1 \\ 1 &{}0 \end{pmatrix}, \; \varTheta _4 = \lbrace (x,z) \in {\mathbb {R}} \times {\mathbb {R}}^2 \; :\; 0 \le \frac{3}{2} z_1 \le x + z_2 \rbrace \\ {\hat{A}}_5 = \begin{pmatrix} 0 &{}1 \\ 1 &{}\frac{1}{2} \end{pmatrix}, \; {\hat{A}}_5^{-1} = \begin{pmatrix} -\frac{1}{2} &{}1 \\ 1 &{}0 \end{pmatrix}, \; \varTheta _5 = \lbrace (x,z) \in {\mathbb {R}} \times {\mathbb {R}}^2 \; :\; 0 \le \frac{1}{2} z_1 \le x + z_2\rbrace \end{aligned}$$

and

$$\begin{aligned} {\hat{A}}_6 = \begin{pmatrix} 1 &{}1 \\ \frac{3}{2} &{}\frac{1}{2} \end{pmatrix}, {\hat{A}}_6^{-1} = \begin{pmatrix} -\frac{1}{2} &{}1 \\ \frac{3}{2} &{}-1 \end{pmatrix}, \varTheta _6 = \lbrace (x,z) \in {\mathbb {R}} \times {\mathbb {R}}^2 \; :\; \frac{1}{2} z_1 \le x + z_2 \le \frac{3}{2} z_1 \rbrace . \end{aligned}$$

Set ${\hat{q}} = (0,0,-5,-3)^\top $, and let ${\hat{q}}_i$ denote the part of upper-level objective function that is associated with ${\hat{A}}_i$. We have

$$\begin{aligned}&{\hat{q}}_1^\top {\hat{A}}_1^{-1}T = 0, \; \; \; {\hat{q}}_2^\top {\hat{A}}_2^{-1}T = -\frac{10}{3}, \; \; \; {\hat{q}}_3^\top {\hat{A}}_3^{-1}T = -6, \\&{\hat{q}}_4^\top {\hat{A}}_4^{-1}T = 0, \; \; \; {\hat{q}}_5^\top {\hat{A}}_5^{-1}T = 0, \; \; \; {\hat{q}}_6^\top {\hat{A}}_6^{-1}T = -2 \end{aligned}$$

and a straightforward calculation yields

$$\begin{aligned} {\mathcal {S}}({\hat{A}}_1)&= \lbrace (x,z) \in {\mathbb {R}} \times {\mathbb {R}}^2 \; :\; z_1 = 0, \; x + z_2 \ge 0 \rbrace \\&\cup \lbrace (x,z) \in {\mathbb {R}} \times {\mathbb {R}}^2 \; :\; z_1 \ge 0, \; x + z_2 = 0 \rbrace \\&= \mathrm {bd} \; \varTheta _1, \\ {\mathcal {S}}({\hat{A}}_2)&= \lbrace (x,z) \in {\mathbb {R}} \times {\mathbb {R}}^2 \; :\; 0 = x + z_2 \le \frac{3}{2}z_1 \rbrace \\&\cup \lbrace (x,z) \in {\mathbb {R}} \times {\mathbb {R}}^2 \; :\; 0 \le x + z_2 = \frac{3}{2}z_1 \rbrace \\&= \mathrm {bd} \; \varTheta _2, \\ {\mathcal {S}}({\hat{A}}_3)&= \varTheta _3, \; \; {\mathcal {S}}({\hat{A}}_4) = \varTheta _4, \; \; {\mathcal {S}}({\hat{A}}_6) = \varTheta _6 \; \; \text {and} \\ {\mathcal {S}}({\hat{A}}_5)&= \lbrace (x,z) \in {\mathbb {R}} \times {\mathbb {R}}^2 \; :\; 0 = \frac{1}{2} z_1 \le x + z_2\rbrace \\&\cup \lbrace (x,z) \in {\mathbb {R}} \times {\mathbb {R}}^2 \; :\; 0 \le \frac{1}{2} z_1 = x + z_2\rbrace \\&= \mathrm {bd} \; \varTheta _5. \end{aligned}$$

Let the density $\delta _Z: {\mathbb {R}}^2 \rightarrow {\mathbb {R}}$ of Z be given by

$$\begin{aligned} \delta _Z(t_1,t_2) = \left\{ \begin{array}{ll} \frac{1}{2\sqrt{\frac{1}{2}t_1 - t_2}}, &{}\quad \text {if} \; 3 \le t_1 \le 4 \; \text {and} \; \frac{1}{2}t_1 - 1 \le t_2 < \frac{1}{2}t_1\\ 0, &{}\quad \text {else} \end{array} \right. \end{aligned}$$

and set $c=0$. We have $\mathrm {supp} \; \mu _Z \subset \overline{{\mathcal {W}}}(0,-6)$, and it is easy to see that

$$\begin{aligned}&\overline{{\mathcal {W}}}(x,-6) \cap \mathrm {supp} \; \mu _Z = \lbrace z \in {\mathbb {R}}^2 \; :\; 3 \le z_1, \le 4, \; \frac{1}{2}z_1 - 1 \le z_2 \le \frac{1}{2}z_1 - x \rbrace \\ \text {and} \;&\overline{{\mathcal {W}}}(x,-2) \cap \mathrm {supp} \; \mu _Z = \lbrace z \in {\mathbb {R}}^2 \; :\; 3 \le z_1, \le 4, \; \frac{1}{2}z_1 - x \le z_2 < \frac{1}{2}z_1 \rbrace \end{aligned}$$

hold true whenever $x \in ]0,1]$ (Fig. 1).

Thus, $\mathrm {supp} \; \mu _Z \subset \overline{{\mathcal {W}}}(x,-6) \cup \overline{{\mathcal {W}}}(x,-2)$ holds for any $x \in [0,1]$ and a simple calculation shows that

$$\begin{aligned} \nabla {\mathcal {Q}}_{\mathbb {E}}(x)&= -6\mu _Z[{\overline{W}}(x,-6)] -2\mu _Z[{\overline{W}}(x,-2)] \\&= -6(1-\sqrt{x}) - 2\sqrt{x} = 4\sqrt{x} - 6 \end{aligned}$$

is not locally Lipschitz continuous at $x=0$.

Our main result is the following sufficient conditions for Lipschitz continuity of $\nabla {\mathcal {Q}}_{\mathbb {E}}$:

Theorem 3.1

Assume $\mathrm {dom} \; f \ne \emptyset $ and let $\mu _Z$ be absolutely continuous w.r.t. the Lebesgue measure and have a bounded support as well as a uniformly bounded density. Then, ${\mathcal {Q}}_{\mathbb {E}}$ is differentiable on $\mathrm {int} \; F_Z$ with Lipschitz continuous gradient.

Note the density in Example 3.1 is not bounded. The proof of Theorem 3.1 requires some preliminary work and will be given in Sect. 6. If the support of $\mu _Z$ is unbounded, we still obtain a weaker estimate for the gradients:

Theorem 3.2

Assume $\mathrm {dom} \; f \ne \emptyset $ and let $\mu _Z$ be absolutely continuous w.r.t. the Lebesgue measure and have a uniformly bounded density. Then, ${\mathcal {Q}}_{\mathbb {E}}$ is differentiable on $\mathrm {int} \; F_Z$ and for any $\epsilon > 0$ there exists a constant $L(\epsilon ) > 0$ such that

$$\begin{aligned} |\nabla {\mathcal {Q}}_{{\mathbb {E}}}(x) - \nabla {\mathcal {Q}}_{{\mathbb {E}}}(x')| \le L(\epsilon ) \Vert x-x'\Vert + \epsilon \end{aligned}$$

holds for all $x,x' \in \mathrm {int} \; F_Z$.

4 On the Geometry of Regions of Strong Stability

In view of (4) and (5), the gradient $\nabla {\mathcal {Q}}_{{\mathbb {E}}}(x)$ is given by a weighted sum of the probabilities of the sets ${\mathcal {W}}(x,\varDelta )$ or $\overline{{\mathcal {W}}}(x,\varDelta )$ for $\varDelta \in D$. As these sets are defined using regions of strong stability, we shall first study properties of the sets ${\mathcal {S}}({\hat{A}}_B)$ with ${\hat{A}}_{B} \in {\mathcal {A}}^*$.

Remark 4.1

Example 3.1 shows that regions of strong stability are not convex in general.

Proposition 4.1

Assume $\mathrm {dom} \; f \ne \emptyset $, then

$$\begin{aligned} {\mathcal {S}}({\hat{A}}_B) = {\mathcal {S}}({\hat{A}}_B) \; + \; \mathrm {ker} (T,I_s) \end{aligned}$$

holds for any ${\hat{A}}_B \in {\mathcal {A}}^*$.

Proof

The above result immediately follows from the fact that the quantities involved in the definition of ${\mathcal {S}}({\hat{A}}_B)$ only depend on $Tx + z$. $\square $

Corollary 4.1

Assume $\mathrm {dom} \; f \ne \emptyset $ and $n \ge 1$, then no region of strong stability has any extremal points.

Proof

Let (x, z) be an arbitrary point of some region of strong stability ${\mathcal {S}}({\hat{A}}_B)$. The n-dimensional kernel of $(T,I_s)$ contains some nonzero element $(x_0,z_0)$, and we have $(x-x_0, z-z_0), (x+x_0, z+z_0) \in {\mathcal {S}}({\hat{A}}_B)$ by Proposition 4.1. Thus, $(x,z) = \frac{1}{2}(x-x_0, z-z_0) + \frac{1}{2}(x+x_0, z+z_0)$ is no extremal point of ${\mathcal {S}}({\hat{A}}_B)$. $\square $

Our main result on the structure of ${\mathcal {S}}({\hat{A}}_B)$ is the following:

Theorem 4.1

Assume $\mathrm {dom} \; f \ne \emptyset $, then any region of strong stability is a union of at most $(s+1)^{|{\mathcal {A}}^*|}$ polyhedral cones and at most $(s+1)^{|{\mathcal {A}}^*| - 1}$ of these cones have a nonempty interior. Moreover, the multifunction ${\mathcal {S}}: {\mathcal {A}}^*\rightrightarrows {\mathbb {R}}^n \times {\mathbb {R}}^s$ is polyhedral, i.e., $\mathrm {gph} \; {\mathcal {S}}$ is a finite union of polyhedra.

Before we get to proof of Theorem 4.1, we will establish the following auxiliary result:

Lemma 4.1

Let ${\mathcal {W}} := \lbrace \xi \in {\mathbb {R}}^k \; :\; V \xi < 0 \rbrace $ with $V \in {\mathbb {R}}^{l \times k}$ be nonempty, then

$$\begin{aligned} \mathrm {cl} \; {\mathcal {W}} = \overline{{\mathcal {W}}} := \lbrace \xi \in {\mathbb {R}}^k \; :\; V \xi \le 0 \rbrace . \end{aligned}$$

Proof

The inclusion $\mathrm {cl} \; {\mathcal {W}} \subseteq \overline{{\mathcal {W}}}$ is trivial. Moreover, for any $\xi _0 \in {\mathcal {W}} = \mathrm {int} \; \overline{{\mathcal {W}}}$ and $\xi \in \overline{{\mathcal {W}}}$ the line segment principle (cf. [6, Lemma 2.1.6]) implies $[\xi _0, \xi ) \subseteq {\mathcal {W}}$ and thus $\xi \in \mathrm {cl} \; {\mathcal {W}}$. $\square $

We are now ready to prove Theorem 4.1.

Proof

(Proof of Theorem 4.1) Denote the elements of the finite set ${\mathcal {A}}^*$ by ${\hat{A}}_1, \dots , {\hat{A}}_l$ and the associated parts of the objective function by ${\hat{q}}_1, \dots , {\hat{q}}_l$. Fix any index $i \in \lbrace 1, \ldots , l \rbrace $; then for any $(x,z) \in F$ satisfying ${\hat{A}}_i^{-1}(Tx + z) \ge 0$, the constraint $c^\top x + {\hat{q}}_i^\top {\hat{A}}_i^{-1}(Tx + z) = f(x,z)$ in the definition of ${\mathcal {S}}({\hat{A}}_i)$ can be reformulated as

$$\begin{aligned} \left( \big ( {\hat{q}}_i^\top {\hat{A}}_i^{-1} - {\hat{q}}_j^\top {\hat{A}}_j^{-1} \big )\big (Tx + z\big ) \le 0 \; \; \vee \; \; {\hat{A}}_j^{-1}(Tx + z) \ngeq 0 \right) \; \; \forall j = 1, \ldots , l. \end{aligned}$$

Introducing the sets

$$\begin{aligned} \varTheta _{i}&:= \lbrace (x,z) \in F \; :\; {\hat{A}}_i^{-1}(Tx + z) \ge 0 \rbrace \\&{:}= \lbrace (x,z) \in {\mathbb {R}}^n \times {\mathbb {R}}^s \; :\; {\hat{A}}_i^{-1}(Tx + z) \ge 0 \rbrace , \\ \Gamma _{ij0}&:= \lbrace (x,z) \in {\mathbb {R}}^n \times {\mathbb {R}}^s \; :\; \big ( {\hat{q}}_i^\top {\hat{A}}_i^{-1} - {\hat{q}}_j^\top {\hat{A}}_j^{-1} \big )\big (Tx + z\big ) \le 0 \rbrace \; \; \text {and} \\ \Gamma _{ijk}&:= \lbrace (x,z) \in {\mathbb {R}}^n \times {\mathbb {R}}^s \; :\; e_k^\top {\hat{A}}_j^{-1}(Tx + z) < 0 \rbrace \end{aligned}$$

with indices $j =1, \ldots , l$ and $k =1, \ldots , s$ and using the fact that ${\mathcal {S}}({\hat{A}}_i)$ is closed by the Lipschitz continuity of f, we obtain the representation

$$\begin{aligned} {\mathcal {S}}({\hat{A}}_i) \;&= \; \mathrm {cl} \; \left( \varTheta _i \; \cap \; \bigcap _{j=1, \ldots , l} \; \bigcup _{k=0, \ldots , s} \Gamma _{ijk} \right) \\&= \; \bigcup _{\begin{array}{c} \alpha \in \lbrace 0, \ldots , s \rbrace ^l, \\ \varTheta _i \; \cap \; \bigcap _{j=1,\ldots , l} \Gamma _{ij\alpha _j} \ne \emptyset \end{array}} \mathrm {cl} \; \left( \varTheta _i \; \cap \; \bigcap _{\begin{array}{c} j=1,\ldots , l, \\ \alpha _j = 0 \end{array}} \Gamma _{ij0} \; \cap \; \bigcap _{\begin{array}{c} j=1,\ldots , l, \\ \alpha _j \ne 0 \end{array}} \Gamma _{ij\alpha _j} \right) . \end{aligned}$$

As $\varTheta _i \; \cap \; \bigcap _{j=1,\ldots , l, \; \alpha _j = 0} \Gamma _{ij0}$ is convex and closed, while $\bigcap _{j=1,\ldots , l, \; \alpha _j \ne 0} \Gamma _{ij\alpha _j}$ is convex and open, [6, Proposition 2.1.10] yields

$$\begin{aligned} {\mathcal {S}}({\hat{A}}_i) \;&= \; \bigcup _{\begin{array}{c} \alpha \in \lbrace 0, \ldots , s \rbrace ^l, \\ \varTheta _i \; \cap \; \bigcap _{j=1,\ldots , l} \Gamma _{ij\alpha _j} \ne \emptyset \end{array}} \left( \varTheta _i \; \cap \; \bigcap _{\begin{array}{c} j=1,\ldots , l, \\ \alpha _j = 0 \end{array}} \Gamma _{ij0} \; \cap \; \mathrm {cl} \; \bigcap _{\begin{array}{c} j=1,\ldots , l, \\ \alpha _j \ne 0 \end{array}} \Gamma _{ij\alpha _j} \right) \end{aligned}$$

The sets $\varTheta _i \; \cap \; \bigcap _{j=1,\ldots , l, \; \alpha _j = 0} \Gamma _{ij0}$ are obviously polyhedral cones, and Lemma 4.1 implies

$$\begin{aligned}&\mathrm {cl} \; \bigcap _{\begin{array}{c} j=1,\ldots , l, \\ \alpha _j \ne 0 \end{array}} \Gamma _{ij\alpha _j} \\&\quad = \; \left\{ (x,z) \in {\mathbb {R}}^n \times {\mathbb {R}}^s \; :\; e_{\alpha _j}^\top {\hat{A}}_j^{-1}(Tx + z) \le 0 \; \forall j = 1, \ldots , l: \alpha _j \ne 0 \right\} \\&\quad = \; \bigcap _{\begin{array}{c} j=1,\ldots , l, \\ \alpha _j \ne 0 \end{array}} \mathrm {cl} \; \Gamma _{ij\alpha _j}. \end{aligned}$$

Moreover, for any $\alpha _i \in \lbrace 1, \ldots , s \rbrace $ we have $e_{\alpha _i}^\top {\hat{A}}_i^{-1} \ne 0$ and thus

$$\begin{aligned}&\mathrm {int} \; \left( \varTheta _i \cap \bigcap _{j=1, \ldots , l} \mathrm {cl} \; \Gamma _{ij\alpha _j} \right) \\&\quad \subseteq \; \mathrm {int} \; \left( \varTheta _i \cap \mathrm {cl} \; \Gamma _{ii\alpha _i} \right) \\&\quad \subseteq \; \mathrm {int} \left\{ (x,z) \in {\mathbb {R}}^n \times {\mathbb {R}}^s \; :\; (e_{\alpha _i}^\top {\hat{A}}_i^{-1}T, e_{\alpha _i}^\top {\hat{A}}_i^{-1}) \begin{pmatrix} x \\ z \end{pmatrix} = 0\right\} \\&\quad = \; \emptyset . \end{aligned}$$

The second part of the theorem is an immediate consequence of the finiteness of ${\mathcal {A}}^*$. $\square $

Corollary 4.2

Assume $\mathrm {dom} \; f \ne \emptyset $, then any region of strong stability is star shaped and contains the n-dimensional kernel of $(T,I_s)$.

Proof

Radial convexity is an immediate consequence of Theorem 4.1, as any region of strong stability contains the line segments from the origin to any feasible point. The second statement directly follows from Proposition 4.1. $\square $

Two-stage stochastic programming can be understood as the special case of bi-level stochastic programming where the objectives of leader and follower coincide. In this case, any region of strong stability is a polyhedral cone and thus convex:

Proposition 4.2

Assume $\mathrm {dom} \; f \ne \emptyset $ and ${\hat{q}} = \alpha {\hat{d}}$ for some $\alpha > 0$. Then, any region of strong stability is a polyhedral cone.

Proof

We shall use the notation of the proof of Theorem 4.1 and denote the part of ${\hat{d}}$ associated with ${\hat{A}}_i$ by ${\hat{d}}_i$. Fix any $(x,z) \in F$ and consider any base matrices ${\hat{A}}_i, \hat{A_j} \in {\mathcal {A}}^*$ that are feasible and thus optimal for the lower-level problem. As

$$\begin{aligned} {\hat{q}}_i^\top {\hat{A}}_i^{-1}(Tx+z) = \alpha {\hat{d}}_i^\top {\hat{A}}_i^{-1}(Tx+z) = \alpha {\hat{d}}_j^\top {\hat{A}}_j^{-1}(Tx+z) = {\hat{q}}_j^\top {\hat{A}}_j^{-1}(Tx+z), \end{aligned}$$

both base matrices are also optimal with respect to the upper-level objective function. Thus, ${\mathcal {S}}({\hat{A}}_i)$ coincides with the polyhedral cone $\varTheta _i$. $\square $

Remark 4.2

As ${\hat{d}} = (0,0,0,0)^\top $ holds in Example 3.1, we see the assumption ${\hat{q}} = \alpha {\hat{d}}$ for some $\alpha \in {\mathbb {R}}$ in Proposition 4.2 cannot be replaced with the weaker condition that $\lbrace {\hat{q}}, {\hat{d}} \rbrace $ is linearly dependent.

5 Properties of the Aggregation Mappings

We shall now study the aggregation mappings ${\mathcal {W}}$ and $\overline{{\mathcal {W}}}$ defined in Sect. 2. The following result is the counterpart of Theorem 4.1:

Theorem 5.1

Assume $\mathrm {dom} \; f \ne \emptyset $, then the multifunction $\overline{{\mathcal {W}}}$ is polyhedral. Moreover, $\overline{{\mathcal {W}}}(x,\varDelta )$ is a finite union of polyhedra for any $(x, \varDelta ) \in {\mathbb {R}}^n \times D$.

The proof of Theorem 5.1 will be based on the following auxiliary result:

Lemma 5.1

Let $C_1, \ldots , C_l \subseteq {\mathbb {R}}^k$ be closed and convex. Then,

$$\begin{aligned} \mathrm {cl} \; \mathrm {int} \; \bigcup _{i=1, \ldots , l} C_i = \bigcup _{i=1, \ldots , l: \; \mathrm {int} \; C_i \ne \emptyset } C_i. \end{aligned}$$

Proof

As the sets $C_1, \ldots , C_l$ are closed and the interior of a union is contained in the union of the interiors, we have

$$\begin{aligned}&\bigcup _{i=1, \ldots , l: \; \mathrm {int} \; C_i \ne \emptyset } C_i \; \supseteq \; \mathrm {cl} \; \mathrm {int} \; \bigcup _{i=1, \ldots , l: \; \mathrm {int} \; C_i \ne \emptyset } C_i \; \supseteq \; \mathrm {cl} \; \bigcup _{i=1, \ldots , l: \; \mathrm {int} \; C_i \ne \emptyset } \mathrm {int} \; C_i \\ \\&\quad = \; \bigcup _{i=1, \ldots , l: \; \mathrm {int} \; C_i \ne \emptyset } \mathrm {cl} \; \mathrm {int} \; C_i \; = \; \bigcup _{i=1, \ldots , l: \; \mathrm {int} \; C_i \ne \emptyset } C_i, \end{aligned}$$

where the first equality is due to the fact that the closure of a union equals the union of the closures and the second equation is a direct consequence of the line segment principle. Thus,

$$\begin{aligned} \mathrm {cl} \; \mathrm {int} \; \bigcup _{i=1, \ldots , l} C_i \; \supseteq \; \mathrm {cl} \; \mathrm {int} \; \bigcup _{i=1, \ldots , l: \; \mathrm {int} \; C_i \ne \emptyset } C_i \; = \; \bigcup _{i=1, \ldots , l: \; \mathrm {int} \; C_i \ne \emptyset } C_i. \end{aligned}$$

For the reverse inclusion, suppose that there is some

$$\begin{aligned} x \in \left( \mathrm {cl} \; \mathrm {int} \bigcup _{i=1, \ldots , l} C_i \right) \Bigg \backslash \bigcup _{i=1, \ldots , l: \; \mathrm {int} \; C_i \ne \emptyset } C_i. \end{aligned}$$

By definition, there are sequences $\lbrace x_n \rbrace _{n \in {\mathbb {N}}} \subset {\mathbb {R}}^k$ and $\lbrace \epsilon _n \rbrace _{n \in {\mathbb {N}}} \subset {\mathbb {R}}_{>0}$ satisfying $x_n \rightarrow x$ and $B_{\epsilon _n}(x_n) \subseteq \bigcup _{i=1, \ldots , l} C_i$ for all $n \in {\mathbb {N}}$. As $\bigcup _{i=1, \ldots , l: \; \mathrm {int} \; C_i \ne \emptyset } C_i$ is closed, there exists some $N \in {\mathbb {N}}$ such that $x_n \notin \bigcup _{i=1, \ldots , l: \; \mathrm {int} \; C_i \ne \emptyset } C_i$ for all $n \ge N$. Together with the previous considerations, the strong separation theorem (cf. [9, Theorem 11.4]) yields the existence of some $\delta _N \in (0,\epsilon _N]$ such that

$$\begin{aligned} B_{\delta _N}(x_N) \subseteq \left( \bigcup _{i=1, \ldots , l} C_i \right) \Bigg \backslash \bigcup _{i=1, \ldots , l: \; \mathrm {int} \; C_i \ne \emptyset } C_i \; = \bigcup _{i=1, \ldots , l: \; \mathrm {int} \; C_i = \emptyset } C_i. \end{aligned}$$

As any $C_i$ with $\mathrm {int} \; C_i = \emptyset $ is contained in an affine subspace of dimension strictly smaller than k (cf. [2, Section 2.5.2]), we obtain the contradiction

$$\begin{aligned} \emptyset \; \ne \; \mathrm {int} \; B_{\delta _N}(x_N) \; \subseteq \; \mathrm {int} \bigcup _{i=1, \ldots , l: \; \mathrm {int} \; C_i = \emptyset } C_i \; = \; \emptyset . \end{aligned}$$

Thus,

$$\begin{aligned} \left( \mathrm {cl} \; \mathrm {int} \bigcup _{i=1, \ldots , l} C_i \right) \Bigg \backslash \bigcup _{i=1, \ldots , l: \; \mathrm {int} \; C_i \ne \emptyset } C_i \; = \; \emptyset , \end{aligned}$$

which completes the proof. $\square $

Corollary 5.1

Let $C \subseteq {\mathbb {R}}^k$ be a finite union of polyhedra (polyhedral cones). Then, $\mathrm {cl} \; \mathrm {int} \; C$ is a finite union of polyhedra (polyhedral cones).

Proof

The above statement is an immediate consequence of Lemma 5.1. $\square $

Proof

(Proof of Theorem 5.1) As D is finite, it is sufficient to consider the multifunctions $\overline{{\mathcal {W}}}(\cdot , \varDelta ): {\mathbb {R}}^n \rightrightarrows {\mathbb {R}}^s$ for fixed $\varDelta \in D$. We have

$$\begin{aligned} \mathrm {gph} \; \overline{{\mathcal {W}}}(\cdot , \varDelta ) = \bigcup _{{\hat{A}}_B \in {\mathcal {A}}^*: \; {\hat{q}}_B^\top {\hat{A}}_B^{-1}T = \varDelta } \mathrm {cl} \; \mathrm {int} \; {\mathcal {S}}({\hat{A}}_B), \end{aligned}$$

which is a finite union of polyhedra by Corollary 5.1. Similarly, $\overline{{\mathcal {W}}}(x,\varDelta )$ admits the representation

$$\begin{aligned} \overline{{\mathcal {W}}}(x,\varDelta ) = \bigcup _{{\hat{A}}_B \in {\mathcal {A}}^*: \; {\hat{q}}_B^\top {\hat{A}}_B^{-1}T = \varDelta } \lbrace z \in {\mathbb {R}}^s \; :\; (x,z) \in \mathrm {cl} \; \mathrm {int} \; {\mathcal {S}}({\hat{A}}_B) \rbrace . \end{aligned}$$

By Theorem 4.1 and Corollary 5.1, the set

$$\begin{aligned} \lbrace z \in {\mathbb {R}}^s \; :\; (x,z) \in \mathrm {cl} \; \mathrm {int} \; {\mathcal {S}}({\hat{A}}_B) \rbrace \end{aligned}$$

is the intersection of a finite union of polyhedral cones and the affine subspace $\lbrace (x',z') \in {\mathbb {R}}^n \times {\mathbb {R}}^s \; :\; x' = x \rbrace $ and thus a finite union of polyhedral cones for any $x \in {\mathbb {R}}^n$ and any ${\hat{A}}_B \in {\mathcal {A}}^*$. $\square $

The following result on ${\mathcal {W}}$ is a simple consequence of the fact that the constraint system describing a region of strong stability only imposes conditions on $(Tx+z)$.

Proposition 5.1

Assume $\mathrm {dom} \; f \ne \emptyset $, then

$$\begin{aligned} {\mathcal {W}}(x,\varDelta ) = {\mathcal {W}}(x',\varDelta ) + \lbrace T(x-x') \rbrace \end{aligned}$$

holds for any $x,x' \in {\mathbb {R}}^n$ and $\varDelta \in D$.

Proof

Fix any $x, x' \in {\mathbb {R}}^n$, $z \in {\mathbb {R}}^s$ and set $z' = z + T(x-x')$, then $Tx + z = Tx' + z'$ and thus

$$\begin{aligned} f(x,z) - c^\top x&= \min _y \big \{ q^\top y \; :\; y \in \underset{y'}{\mathrm {Argmin}} \lbrace d^\top y' \; :\; Ay' \le Tx + z \rbrace \big \} \\&= f(x',z') - c^\top x'. \end{aligned}$$

Similarly, for any ${\hat{A}}_B \in {\mathcal {A}}^*$, $(x,z) \in {\mathcal {S}}({\hat{A}}_B)$ holds if and only if

1.
there exists some $y \in {\mathbb {R}}^m$ such that $Ay \le Tx + z = Tx' + z'$,
2.
${\hat{A}}_B^{-1}(Tx' + z') = {\hat{A}}_B^{-1}(Tx + z) \ge 0$ and
3.
${\hat{q}}_B^\top {\hat{A}}_B^{-1}(Tx' + z') = {\hat{q}}_B^\top {\hat{A}}_B^{-1}(Tx + z) = f(x,z) - c^\top x = f(x',z') - c^\top x',$

i.e., if and only if $(x',z') \in {\mathcal {S}}({\hat{A}}_B)$. We conclude that

$$\begin{aligned}&z \in {\mathcal {W}}(x, \varDelta ) \; \Leftrightarrow \; \exists {\hat{A}}_B \in {\mathcal {A}}^*: \; \varDelta = {\hat{q}}_B^\top {\hat{A}}_B^{-1}T, \; (x,z) \in \mathrm {int} \; {\mathcal {S}}({\hat{A}}_B) \\&\quad \Leftrightarrow \; \exists {\hat{A}}_B \in {\mathcal {A}}^*: \; \varDelta = {\hat{q}}_B^\top {\hat{A}}_B^{-1}T, \; (x',z') \in \mathrm {int} \; {\mathcal {S}}({\hat{A}}_B) \\&\quad \Leftrightarrow \; z' \in {\mathcal {W}}(x', \varDelta ) \end{aligned}$$

holds for any $\varDelta \in D$, which completes the proof. $\square $

6 Proof of the Main Result

We are finally ready to prove Theorem 3.1 based on the results of Sects. 4 and 5 as well as the two following auxiliary results:

Lemma 6.1

Assume $\mathrm {dom} \; f \ne \emptyset $, and let $\mu _Z \in {\mathcal {P}}({\mathbb {R}}^s)$ be absolutely continuous w.r.t. the Lebesgue measure, then

$$\begin{aligned} \mu _Z\left[ {\mathcal {W}}(x,\varDelta ) \setminus \big ( {\mathcal {W}}(x,\varDelta ) + \lbrace t \rbrace \big ) \right] \le \mu _Z\left[ \overline{{\mathcal {W}}}(x,\varDelta ) \setminus \big ( \overline{{\mathcal {W}}}(x,\varDelta ) + \lbrace t \rbrace \big ) \right] \end{aligned}$$

holds for any $x\in {\mathbb {R}}^n$, $\varDelta \in D$ and $t \in {\mathbb {R}}^s$.

Proof

By the arguments used in the proof of [1, Lemma 4.2], we have

$$\begin{aligned} {\mathcal {W}}(x,\varDelta ) \subseteq \overline{{\mathcal {W}}}(x,\varDelta ) \subseteq {\mathcal {W}}(x,\varDelta ) \cup {\mathcal {N}}_x, \end{aligned}$$

where ${\mathcal {N}}_x \subset {\mathbb {R}}^s$ is contained in a finite union of hyperplanes. Consequently,

$$\begin{aligned}&{\mathcal {W}}(x,\varDelta ) \setminus \big ( {\mathcal {W}}(x,\varDelta ) + \lbrace t \rbrace \big ) \\&\quad \subseteq \; \Big [ \overline{{\mathcal {W}}}(x,\varDelta ) \setminus \big ( \overline{{\mathcal {W}}}(x,\varDelta ) + \lbrace t \rbrace \big ) \Big ] \cup \Big [ \big ( \overline{{\mathcal {W}}}(x,\varDelta ) + \lbrace t \rbrace \big ) \setminus \big ( {\mathcal {W}}(x,\varDelta ) + \lbrace t \rbrace \big ) \Big ] \\&\quad = \; \Big [ \overline{{\mathcal {W}}}(x,\varDelta ) \setminus \big ( \overline{{\mathcal {W}}}(x,\varDelta ) + \lbrace t \rbrace \big ) \Big ] \cup \Big [ \big ( \overline{{\mathcal {W}}}(x,\varDelta ) \setminus {\mathcal {W}}(x,\varDelta ) \big ) + \lbrace t \rbrace \Big ] \\&\quad \subseteq \; \Big [ \overline{{\mathcal {W}}}(x,\varDelta ) \setminus \big ( \overline{{\mathcal {W}}}(x,\varDelta ) + \lbrace t \rbrace \big ) \Big ] \cup \Big [ {\mathcal {N}}_x + \lbrace t \rbrace \Big ] \end{aligned}$$

and the above statement is a direct consequence of the fact that the Lebesgue measure of ${\mathcal {N}}_x$ equals zero. $\square $

Lemma 6.2

Assume $\mathrm {dom} \; f \ne \emptyset $ and let $\mu _Z$ be absolutely continuous w.r.t. the Lebesgue measure and have a bounded support as well as a uniformly bounded density. Then, the weight functional $M_\varDelta $ is Lipschitz continuous on $\mathrm {int} \; F_Z$ for any $\varDelta \in D$.

Proof

By definition of ${\mathcal {W}}(x,\varDelta )$, Proposition 5.1 and Lemma 6.1,

$$\begin{aligned}&|\mu _Z\big [{\mathcal {W}}(x,\varDelta )\big ] - \mu _Z\big [{\mathcal {W}}(x',\varDelta )\big ]| \\&\quad = \; |\mu _Z\Big [{\mathcal {W}}(x,\varDelta )\Big ] - \mu _Z\Big [{\mathcal {W}}(x,\varDelta ) + \big \{ T(x'-x) \big \}\Big ]| \\&\quad \le \; \mu _Z \left[ {\mathcal {W}}(x,\varDelta ) \setminus \Big ( {\mathcal {W}}(x,\varDelta ) + \big \{ T(x'-x) \big \} \Big ) \right] \\&\qquad + \; \mu _Z \left[ \Big ( {\mathcal {W}}(x,\varDelta ) + \big \{ T(x'-x) \big \} \Big ) \setminus {\mathcal {W}}(x,\varDelta ) \right] \\&\quad \le \; \mu _Z \left[ \overline{{\mathcal {W}}}(x,\varDelta ) \setminus \Big ( \overline{{\mathcal {W}}}(x,\varDelta ) + \big \{ T(x'-x) \big \} \Big ) \right] \\&\qquad + \; \mu _Z \left[ \Big ( \overline{{\mathcal {W}}}(x,\varDelta ) + \big \{ T(x'-x) \big \} \Big ) \setminus \overline{{\mathcal {W}}}(x,\varDelta ) \right] \end{aligned}$$

holds for any fixed $\varDelta \in D$. As both

$$\begin{aligned} \overline{{\mathcal {W}}}(x,\varDelta ) \setminus \Big ( \overline{{\mathcal {W}}}(x,\varDelta ) + \big \{ T(x'-x) \big \} \Big ) \; \; \text {and} \; \; \Big ( \overline{{\mathcal {W}}}(x,\varDelta ) + \big \{ T(x'-x) \big \} \Big ) \setminus \overline{{\mathcal {W}}}(x,\varDelta ) \end{aligned}$$

are contained in

$$\begin{aligned} {\mathcal {H}}_{x, x'} := \big \{ v + l \cdot T(x'-x) \; :\; v \in \mathrm {bd} \; \overline{{\mathcal {W}}}(x,\varDelta ), \; l \in [-1,1] \big \} \end{aligned}$$

and there exists a finite upper bound $\alpha \in {\mathbb {R}}$ for the Lebesgue density of $\mu _Z$, we have

$$\begin{aligned} |\mu _Z\big [{\mathcal {W}}(x,\varDelta )\big ] - \mu _Z\big [{\mathcal {W}}(x',\varDelta )\big ]| \; \le \; 2\alpha \lambda ^s\big [ {\mathcal {H}}_{x,x'} \cap \mathrm {supp} \; \mu _Z \big ], \end{aligned}$$

where $\lambda ^s$ denotes the s-dimensional Lebesgue measure. By Theorem 5.1, the boundary of $\overline{{\mathcal {W}}}(x,\varDelta )$ is contained in a finite union of lower-dimensional polyhedra. Let ${\mathbb {H}}_x$ denote a set of such cones with minimal cardinality. It is a straightforward conclusion from the proofs of Theorem 4.1, Theorem 5.1 and Lemma 5.1 that the cardinality of ${\mathbb {H}}_x$ can be bounded by a constant $K \in {\mathbb {N}}$ that does not depend on x. Moreover, as any $H \in {\mathbb {H}}_x$ is contained in some hyperplane, the $s-1$-dimensional Lebesgue measure of $H \cap \mathrm {supp} \; \mu _Z$ is at most $\mathrm {diam}(\mathrm {supp} \; \mu _Z)^{s-1}$. Thus,

$$\begin{aligned}&|\mu _Z\big [{\mathcal {W}}(x,\varDelta )\big ] - \mu _Z\big [{\mathcal {W}}(x',\varDelta )\big ]| \\&\quad \le \; 2\alpha \sum _{H \in {\mathbb {H}}_x} \lambda ^s\big [ \lbrace v + l \cdot T(x'-x) \; :\; v \in H, \; l \in [-1,1] \rbrace \cap \mathrm {supp} \; \mu _Z \big ] \\&\quad \le \; 4 \alpha K \cdot \mathrm {diam}(\mathrm {supp} \; \mu _Z )^{s-1} \cdot \Vert T \Vert _{{\mathcal {L}}({\mathbb {R}}^n, {\mathbb {R}}^s)} \Vert x'-x\Vert , \end{aligned}$$

by Cavalieri’s principle, which completes the proof. $\square $

Proof

(Proof of Theorem 3.1) Continuous differentiability on $\mathrm {int} \; F_Z$ is a direct consequence of [1, Corollary 4.7]. Fix any $x, x' \in \mathrm {int} \; F_Z$; then, (4) and Lemma 6.2 yield

$$\begin{aligned}&\Vert \nabla {\mathcal {Q}}_{\mathbb {E}}(x) - \nabla {\mathcal {Q}}_{\mathbb {E}}(x')\Vert \\&\quad \le \; \sum _{\varDelta \in D} |\mu _Z\big [{\mathcal {W}}(x,\varDelta )\big ] - \mu _Z\big [{\mathcal {W}}(x',\varDelta )\big ]| \cdot \Vert \varDelta \Vert \\&\quad \le \; 4 \alpha K |D| \cdot \mathrm {diam}(\mathrm {supp} \; \mu _Z )^{s-1} \cdot \max _{\varDelta \in D} \; \Vert \varDelta \Vert \cdot \Vert T \Vert _{{\mathcal {L}}({\mathbb {R}}^n, {\mathbb {R}}^s)} \Vert x'-x\Vert \end{aligned}$$

and thus the desired Lipschitz continuity. $\square $

Proof

(Proof of Theorem 3.2) Fix any $\kappa > 0$. As $\mu _Z$ is tight by [3, Theorem 1.3], there exists a compact set $C(\kappa ) \subset {\mathbb {R}}^s$ such that $\mu _Z[{\mathbb {R}}^s \setminus C(\kappa )] < \kappa $. Combining this with the estimate from the first part of the proof of Lemma 6.2 and using the same notation established therein, we see that

$$\begin{aligned} |\mu _Z[{\mathcal {W}}(x,\varDelta )] - \mu _Z[{\mathcal {W}}(x,\varDelta )]|&\le 2 \mu _Z[{\mathcal {H}}_{x,x'}] \\&= \; 2 \mu _Z[{\mathcal {H}}_{x,x'} \cap C(\kappa )] + 2 \mu _Z[{\mathcal {H}}_{x,x'} \setminus C(\kappa )] \\&\le \; 2 \alpha \lambda ^s[{\mathcal {H}}_{x,x'} \cap C(\kappa ) \cap \mathrm {supp} \; \mu _Z] + 2\kappa \end{aligned}$$

holds for any $\varDelta \in D$. Thus,

$$\begin{aligned}&|\mu _Z\big [{\mathcal {W}}(x,\varDelta )\big ] - \mu _Z\big [{\mathcal {W}}(x',\varDelta )\big ]| \\&\quad \le \; 2\kappa + 2\alpha \sum _{H \in {\mathbb {H}}_x} \lambda ^s\big [ \lbrace v + l \cdot T(x'-x) \; :\; v \in H, \; l \in [-1,1] \rbrace \cap C(\kappa ) \cap \mathrm {supp} \; \mu _Z \big ] \\&\quad \le \; 2 \kappa + 4 \alpha K \cdot \mathrm {diam}(C(\kappa ) \cap \mathrm {supp} \; \mu _Z )^{s-1} \cdot \Vert T \Vert _{{\mathcal {L}}({\mathbb {R}}^n, {\mathbb {R}}^s)} \Vert x'-x\Vert . \end{aligned}$$

We therefore have

$$\begin{aligned}&\Vert \nabla {\mathcal {Q}}_{\mathbb {E}}(x) - \nabla {\mathcal {Q}}_{\mathbb {E}}(x')\Vert \\&\quad \le \; \sum _{\varDelta \in D} |\mu _Z\big [{\mathcal {W}}(x,\varDelta )\big ] - \mu _Z\big [{\mathcal {W}}(x',\varDelta )\big ]| \cdot \Vert \varDelta \Vert \\&\quad \le \; 4 \alpha K |D| \mathrm {diam}(C(\kappa ) \cap \mathrm {supp} \; \mu _Z )^{s-1} \max _{\varDelta \in D} \; \Vert \varDelta \Vert \cdot \Vert T \Vert _{{\mathcal {L}}({\mathbb {R}}^n, {\mathbb {R}}^s)} \Vert x'-x\Vert + 2|D|\kappa \end{aligned}$$

and choosing $\kappa = \frac{\epsilon }{2|D|}$ yields the desired estimate. $\square $

Remark 6.1

The constant $L(\epsilon )$ derived in the proof of Theorem 3.2 depends on $\epsilon $. If the support of $\mu _Z$ is unbounded, we have $L(\epsilon ) \rightarrow \infty $ as $\epsilon \downarrow 0$.

7 A Sufficient Second-Order Optimality Condition

Under the conditions of Theorem 3.1, $\nabla {\mathcal {Q}}_{{\mathbb {E}}}$ is Lipschitz continuous on $\mathrm {int} \; F_Z$ and thus differentiable almost everywhere on $\mathrm {int} \; F_Z$ by Rademacher’s theorem. Let ${\mathcal {D}} \subseteq \mathrm {int} \; F_Z$ denote the set of points at which $\nabla {\mathcal {Q}}_{{\mathbb {E}}}$ is differentiable, then generalized Clarke’s Hessian of ${\mathcal {Q}}_{\mathbb {E}}$ at some $x \in \mathrm {int} \; F_Z$ is the nonempty, convex and compact set

$$\begin{aligned} \partial ^2 {\mathcal {Q}}_{\mathbb {E}}(x) = \mathrm {conv} \left\{ H \in {\mathbb {R}}^{n \times n} \; :\; \exists \lbrace x_k \rbrace _{k \in {\mathbb {N}}} \subseteq { {\mathcal {D}}}: \; x_k \rightarrow x, \; \nabla ^2 {\mathcal {Q}}_{\mathbb {E}}(x_k) \rightarrow H \right\} . \end{aligned}$$

We have $\partial ^2 {\mathcal {Q}}_{\mathbb {E}}(x) = \lbrace \nabla ^2 {\mathcal {Q}}_{\mathbb {E}}(x) \rbrace $ whenever $x \in {\mathcal {D}}$.

Let the feasible set of (2) be given by $X = \lbrace x \in {\mathbb {R}}^n \; :\; Bx \le b \rbrace $ with some $B \in {\mathbb {R}}^{k \times n}$ and $b \in {\mathbb {R}}^k$. The following second-order sufficient condition is based on [7]:

Theorem 7.1

Assume $\mathrm {dom} \; f \ne \emptyset $, $X \subseteq \mathrm {int} \; F_Z$ and let $\mu _Z$ be absolutely continuous w.r.t. the Lebesgue measure and have a bounded support as well as a uniformly bounded density. Moreover, let $({\bar{x}},{\bar{u}})$ be a KKT point of (2), i.e.,

$$\begin{aligned} \nabla {\mathcal {Q}}_{\mathbb {E}}({\bar{x}}) + B^\top {\bar{u}} = 0, \; B{\bar{x}} \le b, \; {\bar{u}}^\top (B{\bar{x}} - b) = 0, \; {\bar{u}} \ge 0 \end{aligned}$$

and assume that any $H \in \partial ^2 {\mathcal {Q}}_{\mathbb {E}}({\bar{x}})$ is positive definite on

$$\begin{aligned} \left\{ h \in {\mathbb {R}}^n \; :\; \begin{matrix} e_i^\top B h = 0 \; \forall i: \; {\bar{u}}_i > 0 \\ e_j^\top B h \le 0 \; \forall j: \; {\bar{u}}_j = e_j^\top B {\bar{x}} = 0 \end{matrix} \right\} . \end{aligned}$$

Then, ${\bar{x}}$ is a strict local minimizer with order 2 of (2), i.e., there exist a neighborhood U of ${\bar{x}}$ and a constant $L > 0$ such that

$$\begin{aligned} {\mathcal {Q}}_{\mathbb {E}}(x) > {\mathcal {Q}}_{\mathbb {E}}({\bar{x}}) + L \Vert x-{\bar{x}}\Vert ^2 \end{aligned}$$

holds for any $x \in X \cap U$.

Proof

This is a straightforward conclusion from [7, Theorem 1]. $\square $

Remark 7.1

There are various other approaches for optimization problems with data in the class $C^{1,1}$, which consists of differentiable functions with locally Lipschitzian gradients. For instance, second-order optimality conditions can also be formulated based on Dini (cf. [5, Section 4.4]) or Riemann (cf. [8]) derivatives.

8 Conclusions

We have derived sufficient conditions for Lipschitz continuity of the gradient of the expectation functional arising from a bi-level stochastic linear program with random right-hand side in the lower-level constraint system. Invoking the structure of the upper level constraints, we used this result to formulate a second-order sufficient optimality condition for the risk-neutral bi-level stochastic program in terms of the generalized Hessian of ${\mathcal {Q}}_{\mathbb {E}}$. Moreover, the main result on the geometry of regions of strong stability and its counterpart for the aggregation mapping $\overline{{\mathcal {W}}}$ may facilitate the computation or sample-based estimation of gradients of the expectation functional, which enhances gradient descent-based methods. As any region of strong stability is a finite union of polyhedral cones, a promising approach is to employ spherical radial decomposition techniques to calculate $\nabla {\mathcal {Q}}_{\mathbb {E}}$ (cf. [4, Chapter 4]). The details are beyond the scope of this paper but shall be addressed in future research.

Data Availability Statement

Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.

References

Burtscheidt, J., Claus, M., Dempe, S.: Risk-averse models in bilevel stochastic linear programming. SIAM J. Optim. 30(1), 377–406 (2020)
Article MathSciNet Google Scholar
Boyd, S., Vandenberghe, L.: Convex Optimization. Camebridge University Press, Cambridge (2004)
Book Google Scholar
Billingsley, P.: Convergence of Probability Measures. Wiley Series in Probability and Statistics, 2nd edn. Wiley, New York (1999)
Book Google Scholar
Genz, A., Bretz, F.: Computation of multivariate normal and $t$ probabilities. In: Lecture Notes in Statistics, vol. 195. Springer, Heidelberg (2009)
Ginchev, I., La Torre, D., Rocca, M.: $C^{k,1}$ functions, characterization, Taylor’s formula and optimization: a survey. Real Anal. Exch. 35(2), 311–342 (2009/2010)
Hiriart-Urruty, J.-B., Lemaréchal, C.: Fundamentals of Convex Analysis. Springer, Berlin (2001)
Book Google Scholar
Klatte, D., Tammer, K.: On second-order sufficient optimality conditions for $c^{1,1}$-optimization problems. Optimization 19(2), 169–179 (1988)
Article MathSciNet Google Scholar
La Torre, D., Rocca, M.: $C^{1,1}$ functions and optimality conditions. J. Concr. Appl. Math. 3(1), 41–54 (2005)
MathSciNet MATH Google Scholar
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
Book Google Scholar

Download references

Acknowledgements

The author thanks the Deutsche Forschungsgemeinschaft for its support via the Collaborative Research Center TRR 154.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

University Duisburg-Essen, Essen, Germany
Matthias Claus

Authors

Matthias Claus
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matthias Claus.

Additional information

Communicated by René Henrion.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Claus, M. A Second-Order Sufficient Optimality Condition for Risk-Neutral Bi-level Stochastic Linear Programs. J Optim Theory Appl 188, 243–259 (2021). https://doi.org/10.1007/s10957-020-01775-x

Download citation

Received: 31 March 2020
Accepted: 20 October 2020
Published: 10 November 2020
Issue Date: January 2021
DOI: https://doi.org/10.1007/s10957-020-01775-x

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Abstract

Similar content being viewed by others

Strong convexity in risk-averse stochastic programs with complete recourse

Quadratic two-stage stochastic optimization with coherent measures of risk

First order asymptotics of the sample average approximation method to solve risk averse stochastic programs

1 Introduction

2 Model and Notation

Theorem 2.1

Remark 2.1

3 Main Result

Example 3.1

Theorem 3.1

Theorem 3.2

4 On the Geometry of Regions of Strong Stability

Remark 4.1

Proposition 4.1

Proof

Corollary 4.1

Proof

Theorem 4.1

Lemma 4.1

Proof

Proof

Corollary 4.2

Proof

Proposition 4.2

Proof

Remark 4.2

5 Properties of the Aggregation Mappings

Theorem 5.1

Lemma 5.1

Proof

Corollary 5.1

Proof

Proof

Proposition 5.1

Proof

6 Proof of the Main Result

Lemma 6.1

Proof

Lemma 6.2

Proof

Proof

Proof

Remark 6.1

7 A Sufficient Second-Order Optimality Condition

Theorem 7.1

Proof

Remark 7.1

8 Conclusions

Data Availability Statement

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation