1 Introduction and outline

In this text we explore the connections between robust optimization and quadratically constrained quadratic optimization. For the sake of this introduction we briefly review the most important concepts in these areas and then give an outline on the aims of this text.

1.1 Quadratically constrained quadratic optimization

A quadratically constrained quadratic optimization problem (QCQP) consists of minimizing a quadratic function subject to quadratic constraints, formally given by \(\inf _{\mathbf {x}\in {\mathcal {K}}}\left\{ \mathbf {x}^{{\textsf {T} }}{\textsf {Q} }\mathbf {x}+{\mathbf {q}}^{{\textsf {T} }}\mathbf {x}+ \omega \,{:}\, \mathbf {x}^{{\textsf {T} }}{\textsf {A} }_i\mathbf {x}+\mathbf {a}_i^{{\textsf {T} }}\mathbf {x}\leqslant b_i,\ i\in [1\!:\!m] \right\} \), where \({\textsf {Q} },{\textsf {A} }_i\) are real, symmetric matrices of order n, \({\mathcal {K}}\subseteq \mathbb {R}^n\) is a cone, \({\mathbf {q}},\mathbf {a}_i\) are real vectors, \(b_i\) are real numbers and \([1\!:\!k]\,{:}{=}\, \left\{ 1,\dots ,k \right\} \). General QCQPs are hard, and form a large and quite versatile class of optimization problems. Neither the objective function nor the feasible set of a QCQP need to be convex, the latter may even be disconnected. One way to (approximately) solve QCQPs is to look for convex relaxations which in the best case yield exact reformulations. An important strategy for achieving such reformulations is to lift the space of variables, thereby linearizing the quadratic terms and convexifying the feasible set, such that its extreme points correspond to rank-one matrices, which will decompose into vectors feasible to the original problem:

Theorem 1

Let \({\mathcal {F}}\,{:}{=}\,\left\{ \mathbf {x}\in {\mathcal {K}}\,{:}\, \mathbf {x}^{{\textsf {T} }}{\textsf {A} }_i\mathbf {x}+\mathbf {a}_i^{{\textsf {T} }}\mathbf {x}\leqslant b_i,\ i\in [1\!:\!m] \right\} \subseteq \mathbb {R}^n\) be a feasible set of a QCQP and denote by

$$\begin{aligned} {\mathcal {G}}({\mathcal {F}})\,{:}{=}\, {\mathrm {clconv}} \left\{ (\mathbf {x},\mathbf {x}\mathbf {x}^{{\textsf {T} }}): \mathbf {x}\in {\mathcal {F}}\right\} \, , \end{aligned}$$

where \({\mathrm {clconv}} ({\mathcal {A}})\) stands for the closure of the convex hull of a set \({\mathcal {A}}\).

Then for any \({\textsf {Q} }\in {\mathcal {S}}^n\) and \({\mathbf {q}}\in \mathbb {R}^n\) we have

$$\begin{aligned} \inf _{\mathbf {x}\in {\mathcal {F}}} \left( \mathbf {x}^{{\textsf {T} }}{\textsf {Q} }\mathbf {x}+{\mathbf {q}}^{{\textsf {T} }}\mathbf {x}+\omega \right) = \inf _{(\mathbf {x},{\textsf {X} })\in {\mathcal {G}}({\mathcal {F}})}\left( {\mathrm {trace}}({\textsf {Q} }{\textsf {X} })+ {\mathbf {q}}^{{\textsf {T} }}\mathbf {x}+\omega \right) \, . \end{aligned}$$

Proof

See, e.g. (Burer and Anstreicher 2013; Eichfelder and Povh 2013). \(\square \)

The central object in the above theorem is \({\mathcal {G}}({\mathcal {F}})\). Its characterization is the major challenge when employing the reformulation strategy depicted in Theorem 1 and a general workable description of \({\mathcal {G}}({\mathcal {F}})\) is not known. There are, however, characterizations for specific instances of \({\mathcal {F}}\). In practice such characterizations are regularly given by a conic intersection involving some matrix cone \({\mathcal {C}}\) and appropriate linear constraints parametrized by matrices \(\bar{{\textsf {A} }}_i\) and real numbers \(\bar{b}_i, i\in [1\!:\!\bar{m}]\). Thus, typically we have reformulations of the form:

$$\begin{aligned} \inf _{(\mathbf {x},{\textsf {X} })\in {\mathcal {G}}({\mathcal {F}})}\left( {\mathrm {trace}}({\textsf {Q} }{\textsf {X} })+ {\mathbf {q}}^{{\textsf {T} }}\mathbf {x}+\omega \right) = \inf _{{\textsf {Y} }\in {\mathcal {C}}} \{ {\mathrm {trace}}({\textsf {M} }{\textsf {Y} }) \,{:}\, {\mathrm {trace}}(\bar{{\textsf {A} }}_i{\textsf {Y} }) \leqslant \bar{b}_i\, , \,i \in [1\!:\!\bar{m}]\}\, , \end{aligned}$$

where \({\textsf {M} }\), \(\bar{{\textsf {A} }}_i\) and \(\bar{b}_i\) depend on the problem data. Important examples for such reformulations can be found in Anstreicher and Burer (2010), Burer (2009), Burer and Dong (2012), Burer and Yang (2015), Eichfelder and Povh (2013), Yang et al. (2016), some of which we will discuss in more detail later in the text.

1.2 The S-procedure

Another strain of literature that is concerned with a dual perspective on QCQPs is the theory of the S-procedure. The central question is when for a given set of matrices \(\{{\textsf {Q} },{\textsf {A} }_1,\ldots ,{\textsf {A} }_m\}\) the following two statements are equivalent

$$\begin{aligned}&{ i)}\,\hbox {the following system of inequalities has no solution} \, \mathbf {x}\in \mathbb {R}^n \,{:}\, \\&\quad \mathbf {x}^{{\textsf {T} }}{\textsf {Q} }\mathbf {x}< 0 \quad \text{ and }\quad \mathbf {x}^{{\textsf {T} }}{\textsf {A} }_i\mathbf {x}\leqslant 0 \; \forall i \in [1\!:\!m]\, ;\\&{ ii)}\,\hbox {the following conic inequality has a solution }\,\varvec{\lambda }\in \mathbb {R}^m_+ \,{:}\, \\&\quad {\textsf {Q} }+ \sum _{i=1}^{m}\lambda _i {\textsf {A} }_i \in {\mathcal {S}}^n_+ \, . \end{aligned}$$

If equivalence between (i) and (ii) can be established, we say that the S-procedure is exact for this set of matrices. Actually, if equivalence holds for arbitrary \({\textsf {Q} }\), then copositivity over \({\mathcal {K}}\,{:}{=}\, \{\mathbf {x}\in \mathbb {R}^n \,{:}\, \mathbf {x}^{{\textsf {T} }}{\textsf {A} }_i\mathbf {x}\leqslant 0, \ i \in [1\!:\!m] \}\) (see the notations and preliminaries section for the definition) is characterized by \(\mathcal {COP}({\mathcal {K}}) = \{{\textsf {Q} }\,{:}\, {\textsf {Q} }+ \sum _{i=1}^{m}\lambda _i {\textsf {A} }_i \in {\mathcal {S}}^n_+ \text{ for } \text{ some } \varvec{\lambda }\in \mathbb {R}^m_+ \}\). Note that ’\(\supseteq \)’ always holds since obviously ii) always implies i). The first such exactness result, known as the S-Lemma, has been established in Yakubovich (1971) for the case \(m=1\) under the condition that \(\mathbf {x}_0^{{\textsf {T} }}{\textsf {A} }_1\mathbf {x}_0<0\) holds for some \(\mathbf {x}_0\in \mathbb {R}^n\). The result is obtained by invoking a convexity result derived in Dines (1941) concerning the so called joint numerical range of quadratic functions defined as \(\mathcal {J}({\textsf {Q} },{\textsf {A} }_1,\dots ,{\textsf {A} }_m)\,{:}{=}\,\left\{ (\mathbf {x}^{{\textsf {T} }}{\textsf {Q} }\mathbf {x},\mathbf {x}^{{\textsf {T} }}{\textsf {A} }_1\mathbf {x},\dots ,\mathbf {x}^{{\textsf {T} }}{\textsf {A} }_m\mathbf {x}) \,{:}\, \mathbf {x}\in \mathbb {R}^n \right\} \). It was shown that \({\mathcal {J}}({\textsf {Q} },{\textsf {A} }_1)\) is a convex cone. From this convexity result it is deduced that whenever \({\mathcal {J}}\) is disjoint from the negative quadrant, i.e., (i) holds, these two sets can be separated by a hyperplane. The coefficients of that hyperplane can further be shown to be two nonnegative numbers, where the first is actually positive, by using the assumption on \(\mathbf {x}_0\), which eventually implies (ii).

1.3 Robust and adjustable robust optimization

Robust optimization aims at making a decision under uncertainty by looking for the decision with the best worst-case performance among those decisions that will be feasible for all realizations of the uncertain data (see Ben-Tal et al. 2009; Gorissen et al. 2015 and references therein). Formally the robust counterpart of an uncertain optimization problem is

$$\begin{aligned} \inf _{\mathbf {x}\in {\mathcal {X}}} \left\{ \sup _{\mathbf {u}\in {\mathcal {U}}}\left\{ {\bar{f}_0}\left( \mathbf {x},\mathbf {u}\right) \right\} \,{:}\, {\bar{f}_i}(\mathbf {x},\mathbf {u})\geqslant 0 \ \forall \mathbf {u}\in {\mathcal {U}}, \ i \in [1\!:\!m] \right\} . \end{aligned}$$
(1)

The parameters of the functions \({\bar{f}_i}\) are uncertain and governed by the uncertainty parameter \(\mathbf {u}\) that lives in an uncertainty set \({\mathcal {U}}\). This set encompasses all possible realizations of \(\mathbf {u}\). The set \({\mathcal {X}}\subseteq \mathbb {R}^n\) is some feasible set for the decision vector, that is not affected by uncertainty. The solutions obtained by this approach can be overly conservative, a fact that motivated substantial research into redeeming this shortcoming. One solution to this problem is known as adjustable robust optimization (ARO). In this setting, the decision variables are grouped into two categories. The first stage decision variables represent decisions to be made “here and now”, i.e. at a point in time when there is uncertainty in some of the relevant problem data. The second stage decision variables represent decisions that can be delayed until uncertainty is removed. Of course, one could in the first stage make a decision on all of the variables and require the solution to be feasible for any realization of the uncertainty parameter. But then potential flexibility is not harnessed since this is equivalent to the classical robust approach, i.e. the solution will be unnecessarily conservative. The idea is that, rather than requiring that the decision on all variables is feasible in any case, we merely require our first stage decision to allow for a second stage adjustment that renders the overall solution to be feasible in any case. Said differently, we look for the best among those first stage decisions on \(\mathbf {x}\in {\mathcal {X}}\) for which there exists a function \(\mathbf {y}(\mathbf {u})\) that, given any realization of the uncertainty parameter \(\mathbf {u}\), maps to a vector for the second stage decision that in total gives a feasible solution to the optimization problem. ARO was introduced in Ben-Tal et al. (2004) and has received much attention. For a detailed survey see (Yanikoglu et al. 2019). Generically ARO can be written, in a way slightly differing from (1), as

$$\begin{aligned} \inf _{\mathbf {x}\in {\mathcal {X}},\mathbf {y}(\mathbf {u})} \left\{ \sup _{\mathbf {u}\in {\mathcal {U}}}\left\{ f_0\left( \mathbf {x},\mathbf {y}(\mathbf {u}),\mathbf {u}\right) \right\} \,{:}\, f_i(\mathbf {x},\mathbf {y}(\mathbf {u}),\mathbf {u})\geqslant 0 \ \forall \mathbf {u}\in {\mathcal {U}}, \ i \in [1\!:\!m] \right\} . \end{aligned}$$
(2)

The first stage decision \(\mathbf {x}\in {\mathcal {X}}\) is again vector valued, but the second stage variable \(\mathbf {y}(\mathbf {u})\) is allowed to adapt to the uncertainty and is thus a function of \(\mathbf {u}\). Since the space of all functions is intractable, so is (2), and thus it is much harder to solve in practice than (1). However, there are many powerful approaches to (approximately) solve it, see (Yanikoglu et al. 2019) and references therein. Recently, (Zhen et al. 2019) proposed an adjustable robust approach to disjoint bilinear optimization where the problem \(\inf _{(\mathbf {x},\mathbf {y})\in {\mathcal {X}}\times {\mathcal {Y}}} \mathbf {x}^{{\textsf {T} }}{\textsf {Q} }\mathbf {y}\) was reformulated into an instance of ARO. The resulting problem can then be tackled by the numerous techniques that are usually applied in the adjustable robust framework.

1.4 Contribution

  • In Sect. 3 we discuss a strategy by which convex reformulations of QCQPs have been used in literature for the purpose of reformulating semi-infinite constraints with quadratic index. The strategy is not new, but it has so far not been discussed in a general manner. We think this generic perspective is instructive and helps familiarize the two fields with each other, and also allows for a discussion of the critical elements of that approach. We give small illustrative examples, also to highlight the important role of dual attainability in the reformulation.

  • In Sect. 4 we give an example of a class of homogeneous QCQPs for which we can close the relaxation gap for the so called Shor relaxation (see Theorem 3). We show that the result can be transferred to a class of non-homogeneous QCQPs (see Theorem 5) which are special cases of QCQPs for which the tightness of the Shor relaxation is known.

  • Also in Sect. 4 we show that our reformulation result implies a sufficient condition on quadratic forms for the S-procedure to be exact (see Theorem 7). Our sufficient condition can be shown to relate to another known sufficient condition in a way that allows us to infer some information on the geometry of the joint numerical range (see Proposition 11).

  • In Sect. 5 we give an alternative, copositive perspective on the general reformulation strategy for semi-infinite constraints. The so obtained reformulation can be used to characterize the gap that occurs if the underlying QCQP-reformulation fails to exhibit dual attainability (see Theorem 15).

  • Finally, in Sect. 6, we show that a class of QCQPs can be reformulated as adjustable robust optimization problems (see Theorem 18) which themselves allow for a bound based on the general reformulation strategy for semi-infinite constraints (see Theorem 19). The resulting lower bound is tested against known lower bounds found in literature. Empirical evidence suggests that for some range of the problem parameters our lower bound can be beneficial in terms of computation time and solution quality.

2 Notation and preliminaries

Throughout the paper matrices are denoted by sans-serif capital letters (e.g. the \(n\times n\)-identity matrix will be denoted by \({\textsf {E} }_n\), while \({\textsf {O} }\) will denote the zero matrix), vectors by boldface lower case letters (e.g. \(\mathbf {e}_i\) will denotes the i-th column of \({\textsf {E} }_n\), and \(\mathbf {o}\) will denote the zero vector) and scalars (real numbers) by simple lower case letters. We use \(x_i\) to denote the i-th entry of a vector \(\mathbf {x}\). The analogous holds between matrices and vectors, and between matrices and numbers with double indices. For example, for a matrix \({\textsf {X} }\), the i-th column or row vector (which will be pointed out as we go along) is given by \(\mathbf {x}_i\), and the j-th entry in the i-row will by denoted \(x_{ij}\). Sets will be denoted using calligraphic letters, e.g., cones will often be denoted by \({\mathcal {K}}\). We use \({\mathcal {S}}^n\) to indicate the set of symmetric matrices and \({\mathcal {S}}_{++}^n\)/\({\mathcal {S}}_{--}^n\), \({\mathcal {S}}^n_+\)/\({\mathcal {S}}^n_-\) for the sets of positive-/negative-definite and the set of positive-/negative-semidefinite symmetric matrices, respectively. Moreover, we use \({\mathcal {N}}_n\) to denote the set of entrywise nonnegative, symmetric matrices. For notational convenience, we will define the block matrix

$$\begin{aligned} {\textsf {M} }({\textsf {A} },\, {\textsf {B} },\,{\textsf {C} })\,{:}{=}\, \begin{pmatrix} {\textsf {A} }&{} \frac{1}{2}{\textsf {B} }\\ \frac{1}{2}{\textsf {B} }^{{\textsf {T} }}&{} {\textsf {C} }\end{pmatrix} \end{aligned}$$
(3)

where \({\textsf {A} },{\textsf {B} },{\textsf {C} }\) are matrices of appropriate size. Also, for a vector \(\mathbf {v}\in \mathbb {R}^n\) we will denote by \({{\,\mathrm{Diag}\,}}(\mathbf {v})\in {\mathcal {S}}^n\) the diagonal matrix whose diagonal entries are the respective entries of \(\mathbf {v}\).

For a given set \({\mathcal {A}}\) we denote its interior, closure, relative interior, relative boundary, convex hull and conic hull by \({\mathrm {int}}({\mathcal {A}})\), \({\mathrm {cl}}({\mathcal {A}})\), \({\mathrm {ri}}({\mathcal {A}})\), \({\mathrm {bd}}({\mathcal {A}})\), \({\mathrm {conv}}({\mathcal {A}})\), \({\mathrm {cone}}({\mathcal {A}})\), where for the latter operation we stick to the convention that \({\mathrm {cone}}({\mathcal {A}}) \,{:}{=}\, \{\lambda \mathbf {x}\,{:}\, \mathbf {x}\in {\mathcal {A}}\, , \,\ \lambda \geqslant 0 \}\) for notational ease (note that by this definition \({\mathrm {cone}}({\mathcal {A}})\) is the union of \(\{\mathbf {o}\}\) and the smallest cone containing \({\mathcal {A}}\)).

Let \({\mathcal {V}}\) be either \(\mathbb {R}^n\) or \({\mathcal {S}}^n\), whereby we use the standard inner product in \(\mathbb {R}^n\), while in \({\mathcal {S}}^n\) the inner product is given by the Frobenius product \({\textsf {A} }\bullet {\textsf {B} }= {\mathrm {trace}}({\textsf {A} }{\textsf {B} })\). For an arbitrary cone \({\mathcal {K}}\subseteq {\mathcal {V}}\) we denote the dual cone by \({\mathcal {K}}^*\subseteq {\mathcal {V}}\) where

$$\begin{aligned} {\mathcal {K}}^* \,{:}{=}\, \left\{ \mathbf {y}\in {\mathcal {V}}\,{:}\, \langle \mathbf {x},\mathbf {y}\rangle \geqslant 0 \quad \ \text{ for } \text{ all } \mathbf {x}\in {\mathcal {K}}\right\} = [{\mathrm {cl}}\, {\mathcal {K}}]^*\, . \end{aligned}$$

It is well known that \({\mathcal {K}}^{**} = {\mathrm {clconv}}({\mathcal {K}})\). The closure and the convex-hull operations can be omitted if \({\mathcal {K}}\) is closed and/or convex, respectively. We also have

$$\begin{aligned} {\mathrm {int}}({\mathcal {K}}^*) = \left\{ \mathbf {y}\in {\mathcal {V}}\,{:}\, \langle \mathbf {x},\mathbf {y}\rangle > 0 \quad \ \text{ for } \text{ all } \mathbf {x}\in {\mathcal {K}}\setminus \{\mathbf {o}\} \right\} . \end{aligned}$$

We say a convex cone \({\mathcal {K}}\) is pointed if it does not contain a line or, equivalently, if \({\mathrm {int}}({\mathcal {K}}^*) \ne \varnothing \).

Throughout the text, we will make use of conic linear optimization tools. A conic linear optimization problem is given by

$$\begin{aligned} p^*=\inf _{\mathbf {x}\in {\mathcal {K}}}\left\{ \langle \mathbf {c},\mathbf {x}\rangle \,{:}\, \langle \mathbf {a}_i,\mathbf {x}\rangle = b_i\, i \in [1\!:\!m]\right\} \, , \end{aligned}$$
(P)

which is just a linear optimization problem with an extra constraint that restricts the decision variable \(\mathbf {x}\) to lie in a closed, convex cone \({\mathcal {K}}\). The dual problem is given by

$$\begin{aligned} d^*=\sup _{\varvec{\lambda }\in \mathbb {R}^m} \left\{ \sum _{i=1}^{m}-\lambda _i b_i \,{:}\, \mathbf {c}+ \sum _{i=1}^{m}\lambda _i\mathbf {a}_i \in {\mathcal {K}}^* \right\} \, . \end{aligned}$$
(D)

Slater’s condition says that a relative interior feasible point of (P) guarantees \(p^*=d^*\) and existence of an optimal solution to (D) if it is feasible, and likewise with (D) and (P) interchanged.

A large portion of our discussion will revolve around set-copositive and set-completely positive matrix cones, which for arbitrary cones \({\mathcal {K}}\subseteq \mathbb {R}^n\) are defined in the following way, putting \(k={n+1\atopwithdelims ()2}= \dim {\mathcal {S}}^n\):

$$\begin{aligned} \mathcal {COP}_n({\mathcal {K}})&\,{:}{=}\, \left\{ {\textsf {Q} }\in {\mathcal {S}}^n \,{:}\, \mathbf {x}^{{\textsf {T} }}{\textsf {Q} }\mathbf {x}\geqslant 0 \ \text{ for } \text{ all } \mathbf {x}\in {\mathcal {K}}\right\} \, ,\\ \mathcal {CPP}_n({\mathcal {K}})&\,{:}{=}\, \left\{ \sum _{i=1}^{k}\mathbf {x}_i\mathbf {x}_i^{{\textsf {T} }}\,{:}\, \mathbf {x}_i \in {\mathcal {K}},\ i\in [1\!:\!k]\right\} \\&= {\mathrm {conv}}\left\{ \mathbf {x}\mathbf {x}^{{\textsf {T} }}\,{:}\, \mathbf {x}\in {\mathcal {K}}\right\} = \{{\textsf {X} }{\textsf {X} }^{{\textsf {T} }}\,{:}\, {\textsf {X} }\in \mathbb {R}^{n\times k}\, , \, \mathbf {x}_i \in {\mathcal {K}}, \, i \in [1\!:\!k] \}\\&\subseteq {\mathcal {S}}^n. \end{aligned}$$

The index n will henceforth be suppressed if the dimension is clear from the context. The above matrix cones are convex cones that are dual to each other if \({\mathcal {K}}\) is closed since in general \(\mathcal {COP}({\mathcal {K}})^* = {\mathrm {cl}}\, [\mathcal {CPP}({\mathcal {K}})] = \mathcal {CPP}({\mathrm {cl}}\,{\mathcal {K}})\) holds (Sturm and Zhang 2003, Proposition 1 and Lemma 1). In fact, an easy continuity argument shows that \(\mathcal {COP}({\mathcal {K}})\) is always closed even if \({\mathcal {K}}\) is not. Since \({\mathcal {S}}^n_+ = \mathcal {COP}(\mathbb {R}^n) = \mathcal {CPP}(\mathbb {R}^n)\), this also shows that the cone of positive-semidefinite matrices is self-dual, i.e. \(({\mathcal {S}}^n_+)^* = {\mathcal {S}}^n_+\). The concept of copositivity was first introduced in Motzkin (1952) for the case \({\mathcal {K}}= \mathbb {R}^n_+\). To determine whether a given matrix is an element of either \(\mathcal {COP}(\mathbb {R}^n_+)\) or \(\mathcal {CPP}(\mathbb {R}^n_+)\) is NP-hard. This leaves the impression that using these cones in optimization yields a complication rather than a solution of a problem. However, both cones can be approximated by tractable matrix cones, and recent literature shows that even simple approximations yield good bounds for several kinds of optimization problems. We will not go into further detail but instead refer to Bomze (2012), Dür (2010), Hiriart-Urruty and Seeger (2010). Here, let us just mention the most elementary examples of tractable approximations given by

$$\begin{aligned} \begin{aligned} \mathcal {COP}(\mathbb {R}_+^n)&\supseteq {\mathcal {S}}^n + {\mathcal {N}}_n \,{=}{:}\, \mathcal {NND}_n, \\ \mathcal {CPP}(\mathbb {R}_+^n)&\subseteq {\mathcal {S}}^n_+\cap {\mathcal {N}}_n \,{=}{:}\, \mathcal {DNN}_n \end{aligned} \end{aligned}$$
(4)

where the mnemonics \(\mathcal {NND}\), \(\mathcal {DNN}\) abbreviate non-negative-decomposable and doubly-non-negative respectively. In fact, the inclusions in (4) hold with equality in case \(n<5\). We also have the useful equality

$$\begin{aligned} \mathcal {COP}(\mathbb {R}^n\times \mathbb {R}_+) = \mathcal {CPP}(\mathbb {R}^n\times \mathbb {R}_+) = {\mathcal {S}}^{n+1}_+ \end{aligned}$$

which holds since \(\mathcal {COP}(\mathbb {R}^n\times \mathbb {R}_+)\supseteq {\mathcal {S}}^{n+1}_+\) while on the other hand \((\mathbf {x}^{{\textsf {T} }},x_0){\textsf {A} }(\mathbf {x};x_0) \geqslant 0\ \forall (\mathbf {x},x_0)\in \mathbb {R}^n\times \mathbb {R}_+\) implies \((\mathbf {x}^{{\textsf {T} }},x_0){\textsf {A} }(\mathbf {x};x_0) = (-1)^2(-\mathbf {x}^{{\textsf {T} }},-x_0){\textsf {A} }(-\mathbf {x};-x_0) \geqslant 0 \ \forall (\mathbf {x},x_0)\in \mathbb {R}^n\times \mathbb {R}_-\). The equality \(\mathcal {CPP}(\mathbb {R}^n\times \mathbb {R}_+) = {\mathcal {S}}^{n+1}_+\) then follows from selfduality of \({\mathcal {S}}^{n+1}_+\) and the fact that \(\mathbb {R}^n\times \mathbb {R}_+\) is a closed, convex cone and hence \(\mathcal {COP}(\mathbb {R}^n\times \mathbb {R}_+) = \mathcal {CPP}(\mathbb {R}^n\times \mathbb {R}_+)^*\) .

3 Convex reformulations of QCQPs and robust optimization

We will now proceed in describing the general strategy by which these reformulation results can be leveraged in order to cope with cases where the semi-infinite constraints present in robust optimization depend quadratically on the uncertainty vector \(\mathbf {u}\). Assume that \(f(\mathbf {x},\mathbf {u}) = \mathbf {u}^{{\textsf {T} }}{\textsf {Q} }(\mathbf {x})\mathbf {u}+{\mathbf {q}}(\mathbf {x})^{{\textsf {T} }}\mathbf {u}+\omega (\mathbf {x})\) where \({\textsf {Q} }(\mathbf {x}),{\mathbf {q}}(\mathbf {x}),\omega (\mathbf {x})\) are matrix-, vector- and scalar-valued functions of the decision vector \(\mathbf {x}\). For ease of notation we refer to them as \({\textsf {Q} },{\mathbf {q}}\) and \(\omega \), suppressing the dependence on \(\mathbf {x}\). We thus consider the semi-infinite constraint and its (possibly non-smooth) single-constraint equivalent

$$\begin{aligned} \mathbf {u}^{{\textsf {T} }}{\textsf {Q} }\mathbf {u}+{\mathbf {q}}^{{\textsf {T} }}\mathbf {u}+\omega \geqslant 0 \ \forall \mathbf {u}\in {\mathcal {U}}\quad \Longleftrightarrow \quad \inf _{\mathbf {u}\in {\mathcal {U}}}\left[ \mathbf {u}^{{\textsf {T} }}{\textsf {Q} }\mathbf {u}+ {\mathbf {q}}^{{\textsf {T} }}\mathbf {u}+\omega \right] \geqslant 0\, . \end{aligned}$$
(5)

Now suppose that for generic \({\textsf {Q} },{\mathbf {q}}\) and \(\omega \) the minimization problem in (5) has an exact convex, conic reformulation of the following form, using an appropriate matrix cone \({\mathcal {C}}\) and appropriate matrices \({\textsf {A} }_i\in {\mathcal {S}}^{n+1}\), real numbers \(b_i\in \mathbb {R}, \ i \in [1\!:\!m]\), so that, using notation (3),

$$\begin{aligned} \inf _{\mathbf {u}\in {\mathcal {U}}}\left[ \mathbf {u}^{{\textsf {T} }}{\textsf {Q} }\mathbf {u}+ {\mathbf {q}}^{{\textsf {T} }}\mathbf {u}+\omega \right] = \inf _{{\textsf {Y} }\in {\mathcal {C}}} \{{\textsf {M} }({\textsf {Q} },{\mathbf {q}},\omega )\bullet {\textsf {Y} }\,{:}\, {\textsf {A} }_i\bullet {\textsf {Y} }\leqslant b_i\, , \,i \in [1\!:\!m]\} \,. \end{aligned}$$

Further, assume that for this reformulation and its dual we can establish zero duality gap and dual attainability (e.g., if the primal satisfies Slater’s condition), so that

$$\begin{aligned}&\inf _{{\textsf {Y} }\in {\mathcal {C}}} \{{\textsf {M} }({\textsf {Q} },{\mathbf {q}},\omega )\bullet {\textsf {Y} }\,{:}\, {\textsf {A} }_i\bullet {\textsf {Y} }\leqslant b_i\, , \, i \in [1\!:\!m]\}\\&=\sup _{\varvec{\lambda }\in \mathbb {R}_+^m} \left\{ -\mathbf {b}^{{\textsf {T} }}\varvec{\lambda }\,{:}\, {\textsf {M} }({\textsf {Q} },{\mathbf {q}},\omega ) +\sum _{i=1}^m\lambda _i{\textsf {A} }_i \in {\mathcal {C}}^*\right\} \, . \end{aligned}$$

Since dual attainability guarantees the existence of the dual maximizers, we can enforce the semi-infinite constraint in (5) by demanding that

$$\begin{aligned} \mathbf {b}^{{\textsf {T} }}\varvec{\lambda }\leqslant 0\;\; \text{ and } \;\; {\textsf {M} }({\textsf {Q} },{\mathbf {q}},\omega ) +\sum _{i=1}^m\lambda _i{\textsf {A} }_i \in {\mathcal {C}}^*\quad \text{ for } \text{ some } \varvec{\lambda }\in \mathbb {R}_+^{m} \, . \end{aligned}$$
(6)

In summary, the strategy is to bypass the need to directly dualize the implicit QCQP in (5) by providing a linear conic reformulation, a dual of which can be formed by invoking linear conic duality, which is a very well developed and understood subject. Many results in recent literature have harnessed this general strategy in order to cope with cases where quadratic terms in the uncertainty vector appear (see e.g. Mittal et al. 2019; Xu and Hanasusanto 2019). We want to highlight that the critical ingredients for the above strategy are:

  • closing the relaxation gap,

  • closing the duality gap for the conic reformulation and

  • guaranteeing dual attainability.

While the first obstacle seems to be the most challenging, duality also comes with some subtleties attached. In Sect. 5 we will make an effort to understand the reformulation gap, that may occur in case of a failure of dual attainability, while in Sect. 4 we will discuss relaxation and duality gaps and their relation to each other. For now, we will give some examples that illustrate the above strategy and also demonstrate that the gap can be infinite if dual attainability does not hold.

Example 1

Consider the robust optimization problem

$$\begin{aligned} \begin{aligned} \min&\, x+y\\ {{\mathrm {s.t.:}}}\,&xu_1^2+yu_2^2 \geqslant 1 \quad \forall (u_1,u_2) \in \mathbb {R}^2_+ \,{:}\, u_1+u_2 = 1. \end{aligned} \end{aligned}$$
(7)

If, out of the infinitely many constraints, we enforced merely \(0.5^2x+0.5^2y \geqslant 1\), we obtained a relaxed problem with an optimal value of 4 attained at \(x=y=2\). This solution is also feasible for the original problem since \(\min \big \{2u_1^2+2u_2^2\,{:}\, u_1+u_2 = 1, (u_1,u_2) \in \mathbb {R}^2_+\big \} = 1\), and thus it is also optimal. We will now show, that the general reformulation strategy yields an equivalent problem with finitely many constraints. It is well known that \(\min \left\{ au_1^2+bu_2^2+cu_1u_2 \,{:}\, u_1+u_2 =1, \ (u_1,u_2)\in \mathbb {R}_+^2 \right\} = \min \left\{ au_{11}+bu_{22}+cu_{21} \,{:}\, u_{11}+2u_{21}+u_{22} = 1,\ {\textsf {U} }\in {\mathcal {S}}^2_+\cap {\mathcal {N}}_2 \right\} \) (see Bomze et al. 2002). We have \({\textsf {M} }(1/3,2/6,1/3)\in {\mathrm {int}}\left( {\mathcal {S}}^2_+\cap {\mathcal {N}}\right) \) so by Slater’s condition the latter optimization problem is equivalent to \( \max \left\{ \lambda \,{:}\, {{\textsf {M} }(a,c,b)}-\lambda {\textsf {M} }(1,2,1) \in {\mathcal {S}}^2_++{\mathcal {N}}_2 \right\} \), and the maximum is attained. We thus have the equivalences

$$\begin{aligned}&xu_1^2+yu_2^2 \geqslant 1 \quad \forall (u_1,u_2) \in \mathbb {R}^2_+ \,{:}\, u_1+u_2 = 1 \\&\quad \Leftrightarrow \min \left\{ xu_1^2+yu_2^2 \,{:}\, (u_1,u_2) \in \mathbb {R}^2_+ \,{:}\, u_1+u_2 = 1 \right\} \geqslant 1\\&\quad \Leftrightarrow \min \left\{ xu_{11}+yu_{22} \,{:}\, u_{11}+2u_{21}+u_{22} = 1,\ {\textsf {U} }\in {\mathcal {S}}^2_+\cap {\mathcal {N}}_2 \right\} \geqslant 1\\&\quad \Leftrightarrow \max \left\{ \lambda \,{:}\, {\textsf {M} }(x,0,y)-\lambda {\textsf {M} }(1,2,1)\in {\mathcal {S}}^2_++{\mathcal {N}}_2 \right\} \geqslant 1\\&\quad \Leftrightarrow \lambda \geqslant 1,\ {\textsf {M} }(x,0,y)-\lambda {\textsf {M} }(1,2,1)\in {\mathcal {S}}^2_++{\mathcal {N}}_2 \text{ for } \text{ some } \lambda \in \mathbb {R}\end{aligned}$$

Thus we can replace the constraints in (7) in order to obtain

$$\begin{aligned} \begin{aligned} \min&\, x+y\\ {{\mathrm {s.t.:}}}\,&\lambda \geqslant 1,\ {\textsf {M} }(x,0,y)-\lambda {\textsf {M} }(1,2,1) \in {\mathcal {S}}^2_++{\mathcal {N}}_2, \end{aligned} \end{aligned}$$
(8)

and can see by computation that the minimum is attained at \(x=y=2\) and \(\lambda = 1\).

Example 2

To illustrate the issues of a failure of dual attainability for the conic reformulation we consider the robust optimization problem

(9)

Note that the semi-infinite constraint is equivalent to \(x+y\geqslant 0\), so that the constraints overall imply \(x+y=0\) and the minimum is 0. We claim that for any \(a\in \mathbb {R}\) we have \(q(a) \,{:}{=}\, \min \left\{ au_1^2-u_2u_3 \,{:}\,(u_1,u_2,u_3)\in [0,1]\times [0,1]\times \left\{ 0 \right\} \right\} = \min \left\{ au_{11} - u_{23} \,{:}\, u_{11}\leqslant 1, u_{22}\leqslant 1, u_{33}= 0, {\textsf {U} }\in {\mathcal {S}}^3_+ \right\} \). The claim follows easily after considering that \({\textsf {U} }\in {\mathcal {S}}^3_+\) and \(u_{33} =0\) imply \(u_{23} = 0\). Note, that this reformulation is not based on a characterization of \({\mathcal {G}}({\mathcal {F}})\), but is coincidental, and chosen here in order to illustrate a point about dual attainability. Also, we see that \(q(a) = a \) if \(a<0\) and zero otherwise. The conic reformulation does not have a Slater point, however its dual, given by

$$\begin{aligned} \sup \,&\lambda _1+\lambda _2 \\ {{\mathrm {s.t.:}}}\,&{\textsf {M} }_D(a,\lambda _1,\lambda _2,\lambda _3) \,{:}{=}\, \begin{pmatrix} a-\lambda _1 &{} 0 &{} 0\\ 0 &{}-\lambda _2 &{} -1/2\\ 0 &{} -1/2 &{} -\lambda _3 \end{pmatrix} \in {\mathcal {S}}^3_+,\quad \lambda _1 \leqslant 0 \end{aligned}$$

does have a Slater point, where \(\lambda _i, i=1,2,3\) are sufficiently small. Thus the conic reformulation and its dual have the same optimal value, but dual attainability is not guaranteed. In fact, some elementary analysis reveals that the dual does not attain its optimum. We thus merely have the following implications

$$\begin{aligned}&(x+y)u_1^2 \geqslant u_2u_3 \quad \forall (u_1,u_2,u_3)\in [0,1]\times [0,1]\times \left\{ 0 \right\} \\&\quad \Leftrightarrow \min \left\{ (x+y)u_1^2-u_2u_3 \,{:}\, (u_1,u_2,u_3)\in [0,1]\times [0,1]\times \left\{ 0 \right\} \right\} \geqslant 0 \\&\quad \Leftrightarrow \min \left\{ (x+y)u_{11} - u_{23} \,{:}\, u_{11}\leqslant 1, u_{22}\leqslant 1, u_{33}= 0, {\textsf {U} }\in {\mathcal {S}}^3_+ \right\} \geqslant 0 \\&\quad \Leftrightarrow \sup \left\{ \lambda _1+\lambda _2 \,{:}\, {\textsf {M} }_D((x+y),\lambda _1,\lambda _2,\lambda _3) \in {\mathcal {S}}^3_+, \lambda _1\leqslant 0 \right\} \geqslant 0\\&\quad \Leftarrow \lambda _1+\lambda _2 \geqslant 0, \ {\textsf {M} }_D((x+y),\lambda _1,\lambda _2,\lambda _3) \in {\mathcal {S}}^3_+,\ \lambda _1\leqslant 0 \text{ for } \text{ some } \varvec{\lambda }\in \mathbb {R}^3 \end{aligned}$$

The final constraint implies \(x+y>0\) since in case of \(x+y=0\) we had \(\lambda _1=0\) and \(\lambda _2\) can neither be zero nor positive. Thus if we replaced the semi-infinite constraint in (9) by the latter constraint, the problem would become infeasible. This exemplifies that closing the relaxation gap as well as the duality gap is in general not enough to guarantee an exact reformulation.

4 Exactness of the Shor relaxation and the S-Lemma

As stressed in Burer (2015), the key problem when employing Theorem 1 is to characterize the set \({\mathcal {G}}({\mathcal {F}})\), which of course depends heavily on the choice of \({\mathcal {F}}\) (note that no convexity assumptions are imposed on \({\mathcal {F}}\)). A natural starting point is given by the so called Shor relaxation which is constructed in the following manner. For notational convenience we define \({\textsf {Y} }({\textsf {X} },\mathbf {x})\,{:}{=}\, {\textsf {M} }({\textsf {X} },2\mathbf {x},1)\) as well as \({\mathcal {E}}_{xt}({\mathcal {K}}\times \mathbb {R}_+) \,{:}{=}\, \left\{ \mathbf {x}\mathbf {x}^{{\textsf {T} }}\,{:}\, \mathbf {x}\in {\mathcal {K}}\times \mathbb {R}_+ \right\} \) where \({\mathcal {K}}\) is a closed cone, so that \(\mathcal {CPP}({\mathcal {K}}\times \mathbb {R}_+) = {\mathrm {clconv}}({\mathcal {E}}_{xt}({\mathcal {K}}\times \mathbb {R}_+))\) (in fact one can show that \({\mathcal {E}}_{xt}({\mathcal {K}}\times \mathbb {R}_+) \) is identical to the set of extreme rays of \(\mathcal {CPP}({\mathcal {K}}\times \mathbb {R}_+) \)). We have

(10)

We will refer to the latter problem as Shor relaxation since the core idea was introduced in Shor (1987) albeit for the special case of \({\mathcal {K}}= \mathbb {R}^n\), in which case \(\mathcal {CPP}\left( \mathbb {R}^n\times \mathbb {R}_+\right) = {\mathcal {S}}_+^{n+1}\) (see the notations and preliminaries section). The above derivation makes it apparent that the Shor relaxation is not necessarily tight, since its feasible set \({\mathcal {F}}_{Shor}\) can have extreme points that are not rank one matrices. However, if we find that the optimal solution of the Shor relaxation is a rank one matrix, then we have solved the original QCQP. If we can show that a rank one solution exists for any choice of coefficients \(({\textsf {Q} },{\mathbf {q}},\omega )\), we have indeed shown that \({\mathcal {F}}_{Shor} = {\mathcal {G}}({\mathcal {F}})\) (see e.g. Yang et al. 2016). But in general \({\mathcal {F}}_{Shor} \supseteq {\mathcal {G}}({\mathcal {F}})\) so that one has to find a strengthening of the Shor relaxation if the relaxation gap is to be closed (e.g. see Burer 2009, 2012; Eichfelder and Povh 2013)

However, another way of establishing exactness of the Shor relaxation stems from the fact that the conic dual of the Shor relaxation and the Semi-Lagrangian dual of the underlying QCQP (see Bomze 2015; Faye and Roupin 2007) take the same form. To see this consider the following derivation of the dual of a QCQP, using the abbreviations \(\bar{\mathbf {b}} = [1,b_1,\ldots , b_m]^{{\textsf {T} }}\in \mathbb {R}^{m+1}\) and, for given \(\varvec{\lambda }= [\lambda _0,\lambda _1, \ldots , \lambda _m]^{{\textsf {T} }}\in \mathbb {R}^{m+1}_+\), the quadratic form \(q(\mathbf {x},x_0;\varvec{\lambda })= \mathbf {x}^{{\textsf {T} }}{\textsf {Q} }\mathbf {x}+x_0{\mathbf {q}}^{{\textsf {T} }}\mathbf {x}+x_0^2\omega +\lambda _0 x_0^2 +\sum _{i=1}^{m} \lambda _i(\mathbf {x}^{{\textsf {T} }}{\textsf {A} }_i\mathbf {x}+ x_0\mathbf {a}_i^{{\textsf {T} }}\mathbf {x})\):

$$\begin{aligned}&\qquad \quad \inf _{\mathbf {x}\in {\mathcal {K}}}\{\mathbf {x}^{{\textsf {T} }}{\textsf {Q} }\mathbf {x}+{\mathbf {q}}^{{\textsf {T} }}\mathbf {x}+\omega \,{:}\, \mathbf {x}^{{\textsf {T} }}{\textsf {A} }_i\mathbf {x}+ \mathbf {a}_i^{{\textsf {T} }}\mathbf {x}\leqslant b_i, \ i \in [1\!:\!m] \}\\&\quad =\inf _{(\mathbf {x},x_0) \in {\mathcal {K}}\times \mathbb {R}_+}\{\mathbf {x}^{{\textsf {T} }}{\textsf {Q} }\mathbf {x}+x_0{\mathbf {q}}^{{\textsf {T} }}\mathbf {x}+\omega x_0^2 \,{:}\, \mathbf {x}^{{\textsf {T} }}{\textsf {A} }_i\mathbf {x}+ x_0\mathbf {a}_i^{{\textsf {T} }}\mathbf {x}\leqslant b_i, \ i \in [1\!:\!m],\ x_0^2 = 1 \}\\&\quad \geqslant \sup _{\varvec{\lambda }\in \mathbb {R}_+^{m+1}}\inf _{(\mathbf {x},x_0) \in {\mathcal {K}}\times \mathbb {R}_+}\left\{ \mathbf {x}^{{\textsf {T} }}{\textsf {Q} }\mathbf {x}+x_0{\mathbf {q}}^{{\textsf {T} }}\mathbf {x}+x_0^2\omega +\lambda _0(x_0^2-1)+\sum _{i=1}^{m} \lambda _i(\mathbf {x}^{{\textsf {T} }}{\textsf {A} }_i\mathbf {x}+ x_0\mathbf {a}_i^{{\textsf {T} }}\mathbf {x}-b_i) \right\} \\&\quad =\sup _{\varvec{\lambda }\in \mathbb {R}_+^{m+1}}\left\{ -\varvec{\lambda }^{{\textsf {T} }}\bar{\mathbf {b}}\, + \inf _{(\mathbf {x},x_0) \in {\mathcal {K}}\times \mathbb {R}_+} {q(\mathbf {x},x_0;\varvec{\lambda })} \right\} \\&\quad =\sup _{\varvec{\lambda }\in \mathbb {R}_+^{m+1}} \left\{ -\varvec{\lambda }^{{\textsf {T} }}\bar{\mathbf {b}} \,{:}\, {q(\mathbf {x},x_0;\varvec{\lambda })} \geqslant 0 \quad \forall (\mathbf {x},x_0) \in {\mathcal {K}}\times \mathbb {R}_+ \right\} \\&\quad =\sup _{\varvec{\lambda }\in \mathbb {R}_+^{m+1}} \left\{ -\varvec{\lambda }^{{\textsf {T} }}\bar{\mathbf {b}} \,{:}\, {\textsf {M} }({\textsf {Q} },{\mathbf {q}},\omega )+\lambda _0{\textsf {M} }({\textsf {O} },\mathbf {o},1) + \sum _{i=1}^{m} \lambda _i{\textsf {M} }({\textsf {A} }_i,\mathbf {a}_i,0) \in \mathcal {COP}({\mathcal {K}}\times \mathbb {R}_+) \right\} \, . \end{aligned}$$

In the second to last equality we used the fact that for any quadratic form \(q(\mathbf {x})\) and any cone \(\bar{{\mathcal {K}}}\) we have

$$\begin{aligned} \inf _{\mathbf {x}\in \bar{{\mathcal {K}}}} q(\mathbf {x}) = {\left\{ \begin{array}{ll} \ 0 &{} \text{ if } q(\mathbf {x}) \geqslant 0 \ \forall \ \mathbf {x}\in \bar{{\mathcal {K}}} \\ -\infty &{} \text{ otherwise }. \end{array}\right. } \end{aligned}$$

Here, we speak of a Semi-Lagrangian dual, since the constraints \((\mathbf {x},x_0)\in {\mathcal {K}}\times \mathbb {R}_+\) are not dualized in that we do not introduce respective dual variables. The final equality follows from the definition of \(\mathcal {COP}(\cdot )\) and in fact yields the conic dual of the Shor relaxation after a small reformulation,

$$\begin{aligned}&\quad \qquad \inf _{{\textsf {X} },\mathbf {x}} \{{\textsf {Q} }\bullet {\textsf {X} }+ {\mathbf {q}}^{{\textsf {T} }}\mathbf {x}+\omega \,{:}\, {\textsf {A} }_i\bullet {\textsf {X} }+ \mathbf {a}_i^{{\textsf {T} }}\mathbf {x}\leqslant b_i, \ i \in [1\!:\!m], {\textsf {Y} }({\textsf {X} },\mathbf {x}) \in \mathcal {CPP}({\mathcal {K}}\times \mathbb {R}_+)\}\\&\quad =\inf _{{\textsf {Y} }\in \mathcal {CPP}({\mathcal {K}}\times \mathbb {R}_+)} \{{\textsf {M} }({\textsf {Q} },{\mathbf {q}},\omega )\bullet {\textsf {Y} }\,{:}\, {\textsf {M} }({\textsf {A} }_i,\mathbf {a}_i,0)\bullet {\textsf {Y} }\leqslant b_i, \ i \in [1\!:\!m],\ {\textsf {M} }({\textsf {O} },\mathbf {o},1)\bullet {\textsf {Y} }= 1\}\, . \end{aligned}$$

Duality results for QCQPs are regularly obtained by proving that the joint numerical range of the quadratic forms involved has some convexity property (either being convex itself or becoming convex after adding the positive orthant in Minkowski’s set-sum sense). For examples, see (Chieu et al. 2019; Jeyakumar and Li 2014) and references therein. If for a fixed feasible set \({\mathcal {F}}\) the duality result holds for an arbitrary choice of the objective function coefficients, the feasible set of the Shor relaxation is in fact \({\mathcal {G}}({\mathcal {F}})\).

We will now proceed to give some examples that illustrate the interplay between relaxation gaps, duality gaps and exact strengthenings of QCQPs and their relaxations.

Example 3

(Zero duality gap for the QCQP, hence the Shor relaxation is tight) Consider the optimization problem \(\min _{x\in \mathbb {R}_+}\left\{ -x^2 \,{:}\, x^2\leqslant 1 \right\} \). We will now give an argument for full strong duality based on the joint numerical range. In this simple special case the difficulties of such an approach are avoided while the core idea of the argument is maintained. The following equalities are easily checked:

$$\begin{aligned}&\quad \qquad \min _{x\in \mathbb {R}_+}\left\{ -x^2 \,{:}\, x^2\leqslant 1 \right\} = \sup _{t\in \mathbb {R}}\left\{ t\,{:}\, -x^2\geqslant t \ \text{ for } \text{ all } \ x\in \mathbb {R}_+ \text{ with } x^2\leqslant 1 \right\} \\&\quad =\sup _{t\in \mathbb {R}}\left\{ t\,{:}\, \left\{ x\in \mathbb {R}_+\,{:}\, -x^2<t,\ x^2\leqslant 1 \right\} = \varnothing \right\} \, . \end{aligned}$$

In order to close the duality gap we have to show that the condition

$$\begin{aligned} \left\{ x\in \mathbb {R}_+\,{:}\, -x^2<t,\ x^2\leqslant 1 \right\} = \varnothing \end{aligned}$$
(11)

is equivalent to

$$\begin{aligned} \exists \lambda \geqslant 0 \,{:}\, \ (-x^2-t)+\lambda (x^2-1) \geqslant 0\ \forall x\in \mathbb {R}_+ \end{aligned}$$
(12)

Clearly, (12) implies (11), since in case (11) fails, any member of the respective set is an \(x\in \mathbb {R}_+\) that invalidates (12). The reverse implication can be demonstrated by appealing to the geometry of the joint numerical range \({\mathcal {J}}(-1,1,\mathbb {R}_+) = \left\{ (-x^2,x^2)^{{\textsf {T} }}\,{:}\, x\in \mathbb {R}_+ \right\} \). In fact, in this simple case we have

$$\begin{aligned} {\mathcal {J}}(-1,1,\mathbb {R}_+)= \left\{ \delta \begin{pmatrix} -1\\ 1 \end{pmatrix} \,{:}\, \delta \geqslant 0 \right\} , \end{aligned}$$

which is a half line and thus convex (which also illustrates the classical result in Dines (1941)). Now assume that (11) holds, then \({\mathcal {J}}(-1,1,\mathbb {R}_+)-t\mathbf {e}_1-\mathbf {e}_2\) does not meet \({\mathrm {int}}(\mathbb {R}^2_-)\). Since both sets are convex, they can be separated by a hyperplane, i.e. it holds that

$$\begin{aligned} \exists (\alpha ,\beta )^{{\textsf {T} }}\in \mathbb {R}^2_{+}\setminus \left\{ \mathbf {o} \right\} \,{:}\, \ \alpha (-x^2-t)+\beta (x^2-1) \geqslant 0\ \forall x\in {\mathrm {int}}(\mathbb {R}_+) \end{aligned}$$

where the nonnegativity of the multipliers follows from the fact that we separate from the interior of the negative orthant and the \({\mathrm {int}}(\cdot )\)-operator can be dropped by continuity. For the latter condition to hold, it must be the case that \(\alpha > 0\), otherwise we had \(\beta >0\) and \(x=0\) would take the remaining term to be \(-\beta <0\). We thus arrive at (12) with \(\lambda = \beta /\alpha \). Finally, we have that for any fixed \(t\in \mathbb {R}\) (12) is equivalent to

$$\begin{aligned} \exists \lambda&\geqslant 0 \,{:}\, \ (-x^2-ty^2)+\lambda (x^2-y^2) \geqslant 0\ \forall (x,y)\in \mathbb {R}_+\times \left\{ 1 \right\} \\ \Leftrightarrow \exists \lambda&\geqslant 0 \,{:}\, \ (-x^2-ty^2)+\lambda (x^2-y^2) \geqslant 0\ \forall (x,y)\in \mathbb {R}_+^2\\ \Leftrightarrow \exists \lambda&\geqslant 0 \,{:}\,\ {\textsf {M} }(-1,0,-t) + \lambda {\textsf {M} }(1,0,-1) \in \mathcal {COP}(\mathbb {R}_+^2), \end{aligned}$$

so that \(\min _{x\in \mathbb {R}_+}\left\{ -x^2 \,{:}\, x^2\leqslant 1 \right\} = \sup _{t,\lambda }\big \{t \,{:}\, {\textsf {M} }(-1,0,-t) + \lambda {\textsf {M} }(1,0,-1) \in \mathcal {COP}(\mathbb {R}_+^2), \ \lambda \geqslant 0\big \}\). The latter problem is equivalent to \(\sup _{\lambda }\left\{ -\lambda \,{:}\, 1\leqslant \lambda ,\ \lambda \in \mathbb {R}_+ \right\} \) which is the dual of the Shor relaxation \(\min _{x\in {\mathcal {S}}_+^1}\left\{ -x\,{:}\, x\leqslant 1 \right\} \). Thus it follows that the Shor relaxation is tight.

Example 4

(Positive QCQP-duality gap, inexact Shor relaxation with exact tightening, both with zero duality gap) Consider the dual of \(\min \big \{x^2-y^2+x:\ x+y=1,\ x,y\geqslant 0\big \}\) given by

$$\begin{aligned} \sup&\, \lambda _1+\lambda _2 \\ {\mathrm {s.t.:}}&\ \begin{pmatrix} 1 &{} 0 &{} \tfrac{1-\lambda _1}{2}\\ 0 &{}-1 &{} \tfrac{\lambda _1}{2}\\ \tfrac{1-\lambda _1}{2}&{} \tfrac{\lambda _1}{2} &{} \lambda _2 \end{pmatrix} \in \mathcal {COP}(\mathbb {R}^3_+) = {\mathcal {S}}^3_++{\mathcal {N}}_3. \end{aligned}$$

The primal QCQP has an optimal value of \(-1\) while the lagrangian dual is infeasible since any matrix in \(\mathcal {COP}(\mathbb {R}^3_+)\) has nonnegative diagonal entries. The Shor relaxation given by

$$\begin{aligned} \min \, X-Y&+x\\ \mathrm {s.t.:}\, x+y&=1,\\ \begin{pmatrix} X &{}\quad Z &{}\quad x\\ Z &{}\quad Y &{}\quad y\\ x &{}\quad y &{}\quad 1 \end{pmatrix}&\in \mathcal {CPP}(\mathbb {R}^3_+) = {\mathcal {S}}^3_+\cap {\mathcal {N}}_3. \end{aligned}$$

is unbounded, where for the improving ray we have \(x=X=Z=0\), \(y=1\) and \(Y\geqslant 1\). Thus the duality gap for the Shor relaxation is actually zero since the QCQP and its Shor relaxation share the same langrangian dual. We can strengthen the Shor relaxation by introducing the additional constraint \(X+2Z +Y=1\), which is the relaxation of the redundant constraint \((x+y)^2 =1\). This yields an exact reformulation by the main results in Burer (2009), Burer (2012). After applying the simplification steps outlined there, we obtain another exact reformulation, namely \(\min \left\{ 2X-Y \,{:}\, X+2Z+Y=1,\ {\textsf {M} }(X,2Z,Y)\in \mathcal {CPP}(\mathbb {R}^2_+) \right\} \) which has a Slater point at \(X=Y= 0.4\), \(Z=0.1\), so that it exhibits no duality gap.

Example 5

(Exact Shor reformulation with finite duality gap) Let us consider \(\min \left\{ x_1^2+x_2^2 \,{:}\, x_1^2 = 0,\ x_1x_3-x_2^2 =-1,\ x_2\geqslant 0 \right\} \), whose optimal value of 1 is attained at \(x_1 = 0,\ x_2=1\). The Shor relaxation \(\min \big \{{\textsf {X} }_{11}+{\textsf {X} }_{22}\,{:}\, {\textsf {X} }_{11}=0,\ {\textsf {X} }_{13}-{\textsf {X} }_{22} = - 1,\ {\textsf {X} }\in \mathcal {CPP}(\mathbb {R}\times \mathbb {R}_+\times \mathbb {R}) = {\mathcal {S}}^3_+\big \}\) has the same optimal value, since \({\textsf {X} }_{11} = 0\) together with \({\textsf {X} }\in {\mathcal {S}}^3_+\) forces \({\textsf {X} }_{1i}=0, \ i = 2,3\). Now the Shor relaxation is in fact a known example from (Pataki 2018) for an SDP with finite positive duality gap.

In summary we have discussed two routes for achieving a convex reformulation of a QCQP, either via closing a relaxation gap or via closing a duality gap, and we have illustrated the following facts:

  • If a QCQP enjoys strong duality (e.g. if the joint numerical range is a convex cone), we also have an exact Shor reformulation.

  • If the Shor relaxation is exact and we can close its duality gap we also close the duality gap between the underlying QCQP and its dual.

  • If the Shor relaxation is not exact but we can find an exact tightening, the dual of that reformulation can be used as an alternative to the Lagrangian dual of the QCQP.

For some background on these facts see for example (Bomze 2015; Chieu et al. 2019) who discuss the case \({\mathcal {K}}= \mathbb {R}^n_+\). To the best of our knowledge, these facts have so far not been discussed for general convex \({\mathcal {K}}\), but the generalization is immediate, as shown above. Typically, in literature, one of these paths, closing the relaxation gap or closing the duality gap, is chosen with no regard for the implications of an obtained result for the constitution of the other route. We will now proceed with a demonstration of the former path which is a nice example in two regards. First of all, the derivation is simple and instructive as it is accessible via some geometric intuition. Second, the conditions under which we close the relaxation gap will not only give rise to new conditions for a generalized version of the S-Lemma to hold, but also allows some insight into the joint numerical range of the quadratic forms which fulfil these conditions.

4.1 Harnessing Pataki’s rank result

In this section we provide the proof of the exactness of the Shor relaxation under a geometric condition by iterating an application of Pataki’s rank theorem. We follow a proof strategy originally considered in Burer and Anstreicher (2013), where the claim of Theorem 3 was proved for the special case of \(m=2\).

Theorem 2

Consider a feasible set of a semidefinite optimization problem in block-standard form

$$\begin{aligned} {\mathcal {T}}\,{:}{=}\, \left\{ \left[ {\textsf {X} }_j \right] _{j\in \left[ 1:p\right] }\in \times _{j=1}^p {\mathcal {S}}^{n_j}_+ \,{:}\, \sum _{j=1}^{p}{\textsf {A} }_{ji}\bullet {\textsf {X} }_j = b_i, i \in [1\!:\!k]\right\} . \end{aligned}$$

Let \(\left[ {\textsf {X} }_1,\dots ,{\textsf {X} }_p\right] \) be an extreme point of \({\mathcal {T}}\) and let \(r_j\,{:}{=}\, {\mathrm {rank}}({\textsf {X} }_j)\). Then it holds that \(\sum _{j=1}^{p}r_j(r_j+1)\leqslant 2k\).

Proof

See (Pataki 1998, Theorem 2.2), which is more general than stated here. The required specialization is, however, immediate. \(\square \)

We consider a further specialisation, namely

Corollary 1

Let \({\textsf {X} }\) be an extreme point of \({\mathcal {T}}\,{:}{=}\, \{{\textsf {X} }\in {\mathcal {S}}^n_+\,{:}\, {\textsf {A} }\bullet {\textsf {X} }\leqslant a, {\textsf {B} }\bullet {\textsf {X} }\leqslant b\}\). Then \({\mathrm {rank}}({\textsf {X} })\leqslant 1.\)

Proof

After introducing slack variables, \({\mathcal {T}}\) is a projection of the set

$$\begin{aligned} {\overline{{\mathcal {T}}}}\,{:}{=}\,\{({\textsf {X} },s,t)\in {\mathcal {S}}^n_+\times \mathbb {R}^2_+ \,{:}\, {\textsf {A} }\bullet {\textsf {X} }+s = a\, , \, {\textsf {B} }_n\bullet {\textsf {X} }+t = b\} \end{aligned}$$

onto \({\mathcal {S}}^n\). From Theorem 2 it follows that

$$\begin{aligned} {\mathrm {rank}}({\textsf {X} })({\mathrm {rank}}({\textsf {X} })+1)+{\mathrm {rank}}(s)({\mathrm {rank}}(s)+1)+{\mathrm {rank}}(t)({\mathrm {rank}}(t)+1) \leqslant 4 \end{aligned}$$

and thus \({\mathrm {rank}}({\textsf {X} })\leqslant 1\). \(\square \)

The following geometric condition will allow us to leverage Corollary 1 to cases where more than two linear inequalities are present.

Condition 1

For a collection of matrices \({\textsf {A} }_i\in {\mathcal {S}}^n\) and real numbers \(b_i,\ i\in [1\!:\!m]\) we say that Condition 1 holds if for any \({\textsf {X} }\in {\mathcal {S}}^n_+ \) with \({\textsf {A} }_i~\bullet ~{\textsf {X} }~\leqslant ~b_i\) for all \(i\in [1\!:\!m]\),

$$\begin{aligned}&{\textsf {A} }_k\bullet {\textsf {X} }<b_k \quad \forall k\in [1\!:\!m]\setminus \{i,j\}\, \text{ whenever } {\textsf {A} }_i\bullet {\textsf {X} }=b_i \text{ and } {\textsf {A} }_j\bullet {\textsf {X} }= b_j \text{ for } i\ne j\, . \end{aligned}$$

The condition requires that for any feasible \({\textsf {X} }\in {\mathcal {S}}^n\) in above sense that at most two constraints can be binding at the same time. Note that, if \({\mathcal {F}}\,{:}{=}\, \big \{{\textsf {X} }\in {\mathcal {S}}^n_+ \,{:}\, {\textsf {A} }_i~\bullet ~{\textsf {X} }~\leqslant ~b_i, i\in [1\!:\!m]\big \}\) is bounded (as assumed in Theorem 3), one can check Condition 1 by solving \((m^3-3m^2+2m)/6\) semidefinite optimization problems of the form \(\sup _{{\textsf {X} }\in {\mathcal {F}}} \left\{ {\textsf {A} }_k\bullet {\textsf {X} }-b_k \,{:}\, {\textsf {A} }_i\bullet {\textsf {X} }= b_i, {\textsf {A} }_j\bullet {\textsf {X} }= b_j \right\} \). For Condition 1 to hold, all the optimal values must be strictly smaller than 0.

We are now able to prove the following result for the homogeneous problem:

Theorem 3

Suppose that Condition 1 holds for the matrices \({\textsf {A} }_i\in {\mathcal {S}}^n\) and real numbers \(b_i\in \mathbb {R},\ i\in [1\!:\!m]\). Further, suppose that the set \({\mathcal {F}}\,{:}{=}\,\left\{ {\textsf {X} }{\in } {\mathcal {S}}^n_+\,{:}\, {\textsf {A} }_i\bullet {\textsf {X} }{\leqslant } b_i,\ i\in [1\!:\!m] \right\} \) is bounded. Then

$$\begin{aligned} \inf _{\mathbf {x}\in \mathbb {R}^n} \left\{ \mathbf {x}^{{\textsf {T} }}{\textsf {Q} }\mathbf {x}\,{:}\, \mathbf {x}^{{\textsf {T} }}{\textsf {A} }_i\mathbf {x}\leqslant b_i, \ i\in [1\!:\!m] \right\} = \inf _{{\textsf {X} }\in {\mathcal {S}}^n_+}\left\{ {\textsf {Q} }\bullet {\textsf {X} }\,{:}\, {\textsf {A} }_i\bullet {\textsf {X} }\leqslant b_i, \ i\in [1\!:\!m] \right\} . \end{aligned}$$

Proof

Its clear that “\(\geqslant \)” has to hold as the SDP is a relaxation of the QCQP. Since the former has linear objective and \({\mathcal {F}}\) is bounded, its optimal value will be attained at extreme points of \({\mathcal {F}}\). Let \({\textsf {X} }^*\) be one such point. By Condition 1 at most two inequalities, say i and j, are binding at \({\textsf {X} }^*\). If fewer are binding, then one of these indices or both can be chosen arbitrarily. It follows that \({\textsf {X} }^*\) is also extremal in the set \({\mathcal {F}}_{i,j}\,{:}{=}\,\left\{ {\textsf {X} }\in {\mathcal {S}}^n_+ \,{:}\, {\textsf {A} }_i\bullet {\textsf {X} }\leqslant b_i,\ {\textsf {A} }_j\bullet {\textsf {X} }\leqslant b_j \right\} \). To see this, note that by the strict inequalities in Condition 1 there is a ball \({\mathcal {B}}_{\epsilon }({\textsf {X} }^*)\) centered at \({\textsf {X} }^*\) with radius \(\epsilon \geqslant 0\) such that \({\textsf {A} }_k\bullet {\textsf {X} }<b_i, \ k \in [1\!:\!m]\setminus \{i,j\}\) whenever \({\textsf {X} }\in {\mathcal {B}}_{\epsilon }({\textsf {X} }^*)\cap {\mathcal {F}}\subseteq {\mathcal {F}}_{i,j}\), so that \({\mathcal {B}}_{\epsilon }({\textsf {X} }^*)\cap {\mathcal {F}}= {\mathcal {B}}_{\epsilon }({\textsf {X} }^*)\cap {\mathcal {F}}_{i,j}\). If there was \({\textsf {X} }_1,{\textsf {X} }_2 \in {\mathcal {F}}_{i,j}\) such that \({\textsf {X} }\) were a convex combination of those, then such points had to exist in \({\mathcal {F}}\) since the line between \({\textsf {X} }_1\) and \({\textsf {X} }_2\) would penetrate \({\mathcal {B}}_{\epsilon }({\textsf {X} }^*)\cap {\mathcal {F}}\). By Corollary 1 and extremality of \({\textsf {X} }^*\) in \({\mathcal {F}}_{i,j}\) we see that \({\textsf {X} }^* = \mathbf {x}^*(\mathbf {x}^*)^{{\textsf {T} }}\) so that \(\mathbf {x}^*\) is feasible for the QCQP, and \((\mathbf {x}^*)^{{\textsf {T} }}{\textsf {Q} }(\mathbf {x}^*) = {\textsf {Q} }\bullet \mathbf {x}^*(\mathbf {x}^*)^{{\textsf {T} }}= {\textsf {Q} }\bullet {\textsf {X} }^*\). \(\square \)

Example 6

Consider the following quadratic program and its Shor relaxation

The feasible sets of these problems are depicted in Fig. 1. For the lifted feasible set one can see that no more than two inequalities can be binding at the same time within the semidefinite cone. Also, since all matrices at the boundary of \({\mathcal {S}}^2 _+\) are rank-one matrices, we see that the extreme points of the lifted feasible set have rank one, so that the conclusion of Corollary 1 is quite obvious in this case. By inspection one can clearly see that for \(q_{12} =0, \ q_{11} = q_{22} > 0\) one finds that the optimal set of the QCQP is at the innermost corners, where the third and forth inequality are binding. These are the points \((x_1,x_2) = (\pm 2/\sqrt{5},\pm 2/\sqrt{5})\). For the Shor relaxation we can also easily see, that the third and fourth inequalities would be binding in the optimum and so would be the semidefiniteness constraint so that \(x_{11}x_{22} = x_{12}^2\) for the optimal solution. From \( 4x_{11} + x_{22} = 4\) and \(x_{11} + 4x_{22} = 4\) we get \(x_{11} = x_{22} = 4/5\) so that \(x_{12} = \pm 4/5\). We have

$$\begin{aligned} \begin{pmatrix} \frac{4}{5} &{} \pm \frac{4}{5}\\ \pm \frac{4}{5} &{}\frac{4}{5} \end{pmatrix}= \begin{pmatrix} \pm \frac{2}{\sqrt{5}} \\ \pm \frac{2}{\sqrt{5}} \end{pmatrix} \begin{pmatrix} \pm \frac{2}{\sqrt{5}}\\ \pm \frac{2}{\sqrt{5}} \end{pmatrix}^{{\textsf {T} }}, \end{aligned}$$

as claimed in the theorem. Of course, there are more solutions to the Shor relaxation, but these are convex combinations of the two presented here. They can be seen in Fig. 1 as the line connecting the two lower corners in the lifted feasible set.

Fig. 1
figure 1

The feasible set \({\mathcal {F}}\) of the QCQP in Example 6 and the feasible set of its Shor relaxation (which by Theorem 3 coincides with \({\mathcal {G}}({\mathcal {F}})\)) side by side. The latter figure is based on the projection \({\textsf {M} }(x,2y,z) \mapsto (x,\sqrt{2}y,x)\), and we see the intersection between the cone of positive-semidefinite matrices and four half-spaces

The optimization problem in Theorem 3 does not involve linear terms, which raises the question whether the result has implications for that case as well. In fact, the theorem can be transferred to the case where the quadratic functions are non homogeneous by applying the following simple lemma:

Lemma 4

The extreme points of \({\mathcal {T}}\,{:}{=}\, \{{\textsf {X} }\in {\mathcal {S}}^n_+ \,{:}\, {\textsf {A} }_1\bullet {\textsf {X} }= b_1, \ {\textsf {A} }_i\bullet {\textsf {X} }\leqslant b_i, \ i\in [2\!:\!m] \} \) are those extreme points of \({\overline{{\mathcal {T}}}}\,{:}{=}\, \{{\textsf {X} }\in {\mathcal {S}}^n_+ \,{:}\, {\textsf {A} }_i\bullet {\textsf {X} }\leqslant b_i, \ i\in [1\!:\!m] \}\) for which \({\textsf {A} }_1\bullet {\textsf {X} }= b_1\) holds.

Proof

Assume \({\textsf {X} }\in {\mathrm {ext}}({\mathcal {T}})\), the set of extreme points of \({\mathcal {T}}\), and assume further that there are \({\textsf {X} }_1,{\textsf {X} }_2\in {\overline{{\mathcal {T}}}}\) such that \({\textsf {X} }= \lambda {\textsf {X} }_1+(1-\lambda ){\textsf {X} }_2\) for some \(\lambda \in \left( 0,1\right) \). We have \({\textsf {A} }_1\bullet {\textsf {X} }_j \leqslant \mathbf {b}_1\) for \(j=1,2\). If \({\textsf {A} }_1\bullet {\textsf {X} }_j < \mathbf {b}_1\) for at least one \(j\in \{1,2\}\), then \({\textsf {A} }_1\bullet {\textsf {X} }= \lambda {\textsf {A} }_1\bullet {\textsf {X} }_1+(1-\lambda ){\textsf {A} }_1\bullet {\textsf {X} }_2<b_1\), so actually \({\textsf {A} }_1\bullet {\textsf {X} }_j = \mathbf {b}_1\) for \(j=1,2\), implying that \({\textsf {X} }_1,{\textsf {X} }_2\in {\mathcal {T}}\) so that \({\textsf {X} }\notin {\mathrm {ext}}({\mathcal {T}})\), a contradiction. Now, assume \({\textsf {X} }\in {\mathrm {ext}}\left( {\overline{{\mathcal {T}}}}\right) \) and that \({\textsf {A} }_1\bullet {\textsf {X} }=b_1\), so clearly \({\textsf {X} }\in {\mathcal {T}}\). Again, if there was a pair \({\textsf {X} }_1,{\textsf {X} }_2\in {\mathcal {T}}\) such that \({\textsf {X} }= \lambda {\textsf {X} }_1+(1-\lambda ){\textsf {X} }_2\) for some \(\lambda \in \left( 0,1\right) \) there would be such a pair in \({\overline{{\mathcal {T}}}}\supseteq {\mathcal {T}}\), so that we arrive at an analogous contradiction. \(\square \)

Theorem 5

Assume \({\textsf {A} }_i={\textsf {M} }({\textsf {W} }_i,\mathbf {w}_i, \omega _i)\) and \(b_i = 0\), \(i \in [1\!:\!m]\), together with \({\textsf {A} }_{m+1}\,{:}{=}\, {\textsf {M} }({\textsf {O} },\mathbf {o},1)\) and \(b_{m+1} = 1\), fulfill Condition 1 and that the set \({\mathcal {F}}_{Shor}~ \,{:}{=}\,~\left\{ {\textsf {M} }({\textsf {X} },2\mathbf {x},1)\in {\mathcal {S}}^{n+1}_+ \,{:}\, {\textsf {W} }_i\bullet {\textsf {X} }+ \mathbf {w}_i^{{\textsf {T} }}\mathbf {x}+ \omega _i \leqslant 0,\ i \in [1\!:\!m] \right\} \) is bounded. Then

$$\begin{aligned}&\inf _{\mathbf {x}\in \mathbb {R}^n}\left\{ \mathbf {x}^{{\textsf {T} }}{\textsf {Q} }\mathbf {x}+{\mathbf {q}}^{{\textsf {T} }}\mathbf {x}\,{:}\, \mathbf {x}^{{\textsf {T} }}{\textsf {W} }_i\mathbf {x}+\mathbf {w}_i^{{\textsf {T} }}\mathbf {x}+\omega _i \leqslant 0 ,\ i \in [1\!:\!m] \right\} \\&\quad =\inf _{{\textsf {X} },\mathbf {x}}\left\{ {\textsf {Q} }\bullet {\textsf {X} }+ {\mathbf {q}}^{{\textsf {T} }}\mathbf {x}\,{:}\, {\textsf {W} }_i\bullet {\textsf {X} }+ \mathbf {w}_i^{{\textsf {T} }}\mathbf {x}+ \omega _i \leqslant 0, \ i \in [1\!:\!m],\ {\textsf {M} }({\textsf {X} },2\mathbf {x},1)\in {\mathcal {S}}^{n+1}_+ \right\} . \end{aligned}$$

Proof

After reformulating the original problem as

we see that its Shor relaxation is given by

where we again make use of the fact that \(\mathcal {CPP}(\mathbb {R}^n\times \mathbb {R}_+) = {\mathcal {S}}^{n+1}_+\). Note that after resolving the inequality we see that the feasible set of the latter problem is exactly \({\mathcal {F}}_{shor}\). Proving that all extreme points are rank one is done in a manner analogous to Theorem 3, where the fact that we have one equality instead of an inequality does not interfere due to Lemma 4. In the representation of the reformulation we also combined the constraints \({\textsf {A} }_{m+1}\bullet {\textsf {M} }({\textsf {X} },2\mathbf {x},x_0) = 1\) and \({\textsf {M} }({\textsf {X} },2\mathbf {x},x_0) \in {\mathcal {S}}^{n+1}_+\) into \({\textsf {M} }({\textsf {X} },2\mathbf {x},1)\in {\mathcal {S}}^{n+1}_+\) and also resolved the equality. \(\square \)

Theorem 5 is related to the main result in Yang et al. (2016) where the authors proved the following result.

Theorem 6

Assume \({\mathcal {F}}\subseteq \mathbb {R}^n\) is the feasible set of a QCQP and assume that the for the set \(\bar{{\mathcal {F}}} \,{:}{=}\, \left\{ \mathbf {x}\in {\mathcal {F}}\,{:}\, \mathbf {x}^{{\textsf {T} }}{\textsf {W} }_i\mathbf {x}+\mathbf {w}_i^{{\textsf {T} }}\mathbf {x}+\omega _i \leqslant 0,\ i\in [1\!:\!m] \right\} \) the additional inequalities introduce non-intersecting hollows in \({\mathcal {F}}\). Then \({\mathcal {G}}(\bar{{\mathcal {F}}}) = \big \{ ({\textsf {X} },\mathbf {x})\in {\mathcal {G}}({\mathcal {F}}) \,{:}\, {\textsf {W} }_i\bullet {\textsf {X} }+ \mathbf {w}_i^{{\textsf {T} }}\mathbf {x}+ \omega _i \leqslant 0, \ i \in [1\!:\!m]\big \}\).

The non-intersection condition is similar to the condition of Theorem 5. To see this, note that in the Shor relaxation in Theorem 5 the equality associated with \({\textsf {A} }_{m+1}\) consumes one of two inequalities which can be binding at any time. Thus, the remaining inequalities cannot have intersections within the semidefinite cone, or else Condition 1 fails. Also note, that under the Condition 1 the set \({\mathcal {F}}_{Shor}\) is bounded if and only if it is bounded by merely one of the m inequalities in its description. From this we see that we can interpret Theorem 5 as a special case of Theorem 6 where \({\mathcal {F}}\) is given by a single quadratic equality and the non-intersection condition is with respect to the lifted space, i.e. a strictly stronger condition than in Theorem 6. It is an open question whether there is a direct way to deduce Theorem 6 from Theorem 3. However, Theorem 3 cannot be deduced from Theorem 6, and only the former will allow us to derive a generalization of the S-Lemma under geometric condition that is a derivative of Condition 1, as well as allow for some insight in the joint numerical range of the quadratic forms that jointly fulfil said condition.

4.2 Generalized S-lemma and geometry of the joint numerical range

We are now ready to use Theorem 3 in order to establish a generalization of the S-Lemma in the form of a characterization of set-copositivity over a cone given by homogeneous quadratic inequalities. The key to this result will be the fact that a strictly feasible point of a quadratic program guarantees a Slater point in its Shor relaxation, as proved in (Tunçel 2001, Theorem 5.1).

Theorem 7

Let \({\mathcal {K}}\,{:}{=}\, \{\mathbf {x}\in \mathbb {R}^n\,{:}\, \mathbf {x}^{{\textsf {T} }}{\textsf {A} }_i\mathbf {x}\leqslant 0,\ i\in [1\!:\!m]\}\) with \({\textsf {A} }_i \in {\mathcal {S}}^n\). Assume that there is some \(\mathbf {x}_0\) with \(\mathbf {x}_0^{{\textsf {T} }}{\textsf {A} }_i\mathbf {x}_0 < 0\) for all \(i\in [1\!:\!m]\). Further, suppose that for all \(i\in [1\!:\!m]\)

$$\begin{aligned} {\textsf {X} }\in {\mathcal {S}}^n_+ \setminus \{ {\textsf {O} }\}\quad \text{ and }\quad {\textsf {A} }_i\bullet {\textsf {X} }=0 \implies {\textsf {A} }_j\bullet {\textsf {X} }<0 \quad \forall j\in [1\!:\!m]\setminus \{ i\}\,. \end{aligned}$$
(13)

Then

$$\begin{aligned} \mathcal {COP}({\mathcal {K}}) = \left\{ {\textsf {Q} }\,{:}\, {\textsf {Q} }+\sum _{i=1}^{m}\lambda _i {\textsf {A} }_i \in {\mathcal {S}}^n_+ \text{ for } \text{ some } \varvec{\lambda }\in \mathbb {R}^m_+\right\} \, . \end{aligned}$$

Proof

Observe that \({\textsf {Q} }{\in } \mathcal {COP}({\mathcal {K}})\) if and only if \(q^* \,{:}{=}\, \inf _{\mathbf {x}}\left\{ \mathbf {x}^{{\textsf {T} }}{\textsf {Q} }\mathbf {x}\,{:}\, \mathbf {x}^{{\textsf {T} }}{\textsf {E} }_n\mathbf {x}{\leqslant } 1,\ \mathbf {x}{\in } {\mathcal {K}} \right\} {\geqslant }\, 0\), where the latter QCQP has a strictly feasible point \(\mathbf {x}_0/(2\Vert \mathbf {x}_0\Vert )\). The matrices \({\textsf {A} }_i, \ i\in [1\!:\!m]\) fulfil (13) and thus also fulfil Condition 1 together with the pair \({\textsf {E} }_n\) and 1. Thus, by Theorem 3 we have \(q^* = \inf _{{\textsf {X} }\in {\mathcal {S}}^{n}_+}\left\{ {\textsf {Q} }\bullet {\textsf {X} }\,{:}\, {\textsf {E} }_n\bullet {\textsf {X} }\leqslant 1, \ {\textsf {A} }_i\bullet {\textsf {X} }\leqslant 0,\ i \in [1\!:\!m] \right\} \), and the latter set is bounded (due to the constraint \({\textsf {E} }_n\bullet {\textsf {X} }\leqslant 1 \)) and has a Slater point by (Tunçel 2001 Theorem 5.1) since the QCQP has a strictly feasible point. Thus by strong duality \(q^* = \sup _{\varvec{\lambda }\in \mathbb {R}^{m}_+}\left\{ -\mu \,{:}\, {\textsf {Q} }+\sum _{i=1}^{m}\lambda _i {\textsf {A} }_i +\mu {\textsf {E} }_n \in {\mathcal {S}}^n_+ \right\} \) which is nonnegative if and only if \({\textsf {Q} }+\sum _{i=1}^{m}\lambda _i {\textsf {A} }_i \in {\mathcal {S}}^n_+ \text{ for } \text{ some } \varvec{\lambda }\in \mathbb {R}^m_+\). \(\square \)

Note that the above theorem is stated in terms of a characterization of copositivity over a cone described by homogeneous quadratic functions. Another way of stating the result would be a theorem of the alternative, which usually is done when discussing the S-procedure, as discussed in the introduction. Such theorems are often derived from results on the joint numerical range of the quadratic forms involved. Specifically, if the joint numerical range can be shown to be a convex cone, an S-Lemma type result can be derived in a straightforward manner (see e.g. Pólik and Terlaky 2007). An important example of such a result was proved by Polyak in Polyak (1998) where he showed that the S-procedure is exact for the case \(m=2\), \(n\geqslant 3\) (under a condition discussed shortly) by invoking a convexity result regarding the joint numerical range of three quadratic functions, also provided in Polyak (1998), namely:

Theorem 8

Let \({\textsf {A} }_i \in {\mathcal {S}}^n\), \(i= 0,1,2\). For \(n\geqslant 3\) the following assertions are equivalent.

  1. 1.

    There exists \(\varvec{\mu }\in \mathbb {R}^3\) such that

    $$\begin{aligned} \mu _0{\textsf {A} }_0+\mu _1{\textsf {A} }_1+\mu _2{\textsf {A} }_2 \in {\mathcal {S}}^n_{++}. \end{aligned}$$
  2. 2.

    The joint numerical range

    $$\begin{aligned} \mathcal {J} \,{:}{=}\, \left\{ \left( \mathbf {x}^{{\textsf {T} }}{\textsf {A} }_0\mathbf {x},\mathbf {x}^{{\textsf {T} }}{\textsf {A} }_1\mathbf {x},\mathbf {x}^{{\textsf {T} }}{\textsf {A} }_2\mathbf {x}\right) \,{:}\, \mathbf {x}\in \mathbb {R}^n \right\} \end{aligned}$$

    is a convex and pointed cone.

The theorem guarantees that whenever 1. holds and \(\mathcal {J}\) is disjoint from the interior of the negative orthant, it can be separated from the latter set via a hyperplane. Thus, given the first condition in Theorem 8 and Slater’s condition hold, one can prove exactness of the S-procedure, that is, one can prove the following theorem as shown in Polyak (1998):

Theorem 9

Let \(n\geqslant 3\) and \({\textsf {A} }_i\in {\mathcal {S}}^n,\ i = 0,1,2\). Assume that there exists \(\varvec{\mu }\in \mathbb {R}^3\) such that \(\mu _0{\textsf {A} }_0+\mu _1{\textsf {A} }_1+\mu _2{\textsf {A} }_2 \in {\mathcal {S}}^n_{++}\) and that there is an \(\mathbf {x}_0\in \mathbb {R}^n\) such that \(\mathbf {x}_0^{{\textsf {T} }}{\textsf {A} }_i\mathbf {x}_0<0, \ i=1,2\), then the following two statements are equivalent

$$\begin{aligned}&{ i)}\,\hbox {the following system of inequalities has no solution}\, \mathbf {x}\in \mathbb {R}^n \,{:}\, \\&\quad \mathbf {x}^{{\textsf {T} }}{\textsf {A} }_0\mathbf {x}< 0 \quad \text{ and }\quad \mathbf {x}^{{\textsf {T} }}{\textsf {A} }_i\mathbf {x}\leqslant 0, \ i=1,2;\\&{ ii)}\,\hbox {the following conic inequality has a solution }\,\varvec{\lambda }\in \mathbb {R}^2_+ \,{:}\, \\&\quad {\textsf {A} }_0 + \lambda _1 {\textsf {A} }_1+ \lambda _2 {\textsf {A} }_2 \in {\mathcal {S}}^n_+ \, . \end{aligned}$$

Obviously, the condition \(\exists \varvec{\mu }\in \mathbb {R}^3\,{:}\, \mu _0{\textsf {A} }_0+\mu _1{\textsf {A} }_2+\mu _3{\textsf {A} }_3 \in {\mathcal {S}}^n_{++}\) is readily fulfilled if for at least one pair of \(i\ne j\) there exists \(\varvec{\mu }\in \mathbb {R}^2\) with \( \mu _1 {\textsf {A} }_i+\mu _2 {\textsf {A} }_j \in {\mathcal {S}}^n_{++}\) and we will refer to the later condition as Condition 1’. Thus, if we fix \({\textsf {A} }_1\) and \({\textsf {A} }_2\) such that Condition 1’ is fulfilled, we can characterize copositivity over the cone \({\mathcal {K}}\,{:}{=}\,\{\mathbf {x}\in \mathbb {R}^n \,{:}\, \mathbf {x}^{{\textsf {T} }}{\textsf {A} }_1\mathbf {x}\leqslant 0,\mathbf {x}^{{\textsf {T} }}{\textsf {A} }_2\mathbf {x}\leqslant 0\} \) for \(n\geqslant 3\) as

$$\begin{aligned} \mathcal {COP}({\mathcal {K}}) = \left\{ {\textsf {Q} }\,{:}\, {\textsf {Q} }+ \lambda _1{\textsf {A} }_1 + \lambda _2{\textsf {A} }_2 \in {\mathcal {S}}^n_+ \text{ for } \text{ some } \varvec{\lambda }\in \mathbb {R}^2 _+ \right\} \, . \end{aligned}$$

The copositivity condition derived in Theorem 7 is, however, not obtained courtesy of any argument involving the joint numerical range, but rather as a consequence of the exactness of the Shor relaxation and the existence of a Slater point in conjunction with strong conic duality under the assumptions of the theorem. An immediate question is therefore, whether we can learn anything about the joint numerical range of the quadratic forms that fulfil said assumptions. This question is interesting in its own right since the joint numerical range is in general a very ill-behaved object and geometrical results are typically hard to obtain. We will gain at least some insight by proving that Condition 1’ is implied by (13) and under additional assumptions both coincide.

Lemma 10

Let \({\mathcal {K}}_p \,{:}{=}\, \left\{ ({\textsf {A} }_1\bullet {\textsf {X} }, {\textsf {A} }_2 \bullet {\textsf {X} }) \,{:}\, {\textsf {X} }\in {\mathcal {S}}^n_+ \right\} \) and \({\mathcal {K}}_d \,{:}{=}\, \big \lbrace \mathbf {y}\in \mathbb {R}^2 \,{:}\, y_1{\textsf {A} }_1+y_2 {\textsf {A} }_2 \in {\mathcal {S}}^n_+ \big \rbrace \). Then \({\mathcal {K}}_d\) is a closed convex cone, \({\mathcal {K}}_p\) is a convex cone, and \({\mathcal {K}}_p^* = {\mathcal {K}}_d\) so that \({\mathcal {K}}_d^* = {\mathcal {K}}_p^{**} \supseteq {\mathcal {K}}_p\).

Proof

The claims are immediate from the fact that \({\mathcal {S}}_+^n\) is a self-dual cone (therefore closed and convex). \(\square \)

Proposition 11

For matrices \({\textsf {A} }_1,{\textsf {A} }_2 \in {\mathcal {S}}^n\), assume that \(\exists \mathbf {x}_0 \in \mathbb {R}^n\) such that \( \mathbf {x}_0^{{\textsf {T} }}{\textsf {A} }_i\mathbf {x}_0 < 0\) for \(i\in \{ 1,2\}\). Consider the following two statements:

  1. 1.

    There exists \(\varvec{\mu }\in \mathbb {R}^2\) such that

    $$\begin{aligned} \mu _1{\textsf {A} }_1+\mu _2{\textsf {A} }_2 \in {\mathcal {S}}^n_{++}. \end{aligned}$$
  2. 2.

    For any matrix \( {\textsf {X} }\in {\mathcal {S}}^n_+\setminus \{{\textsf {O} }\}\) and \(i,j \in \{1,2\}\) with \(i\ne j\) we have

    $$\begin{aligned} {\textsf {A} }_i\bullet {\textsf {X} }=0 \implies {\textsf {A} }_j\bullet {\textsf {X} }<0, \end{aligned}$$

Then, \(2. \implies 1.\) and if we in addition assume that in the description of the set \({\mathcal {S}}\,{:}{=}\, \{{\textsf {X} }\in {\mathcal {S}}^n_+ : {\textsf {A} }_i\bullet {\textsf {X} }\leqslant 0\,,\, i = 1,2\}\) none of the linear inequalities is redundant, then also \(1. \implies 2.\)

Proof

For the first claim, assume that property 1. does not hold. We will show that condition 2. then has to fail. Under this hypothesis the linear subspace \({\mathcal {M}}\,{:}{=}\, \{\mu _1{\textsf {A} }_1+\mu _2{\textsf {A} }_2 \,{:}\, \varvec{\mu }\in \mathbb {R}^2\}\) is disjoint from \({\mathcal {S}}_{++}^n\). Thus, by (Rockafellar 2015, Theorem 11.2) we have a hyperplane containing \({\mathcal {M}}\), and hence \({\mathcal {S}}_{++}^n\) lies entirely in one of its associated open halfspaces. Since \({\mathcal {M}}\) contains the origin, so does said hyperplane; it actually supports \({\mathcal {S}}_{+}^n\). Thus, one of its normal vectors, say \({\textsf {X} }\), is in \({\mathcal {S}}_+^n\) by self-duality of the latter matrix-cone. But then for this \({\textsf {X} }\in {\mathcal {S}}^n_+\) we have \({\textsf {A} }_1\bullet {\textsf {X} }= {\textsf {A} }_2 \bullet {\textsf {X} }= 0\), i.e. property 2. fails to hold. For the other direction, assume that property 1. and the additional assumption about \({\mathcal {S}}\) hold. Consider the cones \({\mathcal {K}}_p \,{:}{=}\,\{({\textsf {A} }_1\bullet {\textsf {X} },{\textsf {A} }_2 \bullet {\textsf {X} }) \,{:}\, {\textsf {X} }\in {\mathcal {S}}^n_+\}\) and \({\mathcal {K}}_d\,{:}{=}\, \{\mathbf {y}\in \mathbb {R}^2 \,{:}\, y_1{\textsf {A} }_1+y_2{\textsf {A} }_2 \in {\mathcal {S}}^n_+ \}\). By Lemma 10 both are convex, \({\mathcal {K}}_d\) is closed and \({\mathcal {K}}_p^* = {\mathcal {K}}_d\). Since the cone of positive definite matrices has nonempty interior, property 1. is sufficient to guarantee that \({\mathcal {K}}_d\) has nonempty interior. Note that \({\mathcal {K}}_p\) has to meet the interior of the negative orthant, because of \(\mathbf {x}_0^{{\textsf {T} }}{\textsf {A} }_i\mathbf {x}_0<0,\ i=1,2\). Also, \({\mathcal {K}}_p\) has to meet the interior of second orthant, since otherwise there is no \({\textsf {X} }\in {\mathcal {S}}^n_+ \) such that \( {\textsf {A} }_1\bullet {\textsf {X} }< 0, {\textsf {A} }_2\bullet {\textsf {X} }> 0\) or in other words \(\{{\textsf {X} }\in {\mathcal {S}}^n_+ \,{:}\, {\textsf {A} }_1\bullet {\textsf {X} }< 0 \} = \{{\textsf {X} }\in {\mathcal {S}}^n_+ \,{:}\, {\textsf {A} }_1\bullet {\textsf {X} }< 0, {\textsf {A} }_2\bullet {\textsf {X} }\leqslant 0 \}\). The closures of the two sets would coincide, and by (Rockafellar 2015, Theorem 7.6) we had \(\{{\textsf {X} }\in {\mathcal {S}}^n_+ \,{:}\, {\textsf {A} }_1\bullet {\textsf {X} }\leqslant 0 \} = \{{\textsf {X} }\in {\mathcal {S}}^n_+ \,{:}\, {\textsf {A} }_1\bullet {\textsf {X} }\leqslant 0, {\textsf {A} }_2\bullet {\textsf {X} }\leqslant 0\}\) in contrast to the assumption. By an analogous argument we have nonempty intersection of \({\mathcal {K}}_p\) and the interior of the fourth orthant. Now assume that condition 2. fails, i.e. there is an \( {\textsf {X} }\in {\mathcal {S}}^n_{+} \) such that \( {\textsf {A} }_i\bullet {\textsf {X} }= 0\) and \({\textsf {A} }_j \bullet {\textsf {X} }\geqslant 0\) for \(i\ne j\). If in fact \({\textsf {A} }_j\bullet {\textsf {X} }= 0\) then \({\textsf {X} }\) is a normal vector to a hyperplane that contains the linear combinations of \({\textsf {A} }_1,{\textsf {A} }_2\) and which supports \({\mathcal {S}}^n_+\) by self-duality. This clearly contradicts 1. as such a hyperplane is disjoint from \({\mathcal {S}}_{++}^n\). Thus assume \({\textsf {A} }_j\bullet {\textsf {X} }> 0\). Then \(({\textsf {A} }_1\bullet {\textsf {X} },{\textsf {A} }_2\bullet {\textsf {X} })\) together with three more points from the intersection of \({\mathcal {K}}_p\) and the interior of the second, third and fourth orthant respectively, give four vectors whose positive linear combinations span \(\mathbb {R}^2\). This span is contained in \({\mathcal {K}}_p\) so \({\mathcal {K}}_d = \{0\} = (\mathbb {R}^2)^*\), which also contradicts property 1. \(\square \)

By this proposition, we could replace (13) with Condition 1’ in Theorem 7 provided there are no redundant constraints. This gives some insight into the geometry of the joint numerical range \(\mathcal {J}({\textsf {Q} },{\textsf {A} }_1,\dots ,{\textsf {A} }_m)\) of \(m+1\) matrices that satisfy the assumption of the Theorem 7. If \(n\geqslant 3\), every projection of \(\mathcal {J}({\textsf {Q} },{\textsf {A} }_1,\dots ,{\textsf {A} }_m)\) onto three coordinates is in fact a convex, pointed cone by Theorem 8. We do not know whether these geometric properties of \(\mathcal {J}({\textsf {Q} },{\textsf {A} }_1,\dots ,{\textsf {A} }_m)\) are enough to ensure linear separability. However, Theorem 7 is not based on a convexity result concerning \({\mathcal {J}}\), but is a consequence of Pataki’s theorem.

5 An alternative perspective via set-copositivity

In the following section we want to discuss the fact that the semi-infinite constraints, which arise in robust and adjustable robust optimization, can be interpreted as set-copositivity constraints. Careful analysis of these constraints will reveal that they can be decomposed into linear and copositivity constraints of simpler structure. This decomposition will allow for some interesting insights in the following sense. We have seen so far, that we can use quadratic optimization tools in conjunction with conic duality in order to reformulate semi-infinite constraints that quadratically depend on an index. However, if we can merely close the duality gap without at the same time guaranteeing for dual attainability, the feasible set described by the reformulation shrinks. The decomposition will reveal more details on the nature of this shrinkage. We will discuss this phenomenon in the context of using results in Burer (2009), Eichfelder and Povh (2013) for the reformulation of semi-infinite constraints. Let us reconsider the semi-infinite constraints we have encountered in (5):

$$\begin{aligned} \mathbf {u}^{{\textsf {T} }}{\textsf {Q} }\mathbf {u}+ {\mathbf {q}}^{{\textsf {T} }}\mathbf {u}+ \omega \geqslant 0 \quad \text{ for } \text{ all } \mathbf {u}\in {\mathcal {U}}\subseteq \mathbb {R}^n. \end{aligned}$$
(14)

The quadratic inequality can be homogenized in order to obtain a more compact formulation, i.e.,

$$\begin{aligned} \begin{pmatrix} \mathbf {u}\\ t \end{pmatrix}^{{\textsf {T} }}\begin{pmatrix} {\textsf {Q} }&{} \tfrac{1}{2} {\mathbf {q}}\\ \tfrac{1}{2} {\mathbf {q}}^{{\textsf {T} }}&{} \omega \end{pmatrix} \begin{pmatrix} \mathbf {u}\\ t \end{pmatrix} \geqslant 0 \quad \text{ for } \text{ all } \,(\mathbf {u},t)^{{\textsf {T} }}\in {\mathcal {U}}\times \{1\} \subseteq \mathbb {R}^{n+1} \, . \end{aligned}$$

By the definition of set-copositivity the above constraint is equivalent to

$$\begin{aligned} {\textsf {M} }({\textsf {Q} },{\mathbf {q}},\omega )\in \mathcal {COP}({\mathrm {cone}}[{\mathcal {U}}\times \{1\}])\subseteq {\mathcal {S}}^{n+1}. \end{aligned}$$
(15)

We thus see that the copositivity constraint (15) can also be enforced by (6). Further, from \({\mathrm {cone}}\left[ \left( \bigcup _i{\mathcal {U}}_i\right) \times \{1\}\right] = \bigcup _i{\mathrm {cone}}\left[ {\mathcal {U}}_i\times \{1\}\right] \) together with \(\mathcal {COP}(\bigcup _i{\mathcal {U}}_i) = \bigcap _i\mathcal {COP}({\mathcal {U}}_i)\) we see that

$$\begin{aligned} \mathcal {COP}\left( {\mathrm {cone}}\left[ \left( \cup _i{\mathcal {U}}_i\right) \times \{1\}\right] \right) = \cap _i\mathcal {COP}\left( {\mathrm {cone}}\left[ {\mathcal {U}}_i\times \{1\}\right] \right) . \end{aligned}$$
(16)

Thus, as a side note, we can handle unions of sets \({\mathcal {U}}= \cup _i{\mathcal {U}}_i\) in (5) whenever we can characterize \(\mathcal {COP}\left( {\mathrm {cone}}\left[ {\mathcal {U}}_i\times \{1\}\right] \right) \) for each individual \({\mathcal {U}}_i\). However, the complicated structure of the set over which we we require the quadratic form to be non-negative seems to be prohibitive. Nevertheless, this issue can be resolved quite easily. Assume that \({\mathcal {U}}\) is the compact intersection of a (not necessarily convex) cone \({\mathcal {K}}\) with a hyperplane:

$$\begin{aligned} {\mathcal {U}}\,{:}{=}\, \{\mathbf {u}\in {\mathcal {K}}\,{:}\, \mathbf {b}^{{\textsf {T} }}\mathbf {u}= c \}\, . \end{aligned}$$

We will prove a result which states that the seemingly complicated copositivity constraint in (15) can be replaced by a linear constraint and a copositivity constraint over \({\mathcal {K}}\). The key idea is that \({\mathrm {cone}}[({\mathcal {K}}\cap {\mathcal {H}})\times \{\mu \}]\) is actually \({\mathcal {K}}\times \{0\}\) under a linear transformation. To this end, we need some auxiliary results.

Lemma 12

Let \({\mathcal {M}}\,{:}{=}\, \{\mathbf {u}\in \mathbb {R}^n \,{:}\, {\textsf {A} }\mathbf {u}= \mathbf {b}\} \) be an affine subspace where \({\textsf {A} }\in \mathbb {R}^{m\times n}\) and \(\mathbf {b}\in \mathbb {R}^m\setminus \{\mathbf {o}\}\). Let \({\mathcal {K}}\subset \mathbb {R}^n\) be a closed cone with pointed convex hull \({\overline{{\mathcal {K}}}}\,{:}{=}\, {\mathrm {conv}}({\mathcal {K}})\) such that \({\overline{{\mathcal {K}}}}\cap {\mathcal {M}}\) is compact. Then the following two statements hold.

  1. a)

    There are \(\varvec{\alpha }\in {\mathrm {int}}({\mathcal {K}}^*)\) and \( b_0 >0\) such that \({\mathcal {K}}\cap {\mathcal {M}}= \{\mathbf {u}\in {\mathcal {K}}\,{:}\, {\textsf {A} }\mathbf {u}= \mathbf {b},\ \varvec{\alpha }^{{\textsf {T} }}\mathbf {u}= b_0 \}.\)

  2. b)

    If \(m=1\), then \({\mathrm {cone}}({\mathcal {M}}\cap {\mathcal {K}}) = {\mathcal {K}}\).

Proof

Let \({\mathcal {M}}_0\,{:}{=}\, \{\mathbf {u}\in \mathbb {R}^n \,{:}\, {\textsf {A} }\mathbf {u}= \mathbf {o}\} = {\mathrm {ker}}~{\textsf {A} }\) denote the recession cone of \({\mathcal {M}}\). Then, by compactness, we have \({\mathcal {M}}_0 \cap {\overline{{\mathcal {K}}}}= \{\mathbf {o}\}\). We look for a supporting hyperplane of \({\overline{{\mathcal {K}}}}\) which meets this cone only at \(\mathbf {o}\) (i.e., whose normal vector is interior to \({\overline{{\mathcal {K}}}}^*\)) and which contains \({\mathcal {M}}_0\) (i.e., it is also a supporting hyperplane of \({\mathcal {M}}_0\)). Such a hyperplane exists exactly if \({\mathrm {int}}({\overline{{\mathcal {K}}}}^*) \cap {\mathcal {M}}_0^* \ne \varnothing \), which we will prove to be the case. Assume that \({\mathrm {int}}({\overline{{\mathcal {K}}}}^*) \cap {\mathcal {M}}_0^* = \varnothing \), then by (Rockafellar 2015, Theorem 11.2) there is a hyperplane \({\mathcal {H}}\) which contains \({\mathcal {M}}_0^*= \{ {\textsf {A} }^{{\textsf {T} }}\varvec{\lambda }: \varvec{\lambda }\in \mathbb {R}^m\}\) while \({\mathrm {int}}({\overline{{\mathcal {K}}}}^*) \) is contained in one of the open halfspaces associated with \({\mathcal {H}}\). We have \(\mathbf {o}\in {\mathcal {M}}^*_0 \subset {\mathcal {H}}\) so that \({\mathcal {H}}\) is a supporting hyperplane of \({\overline{{\mathcal {K}}}}^*\). This implies that one of the normal vectors of \({\mathcal {H}}\), say \({\overline{\mathbf {a}}}\), is an element of \({\overline{{\mathcal {K}}}}\). Note that \({\mathcal {M}}_0 = ({\mathcal {M}}_0)^{**} = ({\mathcal {M}}_0^*)^{\perp }\) (recall that \({\mathcal {M}}_0\) is a linear subspace) and hence \({\overline{\mathbf {a}}} \in {\mathcal {M}}_0\). Thus, \({\mathcal {M}}\) and \({\overline{{\mathcal {K}}}}\) have a common direction of recession, which contradicts the compactness of \({\overline{{\mathcal {K}}}}\cap {\mathcal {M}}\). Since \({\mathcal {K}}^* = {\overline{{\mathcal {K}}}}^*\), indeed \(\varvec{\alpha }= {\textsf {A} }^{{\textsf {T} }}\varvec{\lambda }\in {\mathrm {int}}({\mathcal {K}}^*)\) for some appropriate \(\varvec{\lambda }\in \mathbb {R}^m\) and \(\varvec{\alpha }^{{\textsf {T} }}\mathbf {u}= \varvec{\lambda }^{{\textsf {T} }}{\textsf {A} }\mathbf {u}= \varvec{\lambda }^{{\textsf {T} }}\mathbf {b}\) is redundant for \(\mathbf {u}\in {\mathcal {M}}\). Assume w.l.o.g. that \(b_1 > 0\). Clearly if \(m=1\) we have \(\mathbf {a}_1 \in {\mathrm {int}}({\mathcal {K}}^*)\) so that for all \(\mathbf {u}\in {\mathcal {K}}\) we have \(\mathbf {a}_1^{{\textsf {T} }}\mathbf {u}> 0\) and thus there is an \(\lambda \geqslant 0\) such that \(\mathbf {a}_1^{{\textsf {T} }}(\lambda \mathbf {u}) = b_1\). This proves \({\mathrm {cone}}({\mathcal {M}}\cap {\mathcal {K}}) \supseteq {\mathcal {K}}\) since \(\mathbf {o}\) is in both cones. The converse is trivial. \(\square \)

Remark

The following procedure can be followed for explicitly determining \(\varvec{\alpha }\) and \(b_0\), given \({\textsf {A} }\in \mathbb {R}^{m\times n},\ \mathbf {b}\in \mathbb {R}^m\setminus \{\mathbf {o}\}\) and \({\mathcal {K}}\subset \mathbb {R}^n\) as in the existence result in Lemma 12. We first calculate \((\varvec{\lambda }^*,\varepsilon ^*) = \arg \max _{(\varvec{\lambda },\varepsilon )\in \mathbb {R}^{m+1}}\left\{ \varepsilon \,{:}\, ({\textsf {A} }^{{\textsf {T} }}\varvec{\lambda })^{{\textsf {T} }}\mathbf {v}\geqslant \varepsilon \ \forall \mathbf {v}\in {\mathcal {K}} \text{ with } \mathbf {a}^{{\textsf {T} }}\mathbf {v}= 1 \right\} \) where \(\mathbf {a}\) is a known element of \({\mathrm {int}}({\mathcal {K}}^*)\) (e.g. \(\mathbf {a}=\mathbf {e}\) if \({\mathcal {K}}\) is the non-negative orthant or \(\mathbf {a}=\mathbf {e}_1\) if \({\mathcal {K}}\) is the second order cone). Note that Lemma 12 guarantees the existence of an feasible solution \((\varvec{\lambda },\varepsilon )\) such that \({\textsf {A} }^{{\textsf {T} }}\varvec{\lambda }\in {\mathrm {int}}( {\mathcal {K}}^*) = \left\{ \mathbf {y}\,{:}\, \mathbf {v}^{{\textsf {T} }}\mathbf {y}> 0 \ \forall \mathbf {v}\in {\mathcal {K}}\setminus \{\mathbf {o}\} \right\} \) and \(\varepsilon >0\). The maximization problem is equivalent to \(\max _{(\varvec{\lambda },\varepsilon ,w)\in \mathbb {R}^{m+2}} \left\{ \varepsilon \,{:}\, w \geqslant \varepsilon ,\ {\textsf {A} }^{{\textsf {T} }}\varvec{\lambda }- w\AA \in {\mathcal {K}}^* \right\} \) by a standard duality argument (the dual variable w can be eliminated, which we avoided in order to preserve clarity). Finally we can set \(\varvec{\alpha }= {\textsf {A} }^{{\textsf {T} }}\varvec{\lambda }^*\) and \(b_0 = \varvec{\alpha }^{{\textsf {T} }}\mathbf {x}\) for any \(\mathbf {x}\in {\mathcal {M}}\).

Lemma 13

Let \(\mu \in \mathbb {R}\) and \({\mathcal {K}}\) be a closed cone with pointed convex hull and \({\mathcal {H}}= \{\mathbf {u}\in \mathbb {R}^n \,{:}\, \mathbf {a}^{{\textsf {T} }}\mathbf {u}= b \} \) be a hyperplane such that \({\mathcal {H}}\cap {\mathrm {conv}}({\mathcal {K}})\) is compact and contains a point \(\mathbf {u}\ne \mathbf {o}\). Then, there exists an invertible matrix \( {\textsf {T} }\) of order \(n+1\) such that

$$\begin{aligned} {\mathrm {cone}}[({\mathcal {K}}\cap {\mathcal {H}})\times \{\mu \}] = {\textsf {T} }({\mathcal {K}}\times \{0\}) {:=\left\{ {\textsf {T} }\begin{pmatrix} {\mathbf {k}}\\ 0 \end{pmatrix}\,{:}\, {\mathbf {k}}\in {\mathcal {K}} \right\} }\, . \end{aligned}$$

Proof

Without loss of generality we can assume that \(\AA = \mathbf {e}_n\) and \(b=1\) (so \(\mathbf {u}\in {\mathcal {H}}\) if and only if \(u_n=1\)). Choose \({\textsf {T} }\) as the following square matrix of order \(n+1\):

$$\begin{aligned} {\textsf {T} }\,{:}{=}\, {\textsf {E} }_{n+1}+\mathbf {v}\mathbf {e}_n^{{\textsf {T} }}= \begin{pmatrix} {\textsf {E} }_{n-1} &{} \mathbf {o}&{} \mathbf {o}\\ \mathbf {o}^{{\textsf {T} }}&{} 1 &{} 0 \\ \mathbf {o}^{{\textsf {T} }}&{} \mu &{} 1 \end{pmatrix} \quad \text{ with }\quad \mathbf {v}= \begin{pmatrix} \mathbf {o}\\ \mu \end{pmatrix}\in \mathbb {R}^{n+1}\, . \end{aligned}$$

Then, for any \(\mathbf {u}\in \mathbb {R}^{n+1}\) we have \({\textsf {T} }\mathbf {u}= \mathbf {u}+u_n\mathbf {v}\). Further \({\textsf {T} }^{-1} = {\textsf {E} }_{n+1} - \mathbf {v}\mathbf {e}_n^{{\textsf {T} }}\) exists. We will now show that \( {\mathrm {cone}}({\mathcal {K}}\cap {\mathcal {H}}\times \{\mu \}) = {\textsf {T} }({\mathcal {K}}\times \{0\})\). Now \(\mathbf {o}\) is contained in both cones, so let \(\mathbf {u}\ne \mathbf {o}\) henceforth. First, assume \(\mathbf {u}\in {\textsf {T} }(\mathcal {K}\times \{0\}) \), so that there is \(\mathbf {u}_0 \in {\mathcal {K}}\times \{0\}\) with \({\textsf {T} }\mathbf {u}_0 = \mathbf {u}\). By Lemma 12 we have a \(\lambda \geqslant 0\) such that \( \lambda \mathbf {u}_0 \in ({\mathcal {K}}\cap {\mathcal {H}})\times \{0\} \) and \(\lambda \mathbf {u}= \lambda \mathbf {u}_0+\mathbf {v}\in ({\mathcal {K}}\times {\mathcal {H}})\times \{\mu \}\), and so \(\mathbf {u}\in {\mathrm {cone}}({\mathcal {K}}\cap {\mathcal {H}}\times \{\mu \})\). On the other hand, assume \(\mathbf {u}\in {\mathrm {cone}}[({\mathcal {K}}\cap {\mathcal {H}}\times \{\mu \})]\), then \(\lambda \mathbf {u}\in ({\mathcal {K}}\cap {\mathcal {H}})\times \{\mu \}\) for some \(\lambda \geqslant 0\) implying \(\lambda {\textsf {T} }^{-1} \mathbf {u}=\lambda \mathbf {u}- \mathbf {v}\in ({\mathcal {K}}\cap {\mathcal {H}})\times \{0\}\) and so \({\textsf {T} }^{-1}\mathbf {u}\in {\mathcal {K}}\times \{0\}\), which gives \(\mathbf {u}= {\textsf {T} }{\textsf {T} }^{-1} \mathbf {u}\in {\textsf {T} }({\mathcal {K}}\times \{0\})\). \(\square \)

Finally, for any cone \({\mathcal {K}}\subseteq \mathbb {R}^n\) observe that

$$\begin{aligned} \mathcal {COP}({\mathcal {K}}\times \{0\} ) = \left\{ {\textsf {M} }({\textsf {R} },\mathbf {f},b) \in {\mathcal {S}}^n \,{:}\, {{\textsf {R} }}\in \mathcal {COP}({\mathcal {K}})\,,\, \mathbf {f}\in \mathbb {R}^n\,,\, b \in \mathbb {R}\right\} , \end{aligned}$$
(17)

and for any nonsingular \({\textsf {A} }\in \mathbb {R}^{n\times n}\)

$$\begin{aligned} \mathcal {COP}({\textsf {A} }{\mathcal {K}}) = \{{\textsf {A} }^{-\textsf {T} }{{\textsf {R} }} {\textsf {A} }^{-1} \in {\mathcal {S}}^n \,{:}\, {{\textsf {R} }}\in \mathcal {COP}({\mathcal {K}}) \}\, . \end{aligned}$$
(18)

Now we are ready to prove the main result in this section.

Theorem 14

Let \(\mu \in \mathbb {R}\), \({\mathcal {K}}\cap {\mathcal {H}}\subseteq \mathbb {R}^n\) and \({\mathcal {K}}\) be a closed cone with pointed convex hull and \({\mathcal {H}}= \{\mathbf {x}\,{:}\, \mathbf {a}^{{\textsf {T} }}\mathbf {u}= b \} \subset \mathbb {R}^n\) a hyperplane, such that \({\mathrm {conv}}({\mathcal {K}})\cap {\mathcal {H}}\) is compact and nontrivial (neither empty nor a singleton). Then there is a nonsingular matrix \({\textsf {A} }\) of order \(n+1\) such that \({\mathrm {cone}}({\mathcal {K}}\cap {\mathcal {H}}\times \{\mu \}) = {\textsf {A} }({\mathcal {K}}\times \{0\})\) and with

$$\begin{aligned} {\mathcal {C}}\,{:}{=}\, \left\{ {\textsf {A} }^{-\textsf {T} }{\textsf {M} }({\textsf {R} },\mathbf {f}, b) {\textsf {A} }^{-1} \in {\mathcal {S}}^{n+1} \,{:}\, {{\textsf {R} }} \in \mathcal {COP}({\mathcal {K}})\, ,\, \mathbf {f}\in \mathbb {R}^ n \, ,\, b \in \mathbb {R}\right\} \end{aligned}$$

we have

$$\begin{aligned} \mathcal {COP}(&{\mathrm {cone}}[{\mathcal {K}}\cap {\mathcal {H}}\times \{\mu \}] ) = {\mathcal {C}}\, . \end{aligned}$$

Proof

We have

$$\begin{aligned} \mathcal {COP}(\mathrm {cone}[{\mathcal {K}}\cap {\mathcal {H}}\times \{\mu \}])&= \mathcal {COP}( {\textsf {A} }[{\mathcal {K}}\times \{0\}])&\text {by Lemma 13} \\&= {\textsf {A} }^{-\textsf {T} }\mathcal {COP}({\mathcal {K}}\times \{0\}){\textsf {A} }^{-1}&\text {by Equation (18)} \\&= {\mathcal {C}}\, .&\text {by Equation (17)} \end{aligned}$$

\(\square \)

The above theorem allows us to harness other results in quadratic optimization in order to characterize or at least approximate set copositive constraints in (15). As an example we use the main result in Eichfelder and Povh (2013), which is a generalization of the celebrated result in Burer (2009), in the following theorem.

Theorem 15

Let \({\mathcal {U}}_0 \,{:}{=}\, \{\mathbf {u}_0\in {\mathcal {K}}_0 \,{:}\, {\textsf {A} }_0\mathbf {u}_0 = \mathbf {b}_0\,,\, (\mathbf {u}_0)_j \in \{0,1\}~ \text{ for } \text{ all } ~j\in {\mathcal {I}}\}\ne \{\mathbf {o}\}\) be a closed, bounded and nonempty set, where \({\mathcal {K}}_0\subseteq \mathbb {R}^{n_0}\) is a closed, convex and pointed cone with nonempty interior, \({\textsf {A} }_0 \in \mathbb {R}^{ k_0\times n_0}\), \(\mathbf {b}_0 \in \mathbb {R}^{k_0}\) and \({\mathcal {I}}\subseteq [1\!:\! n_0]\). Further, let \({\mathcal {K}}\,{:}{=}\, {\mathrm {cone}}({\mathcal {U}}_0)\). Define

for some appropriate matrices \({\textsf {A} }\), \({\textsf {E} }\) and \({\textsf {Q} }\), vectors \(\mathbf {b}\), \(\mathbf {b}^2\), \(\varvec{\alpha }\) and numbers \(n_y\), \(n_z\) and k (which can be formed with minimal effort). Then, the following inclusions hold:

$$\begin{aligned} \mathcal {COP}({\mathcal {K}}) \supseteq \mathcal {COP}_{prx}({\mathcal {K}}) \supseteq {\mathrm {int}}~\mathcal {COP}({\mathcal {K}})\, . \end{aligned}$$

Proof

In order to apply (Eichfelder and Povh 2013, Theorem 10) we have to guarantee a version of Burer’s key condition holds for \({\mathcal {U}}_0\), namely

$$\begin{aligned} {\textsf {A} }_0\mathbf {u}_0 = \mathbf {b}_0 \text{ and } \mathbf {u}_0 \in {\mathcal {K}}_0\, \implies 0\leqslant (\mathbf {u}_0)_j\leqslant 1 \text{ for } \text{ all } j \in {\mathcal {I}}. \end{aligned}$$

Assume \((\mathbf {u}_0)_j\leqslant 1\) is not implied for \(n_y\) different \(j\in {\mathcal {I}}\) and \(0\leqslant (\mathbf {u}_0)_j\) is not implied for \(n_z\) different \(j\in {\mathcal {I}}\) (i.e. in the worst case \(n_y=n_z = |{\mathcal {I}}|\), in the best case both are zero). This forces us to introduce \(n_y\) constraints \((\mathbf {u}_0)_j+y_j = 1\), and \(n_z\) constraints \((\mathbf {u}_0)_j-z_j = 0\), where \(\mathbf {y}\in \mathbb {R}^{n_y}_+\) and \(\mathbf {z}\in \mathbb {R}^{n_z}_+\) are slack variables, which are necessary since all linear constraints have to be equality constraints. Note that all these additional constraints are redundant. Let \(n\,{:}{=}\, n_0+ n_y+n_z\) and \(k \,{:}{=}\, k_0+n_y+n_z\). We define \(\mathbf {b}\,{:}{=}\, (\mathbf {b}_0;\mathbf {e};\mathbf {o})\in \mathbb {R}^{k}\), where \(\mathbf {e}\in \mathbb {R}^{n_y}\) is the all-ones vector and \(\mathbf {o}\in \mathbb {R}^{n_z}\). Further, \(\mathbf {u}\,{:}{=}\, (\mathbf {u}_0;\mathbf {y},\mathbf {z})\in {\mathcal {K}}\times \mathbb {R}_+^{n_y+n_z}\) and \({\textsf {Q} }\,{:}{=}\, ({\textsf {Q} }_0,{\textsf {O} };{\textsf {O} },{\textsf {O} })\in {\mathcal {S}}^{n}\). Also we abbreviate by \(\mathbf {b}^2\) the vector whose i-th entry is given by \(b_i^2\). Now let \({\textsf {A} }\in \mathbb {R}^{n\times k}\) be such that \({\textsf {A} }\mathbf {u}= \mathbf {b}\) gives line by line the equalities \({\textsf {A} }_0\mathbf {u}_0 = \mathbf {b}_0\), \((\mathbf {u}_0)_j+y_j = 1\) and \((\mathbf {u}_0)_j-z_j = 0\). It is easy to check, that for \({\mathcal {U}}\,{:}{=}\,~\{\mathbf {u}\in ~ {\mathcal {K}}_0\times \mathbb {R}_+^{n_y+n_z}~\,{:}\,~{\textsf {A} }\mathbf {u}= \mathbf {b}\, ,\, u_j \in \{0,1\} \, \text{ for } \text{ all } \, j\in {\mathcal {I}}\}\) we have \(\inf \limits _{\mathbf {u}\in {\mathcal {U}}} \mathbf {u}^{{\textsf {T} }}{\textsf {Q} }\mathbf {u}= \inf \limits _{\mathbf {u}_0\in {\mathcal {U}}_0} \mathbf {u}_0^{{\textsf {T} }}{\textsf {Q} }_0\mathbf {u}_0\), that \({\mathcal {U}}\) is a compact set, and that \({\mathcal {K}}_0\times \mathbb {R}_+^{n_y+n_z}\) is a closed, convex, pointed cone. Thus, by Lemma 12, we can add a redundant constraint \(\varvec{\alpha }^{{\textsf {T} }}\mathbf {u}= \beta \) where \(\varvec{\alpha }\in {\mathrm {int}}(({\mathcal {K}}_0\times \mathbb {R}_+^{n_y+n_z})^*)\) to the definition of \({\mathcal {U}}\). By redundancy and since \(\mathbf {y}^{{\textsf {T} }}\varvec{\alpha }> 0 \ \text{ for } \text{ all } \mathbf {y}\in {\mathcal {K}}_0\times \mathbb {R}_+^{n_y+n_z}\setminus \{0\}\), we see that \(\beta > 0 \) so that after a scaling, the constraint can be written as \(\varvec{\alpha }^{{\textsf {T} }}\mathbf {u}= 1\). Then (Eichfelder and Povh 2013, Theorem 10) along with a straightforward adaptation of the reduction step outlined in (Burer 2012, Theorem 2) gives us the equivalence between the following two optimization problems

(19)

where \({\mathcal {C}}\,{:}{=}\, \mathcal {CPP}({\mathcal {K}}_0\times \mathbb {R}_+^{n_y+n_z})\), so that the latter problem has the following dual, abbreviating \({\textsf {E} }={\textsf {E} }_{|{\mathcal {I}}|}\):

where the dual variables \((\varvec{\lambda }, \varvec{\mu },\varvec{\nu },\delta )\in \mathbb {R}^{2k+|{\mathcal {I}}|+1}\) and where \(\varvec{\lambda }\) and \(\varvec{\nu }\) were subject to a change of variables from \(\varvec{\lambda }\Rightarrow 2\varvec{\lambda }\) and \(\varvec{\nu }\Rightarrow 2\varvec{\nu }\) to avoid fractions in the expression. Now, since \(\varvec{\alpha }\in {\mathrm {int}}~({\mathcal {K}}_0\times \mathbb {R}_+^{n_y+n_z})^*\) we have \(\varvec{\alpha }\varvec{\alpha }^{{\textsf {T} }}\in {\mathrm {int}}~\mathcal {COP}({\mathcal {K}}_0\times \mathbb {R}_+^{n_y+n_z})\), so that the dual problem has an obvious Slater point and indeed \(d^*=q^*\). We are now ready to prove the inclusions in the theorem. Note that \(\mathbf {o}\notin {\mathcal {U}}\) under the hypothesis of the theorem, hence \(p^*>0\) if and only if \({\textsf {Q} }\in {\mathrm {int}}~\mathcal {COP}({\mathcal {K}})\). In this case \(d^*=p^*>0\) implies that \({\textsf {Q} }\in \mathcal {COP}_{prx}({\mathcal {K}})\). If \({\textsf {Q} }\in \mathcal {COP}_{prx}({\mathcal {K}})\) then \(p^* = d^* \geqslant 0\) which implies that \({\textsf {Q} }\in \mathcal {COP}({\mathcal {K}})\). \(\square \)

The argument in the proof is analogous to the general strategy outlined in Sect. 3, the difference being that the objective in the minimization problem is homogeneous. Also, we do not assume dual attainability which is why the reverse inclusions cannot be established. The theorem highlights the fact, that in absence of dual attainability the feasible set is shrinking, and locates the points that are lost at the boundary of the set-copositive cone we sought to characterize. The reformulation results in Burer (2009), Eichfelder and Povh (2013) have been used in Mittal et al. (2019), Xu and Hanasusanto (2019) for the sake of reformulating semi-infinite constraints with quadratic dependency on the uncertainty vector. We want to point out that in these papers the problem of dual attainability was not sufficiently considered when taking uncertain constraints into account. In addition, these papers do not harness the reduction step, which by Lemma 12 is always applicable if the uncertainty set \({\mathcal {U}}\) is compact. The reduction step reduces the dimension of the set-copositive constraint in the copositive reformulation and at least opens the possibility for a primal Slater point to exist, which would never be the case if that reduction step was omitted as was pointed out in (Burer 2012, Sect. 3). To better understand the situation, consider the example given there more carefully. It states, that

$$\begin{aligned}&\quad \qquad \min \{\mathbf {x}^{{\textsf {T} }}{\textsf {Q} }\mathbf {x}\,{:}\, \mathbf {e}^{{\textsf {T} }}\mathbf {x}= 1, \mathbf {x}\in \mathbb {R}^n_+ \} \\&\quad =\min \{{\textsf {Q} }\bullet {\textsf {X} }\,{:}\, \mathbf {e}^{{\textsf {T} }}\mathbf {x}= 1,\ \mathbf {e}^{{\textsf {T} }}{\textsf {X} }\mathbf {e}= 1,\ {\textsf {Y} }({\textsf {X} },\mathbf {x}) \in \mathcal {CPP}(\mathbb {R}^{n+1}_+)\}\\&\quad =\min \{{\textsf {Q} }\bullet {\textsf {X} }\,{:}\, \mathbf {e}^{{\textsf {T} }}{\textsf {X} }\mathbf {e}= 1,\ {\textsf {X} }\in \mathcal {CPP}(\mathbb {R}^{n}_+)\} \end{aligned}$$

The first equality is due to the main result in Burer (2012) and the second one is the reduction step outlined in Sect. 3 of that paper. In fact, equality of the first and the last problem was already established at the origins of copositive programming in Bomze et al. (2002). Take any interior point \({\textsf {X} }\in {\mathrm {int}}\mathcal {CPP}(\mathbb {R}^{n}_+)\). Since \(\mathbf {e}\mathbf {e}^{{\textsf {T} }}\in {\mathrm {int}}\mathcal {COP}(\mathbb {R}^{n}_+)\) one can rescale \({\textsf {X} }\) such that \(\mathbf {e}^{{\textsf {T} }}{\textsf {X} }\mathbf {e}= 1\) holds, and we see that the last minimization problem indeed has a primal Slater point. In fact, the same line of reasoning remains valid if we replace \(\mathbb {R}_+^n\) by any convex cone \({\mathcal {K}}_0\) and \(\mathbf {e}\) by any \(\varvec{\alpha }\in {\mathcal {K}}_0^*\) and 1 by any positive number b. The only difference is that we then have to appeal to the results in Burer (2012) or Eichfelder and Povh (2013) for the reformulation step. In such cases we would have \({\mathcal {K}}= {\mathrm {cone}}\{\mathbf {x}\in {\mathcal {K}}_0\,{:}\, \varvec{\alpha }^{{\textsf {T} }}\mathbf {x}= b \} = {\mathcal {K}}_0\), i.e. \({\mathcal {X}}_0\) would be a base of \({\mathcal {K}}_0\). The question is, whether we have any other cases where we end up with a primal Slater point after applying the reduction step? Proposition 16 unfortunately gives a negative answer to this question: it states that a primal Slater point occurs exactly in the case where there is just a single equality constraint. Therefore, Theorem 15 is only an approximate result and does not give rise to an exact copositivity characterization in general. This unfortunate circumstance is at least partly redeemed by the fact that we lose at most some points at the boundary of the set.

Proposition 16

Consider the conic reformulation in (19), i.e.

(20)

This problem has a strictly feasible point if and only if \(\mathbf {a}_i = b_i \varvec{\alpha }\) for all \(i \in [1\!:\!m]\).

Proof

The “if” part is clear since then all constraints but \(\varvec{\alpha }^{{\textsf {T} }}{\textsf {U} }\varvec{\alpha }= 1\) are redundant, \(\varvec{\alpha }\varvec{\alpha }^{{\textsf {T} }}\in {\mathrm {int}}~\mathcal {CPP}({\mathcal {K}}_0)^*\) and \(1 > 0\) so that any interior point of \(\mathcal {CPP}({\mathcal {K}}_0)\) can be scaled to fulfil the one remaining constraint. So assume there is a Slater point \({\overline{{\textsf {U} }}}\in {\mathrm {int}}~\mathcal {CPP}({\mathcal {K}}_0) \subseteq {\mathrm {int}}~{\mathcal {S}}_+^n\). Then \({\overline{{\textsf {U} }}}^{1/2}\) exists and is invertible. Set \({\overline{\mathbf {a}}}_i = {\overline{{\textsf {U} }}}^{1/2} \mathbf {a}_i\) and \({\overline{\varvec{\alpha }}}= {\overline{{\textsf {U} }}}^{1/2} \varvec{\alpha }\) then the constraints read as \({\overline{\varvec{\alpha }}}^{{\textsf {T} }}{\overline{\varvec{\alpha }}}= 1\), \({\overline{\mathbf {a}}}^{{\textsf {T} }}{\overline{\mathbf {a}}}_i = b_i^2\) and \({\overline{\varvec{\alpha }}}^{{\textsf {T} }}{\overline{\mathbf {a}}}_i = b_i\). The first and the second one imply \(\Vert {\overline{\varvec{\alpha }}}\Vert = 1\) and \(\Vert {\overline{\mathbf {a}}}_i\Vert = b_i\) but then by the equality case of Cauchy-Schwarz, the third constraint implies \({\overline{\mathbf {a}}}_i = \beta _i {\overline{\varvec{\alpha }}}\) for some \(\beta _i\); we actually have \(b_i = \beta _i\Vert {\overline{\varvec{\alpha }}}\Vert ^2 = \beta _i\) and finally, by invertibility of \({\textsf {U} }^{1/2}\), \(\mathbf {a}_i = b_i \varvec{\alpha }\). \(\square \)

6 Adjustable robust reformulation of disjoint convex-non-convex quadratic optimization

So far, we have discussed how to use reformulations of QCQPs for the purpose of reformulating robust optimization problems. However, (Zhen et al. 2019) outlined a way to use an adjustable robust approach in order to tackle disjoint bilinear optimization problems, which are special cases of QCQPs, thus demonstrating that the usefulness runs both ways. In this section we will show that the techniques we discussed in this text can be used to further their approach. The problem considered in Zhen et al. (2019) is given by:

$$\begin{aligned} \inf \limits _{(\mathbf {x},\mathbf {y})\in {\mathcal {X}}\times {\mathcal {Y}}} \mathbf {x}^{{\textsf {T} }}{\textsf {Q} }\mathbf {y}. \end{aligned}$$
(21)

The following theorem proved in that article establishes a connection between QCQPs and adjustable robust optimization.

Theorem 17

Let \({\mathcal {Y}}\,{:}{=}\, \left\{ \mathbf {y}\in \mathbb {R}^{n_y}_+ \,{:}\, {\textsf {A} }_y\mathbf {y}\geqslant \mathbf {b}_y \right\} \), where \({\textsf {A} }_y\in \mathbb {R}^{m_y\times n_y}\) and \(\mathbf {b}_y\in \mathbb {R}^{m_y}\). Then (21) has the same optimal value as

$$\begin{aligned} \sup _{\tau } \left\{ \tau \,{:}\, \text{ there } \text{ is } \text{ a } \text{ nonnegative } \text{ function } \mathbf {z}(\mathbf {x}) \,{:}\, \tau \leqslant \mathbf {b}_y^{{\textsf {T} }}\mathbf {z}(\mathbf {x}),\ {\textsf {A} }_y^{{\textsf {T} }}\mathbf {z}(\mathbf {x})\leqslant {\textsf {Q} }^{{\textsf {T} }}\mathbf {x}\ \text{ for } \text{ all } \mathbf {x}\in {\mathcal {X}} \right\} \end{aligned}$$

We will expand this approach to the case where in addition a possibly nonconvex quadratic term in \(\mathbf {x}\) and a convex quadratic term in \(\mathbf {y}\) are present in the objective. We will also slightly deviate from the definition of \({\mathcal {Y}}\) in that we will assume equality constraints. As a consequence, the function \(\mathbf {z}(\mathbf {x})\) will not have a sign restriction, which will be useful in in the proof of Theorem 19.

The following argument is a straightforward but tedious generalization of Theorem 17. For the readers’ convenience we provide a detailed derivation.

Theorem 18

Let \({\textsf {Q} }_x \in {\mathcal {S}}^{n_1}\), \({\textsf {Q} }_{xy} \in \mathbb {R}^{n_1\times n_2}\), \({\textsf {F} }\in \mathbb {R}^{k\times n_2}\) and \({\textsf {G} }\in \mathbb {R}^{r\times n_2}\). Further, assume \({\mathcal {X}}\subseteq \mathbb {R}^{n_1}\) is a compact set and \({\mathcal {Y}}\,{:}{=}\,\{\mathbf {y}\in \mathbb {R}^{n_2}_+\,{:}\, {\textsf {F} }\mathbf {y}= \mathbf {d}\} \subseteq \mathbb {R}^{n_2}\) has a Slater point and let \({\mathcal {Z}}(\mathbf {x})~\,{:}{=}\,~ \{(\mathbf {z},\mathbf {w})\,{:}\, {\textsf {F} }^{{\textsf {T} }}\mathbf {z}+{\textsf {G} }^{{\textsf {T} }}\mathbf {w}\leqslant {\textsf {Q} }_{xy}^{{\textsf {T} }}\mathbf {x}\}\). Then

$$\begin{aligned}&\inf _{\mathbf {x}\in {\mathcal {X}}, \mathbf {y}\in {\mathcal {Y}}} \mathbf {x}^{{\textsf {T} }}{\textsf {Q} }_x\mathbf {x}+ \mathbf {x}^{{\textsf {T} }}{\textsf {Q} }_{xy}\mathbf {y}+ \Vert {\textsf {G} }\mathbf {y}\Vert ^2 \end{aligned}$$
(22)
$$\begin{aligned}&=\sup _{\tau }\{\tau \,{:}\, \text{ for } \text{ some } \mathbf {z}(\mathbf {x}),\mathbf {w}(\mathbf {x}) \,{:}\, \mathbf {x}^{{\textsf {T} }}{\textsf {Q} }_x\mathbf {x}+ \mathbf {d}^{{\textsf {T} }}\mathbf {z}(\mathbf {x})-\tfrac{1}{4}\Vert \mathbf {w}(\mathbf {x})\Vert ^2 \geqslant \tau , \,\nonumber \\&\qquad \qquad (\mathbf {z}(\mathbf {x}),\mathbf {w}(\mathbf {x})) \in {\mathcal {Z}}(\mathbf {x}) \, \text{, } \text{ for } \text{ all } \mathbf {x}\in {\mathcal {X}}\}, \end{aligned}$$
(23)

where \(\mathbf {z}\,{:}\, \mathbb {R}^{n_1} \rightarrow \mathbb {R}^{k}\) and \(\mathbf {w}\,{:}\, \mathbb {R}^{n_1} \rightarrow \mathbb {R}^{r}\) are function-valued decision variables.

Proof

Observe that

$$\begin{aligned}&\quad \qquad \inf _{(\mathbf {x},\mathbf {y})\in {\mathcal {X}}\times {\mathcal {Y}}} \left\{ \mathbf {x}^{{\textsf {T} }}{\textsf {Q} }_x\mathbf {x}+ \mathbf {x}^{{\textsf {T} }}{\textsf {Q} }_{xy}\mathbf {y}+ \Vert {\textsf {G} }\mathbf {y}\Vert ^2 \right\} \\&\quad = \sup _{\tau }\left\{ \tau \,{:}\, \mathbf {x}^{{\textsf {T} }}{\textsf {Q} }_x\mathbf {x}+ \mathbf {x}^{{\textsf {T} }}{\textsf {Q} }_{xy}\mathbf {y}+ \Vert {\textsf {G} }\mathbf {y}\Vert ^2 \geqslant \tau , \, \text{ for } \text{ all } \mathbf {x}\in {\mathcal {X}}\,\text{ and } \text{ for } \text{ all } \mathbf {y}\in {\mathcal {Y}} \right\} . \end{aligned}$$

Note that the semi-infinite constraint in the last problem implies a minimization problem of the constraint function with respect to \(\mathbf {y}\) explicitly given by

$$\begin{aligned} \inf _{\mathbf {y}\in \mathbb {R}^{n_2}_+} \left\{ \mathbf {x}^{{\textsf {T} }}{\textsf {Q} }_{xy}\mathbf {y}+ \Vert {\textsf {G} }\mathbf {y}\Vert ^2 \,{:}\, {\textsf {F} }\mathbf {y}= \mathbf {d}\, \right\} \end{aligned}$$
(24)

By assumption there is a \(\mathbf {y}\in {\mathrm {int}}(\mathbb {R}^{n_2}_+)\) such that \({\textsf {F} }\mathbf {y}= \mathbf {d}\), thus (24) fulfils Slater’s condition, so that zero duality gap and dual attainability is guaranteed for the dual given by

$$\begin{aligned} \sup _{(\mathbf {z},\mathbf {w})\in \mathbb {R}^{2k}} \left\{ \mathbf {d}^{{\textsf {T} }}\mathbf {z}-\tfrac{1}{4}\Vert \mathbf {w}\Vert ^2 \,{:}\, {\textsf {F} }^{{\textsf {T} }}\mathbf {z}+{\textsf {G} }^{{\textsf {T} }}\mathbf {w}\leqslant {\textsf {Q} }_{xy}^{{\textsf {T} }}\mathbf {x} \right\} . \end{aligned}$$
(25)

The dual feasible set is given by \({\mathcal {Z}}(\mathbf {x})\). The following equations hold:

\(\square \)

We end up with an adjustable robust optimization problem with second stage variables \(\mathbf {z}(\mathbf {x})\) and \(\mathbf {w}(\mathbf {x})\) and uncertainty set \({\mathcal {X}}\), i.e. the decision vector \(\mathbf {x}\) takes the role of the uncertainty parameter.

The fact that \(\mathbf {z}(\mathbf {x})\) and \(\mathbf {w}(\mathbf {x})\) are function valued variables is a major complication since the space of all functions is an intractable search space. The standard way to deal with this issue is to contract the search space as to encompass only linear functions, i.e. \(\mathbf {z}(\mathbf {x}) = {\textsf {Z} }\mathbf {x}+\mathbf {z}\) and \(\mathbf {w}(\mathbf {x}) = {\textsf {W} }\mathbf {x}+\mathbf {w}\). The new decision variables then become the coefficients that identify the affine functions. If after the contraction step the robust constraints become linear in the uncertainty parameter one can employ standard reformulation techniques based on linear conic duality in order to obtain a tractable reformulation of the semi-infinite constraints. However, at least one of the adjustable robust constraints in (23) depends quadratically on \(\mathbf {x}\) rather than linearly. Thus, approximations based on linear duality cannot be employed here.

However, the general reformulation strategy allows us not only to deal with the terms \(\mathbf {x}^{{\textsf {T} }}{\textsf {Q} }_{x}\mathbf {x}\) in the adjustable robust reformulation (23), it also allows us for the adjustable variable \(\mathbf {z}(\mathbf {x})\) to employ a quadratic decision rule, i.e. \((\mathbf {z}(\mathbf {x}))_j = \mathbf {x}^{{\textsf {T} }}{\textsf {Z} }_j\mathbf {x}+\mathbf {x}^{{\textsf {T} }}\mathbf {z}_j+z_j,\ j\in [1\!:\!k]\), which is a class of decision rules which contains affine decision rules as a special case. Thus, it can potentially yield tighter approximations than the ones obtainable by employing affine decision rules. On the other hand, \(\mathbf {w}(\mathbf {x})\) is subject to a (convex) quadratic constraint. Consequently, an affine policy has to be employed so that \(\mathbf {w}(\mathbf {x}) = {\textsf {W} }\mathbf {x}+\mathbf {w}\), lest terms of order larger than 2 emerge. Since employing such decision rules is a restriction of the maximization problem, we will get a lower bound on the original minimization problem.

Theorem 19

Let the problem data be given as in Theorem 18. Assume that for any \({\textsf {Q} }\in {\mathcal {S}}^{n_1}\), \({\mathbf {q}}\in \mathbb {R}^{n_1}\) and \(\omega \in \mathbb {R}^n\) it holds that

$$\begin{aligned} \inf _{\mathbf {x}\in {\mathcal {X}}} \mathbf {x}^{{\textsf {T} }}{\textsf {Q} }\mathbf {x}+{\mathbf {q}}^{{\textsf {T} }}\mathbf {x}+\omega = \inf _{{\textsf {X} }\in {\mathcal {C}}}\left\{ {\textsf {M} }({\textsf {Q} },{\mathbf {q}},\omega )\bullet {\textsf {X} }\,{:}\, {\textsf {A} }_j\bullet {\textsf {X} }\leqslant b_j, \ j \in [1\!:\!m] \right\} \end{aligned}$$

for some appropriate matrix cone \({\mathcal {C}}\subseteq {\mathcal {S}}^{n_1+1}\), matrices \({\textsf {A} }_j\in {\mathcal {S}}^{n_1+1}, \ j \in [1\!:\!m]\) such that there is no duality gap for the conic reformulation and its dual. Then, the following optimization problem gives a nontrivial lower bound for (23):

where \(({\textsf {Q} }_{xy})_l = {\mathrm {Column}}_l({\textsf {Q} }_{xy})\), \(\mathbf {w}_i = {\mathrm {Row}}_i({\textsf {W} })\), \(f_{il} = ({\textsf {F} })_{li}\), \(g_{il} = ({\textsf {G} })_{li}\), \(w_i = (\mathbf {w})_i\) and \(d_i = (\mathbf {d})_i\).

Proof

Under the assumptions of the theorem we have that, given any \({\textsf {Q} }\in {\mathcal {S}}^{n_1}\), \({\mathbf {q}}\in \mathbb {R}^{n_1}\) and \(\omega \in \mathbb {R}\) we can enforce the semi-infinite constraint

$$\begin{aligned} \mathbf {x}^{{\textsf {T} }}{\textsf {Q} }\mathbf {x}+ {\mathbf {q}}^{{\textsf {T} }}\mathbf {x}+\omega \geqslant 0 \ \forall \mathbf {x}\in {\mathcal {X}}\ \end{aligned}$$
(26)

by demanding that

$$\begin{aligned} \mathbf {b}^{{\textsf {T} }}\varvec{\lambda }\leqslant 0, \ {\textsf {M} }\left( {\textsf {Q} },{\mathbf {q}},\omega \right) + \sum _{j=1}^m\lambda _j {\textsf {A} }_j \in {\mathcal {C}}^*,\ \varvec{\lambda }\in \mathbb {R}^m_+, \end{aligned}$$
(27)

as long as full strong duality holds (the consequences of a failure of full strong duality will be discussed at the end of the proof). Setting \((\mathbf {z}(\mathbf {x}))_i = \mathbf {x}^{{\textsf {T} }}{\textsf {Z} }_i\mathbf {x}+\mathbf {x}^{{\textsf {T} }}\mathbf {z}_i+z_i,\ i\in [1\!:\!k]\) and \(\mathbf {w}(\mathbf {x}) = {\textsf {W} }\mathbf {x}+\mathbf {w}\) we see that the constraint \(\mathbf {x}^{{\textsf {T} }}{\textsf {Q} }_x\mathbf {x}+ \mathbf {d}^{{\textsf {T} }}\mathbf {z}(\mathbf {x})- \tfrac{1}{4}\Vert {\textsf {W} }\mathbf {x}+\mathbf {w}\Vert ^2 - \tau \geqslant 0\ \forall \mathbf {x}\in {\mathcal {X}}\) is equivalent to \(\mathbf {x}^{{\textsf {T} }}{\textsf {Q} }_x\mathbf {x}+ \mathbf {d}^{{\textsf {T} }}\mathbf {z}(\mathbf {x}) -\tfrac{1}{4}\left( \mathbf {x}^{{\textsf {T} }},1\right) {\textsf {H} }\left( \mathbf {x}^{{\textsf {T} }},1\right) ^{{\textsf {T} }}- \tau \geqslant 0\ \forall \mathbf {x}\in {\mathcal {X}}, \ {\textsf {M} }\left( {\textsf {E} }_r,2\left( {\textsf {W} },\mathbf {w}\right) ,{\textsf {H} }\right) \in {\mathcal {S}}^{n_1+r+1}_+\) by Lemma 20 below. The semi-infinite constraint is of the form (26) with \({\textsf {Q} }= {\textsf {Q} }_x-\tfrac{1}{4}\left( {\textsf {H} }\right) _{1:n_1,1:n_1}+\sum _{i=1}^{k}d_i{\textsf {Z} }_i\), \({\mathbf {q}}= \sum _{i=1}^{k}d_i\mathbf {z}_i-\tfrac{1}{2}\left( {\textsf {H} }\right) _{1:n_1,n_1+1}\) and \(\omega = \sum _{i=1}^{k}d_iz_i-\tau -\tfrac{1}{4}\left( {\textsf {H} }\right) _{n_1+1,n_1+1}\). Similarly, the constraints \(({\textsf {Q} }^{{\textsf {T} }}_{xy}\mathbf {x}-{\textsf {F} }^{{\textsf {T} }}\mathbf {z}(\mathbf {x})-{\textsf {G} }^{{\textsf {T} }}\mathbf {w}(\mathbf {x}))_l\geqslant 0 \ \forall \mathbf {x}\in {\mathcal {X}}, \ l \in [1\!:\!n_2]\) are of the form (26) with \({\textsf {Q} }= \sum _{i=1}^{k}f_{il} {\textsf {Z} }_i\), \({\mathbf {q}}= ({\textsf {Q} }_{xy})_l^{{\textsf {T} }}- \sum _{i=1}^{k}f_{il}\mathbf {z}_i - \sum _{i=1}^{r}g_{il}\mathbf {w}_i^{{\textsf {T} }}\) and \(\omega = \sum _{i=1}^{k}f_{il}z_i\). Now, since we do not assume full strong duality, but merely zero duality gap for the conic reformulation of (26), we need to argue that the conclusion of the theorem still holds in absence of dual attainability. In fact, for any \(\left( {\textsf {Q} },{\mathbf {q}},\omega \right) \) for which dual attainability fails, constraints (27) may become infeasible. This can only be the case if there is an \(\mathbf {x}\in {\mathcal {X}}\) such that the inequality (26) is actually tight. To see this, consider the case when \(p^*=\inf _{\mathbf {x}\in {\mathcal {X}}} \left\{ \mathbf {x}^{{\textsf {T} }}{\textsf {Q} }\mathbf {x}+ {\mathbf {q}}^{{\textsf {T} }}\mathbf {x}+\omega \right\} >0\). Since we assume zero duality gap, there is a dually feasible sequence \(\varvec{\lambda }_i\) such that \(\lim _{i \rightarrow \infty } -\mathbf {b}^{{\textsf {T} }}\varvec{\lambda }_i = p^*\) and since the latter quantity strictly exceeds zero, there has to be an \(i^*\) such hat \(-\mathbf {b}^{{\textsf {T} }}\varvec{\lambda }_{i^*} \geqslant 0\) so that \(\left( {\textsf {Q} },{\mathbf {q}},\omega \right) \) is feasible for (26) and (27) simultaneously. However, if \(p^* = 0\) there may be no member of the sequence that takes the dual objective to zero, so that any choice on the decision variables that yields such instances of \(\left( {\textsf {Q} },{\mathbf {q}},\omega \right) \) may be infeasible. Such a contraction of the feasible set can only lower the optimal value of the optimization problem, so that the conclusion of the theorem remains valid, if we can show that the feasible set never shrinks to the empty set. Let us again consider (23) under a quadratic policy. We need to show that there is a choice on the coefficients of the quadratic policy and \(\tau \) so that none of the inequalities is ever binding, i.e. they are fulfilled with strict inequality for all \(\mathbf {x}\in {\mathcal {X}}\). It suffices to show that there is a static policy, where \(\mathbf {z}(\mathbf {x}),\ \mathbf {w}(\mathbf {x})\) are constant for all \(\mathbf {x}\), which accomplishes this goal. The inequality \(\mathbf {x}^{{\textsf {T} }}{\textsf {Q} }_x\mathbf {x}+ \mathbf {d}^{{\textsf {T} }}\mathbf {z}-\tfrac{1}{4}\Vert \mathbf {w}\Vert ^2\geqslant \tau \) is safe, since we can make \(\tau \) arbitrarily small and the left-hand side is bounded over \({\mathcal {X}}\) as it is a compact set. Since \({\mathcal {Y}}\,{:}{=}\,\{\mathbf {y}\in \mathbb {R}^{n_2}_+\,{:}\, {\textsf {F} }\mathbf {y}= \mathbf {d}\} \subseteq \mathbb {R}^{n_2}\) is a bounded conic intersection we invoke Lemma 12 to show that there is a \(\bar{\mathbf {z}}\in \mathbb {R}^k\) so that \({\textsf {F} }^{{\textsf {T} }}\bar{\mathbf {z}} \in {\mathrm {int}}\mathbb {R}^{n_2}_{+}\). Now set \(\mathbf {z}= -\mu \bar{\mathbf {z}}\) to see that \( {\textsf {F} }^{{\textsf {T} }}\mathbf {z}+{\textsf {G} }^{{\textsf {T} }}\mathbf {w}\leqslant {\textsf {Q} }_{xy}^{{\textsf {T} }}\mathbf {x}\) can be fulfilled for all \(\mathbf {x}\in {\mathcal {X}}\) with strict inequality by choosing \(\mu >0\) large enough. This concludes the proof. \(\square \)

The argument for the linearization of the constraint \({\textsf {H} }= ({\textsf {W} },\mathbf {w})^{{\textsf {T} }}({\textsf {W} },\mathbf {w})\) in the form \( {\textsf {M} }\left( {\textsf {E} }_{r},2({\textsf {W} },\mathbf {w}),{\textsf {H} }\right) \in {\mathcal {S}}^{n_1+r+1}_+\) is given in the following lemma, which is a straightforward generalization of (Mittal et al. 2019, Lemma 4). It is slightly more general than needed for Theorem 19 as it might be useful in other contexts such as the ones discussed in Mittal et al. (2019).

Lemma 20

Let \({\textsf {Q} }(\mathbf {y})\,{:}\, \mathbb {R}^k\mapsto \mathbb {R}^{m\times n}\). A vector \(\mathbf {y}\in \mathbb {R}^k\) fulfils

$$\begin{aligned} \mathbf {x}^{{\textsf {T} }}({\textsf {D} }-{\textsf {Q} }^{{\textsf {T} }}(\mathbf {y}){\textsf {Q} }(\mathbf {y}))\mathbf {x}+ {\mathbf {q}}^{{\textsf {T} }}\mathbf {x}+\omega \geqslant 0 \ \forall \mathbf {x}\in {\mathcal {X}}\subseteq \mathbb {R}^n \end{aligned}$$
(28)

if and only if there exists a matrix \({\textsf {H} }\in {\mathcal {S}}^n\) such that

$$\begin{aligned} \mathbf {x}^{{\textsf {T} }}({\textsf {D} }-{\textsf {H} })\mathbf {x}+ {\mathbf {q}}^{{\textsf {T} }}\mathbf {x}+\omega&\geqslant 0 \ \forall \mathbf {x}\in {\mathcal {X}}\quad {\mathrm {and}} \end{aligned}$$
(29)
$$\begin{aligned} {\textsf {M} }\left( {\textsf {E} }_m,\, 2{\textsf {Q} }(\mathbf {y}),\, {\textsf {H} }\right)&\in {\mathcal {S}}^{n+m}_+. \end{aligned}$$
(30)

Proof

Assume that \(\mathbf {y}\) fulfils (28). Then, it fulfils (29) and (30) by setting \({\textsf {H} }= {\textsf {Q} }^{{\textsf {T} }}(\mathbf {y}){\textsf {Q} }(\mathbf {y})\), which implies \({\textsf {H} }- {\textsf {Q} }^{{\textsf {T} }}(\mathbf {y}){\textsf {Q} }(\mathbf {y})\in {\mathcal {S}}^n_+\), which by Schur complementation implies (30). So assume \(\mathbf {y}\) and \({\textsf {H} }\) fulfil (29) and (30), then

$$\begin{aligned}&\quad \qquad \begin{pmatrix} \mathbf {x}\\ 1 \end{pmatrix}^{{\textsf {T} }}\left[ \begin{pmatrix} {\textsf {D} }-{\textsf {Q} }(\mathbf {y})^{{\textsf {T} }}{\textsf {Q} }(\mathbf {y}) &{} \frac{1}{2} {\mathbf {q}}\\ \frac{1}{2} {\mathbf {q}}^{{\textsf {T} }}&{} \omega \end{pmatrix} -\begin{pmatrix} {\textsf {D} }-{\textsf {H} }&{} \frac{1}{2} {\mathbf {q}}\\ \frac{1}{2} {\mathbf {q}}^{{\textsf {T} }}&{} \omega \end{pmatrix}\right] \begin{pmatrix} \mathbf {x}\\ 1 \end{pmatrix} \\&\quad = \mathbf {x}^{{\textsf {T} }}\left[ {\textsf {H} }-{\textsf {Q} }(\mathbf {y})^{{\textsf {T} }}{\textsf {Q} }(\mathbf {y}) \right] \mathbf {x}\geqslant 0 \quad \text{ for } \text{ all } \mathbf {x}\in {\mathcal {X}}\, , \end{aligned}$$

where the inequality holds due to (30). But since for any pair of matrices \({\textsf {A} },{\textsf {B} }\in {\mathcal {S}}^n\) we have that \(\mathbf {x}^{{\textsf {T} }}{\textsf {B} }\mathbf {x}\geqslant 0 \ \forall \mathbf {x}\in {\mathcal {X}}\) and \({\textsf {A} }-{\textsf {B} }\in {\mathcal {S}}^n_+\) imply \(\mathbf {x}^{{\textsf {T} }}{\textsf {A} }\mathbf {x}\geqslant 0 \ \forall \mathbf {x}\in {\mathcal {X}}\) for any set \({\mathcal {X}}\), we have shown that (28) holds as well. \(\square \)

7 Experimental evidence

In the following we provide results from extensive numerical experiments, where we tried to assess the quality of the lower bound from Theorem 19 for the QCQP (22). We will pursue two main types of experiments. First, we will compare the quality of our proposed lower bound with lower bounds obtained from typical relaxations of the (set-)copositive reformulations established in Burer (2009), Burer (2012), Eichfelder and Povh (2013). Second, for the case where only the bilinear term is present in the objective function we will compare our lower bound to the one derived in Zhen et al. (2019). The main difference between the two approaches is that we, by means of the strategy outlined in Sect. 3, are able to employ quadratic decision rules while (Zhen et al. 2019) employ affine policies. It should be noted that (Xu and Hanasusanto 2019) provided empirical evidence for the advantage of quadratic over affine decision rules. However, we believe that the disjoint bilinear optimization problem is an interesting special case due to its applications in game theory, that is significantly different from the problems considered in Xu and Hanasusanto (2019), so that particular consideration is warranted. All experiments were implemented using the YALMIP interface. The semidefinite optimization problems were solved using SDPT3, while linear problems were solved using Gurobi. For the purpose of generating feasible solutions, and thus, upper bounds, we employed fmincon as a global solver. The experiments were run on a system with Intel Core i7-4510U CPU and 8GB RAM.

7.1 Comparison with lower bounds from copositive programming

For the problem

$$\begin{aligned} \inf _{(\mathbf {x},\mathbf {y})\in {\mathcal {X}}\times {\mathcal {Y}}} \left\{ \mathbf {x}^{{\textsf {T} }}{\textsf {Q} }_x\mathbf {x}+ \mathbf {x}^{{\textsf {T} }}{\textsf {Q} }_{xy}\mathbf {y}+ \Vert {\textsf {G} }\mathbf {y}\Vert ^2 \right\} , \end{aligned}$$
(31)

when specifying \({\mathcal {X}}\,{:}{=}\, \left\{ \mathbf {x}\in {\mathcal {K}}\,{:}\, {\textsf {B} }\mathbf {x}= \mathbf {c} \right\} \subseteq \mathbb {R}^{n_1}\) for some convex cone \({\mathcal {K}}\subseteq \mathbb {R}^{n_1},\ {\textsf {B} }\in \mathbb {R}^{k_1\times n_1},\ \mathbf {c}\in \mathbb {R}^{k_1}\) and \({\mathcal {Y}}\,{:}{=}\, \left\{ \mathbf {y}\in \mathbb {R}^{n_2}_+\,{:}\, {\textsf {F} }\mathbf {y}= \mathbf {d} \right\} \) as in Theorem 18, we obtain a QCQP that is amenable to copositive reformulations as demonstrated in Burer (2009). We will only consider the case where \({\mathcal {X}}\) and \({\mathcal {Y}}\) are bounded so that by Lemma 12 we can always ad a redundant linear constraint \(\varvec{\alpha }^{{\textsf {T} }}(\mathbf {x}^{{\textsf {T} }},\mathbf {y}^{{\textsf {T} }})^{{\textsf {T} }}= 1\) with \(\varvec{\alpha }= (\varvec{\alpha }_1^{{\textsf {T} }},\varvec{\alpha }_2^{{\textsf {T} }})^{{\textsf {T} }}\in {\mathrm {int}}\left( {\mathcal {K}}^*\times \mathbb {R}^{n_2}_+\right) \) which enables us to apply the simplification step outlined in (Burer 2012, Sect. 2.3). The resulting completely positive reformulation involves a conic constraint that restricts the decision matrix to \(\mathcal {CPP}({\mathcal {K}}\times \mathbb {R}^{n_2}_+)\). This constraint is generally difficult (unless for example \(n_2=0\) and \({\mathcal {K}}\) is the second order cone). For the purpose of our experiments we will resort to approximations which are standard in literature. In case \({\mathcal {K}}= \mathbb {R}^{n_1}_+\) we will replace \(\mathcal {CPP}\left( \mathbb {R}^{n_1+n_2}\right) \) by \({\mathcal {D}}{\mathcal {N}}{\mathcal {N}}_{n_1+n_2} \,{:}{=}\, {\mathcal {S}}^{n_1+n_2}_+\cap {\mathcal {N}}_{n_1+n_2}\). In case \({\mathcal {K}}\) is the second order cone we will resort to

$$\begin{aligned} {\mathcal {O}}_{uter} \,{:}{=}\, \left\{ {\textsf {M} }\left( {\textsf {C} }_1, 2{\textsf {C} }_2, {\textsf {C} }_3 \right) \,{:}\, {\textsf {C} }_1 \in \mathcal {CPP}\left( {\mathcal {K}}\right) , {\mathrm {Columns}}\left( {\textsf {C} }_2\right) \in {\mathcal {K}}, {\textsf {C} }_3 \in {\mathcal {D}}{\mathcal {N}}{\mathcal {N}}_{n_2} \right\} , \end{aligned}$$

which is an approximation described in Xu and Burer (2018), where the dual of \({\mathcal {O}}_{uter} \) was used to approximate \(\mathcal {COP}\left( {\mathcal {K}}\times \mathbb {R}^{n}_+\right) \). Note that by the S-Lemma, \(\mathcal {CPP}\left( {\mathcal {K}}\right) = \left\{ {\textsf {C} }\in {\mathcal {S}}{^n_+} \,{:}\, {\textsf {C} }\bullet {\textsf {J} }\leqslant 0 \right\} \) where \({\textsf {J} }={\mathrm {Diag}}(-1,1,\dots ,1) \in {\mathcal {S}}_{n}\) and thus is a tractable matrix cone.

As for the lower bound in Theorem 19 the main assumption is fulfilled as for the implicit minimization problem \(\inf _{\mathbf {x}\in {\mathcal {X}}} \mathbf {x}^{{\textsf {T} }}{\textsf {Q} }\mathbf {x}+{\mathbf {q}}^{{\textsf {T} }}\mathbf {x}+\omega \) we can again invoke (Burer (2012)). In consequence the matrix cone \({\mathcal {C}}^*\) is given by \(\mathcal {COP}({\mathcal {K}})\) which is exactly characterized by \(\mathcal {COP}({\mathcal {K}}) = \left\{ {\textsf {C} }\,{:}\, {\textsf {C} }+\tau {\textsf {J} }\in {\mathcal {S}}^n_+,\ \tau \geqslant 0 \right\} \) in case \({\mathcal {K}}\) is the second order cone. For each instance type, characterized by the size of \(n_1\), \(n_2\), \(m_1\), \(m_2\) and the choice for \({\mathcal {K}}\in \{\mathbb {R}^{n_1}_+,{\mathcal {S}}{\mathcal {O}}{\mathcal {C}}\}\), which are indicated by the tuples \(({\mathcal {K}},n_1,n_2,m_1,m_2)\) in the first column, we randomly generated 100 instances. In Table 1 we give average values for the relative optimality gap (avg. gap), the maximum relative optimality gap (max. gap), the number of times a model gave the a better solution (# best) and their respective average computation time (avg. time). We see that our model can outperform the copositive bound in cases where the dimension of \({\mathcal {Y}}\) is significantly larger than that of \({\mathcal {X}}\) and the number of constraints in the former is not too small. In these cases computation time seems to be lower for our model. In some of those cases also the solution quality is slightly better. For both models the average optimality gap was almost vanishing, indicating that random generation is not a good way to create challenging instances. However, finding special structure that guarantees hardness is a major task in and of it self, and we do not pursue it in this paper.

Table 1 Comparison between the copositive and the adjustable robust lower bound

Remark

It should be noted that for \(\mathcal {CPP}\left( {\mathcal {K}}\times \mathbb {R}^n_2\right) ,\ {\mathcal {K}}\in \left\{ \mathbb {R}^{n_1}_+,{\mathcal {S}}{\mathcal {O}}{\mathcal {C}} \right\} \) we could have also employed approximation hierarchies such as the ones described in Zuluaga et al. (2006), where a sequence of cones is provided that approximate \(\mathcal {CPP}\left( {\mathcal {K}}\times \mathbb {R}^n_2\right) \) with increasing precision and where the computational cost of the respective bounds increases accordingly. One could argue that a more fair comparison between the two lower bounds would have been the one where we use those approximations for \(\mathcal {CPP}\left( {\mathcal {K}}\times \mathbb {R}^n_2\right) \) where computational effort is similar to our proposed bound. However, these hierarchies are notoriously hard to work with and their computational complexity quickly explodes with precision. It is questionable whether we would have succeeded in fairly balancing computational effort.

7.2 Comparison with linear policy for the bilinear case

As shown in Zhen et al. (2019) we have

$$\begin{aligned} \inf _{(\mathbf {x},\mathbf {y})\in {\mathcal {X}}\times {\mathcal {Y}}} \left\{ \mathbf {x}^{{\textsf {T} }}{\textsf {Q} }_{xy}\mathbf {y} \right\} = \sup _{\tau ,\mathbf {z}(\mathbf {x})}\left\{ \tau \,{:}\, \tau \leqslant \mathbf {d}^{{\textsf {T} }}\mathbf {z}(\mathbf {x}) , {\textsf {F} }^{{\textsf {T} }}\mathbf {z}(\mathbf {x})\leqslant {\textsf {Q} }_{xy}^{{\textsf {T} }}\mathbf {x}\ \ \forall \mathbf {x}\in {\mathcal {X}} \right\} \end{aligned}$$
(32)

where \(\mathbf {z}(\mathbf {x})\) is a function-valued decision variable and \({\mathcal {Y}}\) is defined as above. If further \({\mathcal {X}}\) is defined as in Sect. 7.1 and we employ an affine decision rule, i.e. \(\mathbf {z}(\mathbf {x}) = {\textsf {Z} }\mathbf {x}+\mathbf {z}\) we can reformulate the latter maximization problem via linear conic duality in a standard manner (see e.g. Ben-Tal et al. 2004) as to obtain a lower bound. In contrast, we, in Theorem 19 employ a quadratic decision rule for \(\mathbf {z}(\mathbf {x})\) (\(\mathbf {w}(\mathbf {x})\) does not appear if the convex quadratic term is dropped), which potentially tightens the bound. However, moving from a linear programming formulation or second order cone formulation to an SDP comes at a significant computational cost. In our experiment we aimed to quantify the benefit in order to see whether the trade off is justified. Table 2 summarizes the results of our experiments. We again present the average relative optimality gap (avg. gap) to an upper bound computed using fmincon, the maximum relative optimality gap (max. gap) to that upper bound, the number of times the our model outperformed the linear policy (# improved) and the average CPU-time (avg. time). All averages are taken across 100 replications per instance.

We see, that the linear policy is outperformed regularly, where the biggest advantages are achieved if \(m_2\) is small. This makes sense, since the number of coefficients of the linear and the quadratic policy respectively depends on \(m_2\) which in turn impacts the flexibility of these decision rules. Since the quadratic decision rule is inherently more flexible than the linear one, that impact is more pronounced for the latter. The computational cost for the benefit is however substantial, which is not surprising as SDPs are known to scale poorly unless special structure is exploited. Such special structure is not present in the generic framework under which we operate here. To explore cases where that would be the cases is an interesting topic for future research.

Table 2 Comparison between linear and quadratic decision rules

8 Conclusion

In this paper we have outlined a general strategy, by which robust optimization problems with quadratic uncertainty can be reformulated using convex reformulation results from quadratic optimization. These convex reformulations of QCQPs can be used as an alternative way to establish duality for quadratic optimization problems by providing a convex conic reformulation first and then invoking conic optimization duality. This is an alternative to the classical approach of establishing duality for QCQPs via the S-Lemma. We introduced a new result on these QCQP-reformulations and explored its connection to existing S-Lemma type results. We also explore a copositve perspective on the general strategy, which enables us to also investigate the effect of a failure of full strong duality, where we have zero duality gap but dual attainability is not guaranteed. We then introduce a new application of the general strategy, where we reformulate a special type of QCQP as an adjustable robust optimization problem, which after introduction of a quadratic policy becomes amenable to reformulations based on the general strategy. In numerical experiments we evaluated the merits of our model, finding that it may outperform existing approaches in some cases.