1 Introduction

The uncertain characteristics of a system’s performance often depend on its design decisions. This type of uncertainty is called endogenous uncertainty. For example in a newsvendor model product demand function may depend on its selling price [1]. Additional examples of decision problems with endogeneous uncertainty from finance, resource management, process design, and network design are given in Sect. 2. The goal of this paper is to present decision dependent ambiguity frameworks to model problems involving endogenous uncertainty. The main contribution is in showing that the dualization of a certain inner problem continues to be applicable in this more general setting. This dualization has a unique advantage for the problems under consideration. It allows application of algorithms from nonlinear global optimization to solve the resulting reformulations.

Specifically, we study the optimization problems in which the ambiguity set of distributions may depend on the decisions in the following modeling framework:

figure a

Here \(\varvec{x}\) is the vector of decision variables with the feasible set \(X\subseteq {\mathbb {R}}^n\), and \(\varvec{\xi }\) is the vector of uncertain model parameters, which is defined on a measurable space \(({\varXi },{\mathcal {F}})\); \({\varXi }\) is the support in \({\mathbb {R}}^d\), and \({\mathcal {F}}\) is a \(\sigma \)-algebra. We may allow \(\xi \) to also depend on \(\varvec{x}\), i.e., replace \(h(\varvec{x},\varvec{\xi })\) in (\(\hbox {D}^3\hbox {RO}\)) with \(h(\varvec{x},\varvec{\xi }(\varvec{x}))\) but we have not done so here for notation simplicity. For a given \({\varvec{x}}\), the ambiguity set \({\mathcal {P}}(\varvec{x})\) of the unknown probability distribution depends on the decision variables \({\varvec{x}}\), and \({\mathcal {P}}({\varvec{x}})\subseteq {\mathcal {P}}({\varXi },{\mathcal {F}})\), where \({\mathcal {P}}({\varXi },{\mathcal {F}})\) is the set of probability distributions defined on \(({\varXi },{\mathcal {F}})\). The function \(f(\varvec{x})\) is the deterministic part of the objective with no uncertain parameters. Keeping this function in (\(\hbox {D}^3\hbox {RO}\)) allows us to consider decision models involving two-stage decision making.

We denote the inner problem \(\mathop {\text {max}}\nolimits _{P\in {\mathcal {P}}(\varvec{x})}\; {\mathbb {E}}_P[h(\varvec{x},\varvec{\xi })]\) as (\(\hbox {D}^3\hbox {RO}\))-inner. Note that if \(h(\varvec{x},\xi )\) is a recourse function in a two-stage stochastic program, i.e.,

$$\begin{aligned} h(\varvec{x},\varvec{\xi })=\;\underset{\varvec{y}\in {\mathbb {R}}^q}{\text {min}}\;\; g(\varvec{x},\varvec{y},\varvec{\xi }),\quad \text {s.t.}\, \psi _i({\varvec{x}},\varvec{y},\varvec{\xi })\ge 0 \quad \forall i\in [m], \end{aligned}$$
(1)

where \(g(\varvec{x},\varvec{y},\varvec{\xi })\) and \(\psi _i({\varvec{x}},\varvec{y},\varvec{\xi })\) are bounded and continuous functions of \({\varvec{x}}\), \(\varvec{y}\) and \(\varvec{\xi }\), then (\(\hbox {D}^3\hbox {RO}\)) becomes a two-stage decision dependent distributionally robust stochastic program (TSD\(^3\)SP), which is an important application of (\(\hbox {D}^3\hbox {RO}\)). We assume that the minimization problem in (1) is feasible for any \({\varvec{x}}\in X\) and \(\varvec{\xi }\in {\varXi }\), and \(h({\varvec{x}},\varvec{\xi })\) is finite. In other words, we assume that (D\(^3\)SP) has complete recourse [2]. Moreover, we assume a certain Slater-type condition for the set \({\mathcal {P}}({\varvec{x}})\), as needed.

The ambiguity set \({\mathcal {P}}({\varvec{x}})\) can be constructed in many different ways. The reformulations given in this chapter are for the decision dependent generalizations of the most common types of ambiguity sets proposed in the distributionally robust optimization literature [3]. In Sect. 3 we investigate the reformulation of (\(\hbox {D}^3\hbox {RO}\)) for five different possible specifications of \({\mathcal {P}}(\varvec{x})\): (i) the ambiguity sets defined by using component-wise moment inequalities and bounds on the scenario probabilities [4, 5]; (ii) ambiguity sets defined by using the mean vector and covariance matrix inequalities [6]; (iii) ambiguity sets defined by using the Wasserstein metric [7,8,9,10]; (iv) ambiguity sets defined by using \(\phi \)-divergence [11,12,13,14,15,16]; and (v) the ambiguity sets defined by using the multi-variate Kolmogorov–Smirnov test [17]. The reformulations are given in Sects. 3.1 to 3.5, respectively. The choice of the specific ambiguity set when modeling a problem is context dependent. A choice depends on the data being represented by the set, and the needs of the modeler. Moments based sets might be natural when their estimates are available from a prior statistical analysis. However, Wasserstein-distance based sets have an underlying polyhedral structure that makes them more amenable to algorithm development within the linear programming or linear conic duality frameworks. Total variance based sets specify bounds on the probabilities of each scenario more naturally. Kolmogorov–Smirnov test is commonly used in hypothesis testing that compares probability distributions.

The basic concept used in arriving at the reformulations of (\(\hbox {D}^3\hbox {RO}\))-inner is to use linear programming duality or conic duality as needed in the specific settings. Lagrangian duality is used for the situations where considering the saddle point problem appears more suitable.

We note that the computational complexity of the reformulated problem is not the main motivation of this paper. Our goal is towards studying the modeling frameworks that more realistically represent the underlying phenomenon. Here we also do not focus on developing any new (possibly more efficient) algorithms, as that is left for future studies. In general, we refer to the global optimization techniques for solving the non-convex optimization problems resulting from our reformulations [18]. Moreover, to simplify the presentation, we consider the finite support case in the main text. Reformulations (without proof) are also given for the ambiguity sets (i)–(iii) allowing for continuous support. In these cases semi-infinite programming reformulation of the corresponding models are given. These semi-infinite reformulations allow the use of a cutting surface algorithm, given in Sect. 4.1, to solve the semi-infinite problems. The models having first-stage binary variables allow for the use of a reformulation technique that reformulates the product of a binary variable with a continuous variable (Lagrange multiplier). The central idea behind this technique is given in Sect. 4.2. A newsvendor example illustrating the reformulations and the relevance of decision dependent ambiguity is given in Sect. 5. The construction of the decision dependent parameters appearing in the specification of the ambiguity set is discussed briefly when making the concluding remarks.

2 Literature review on optimization with decision dependent uncertainty

The endogenous uncertainty has been considered in dynamic programming [19], stochastic programming [20], robust optimization [21, 22], with applications in financial market modeling [23,24,25], resource management [26], stochastic traffic assignment [27], oil (natural gas) exploration [28,29,30], and robust network design [31, 32].

In the framework of stochastic optimization, the endogenous uncertainty affects the underlying probability distribution and the scenario tree. Jonsbråten et al. [33] studied stochastic programming problems with decision dependent scenario distributions, where the distribution is indexed by a Boolean vector. They provided an implicit enumeration algorithm for solving these problems based on a branch-and-bound scheme. This model and the proposed branch-and-bound method was applied to an optimal selection and sequencing of oil well exploration problem under reservoir capacity uncertainty [28]. Ahmed [31] investigated a class of single stage stochastic programs with discrete candidate probability distributions that are based on Luce’s choice axiom [34]. The decision affects utility functions of the choices, and hence the probability distribution [35]. These types of problems arise from network design and server selection applications [31]. It is shown that stochastic programs of this class can be reformulated as 0-1 hyperbolic programs. Viswanath et al. [32] investigated a two-stage shortest path problem in a stochastic network which arises from disaster relief services. Here the first stage investment decisions can reduce the failure probability of links in the network, and a shortest path is identified based on the post-event network. Held et al. [36] developed a heuristic algorithm to solve a two-stage stochastic network interdiction problem, where the interdictor is the first-stage decision maker whose objective is to maximize the probability that the minimum path length exceeds a certain value after the interdiction. The interdictor’s decision changes the network topology and the uncertainty description. The structure of single-stage stochastic programming with a decision dependent probability distribution was also studied by [37]. Lee et al. [38] investigated a newsvendor model under decision dependent uncertainty, where sequential decisions are made after a re-estimation of the demand distribution. They provide conditions under which the estimation and decision process converges.

Grossmann et al. [20, 30] developed a disjunctive programming reformulation for multistage decision dependent stochastic programs. They investigated this problem with finitely many scenarios, where exogenous and endogenous uncertain parameters are involved. In their models, endogenous parameters are resolved after the operational decisions are made (e.g., a facility is installed or an investment is made). A branch-and-bound algorithm is developed to solve the disjunctive program by branching on the logic variables involved in the disjunctive clauses [20, 39], and a lower bound is obtained at each node by solving a Lagrangian dual sub-problem [40]. More solution strategies for the disjunctive program are given in [41]. This framework is applied to model and solve the offshore oil or gas field infrastructure multi-stage planning problem with uncertainty in estimating parameters that are not immediately realized [29]. The framework is also applied to optimize process network synthesis problems with yield uncertainty that can be reduced by investing in pilot plants [42]. Tarhan et al. [43] developed a computational strategy that combines global optimization and outer-approximation to solve multistage nonlinear mixed-integer programs with decision dependent uncertainty.

Decision dependent uncertainty is also considered in the framework of robust optimization by letting the uncertainty set depend on the decision variables. Spacey et al. [44] studied a problem of minimizing the run time of a computer program by assigning code segments to execution locations where the scheduling of code segment execution depends on the assignment. In robust combinatorial optimization, decision dependent uncertainty set is used to ensure the same relative protection level of all binary decision vectors [21]. To model a robust task scheduling problem with uncertainty in the processing time, Vujanic et al. [45] proposed a decision dependent uncertainty set as a Minkowski sum of some static sets such that the uncertain completion time interval of a task can naturally depend on the starting time of the task. Hu et al. [1] studied a newsvendor model where the product demand may depend on the selling price. Since the analytical relationship between the demand and the selling price is unknown, they construct a family of decreasing and convex functions from historical data as the functional ambiguity set of the true demand function, and solve the functionally robust optimization problem of this model using a univariate reformulation. Nohadani et al. [22] investigated robust linear programs with decision dependent budget-type uncertainty (RLP-DDU). They showed that this problem is NP-hard even in the case where the uncertainty set is a polyhedron and the decision dependence is affine. RLP-DDU can be reformulated as a mixed integer linear program (MILP), if the decision variables affect uncertain variables by controlling the upper bounds of the uncertain variables. This concept is demonstrated in a robust shortest-path problem, where the uncertainty is resolved progressively when approaching the destination.

3 Reformulation of (D\(^3\)RO)

We investigate the dual of the inner problem of (\(\hbox {D}^3\hbox {RO}\)) under the assumption that the probability distributions are specified over a finite support on \({\varXi }\). Reformulations for the case where \({\varXi }\) is continuous are also given for the first three cases. We make the following assumption to simplify the presentation of our key observations:

Assumption 1

Every \(P\in {\mathcal {P}}({\varvec{x}})\) has a decision independent finite support \({\varXi }:=\{ \varvec{\xi }^k \}^{N}_{k=1}\), \(\forall {\varvec{x}}\in X\), for a fixed N.

The finite support assumption is commonly used when developing computational algorithms for solving general two-stage stochastic optimization. The finite support framework arises either due to an application of sample average approximation approach when solving problem having parameters with continuous support, or in a data-driven setting when the data provides the nominal samples for the model. For a more general model, the set \({\varXi }\) can be a compact set in the Euclidean space. The reformulations given in Sects. 3.13.3 that have finitely many constraints due to Assumption 1 (specified by \(\varvec{\xi }^k\), \(k=1,\ldots ,N\)) become semi-infinite programs, where the constraints are parameterized over the continuous support \({\varXi }\). Dual formulations for finite support case provide a bridge to understanding dual formulations for the case where the distributions have continuous support. In the finite support case, because of LP duality, often we do not need Slater-type conditions. However, these conditions are required for the more general case. It is important to be able to see such distinctions when considering applications. Moreover, all reformulations adapt to the case where \(\varvec{\xi }^k\) is replaced with \(\varvec{\xi }^k({\varvec{x}})\).

An interpretation of Assumption 1 is as follows. In this case, the candidate probability distributions in \({\mathcal {P}}({\varvec{x}})\) are represented as a vector \(\varvec{w}\in {\mathbb {R}}^N\) such that \(w_i({\varvec{x}})\) is the probability assigned to scenario \(\varvec{\xi }^i\) (\(i\in [N]\)) and \(\Vert \varvec{w}({\varvec{x}}) \Vert _1=1\) for all \({\varvec{x}}\in X\). By judicious specification of \({\varvec{x}}\), the support of the distribution (turning a scenario on or off) may also be allowed to change with x by forcing certain scenarios to have zero probability in the specification of the set \({\mathcal {P}}({\varvec{x}})\). Therefore, this finite support framework allows for a great deal of generality.

3.1 Ambiguity sets defined by simple measure and moment inequalities

We consider the moment robust set defined as follows:

(2)

where \({\mathcal {M}}({\varXi },{\mathcal {F}})\) is the set of positive measures defined on \(({\varXi },{\mathcal {F}})\), \(\nu _1({\varvec{x}}),\nu _2({\varvec{x}})\in {\mathcal {M}}({\varXi },{\mathcal {F}})\) are two given measures for a fixed \({\varvec{x}}\) that are lower and upper bounds of candidate probability measures, and \(\varvec{f}:=[f_1(\varvec{\xi }),\dotsc ,f_m(\varvec{\xi })]\) is a vector of moment functions. The ordering “\(\preceq \)” is defined as follows: For any two measures (not necessarily probability measures) \({\mathcal {M}}_1\) and \({\mathcal {M}}_2\) defined on \(({\varXi },{\mathcal {F}})\), \({\mathcal {M}}_1\preceq {\mathcal {M}}_2\) is equivalent to \({\mathcal {M}}_1(S)\le {\mathcal {M}}_2(S)\;\;\forall S\in {\mathcal {F}}\). If Assumption 1 is satisfied, then \({\mathcal {M}}_1\preceq {\mathcal {M}}_2\) is equivalent to \({\mathcal {M}}_1(\varvec{\xi }^k)\le {\mathcal {M}}_2(\varvec{\xi }^k)\;\;\forall k\in [N]\). To ensure that P is a probability distribution, we set \(l_1({\varvec{x}})=u_1({\varvec{x}})=1\) and \(f_1(\varvec{\xi })=1\) in the above definition of \({\mathcal {P}}^{SM}({\varvec{x}})\). For any \(\varvec{\xi }\in {\varXi }\), let \(\varvec{\xi }:=[\xi _1,\ldots ,\xi _d]\). When standard moments are used, the ith (\(i\in [m]\)) entry of \(\varvec{f}\) has the form: \(f_i(\varvec{\xi }):=(\xi _1)^{k_{i1}}\cdot (\xi _2)^{k_{i2}} \cdots (\xi _d)^{k_{id}}\), where \(k_{ij}\) is a nonnegative integer indicating the power of \(\xi _j\) for the ith moment function. The framework also allows the use of generalized moments by choosing alternative base functions. Note that the constraint \(\int _{{\varXi }}f_1(\xi )P(d\xi )\in [l_1({\varvec{x}}),u_1({\varvec{x}})]\) in (2) is used to ensure that P is a probability distribution. The ambiguity set (2) is a generalization of the set in [46] for the decision dependent case. The following theorem gives a reformulation of (\(\hbox {D}^3\hbox {RO}\)) with moment robust ambiguity set \({\mathcal {P}}^{SM}(\varvec{x})\).

Theorem 3.1

Let Assumption 1 hold, and therefore the ambiguity set (2) be given by \({\mathcal {P}}^{SM}_0({\varvec{x}})=\{\varvec{p}\in {\mathbb {R}}^N: \;\varvec{l}({\varvec{x}})\le \sum ^N_{k=1}p_kf(\varvec{\xi }^k)\le \varvec{u}({\varvec{x}}),\;{\underline{p}}_k({\varvec{x}})\le p_k\le {\overline{p}}_k({\varvec{x}})\;\;\forall k\in [N]\}\), where \({\underline{p}}_k({\varvec{x}})\) and \({\overline{p}}_k({\varvec{x}})\) are given lower and upper bounds of each \(p_k\). If for any \({\varvec{x}}\in X\), \(h({\varvec{x}},\varvec{\xi }^k)\) is finite for any \(k\in [N]\) and the ambiguity set \({\mathcal {P}}^{SM}_0({\varvec{x}})\) is nonempty, then the (\(\hbox {D}^3\hbox {RO}\)) problem with the ambiguity set \({\mathcal {P}}^{SM}_0({\varvec{x}})\) can be reformulated as the following nonlinear program:

$$\begin{aligned} \begin{aligned} \underset{\varvec{x},\varvec{\alpha },\varvec{\beta },\varvec{\gamma },\varvec{\mu }}{min }&\quad f(\varvec{x}) + \varvec{\alpha }^T\varvec{l}({\varvec{x}}) + \varvec{\beta }^T\varvec{u}({\varvec{x}}) +\varvec{\gamma }^T\underline{\varvec{p}}({\varvec{x}}) + \varvec{\mu }^T\overline{\varvec{p}}({\varvec{x}}) \\ s.t.&\quad (\varvec{\alpha }+\varvec{\beta })^T \varvec{f}(\varvec{\xi }^k) + \gamma _k + \mu _k \ge h(\varvec{x},\varvec{\xi }^k) \quad \forall k\in [N], \\&\quad \varvec{x}\in X,\; \varvec{\alpha },\varvec{\beta }\in {\mathbb {R}}^{m},\; \; \varvec{\gamma }, \varvec{\mu }\in {\mathbb {R}}^N,\; \varvec{\alpha }\le \varvec{0},\; \varvec{\beta }\ge \varvec{0},\; \varvec{\gamma }\le 0, \; \varvec{\mu }\ge 0. \end{aligned} \end{aligned}$$
(3)

Proof

Under Assumption 1, the inner problem of (\(\hbox {D}^3\hbox {RO}\)) becomes the following linear program:

$$\begin{aligned}&\underset{\varvec{p}\in {\mathbb {R}}^{N}}{\text {max}}\;\; \sum ^{N}_{k=1} p_k h(\varvec{x},\varvec{\xi }^k) \nonumber \\&\text {s.t.}\,\quad \varvec{l}(\varvec{x})\le \sum ^{N}_{k=1} p_k\varvec{f}(\varvec{\xi }^k) \le \varvec{u}(\varvec{x}), \nonumber \\&\qquad \quad {\underline{p}}_k({\varvec{x}}) \le p_k\le {\overline{p}}_k({\varvec{x}}) \;\;\forall k\in [N]. \end{aligned}$$
(4)

Based on the hypothesis of the theorem, the above linear program is feasible for any \({\varvec{x}}\in X\), and its objective value is bounded. We take the dual of (4) using strong duality and combining the dual problem with the outer problem to get the desired reformulation. \(\square \)

A reformulation for the two-stage case is given in the following corollary.

Corollary 3.1

If \(h(\cdot ,\cdot )\) is a recourse function defined in (1), and the ambiguity set \({\mathcal {P}}^{SM}_0({\varvec{x}})\) is non-empty for any \({\varvec{x}}\in X\), then the (\(\hbox {D}^3\hbox {RO}\)) problem with the ambiguity set \({\mathcal {P}}^{SM}_0(\varvec{x})\) can be formulated as follows:

$$\begin{aligned} \begin{aligned} \underset{\varvec{x},\varvec{y},\varvec{\alpha },\varvec{\beta },\varvec{\gamma },\varvec{\mu }}{min }&\quad f(\varvec{x}) + \varvec{\alpha }^T\varvec{l}({\varvec{x}}) + \varvec{\beta }^T\varvec{u}({\varvec{x}}) +\varvec{\gamma }^T\underline{\varvec{p}}({\varvec{x}}) + \varvec{\mu }^T\overline{\varvec{p}}({\varvec{x}}) \\ s.t.&\quad (\varvec{\alpha }+\varvec{\beta })^T \varvec{f}(\varvec{\xi }^k) + \gamma _k + \mu _k \ge g(\varvec{x},\varvec{y}^k,\varvec{\xi }^k) \quad \forall k\in [N] \\&\quad \psi _i(\varvec{x},\varvec{y}^k,\varvec{\xi }^k)\ge 0 \quad \forall i\in [m],\;\forall k\in [N], \\&\quad \varvec{x}\in X,\; \varvec{\alpha },\varvec{\beta }\in {\mathbb {R}}^{m},\; \; \varvec{\gamma }, \varvec{\mu }\in {\mathbb {R}}^N,\; \varvec{\alpha }\le \varvec{0},\; \varvec{\beta }\ge \varvec{0},\; \varvec{\gamma }\le 0, \; \varvec{\mu }\ge 0. \end{aligned} \end{aligned}$$
(5)

Proof

Recall that the complete recourse assumption ensures that \(h({\varvec{x}},\varvec{\xi }^k)\) is finite of any given \({\varvec{x}}\). One can take dual of the inner problem and use strong duality as in the proof of Theorem 3.1. \(\square \)

Theorem 3.3, which is an analogue of Theorem 3.1 for the continuous case, can be proved using conic duality theory in functional spaces.

Theorem 3.2

(conic duality in functional spaces [47]) Let X and Y be linear spaces, \(C\subset X\) and \(K\subset Y\) be convex cones, \(b\in Y\) and \(A:\;X\rightarrow Y\) be a linear mapping. The spaces X and Y are paired with linear spaces (dual spaces) \(X^{\prime }\) and \(Y^{\prime }\), respectively, in the sense that bilinear forms \(\langle \cdot ,\cdot \rangle :\;X^{\prime }\times X\rightarrow {\mathbb {R}}\) and \(\langle \cdot ,\cdot \rangle :\;Y^{\prime }\times Y\rightarrow {\mathbb {R}}\) are defined. Consider the following pair of conic linear optimization problem and its dual:

$$\begin{aligned}&\underset{x\in C}{inf }\; \langle c, x\rangle \quad s.t. \, Ax+b\in K, \end{aligned}$$
(P)
$$\begin{aligned}&\underset{y\in -K^*}{sup } \langle y, b \rangle \quad s.t. \, A^*y+c\in C^*, \end{aligned}$$
(D)

where \(C^*\) is the polar (positive dual) cone of C, \(K^*\) is the polar cone of K, and \(A^*\) is the adjoint mapping of A. Suppose that X and Y are Banach spaces, the cones C and K are closed and \(\langle c,\cdot \rangle \) and \(A:\;X\rightarrow Y\) are continuous and \(b\in ri [A(C)-K]\). Then \(val (P)=val (D)\), and an optimal solution of (D) exists.

Theorem 3.3

Let \({\varXi }\) be a closed and bounded set in the Euclidean space. Let the probability measure P and the positive measures \(\nu _1(x),\nu _2(x)\) be defined on the measurable space \(({\varXi },{\mathcal {F}})\), where the \(\sigma \)-algebra \({\mathcal {F}}\) contains all singleton subsets, i.e., \(\{\xi \}\in {\mathcal {F}}\) for all \(\xi \in {\varXi }\). If for any \(x\in {\varXi }\), the inner problem of (\(\hbox {D}^3\hbox {RO}\)) with the ambiguity set \({\mathcal {P}}^{SM}(x)\) has a non-empty relative interior, then the inner problem has the following dual reformulation:

$$\begin{aligned} \begin{aligned} \underset{\alpha ,\beta ,\gamma ,\mu }{min }&\alpha ^{\top }l(x) + \beta ^{\top }u(x) +\int _{{\varXi }}\gamma ^{\top }(\xi )\nu _1(x,\xi )d\xi + \int _{{\varXi }}\mu ^{\top }(\xi )\nu _2(x,\xi )d\xi \\ s.t.&\quad (\alpha +\beta )^{\top } f(\xi ) + \gamma (\xi ) + \mu (\xi ) = h(x,\xi ) \quad \forall \xi \in {\varXi }, \\&\quad x\in X,\; \alpha ,\beta \in {\mathbb {R}}^{m},\; \; \gamma (\xi ), \mu (\xi )\in {\mathbb {R}}\;\forall \xi \in {\varXi },\; \alpha \le 0,\; \beta \ge 0,\; \\&\quad \gamma (\xi )\le 0, \; \mu (\xi )\ge 0. \end{aligned} \end{aligned}$$

Furthermore, the strong duality holds and a dual optimal solution exists.

Proof

For notational convenience, we omit the x variable in the proof. Let \(V^+({\varXi },{\mathcal {F}})\) be the cone of of positive measures on \(({\varXi },{\mathcal {F}})\). The inner problem of the original model is a conic linear program in a functional space, which we rewrite as follows:

$$\begin{aligned} \begin{aligned}&\underset{P}{\text {sup}}\;\;{\mathbb {E}}_P[h(\xi )] \\&\text {s.t.}\,\nu _1\preceq P \preceq \nu _2, \\&\qquad l_i\le \int _{{\varXi }} f_i(\xi ) P(d\xi ) \le u_i, \quad i\in [m], \end{aligned} \end{aligned}$$
(6)

where \(P\in V^+({\varXi },{\mathcal {F}})\). We write a dual of (6) as follows:

$$\begin{aligned}&\underset{\alpha ,\beta ,\gamma ,\mu }{\text {inf}}\;\; \alpha ^{\top }l+\beta ^{\top }u+\int _{{\varXi }}\gamma (\xi )d\nu _1(\xi ) +\int _{{\varXi }}\mu (\xi )d\nu _2(\xi ) \nonumber \\&\text {s.t.}\, (\alpha +\beta )^{\top }f(\xi )+\gamma (\xi )+\mu (\xi )= h(\xi )\qquad \forall \xi \in {\varXi }, \nonumber \\&\qquad \alpha ,\beta \in {\mathbb {R}}^m,\;\alpha \le 0,\;\beta \ge 0,\;\gamma (\xi )\le 0,\;\mu (\xi )\ge 0 \;\forall \xi \in {\varXi }. \end{aligned}$$
(7)

We now verify that the conditions in Theorem 3.2 hold for (6) and (7). The primal decision variable P is an element in the the following Banach space of finite signed measures over \(({\varXi },{\mathcal {F}})\):

The Banach space \({\mathcal {M}}({\varXi },{\mathcal {F}})\) is equipped with the norm \(\Vert \mu \Vert =|\mu |(X)\), where \(\mu =\mu ^+-\mu ^-\), \(|\mu |=\mu ^++\mu ^-\). The \(\mu ^+\), \(\mu ^-\) are the positive and negative components of \(\mu \). The dual decision variable \([\alpha ,\beta ,\gamma ,\mu ]\) is an element in the Banach space \({\mathbb {R}}^m\times {\mathbb {R}}^m\times L\times L\), where L is a Banach space defined as

The primal problem can be rewritten as the following linear conic optimization problem:

$$\begin{aligned} \begin{aligned}&\underset{P\in {\mathcal {M}}({\varXi },{\mathcal {F}})}{\text {sup}}\;\langle h, P \rangle \\&\text {s.t.}\, \langle f_i, P \rangle -l_i \in {\mathbb {R}}_+ \quad i\in [m], \qquad -\langle f_i, P \rangle +u_i \in {\mathbb {R}}_+ \quad i\in [m], \\&\qquad P-\nu _1 \in {\mathcal {M}}^+({\varXi },{\mathcal {F}}), \qquad -P+\nu _2\in {\mathcal {M}}^+({\varXi },{\mathcal {F}}), \qquad P\in {\mathcal {M}}^+({\varXi },{\mathcal {F}}), \end{aligned} \end{aligned}$$
(8)

where \(\langle \psi , P\rangle =\int _{{\varXi }} \psi dP\). The linear mapping \(A:\;{\mathcal {M}}({\varXi },{\mathcal {F}})\rightarrow {\mathbb {R}}^m\times {\mathbb {R}}^m \times {\mathcal {M}}({\varXi },{\mathcal {F}})\times {\mathcal {M}}({\varXi },{\mathcal {F}})\) is given as:

$$\begin{aligned} A\circ P:=[\langle f, P\rangle ,\;-\langle f, P\rangle ,\;P,\;-P]. \end{aligned}$$

The dual problem can be rewritten as the following linear conic optimization problem:

$$\begin{aligned} \begin{aligned}&\underset{\begin{array}{c} [\alpha ,\beta ,\gamma ,\mu ]\in \\ {\mathbb {R}}^m\times {\mathbb {R}}^m\times L\times L \end{array}}{\text {inf}}\;\langle [l,u,\nu _1,\nu _2], [\alpha ,\beta ,\gamma ,\mu ] \rangle \\&\text {s.t.}\, (\alpha +\beta )^{\top }f(\xi )+\gamma (\xi )+\mu (\xi )-h(\xi )\in \{0\} \quad \forall \xi \in {\varXi }, \\&\qquad [\alpha ,\beta ,\gamma ,\mu ]\in {\mathbb {R}}^{m}_-\times {\mathbb {R}}^{m}_+\times L^-\times L^+, \end{aligned} \end{aligned}$$
(9)

where and .

By definition, it is clear that the cones \({\mathcal {M}}^+({\varXi },{\mathcal {F}})\) and \({\mathbb {R}}^m_+\times {\mathbb {R}}^m_+ \times {\mathcal {M}}^+({\varXi },{\mathcal {F}})\times {\mathcal {M}}^+({\varXi },{\mathcal {F}})\) are closed. To show that the linear functional \(\langle h, \cdot \rangle \) and the linear mapping A is continuous, it suffices to show that if \(P_n\rightarrow 0\), we have \(\langle h, P_n\rangle \rightarrow 0\) and \(A\circ P_n\rightarrow 0\) as \(n\rightarrow \infty \). Indeed, if \(P_n\rightarrow 0\), we have

$$\begin{aligned} |\langle h, P_n\rangle |= & {} \left|\int _{{\varXi }}h(\xi )dP_n(\xi ) \right|\le \text {max}_{\xi \in {\varXi }}h(\xi )\cdot P_n({\varXi }) \rightarrow 0, \text {and similarly,}\, |\langle f_i, P_n \rangle | \rightarrow 0. \end{aligned}$$

Therefore, \(\langle h, P_n\rangle \rightarrow 0\) and \(A\circ P_n \rightarrow 0\) as \(n\rightarrow \infty \). Since the primal problem has a non-empty relative interior, by Theorem 3.2, the strong duality holds and the dual problem has an optimal solution. \(\square \)

3.2 Covariance matrix based ambiguity set

We now consider a distributional ambiguity set with multi-variate bounds defined as follows:

(10)

This set is a generalization of the set used in [6] for the decision dependent case. In the finite support case this set is written as follows:

(11)

Note that in a special case of (10), \(\varvec{\mu }({\varvec{x}})\) and \(\varvec{Q}({\varvec{x}})\) may not depend on \({\varvec{x}}\) and only \(\alpha ({\varvec{x}})\), \(\beta ({\varvec{x}})\) depend on the decision variables. In this case, the decision only affects the size of the ambiguity set. Note that in \({\mathcal {P}}^{DY}_0({\varvec{x}})\) we can also allow the probabilities \(p_i\) to depend on \({\varvec{x}}\) as in \({\mathcal {P}}^{DY}({\varvec{x}})\). This possible generalization is ignored here for simplicity. The following theorem gives a reformulation of (\(\hbox {D}^3\hbox {RO}\)) with the ambiguity set \({\mathcal {P}}^{DY}_0(\varvec{x})\). The following theorem is based on Lemma 1 in [6] for the finite support and decision dependent case.

Theorem 3.4

Let Assumption 1 hold. Suppose that Slater’s constraint qualification conditions are satisfied, i.e., for any \(\varvec{x}\in X\), there exist a vector \(\varvec{p}^{\prime }\) in the interior of \({\mathcal {P}}^{DY}({\varvec{x}})\). Then the (\(\hbox {D}^3\hbox {RO}\)) problem with the ambiguity set \({\mathcal {P}}^{MB}({\varvec{x}})\) can be reformulated as:

$$\begin{aligned} \begin{aligned} \underset{{\varvec{x}},s,\varvec{u},\varvec{z},\varvec{Y}}{min }&\quad f(\varvec{x}) + s + [\sqrt{\alpha ({\varvec{x}})}, -(\varvec{Q}({\varvec{x}})^{-1/2}\varvec{\mu }(\varvec{x}))^T]\varvec{z} + \beta ({\varvec{x}})\varvec{Q}({\varvec{x}})\bullet \varvec{Y} \\ s.t.&\quad s - (\varvec{\xi }^{i})^T\varvec{Q}({\varvec{x}})^{-1/2}\varvec{z}_1 + (\varvec{\xi }^i-\varvec{\mu }({\varvec{x}}))(\varvec{\xi }^i-\varvec{\mu }({\varvec{x}}))^T\bullet \varvec{Y} \ge h({\varvec{x}},\varvec{\xi }^i) \quad \forall \; i\in [N], \\&\quad \varvec{x}\in X,\; \varvec{z}:=[z_0,\varvec{z}_1]\in {\mathcal {N}}_{SOC },\; \varvec{Y}\in {\mathbb {R}}^{N\times N}, \; \varvec{Y} \succeq 0, \end{aligned}\nonumber \\ \end{aligned}$$
(12)

where \({\mathcal {N}}_{SOC }\) is a second order cone defined as , \(\varvec{z}:=[z_0,\;\varvec{z}_1]\) with \(z_0\in {\mathbb {R}}\) and \(\varvec{z}_1\in {\mathbb {R}}^d\), and \(A\bullet B=Tr(A^TB)\) for matrices A and B.

Proof

With the ambiguity set \({\mathcal {P}}^{DY}_0\) the inner problem of (\(\hbox {D}^3\hbox {RO}\)) is given as follows:

$$\begin{aligned} \underset{\varvec{p},\tau }{\text {max}}&\quad \sum ^{N}_{i=1} p_i h({\varvec{x}},\varvec{\xi }^i)&\nonumber \\ \text {s.t.}&\quad \sum ^{N}_{i=1} p_i = 1,&:\; s \in {\mathbb {R}}, \nonumber \\&\quad \varvec{\tau } = \sum ^{N}_{i=1} p_i \varvec{\xi }^i,&:\; \varvec{u}\in {\mathbb {R}}^d, \nonumber \\&\quad (\varvec{\tau }-\varvec{\mu }({\varvec{x}}))^T\varvec{Q}({\varvec{x}})^{-1}(\varvec{\tau }-\varvec{\mu }({\varvec{x}})) \le \alpha ({\varvec{x}}),&:\; \varvec{z}\in {\mathcal {N}}_{\text {SOC}}, \nonumber \\&\quad \sum ^{N}_{i=1} p_i(\varvec{\xi }^i-\varvec{\mu }({\varvec{x}}))(\varvec{\xi }^i-\varvec{\mu }({\varvec{x}}))^T \preceq \beta ({\varvec{x}})\varvec{Q}({\varvec{x}}),&:\; \varvec{Y}\succeq 0, \nonumber \\&\quad \varvec{p}:=[p_1,\ldots ,p_{N}]\in {\mathbb {R}}^{N}_+,&\forall i\in [N]. \end{aligned}$$
(13)

Under the Slater’s condition using the strong duality for semi-definite programming, we reformulate the dual problem as:

$$\begin{aligned} \begin{aligned} \underset{s,\varvec{u},\varvec{z},\varvec{Y}}{\text {min}}&\quad s + \beta ({\varvec{x}})\varvec{Q}({\varvec{x}})\bullet \varvec{Y} + \big [\sqrt{\alpha ({\varvec{x}})}, -\left( \varvec{Q}({\varvec{x}})^{-1/2}\varvec{\mu }({\varvec{x}})\right) ^T \big ]\varvec{z} \\ \text {s.t.}&\quad s + \varvec{u}^T\varvec{\xi }^i +(\varvec{\xi }^i-\varvec{\mu }({\varvec{x}}))(\varvec{\xi }^i-\varvec{\mu }({\varvec{x}}))^T\bullet \varvec{Y} \ge h({\varvec{x}},\varvec{\xi }^i) \qquad \forall i\in [N], \\&\quad \varvec{u} + \varvec{Q}({\varvec{x}})^{-1/2}\varvec{z}_1 = 0, \\&\quad \varvec{z}\in {\mathcal {N}}_{\text {SOC}},\; \varvec{Y}\in {\mathbb {R}}^{d\times d}, \; \varvec{Y}\succeq 0. \end{aligned}\quad \end{aligned}$$
(14)

Substituting (14) in (\(\hbox {D}^3\hbox {RO}\)), we obtain (12). \(\square \)

Corollary 3.2

If \(h(\cdot ,\cdot )\) is a recourse function defined in (1) and the ambiguity set \({\mathcal {P}}^{DY}_0({\varvec{x}})\) is non-empty for any \({\varvec{x}}\in X\), then the (\(\hbox {D}^3\hbox {RO}\)) problem with the ambiguity set \({\mathcal {P}}^{DY}_0({\varvec{x}})\) can be reformulated as follows:

$$\begin{aligned} \begin{aligned} \underset{{\varvec{x}},s, \varvec{y}, \varvec{u},\varvec{z},\varvec{Y}}{min }&\quad f(\varvec{x}) + s + [\sqrt{\alpha ({\varvec{x}})}, -(\varvec{Q}({\varvec{x}})^{-1/2}\varvec{\mu }(\varvec{x}))^T]\varvec{z} + \beta ({\varvec{x}})\varvec{Q}({\varvec{x}})\bullet \varvec{Y} \\ s.t.&\quad s - (\varvec{\xi }^{k})^T\varvec{Q}({\varvec{x}})^{-1/2}\varvec{z}_1 + (\varvec{\xi }^k-\varvec{\mu }({\varvec{x}}))(\varvec{\xi }^k-\varvec{\mu }({\varvec{x}}))^T\bullet \varvec{Y} \ge g({\varvec{x}},\varvec{y}^k,\varvec{\xi }^k) \quad \\&\qquad \forall \; k\in [N], \\&\quad \psi _i(\varvec{x},\varvec{y}^k,\varvec{\xi }^k)\ge 0 \quad \forall i\in [m],\;\forall k\in [N], \\&\quad \varvec{x}\in X,\; \varvec{z}:=[z_0,\varvec{z}_1]\in {\mathcal {N}}_{SOC },\; \varvec{Y} \succeq 0. \end{aligned}\nonumber \\ \end{aligned}$$
(15)

The following theorem, which is an analogue of Theorem 3.4, can be proved using conic duality.

Theorem 3.5

Let \({\varXi }\) be a closed and bounded set in the Euclidean space. Suppose the \(\sigma \)-algbra \(\mathcal {F}\) contains all singleton sets, i.e., \(\{\xi \}\in \mathcal {F}\) for all \(\xi \in {\varXi }\). Suppose the following Slater-type conditions are satisfied: For all \({\varvec{x}}\in X\), there exists a probability measure \(P\in {\mathcal {P}}({\varXi },{\mathcal {F}})\) satisfying \(({\mathbb {E}}_P[\varvec{\xi }] - \varvec{\mu }({\varvec{x}}))^T\varvec{Q}({\varvec{x}})^{-1}({\mathbb {E}}_P[\varvec{\xi }] - \varvec{\mu }({\varvec{x}})) < \alpha ({\varvec{x}})\) and \({\mathbb {E}}_P[(\varvec{\xi }-\varvec{\mu }({\varvec{x}}))(\xi -\varvec{\mu }({\varvec{x}}))^T]\prec \beta ({\varvec{x}}) \varvec{Q}({\varvec{x}})\). Then problem (\(\hbox {D}^3\hbox {RO}\)) with the ambiguity set \({\mathcal {P}}^{DY}({\varvec{x}})\) can be reformulated as

$$\begin{aligned} \begin{aligned} \underset{{\varvec{x}},s,\varvec{u},\varvec{z},\varvec{Y}}{\text {min}}&\quad f(\varvec{x}) + s + [\sqrt{\alpha ({\varvec{x}})}, -(\varvec{Q}({\varvec{x}})^{-1/2}\varvec{\mu }(\varvec{x}))^T]\varvec{z} + \beta ({\varvec{x}})\varvec{Q}({\varvec{x}})\bullet \varvec{Y} \\ \text {s.t.}&\quad s - \varvec{\xi }^T\varvec{Q}({\varvec{x}})^{-1/2}\varvec{z}_1 + (\varvec{\xi }-\varvec{\mu }({\varvec{x}}))(\varvec{\xi }-\varvec{\mu }({\varvec{x}}))^T\bullet \varvec{Y} \ge h({\varvec{x}},\varvec{\xi }) \quad \forall \; \varvec{\xi }\in {\varXi }, \\&\quad \varvec{x}\in X,\; \varvec{z}:=[z_0,\varvec{z}_1]\in {\mathcal {N}}_{SOC },\; \varvec{Y}\in {\mathbb {R}}^{d\times d}, \; \varvec{Y} \succeq 0. \end{aligned} \end{aligned}$$

Proof

For the notational convenience, we omit the \({\varvec{x}}\) argument in all functions and parameters in the proof. Let \(\mathcal {M}^+({\varXi },\mathcal {F})\) be the cone of positive measures defined on the measurable space \(({\varXi },\mathcal {F})\). The inner problem of (\(\hbox {D}^3\hbox {RO}\)) under the uncertainty set \({\mathcal {P}}^{DY}\) is written as:

$$\begin{aligned} \begin{aligned}&\text {max}\;\;{\mathbb {E}}_P[h(\xi )] \\&\text {s.t.}\,({\mathbb {E}}_P[\xi ]-\mu )^{\top }Q^{-1}({\mathbb {E}}_P[\xi ]-\mu )\le \alpha , \\&\qquad {\mathbb {E}}_P[(\xi -\mu )(\xi -\mu )^{\top }]\preceq \beta Q, \\&\qquad {\mathbb {E}}_P[1]=1,\; P\in \mathcal {M}^+({\varXi },\mathcal {F}). \end{aligned} \end{aligned}$$
(16)

Define the linear mapping \(\mathcal {A}_1:\;\mathcal {M}^+({\varXi },\mathcal {F})\longmapsto {\mathbb {R}}\) as \(\mathcal {A}_1P:=\int _{{\varXi }}1dP(\xi )\), define the linear mapping \(\mathcal {A}_2:\;\mathcal {M}^+({\varXi },\mathcal {F})\longmapsto {\mathbb {R}}^{m+1}\) as \(\mathcal {A}_2P:=[0,Q^{-1/2}\int _{{\varXi }}\xi dP(\xi )]\), and define the linear mapping \(\mathcal {A}_3:\;\mathcal {M}^+({\varXi },\mathcal {F})\longmapsto {\mathbb {R}}^{m\times m}\) as \(\mathcal {A}_3P:=-\int _{{\varXi }}(\xi -\mu )(\xi -\mu )^{\top }dP(\xi )\). Define the linear functional \(\langle h,\cdot \rangle :\;\mathcal {M}^+({\varXi },\mathcal {F})\longmapsto {\mathbb {R}}\) as \(\langle h,P\rangle :=\int _{{\varXi }}h(\xi )dP(\xi )\). Let \(b = [\sqrt{\alpha },\;0]\) be a constant vector in \({\mathbb {R}}^{m+1}\). Then (16) can be reformulated as the following conic linear program:

$$\begin{aligned} \begin{aligned}&\text {max}\;\;\langle h, P\rangle \\&\text {s.t.}\, \mathcal {A}_1P-1=0, \\&\qquad \mathcal {A}_2P+b \in \mathcal {K}_{SOC}, \\&\qquad \mathcal {A}_3P+\beta Q\in \mathcal {K}_{SD}, \\&\qquad P\in \mathcal {M}^+({\varXi },\mathcal {F}), \end{aligned} \end{aligned}$$
(17)

where \(\mathcal {K}_{SOC}\) and \(\mathcal {K}_{SD}\) are the second-order cone and the positive semi-definite cone, respectively. Applying the duality theory [47] of the conic linear program in the functional space, we can obtain the dual of (16) as given in the statement of the theorem. The slater-type condition is equivalent to that there exists a probability measure \(P^{\prime }\) such that \(\mathcal {A}_2P^{\prime }+b\in \text {int}(\mathcal {K}_{SOC})\), and \(\mathcal {A}_3P^{\prime }+\beta Q\in \text {int}(\mathcal {K}_{SD})\). Based on (23) in [47], the statement after (23), and Proposition 2.9 of [47], the strong duality holds. \(\square \)

3.3 Ambiguity sets defined by Wasserstein metric

Instead of using moment based definitions of the ambiguity set, we may define this set using a statistical distance, such as the Wasserstein metric. We now study the (\(\hbox {D}^3\hbox {RO}\)) problem with a decision dependent ambiguity set defined using the \(L_1\)-Wasserstein metric as follows:

(18)

where \(P_0\) is a nominal probability distribution, and \({\mathcal {W}}(\cdot ,\cdot ):\;{\mathcal {P}}({\varXi },{\mathcal {F}})\times {\mathcal {P}}({\varXi },{\mathcal {F}})\rightarrow {\mathbb {R}}\) is the \(L_1\)-Wasserstein metric defined in [48]:

$$\begin{aligned} {\mathcal {W}}(P_1, P_2):= \underset{K\in {\mathcal {S}}(P_1,P_2)}{\text {inf}} \int _{{\varXi }\times {\varXi }} \Vert \varvec{s}_1 - \varvec{s}_2 \Vert K(d\varvec{s}_1\times d \varvec{s}_2), \end{aligned}$$
(19)

where \({\mathcal {S}}(P_1,P_2):=\big \{ K\in {\mathcal {P}}({\varXi }\times {\varXi },{\mathcal {F}}\times {\mathcal {F}}):\; K(A\times {\varXi }) = P_1(A),\; K({\varXi }\times A) = P_2(A),\; \forall A\in {\mathcal {F}} \big \}\) is the set of all joint probability distributions whose marginals are \(P_1\) and \(P_2\), and \(\Vert \cdot \Vert \) is an arbitrary norm defined on \({\mathbb {R}}^d\). The ambiguity set (18) is a generalization of the one considered in [7,8,9,10] for the decision dependent case. As a special case of (18), under Assumption 1, \({\mathcal {P}}^W({\varvec{x}})\) is written as:

$$\begin{aligned} {\mathcal {P}}^W({\varvec{x}})=\left\{ \varvec{p}\in {\mathbb {R}}^N\;|\; {\mathcal {W}}(\varvec{p},\hat{\varvec{p}})\le r({\varvec{x}}),\; \sum ^N_{i=1}p_i=1,\; p_i\ge 0,\; \forall i\in [N] \right\} , \end{aligned}$$
(20)

where \(\hat{\varvec{p}}\) is a given empirical probability distribution on \({\varXi }\), and the Wasserstein metric can be simplified as

$$\begin{aligned}&{\mathcal {W}}(\varvec{p},\hat{\varvec{p}})=\left\{ \underset{\varvec{w}}{\text {min}}\; \sum ^N_{i=1}\sum ^N_{j=1}\Vert \varvec{\xi }^i-\varvec{\xi }^j\Vert w_{ij}\;\; \big \vert \; \sum ^N_{j=1}w_{ij}=p_i \; \forall i\in [N], \right. \\&\left. \quad \sum ^N_{i=1}w_{ij}={\hat{p}}_j\; \forall j\in [N],\; w_{ij}\ge 0\;\forall i,j\in [N] \right\} , \end{aligned}$$

where \({\hat{p}}_j\) is the probability of scenario \(\varvec{\xi }^j\). The following theorem gives a reformulation of (\(\hbox {D}^3\hbox {RO}\)) for the ambiguity set (20).

Theorem 3.6

Let Assumption 1 hold. If for any \({\varvec{x}}\in X\), \(h({\varvec{x}},\varvec{\xi }^i)\) is finite for any \(i\in [N]\) and the ambiguity set (20) is nonempty, then (\(\hbox {D}^3\hbox {RO}\)) problem with the ambiguity set (20) can be reformulated as:

$$\begin{aligned} \begin{aligned}&\underset{\varvec{x},\varvec{\alpha },\varvec{\beta },\varvec{\mu },\varvec{\lambda },\gamma ,\eta }{min }\;\; f(\varvec{x})-\sum ^N_{i=1}{\hat{p}}_i\beta _i - r(\varvec{x})\gamma + \eta \\&\quad s.t. \, -\alpha _i-\mu _i + \eta \ge h(\varvec{x},\varvec{\xi }^i) \qquad \forall i\in [N], \\&\qquad \quad -\alpha _i+\beta _j + \Vert \varvec{\xi }^i - \varvec{\xi }^j\Vert \gamma + \lambda _{ij} \le 0 \qquad \forall i\in [N],\;\forall j\in [N], \\&\qquad \quad \varvec{x}\in X, \;\; \alpha _i\in {\mathbb {R}},\;\beta _i\in {\mathbb {R}},\;\mu _i\ge 0,\;\lambda _{ij}\ge 0,\;\gamma \le 0,\;\eta \in {\mathbb {R}}\quad \\&\quad \forall i\in [N],\;\forall j\in [N]. \end{aligned} \end{aligned}$$
(21)

Proof

Since \({\varXi }\) is finite, the (\(\hbox {D}^3\hbox {RO}\))-inner problem with ambiguity set \({\mathcal {P}}^W(\varvec{x})\) can be formulated as the following linear program:

$$\begin{aligned} \begin{aligned}&\underset{\varvec{p},\varvec{w}}{\text {max}}\;\;\sum ^N_{i=1}h(\varvec{x},\varvec{\xi }^i)p_i \\&\;\text {s.t.}\, \sum ^N_{j=1}w_{ij}=p_i \qquad \forall i\in [N], \\&\qquad \sum ^N_{i=1}w_{ij}={\hat{p}}_j \qquad \forall j\in [N], \\&\qquad \sum ^N_{i=1}\sum ^N_{j=1} \Vert \varvec{\xi }^i - \varvec{\xi }^j\Vert w_{ij} \le r(\varvec{x}), \\&\qquad \sum ^N_{i=1}p_i=1,\;\; p_i\ge 0,\;w_{ij}\ge 0\;\; \forall i\in [N],\;\forall j\in [N], \end{aligned} \end{aligned}$$
(22)

where \(\varvec{w}\) is a joint probability distribution with two marginal distributions given by \(\varvec{p}\) and \(\hat{\varvec{p}}\), respectively. The dual of the above linear program is:

$$\begin{aligned} \begin{aligned}&\underset{\varvec{\alpha },\varvec{\beta },\varvec{\mu },\varvec{\lambda },\gamma ,\eta }{\text {min}}\;\; -\sum ^N_{i=1}\hat{\varvec{p}}_i\beta _i - r(\varvec{x})\gamma + \eta \\&\quad \text {s.t.}\, -\alpha _i-\mu _i + \eta \ge h(\varvec{x},\varvec{\xi }^i) \qquad \forall i\in [N], \\&\qquad \quad -\alpha _i+\beta _j + \Vert \varvec{\xi }^i - \varvec{\xi }^j\Vert \gamma + \lambda _{ij} \le 0 \qquad \forall i\in [N],\;\forall j\in [N], \\&\qquad \quad \alpha _i\in {\mathbb {R}},\;\beta _i\in {\mathbb {R}},\;\mu _i\ge 0,\;\lambda _{ij}\ge 0,\;\gamma \le 0,\;\eta \in {\mathbb {R}}\quad \forall i\in [N],\;\forall j\in [N]. \end{aligned} \end{aligned}$$
(23)

After substituting (23) into (\(\hbox {D}^3\hbox {RO}\)), we obtain the desired reformulation (21). \(\square \)

A reformulation for the two-stage case is given in the following corollary.

Corollary 3.3

If \(h(\cdot ,\cdot )\) is a recourse function defined in (1), and the ambiguity set \({\mathcal {P}}^W({\varvec{x}})\) is non-empty for any \({\varvec{x}}\in X\), then the (\(\hbox {D}^3\hbox {RO}\)) problem with the ambiguity set \({\mathcal {P}}^{W}({\varvec{x}})\) can be reformulated as follows:

$$\begin{aligned} \begin{aligned}&\underset{\varvec{x},\varvec{y},\varvec{\alpha },\varvec{\beta },\varvec{\mu },\varvec{\lambda },\gamma }{min }\;\; f(\varvec{x})-\sum ^N_{k=1}\hat{\varvec{p}}_k\beta _k - r(\varvec{x})\gamma \\&\quad s.t. \, -\alpha _k-\mu _k\ge g(\varvec{x},\varvec{y}^k,\varvec{\xi }^k) \qquad \forall k\in [N], \\&\qquad \quad -\alpha _i+\beta _j + \Vert \varvec{\xi }^i-\varvec{\xi }^j\Vert \gamma + \lambda _{ij} \le 0 \qquad \forall i\in [N],\;\forall j\in [N], \\&\qquad \qquad \psi _i(\varvec{x},\varvec{y}^k,\varvec{\xi }^k)\ge 0 \quad \forall i\in [m],\;\forall k\in [N], \\&\qquad \quad \varvec{x}\in X,\;\; \alpha _i\in {\mathbb {R}},\;\beta _i\in {\mathbb {R}},\;\mu _i\ge 0,\;\lambda _{ij}\ge 0,\;\gamma \le 0\quad \forall i\in [N],\;\forall j\in [N]. \end{aligned} \end{aligned}$$
(24)

A generalization of (21) of (\(\hbox {D}^3\hbox {RO}\)) for the continuous support case is given in Appendix A.

3.4 Ambiguity sets defined using \(\phi \)-divergence

We now study the (\(\hbox {D}^3\hbox {RO}\)) problem using a decision dependent ambiguity set defined using the notion of \(\phi \)-divergence:

$$\begin{aligned} {\mathcal {P}}^{\phi }(\varvec{x}):= \big \{ P\in {\mathcal {P}}({\varXi },{\mathcal {F}}):\; {\mathcal {D}}_{\phi }(P||P_0)\le \eta (\varvec{x}) \big \}, \end{aligned}$$
(25)

where \({\mathcal {D}}_{\phi }(P||P_0)=\int _{{\varOmega }}\phi \left( \frac{dP}{dP_0}\right) dP_0\), and \(\phi \) is a non-negative and convex function. This type of ambiguity set is a generalization of the one considered in [11, 12, 14,15,16, 49] for the decision dependent case. Under Assumption 1, and letting \({\hat{p}}_i\) be the probability of scenario \(\varvec{\xi }_i\), the ambiguity set (25) is written as:

$$\begin{aligned} {\mathcal {P}}^{\phi }(\varvec{x})= \left\{ P=\sum ^N_{i=1}p_i\delta _{\varvec{\xi }^i}:\; \sum ^N_{i=1}{\hat{p}}_i\phi (p_i/{\hat{p}}_i) \le \eta (\varvec{x}), \; \sum ^N_{i=1}p_i=1, \; p_i\ge 0 \;\;\forall i\le N \right\} .\nonumber \\ \end{aligned}$$
(26)

Two reformulations of (\(\hbox {D}^3\hbox {RO}\)) with ambiguity set \({\mathcal {P}}^{\phi }(\varvec{x})\) are given in the following theorem.

Theorem 3.7

Let Assumption 1 hold, and \(\phi \) be a non-negative convex function. Assume that the following Slater’s condition is satisfied for every \(\varvec{x}\in X\): there exist a \(\varvec{p}\in {\mathbb {R}}^N\) such that \(p_i>0\), \(\sum ^N_{i=1}p_i=1\) and \(\sum ^N_{i=1}{\hat{p}}_i\phi (p_i/{\hat{p}}_i)<\eta (\varvec{x})\). Then (\(\hbox {D}^3\hbox {RO}\)) with the ambiguity set \({\mathcal {P}}^{\phi }(\varvec{x})\) can be reformulated as the following semi-infinite program:

$$\begin{aligned} \begin{aligned}&\underset{{\varvec{x}},\varvec{p},\alpha ,\beta ,\varvec{\lambda },z}{min } \;\; z \\&\;\;s.t. \, z\ge \sum ^N_{i=1}h(\varvec{x},\varvec{\xi }^i)p_i + \alpha \Big ( \frac{1}{N}\sum ^N_{i=1}\phi (Np_i)-\eta (\varvec{x}) \Big ) \\&\qquad \qquad + \beta \Big (\sum ^N_{i=1}p_i-1\Big ) + \sum ^N_{i=1}p_i\lambda _i \qquad \forall \varvec{p}\in S,\\&\qquad \quad \varvec{x}\in X,\;\alpha \le 0,\; \beta \in {\mathbb {R}},\; \lambda _i\ge 0,\; p_i\ge 0\;\;\forall i\in [N], \end{aligned} \end{aligned}$$
(27)

where \(S=\{\varvec{p}\in {\mathbb {R}}^N:\; \sum ^N_{i=1}p_i=1,\;p_i\ge 0\;\;\forall i\in [N]\}\). Alternatively, (27) also has the reformulation:

$$\begin{aligned} \begin{aligned}&\underset{\varvec{x},\varvec{p},\alpha ,\beta ,\varvec{\lambda }}{min }\;\; f(\varvec{x}) + \sum ^N_{i=1}p_ih\big (\varvec{x},\varvec{\xi }^i\big ) + \alpha \Big (\frac{1}{N}\sum ^N_{i=1}\phi (Np_i)-\eta (\varvec{x}) \Big )\\&\qquad + \beta \Big (\sum ^N_{i=1}p_i-1\Big ) + \sum ^N_{i=1}\lambda _ip_i \\&\;\;s.t. \,\ \alpha \phi ^{\prime }(Np_i) +\beta + h(\varvec{x},\varvec{\xi }^i) + \lambda _i = 0 \qquad \forall i\in [N], \\&\qquad \quad \varvec{x}\in X,\;\alpha \le 0,\; \beta \in {\mathbb {R}},\; \lambda _i\ge 0,\; p_i\ge 0\;\;\forall i\in [N]. \end{aligned} \end{aligned}$$
(28)

Proof

The (\(\hbox {D}^3\hbox {RO}\)) problem can be written as \(\underset{\varvec{x}\in X}{\text {min}}\;\; f(\varvec{x}) + {\varPhi }(\varvec{x})\), where the function \({\varPhi }(\varvec{x})\) is the optimal objective of the following optimization problem:

$$\begin{aligned} \begin{aligned} {\varPhi }(\varvec{x})=\;&\underset{\varvec{p}}{\text {max}}\;\sum ^N_{i=1}p_ih\big (\varvec{x},\varvec{\xi }^i\big ) \\&\text {s.t.}\, \frac{1}{N}\sum ^N_{i=1}\phi (Np_i) \le \eta (\varvec{x}), \quad \sum ^N_{i=1}p_i=1, \\&\qquad p_i\ge 0 \;\;\forall i\le N, \;\; \varvec{x}\in X. \end{aligned} \end{aligned}$$
(29)

Since \(\phi \) is convex, (29) is a convex program with respect to the decision variable \(\varvec{p}\). For a fixed \(\varvec{x}\in X\), the Lagrangian dual of (29) is written as follows:

$$\begin{aligned} \begin{aligned}&\underset{\alpha ,\beta ,\varvec{\lambda }}{\text {min}}\; \underset{\varvec{p}}{\text {max}}\;{\mathcal {L}}(\varvec{p};\alpha ,\beta ,\varvec{\lambda }) \\&\text {s.t.}\, \alpha \le 0,\; \beta \in {\mathbb {R}}, \; \lambda _i\ge 0,\quad \forall i\in [N], \end{aligned} \end{aligned}$$
(30)

where \({\mathcal {L}}(\varvec{p};\alpha ,\beta ,\varvec{\lambda })=\sum ^N_{i=1}h(\varvec{x},\varvec{\xi }^i)p_i + \alpha \Big ( \frac{1}{N}\sum ^N_{i=1}\phi (Np_i)-\eta (\varvec{x}) \Big ) + \beta \Big (\sum ^N_{i=1}p_i-1\Big ) + \sum ^N_{i=1}p_i\lambda _i\), and \(\alpha \), \(\beta \), \(\varvec{\lambda }\) are the Lagrange multipliers. Since Slater’s condition is satisfied for any \(\varvec{x}\in X\), strong duality holds. The inner maximization problem of (30) is equivalent to \(\{\text {max}\;z,\;\; \text {s.t. }z\ge {\mathcal {L}}(\varvec{p};\alpha ,\beta ,\varvec{\lambda })\;\;\forall \varvec{p}\in S\}\), which gives the reformulation (27).

Note that the inner problem of (30) is an unconstrained convex optimization problem. Using the KKT optimality conditions we have:

$$\begin{aligned} \frac{\partial {\mathcal {L}}}{\partial p_i}=\alpha \phi ^{\prime }(Np_i)+h(\varvec{x},\xi ^i)+\beta +\lambda _i=0,\qquad \forall i\in [N]. \end{aligned}$$
(31)

After substituting the expression of the Lagrangian in (30), adding the optimality condition and using strong duality, we obtain the reformulation given in (28). \(\square \)

We remark that in the cutting-surface algorithm (Algorithm 1) given in Sect. 4, the separation problem is specified over the set S, and it maximize \(\sum ^N_{i=1}h(\varvec{x},\varvec{\xi }^i)p_i + \alpha \Big ( \frac{1}{N}\sum ^N_{i=1}\phi (Np_i)-\eta (\varvec{x}) \Big ) + \beta \Big (\sum ^N_{i=1}p_i-1\Big ) + \sum ^N_{i=1}p_i\lambda _i\) over \(\varvec{p}\in S\), where \({\varvec{x}},\alpha ,\beta ,\varvec{\lambda }\) are the current solution obtained from solving the master problem at an iteration. Since \(\alpha \le 0\), and \(\phi \) is a convex function, the separation problem is maximizing a concave function, which is a convex optimization problem.

A reformulation for the two-stage stochastic optimization case is given in the following corollary.

Corollary 3.4

If \(h(\cdot ,\cdot )\) is a recourse function defined in (1) and the ambiguity set \({\mathcal {P}}^{\phi }({\varvec{x}})\) is non-empty for any \({\varvec{x}}\in X\), then the (\(\hbox {D}^3\hbox {RO}\)) problem with the ambiguity set \({\mathcal {P}}^{\phi }({\varvec{x}})\) can be reformulated as follows:

$$\begin{aligned} \begin{aligned}&\underset{\begin{array}{c} \varvec{x},\varvec{y},z,\varvec{p},\\ \alpha ,\beta ,\varvec{\lambda } \end{array}}{min }\;\; z \\&\;\;s.t. \,\ z\ge f(\varvec{x}) + \sum ^N_{k=1}p_kg\big (\varvec{x},\varvec{y}^k,\xi ^k\big )\\&\qquad \qquad + \alpha \Big (\frac{1}{N}\sum ^N_{k=1}\phi (Np_k)-\eta (\varvec{x}) \Big ) + \beta \Big (\sum ^N_{k=1}p_k-1\Big ) + \sum ^N_{k=1}\lambda _kp_k \; \forall \varvec{p}\in S, \\&\qquad \quad \psi _i(\varvec{x},\varvec{y}^k,\varvec{\xi }^k)\ge 0 \quad \forall i\in [m],\;\forall k\in [N], \\&\qquad \quad z\in {\mathbb {R}},\;\varvec{x}\in X,\;\alpha \le 0,\; \beta \in {\mathbb {R}},\; \lambda _k\ge 0,\; p_k\ge 0\;\;\forall k\in [N], \end{aligned} \end{aligned}$$
(32)

where \(S=\{\varvec{p}\in {\mathbb {R}}^N:\; \sum ^N_{k=1}p_k=1,\;p_k\ge 0\;\;\forall k\in [N]\}\).

3.5 Ambiguity sets defined based on the Kolmogorov–Smirnov test

The K–S distance was used by [17] in defining an ambiguity set for a data-driven distributionally robust optimization model. For two univariate probability distributions \(P_1\) and \(P_2\), let \(F_1\) and \(F_2\) be their cumulative distribution functions. The Kolmogorov–Smirnov (KS) distance is defined as:

$$\begin{aligned} D(P_1,P_2)=\underset{s}{\text {sup}}\; |F_1(s)-F_2(s)|. \end{aligned}$$
(33)

We now study the (\(\hbox {D}^3\hbox {RO}\)) problem with the ambiguity set defined based on the KS-distance. Note that although (33) is defined for a univariate random variable, this definition can be directly generalized for the probability distribution of a random vector with a finite support. Specifically, under Assumption 1, let \(P_0=\sum ^N_{i=1}{\hat{p}}_i\delta _{\varvec{\xi }^i}\) be an empirical probability distribution, where \(\delta _{\varvec{\xi }^i}\) is the indicator function. The KS-distance between a discrete probability distribution \(P=\sum ^N_{i=1}p_i\delta _{\varvec{\xi }^i}\) and \(P_0\) can be written as:

$$\begin{aligned} D(P,P_0)=\underset{k\in [N]}{\text {sup}}\; \Big | \sum ^k_{i=1}p_i - \sum ^k_{i=1}{\hat{p}}_i \Big |. \end{aligned}$$
(34)

The decision dependent ambiguity set of probability distributions is constructed using the KS-distance as follows:

$$\begin{aligned} {\mathcal {P}}^{KS}(\varvec{x})=\left\{ \varvec{p}\in {\mathbb {R}}^{N}:\;\; \underset{k\in [N]}{\text {sup}}\;\Big |\sum ^k_{i=1}p_i-\sum ^k_{i=1}{\hat{p}}_i \Big |\le \eta (\varvec{x}), \;\; \sum ^N_{i=1}p_i=1,\;\; p_i\ge 0\;\;\forall i\in [N] \right\} .\nonumber \\ \end{aligned}$$
(35)

A reformulation of the (\(\hbox {D}^3\hbox {RO}\)) problem is given in the following theorem.

Theorem 3.8

Let Assumption 1 hold. If for any \({\varvec{x}}\in X\), \(h({\varvec{x}},\varvec{\xi }^k)\) is finite for any \(k\in [N]\) and the ambiguity set (35) is nonempty, then the (\(\hbox {D}^3\hbox {RO}\)) problem with the ambiguity set (35) can be reformulated as:

$$\begin{aligned} \begin{aligned}&\underset{\varvec{x},\lambda ,\varvec{\alpha },\varvec{\beta },\varvec{\gamma }}{min }\; f(\varvec{x})+\lambda + \sum ^N_{i=1}\sum ^N_{k=1}(\alpha _k+\beta _k){\hat{p}}_i +\sum ^N_{k=1}(\alpha _k-\beta _k)\eta (\varvec{x}) \\&\quad s.t. \, \lambda + \sum ^N_{k=i}(\alpha _k+\beta _k) + \gamma _i \ge h(\varvec{x},\xi ^i) \qquad \forall i\in [N], \\&\qquad \quad \lambda \in {\mathbb {R}},\; \alpha _i\ge 0,\; \beta _i\le 0,\; \gamma _i\le 0\;\;\forall i\in [N]. \end{aligned} \end{aligned}$$
(36)

Proof

The (\(\hbox {D}^3\hbox {RO}\)) problem with the ambiguity set (35) can be written as:

$$\begin{aligned} \begin{aligned}&\underset{\varvec{x}}{\text {min}}\;f(\varvec{x}) + \underset{\varvec{p}}{\text {max}}\;\sum ^N_{i=1}h(\varvec{x},\xi ^i)p_i \\&\text {s.t.}\, \underset{k\in [N]}{\text {sup}}\;\Big |\sum ^k_{i=1}p_i-\sum ^k_{i=1}{\hat{p}}_i \Big |\le \eta (\varvec{x}), \\&\qquad \sum ^N_{i=1}p_i=1, \quad p_i\ge 0 \quad \forall i\in [N]. \end{aligned} \end{aligned}$$
(37)

Note that the inner problem of (37) can be reformulated as the following linear program:

$$\begin{aligned} \begin{aligned}&\underset{\varvec{p}}{\text {max}}\; \sum ^N_{i=1}h(\varvec{x},\xi ^i)p_i \\&\text {s.t.}\, \sum ^k_{i=1}p_i-\sum ^k_{i=1}{\hat{p}}_i\le \eta (\varvec{x}) \qquad \forall k\in [N], \\&\qquad \sum ^k_{i=1}p_i-\sum ^k_{i=1}{\hat{p}}_i\ge -\eta (\varvec{x}) \qquad \forall k\in [N], \\&\qquad \sum ^N_{i=1}p_i=1,\quad p_i\ge 0 \quad \forall i\in [N]. \end{aligned} \end{aligned}$$
(38)

After taking the dual of the above linear program and combining it with the outer problem, we obtain (36). \(\square \)

A reformulation for the two-stage stochastic optimization case is given in the following corollary.

Corollary 3.5

If \(h(\cdot ,\cdot )\) is a recourse function defined in (1) and the ambiguity set is given by (35). If for any \({\varvec{x}}\in X\), \(h({\varvec{x}},\varvec{\xi }^k)\) is finite for any \(k\in [N]\) and the ambiguity set (35) is nonempty, then the (\(\hbox {D}^3\hbox {RO}\)) problem with the ambiguity set (35) can be reformulated as follows:

$$\begin{aligned} \begin{aligned}&\underset{\varvec{x},\varvec{y},\lambda ,\varvec{\alpha },\varvec{\beta },\varvec{\gamma }}{min }\; f(\varvec{x})+\lambda + \sum ^N_{i=1}\sum ^N_{k=1}(\alpha _k+\beta _k){\hat{p}}_i +\sum ^N_{k=1}(\alpha _k-\beta _k)\eta (\varvec{x}) \\&\quad s.t. \, \lambda + \sum ^N_{k=i}(\alpha _k+\beta _k) + \gamma _i \ge g(\varvec{x},\varvec{y}^i,\xi ^i) \qquad \forall i\in [N], \\&\qquad \quad \psi _i(\varvec{x},\varvec{y}^k,\varvec{\xi }^k)\ge 0 \quad \forall i\in [m],\;\forall k\in [N], \\&\qquad \quad {\varvec{x}}\in X,\; \lambda \in {\mathbb {R}},\; \alpha _i\ge 0,\; \beta _i\le 0,\; \gamma _i\le 0\;\;\forall i\in [N]. \end{aligned} \end{aligned}$$
(39)

The reformulation (36) of (\(\hbox {D}^3\hbox {RO}\)) with the ambiguity set defined using the K–S distance can be generalized for the case where the support \({\varXi }\) is continuous. The details of this generalization are given in Appendix B.

4 Solution approaches

4.1 Models with continuous support

For models with continuous and compact support \({\varXi }\), reformulations given in Sect. 3.1 (without the measure bound constraint) Sect. 3.2, Appendix A, B, and that in Sect. 3.4 are semi-infinite programs (gen-SIP):

$$\begin{aligned} \begin{array}{cll} \underset{x}{\text {min}} &{} \displaystyle f(x) &{} \\ \text {s.t.} &{}\displaystyle g(x,t)\le 0, &{} \forall t\in T, \\ &{} x\in X, \end{array} \end{aligned}$$
(gen-SIP)

by appropriately specifying g(xt). Here \(X\subseteq {\mathbb {R}}^{k_1}\) and \(T\subseteq {\mathbb {R}}^{k_2}\times {\mathbb {Z}}^{k_3}\). A cutting-surface algorithm from [10] (Algorithm 1) is now presented to solve such problems. We assume to have an oracle that can solve a deterministic non-convex optimization problem with finitely many constraints. The basic cutting-surface algorithm solves a finite constraint relaxation problem at each iteration:

$$\begin{aligned} \underset{x\in X}{\text {min}}\;\{ f(x):\; \text {s.t.}\, g(x,t)\le 0, \; t\in T^{\prime } \}, \end{aligned}$$
(40)

where \(T^{\prime }\) are finite samples from T. This relaxation is called the master problem. The algorithm generates desired samples as the algorithm progresses. These samples are identified from the support by solving a separation problem. See for example, the algorithm given in [10]. Specifically, a new constraint is added at each iteration of the master problem. This constrained is identified by finding a \(t\in T\) for which the incumbent solution from the master problem is violated. Given the incumbent solution \({\hat{x}}\in X\), the following separation problem

$$\begin{aligned} \underset{t\in T}{\text {max}}\; g({\hat{x}}, t), \end{aligned}$$
(41)

is solved for the purposes of generating the next constraint. The algorithm outputs an \(\varepsilon \)-optimal solution (Definition 4.1) to (gen-SIP) in a finite number of iterations (Theorem 4.1).

Definition 4.1

For a general semi-infinite program in the form of (gen-SIP), a point \(x_0\in X\) is an \(\varepsilon \)-feasible solution of (gen-SIP) if \(\underset{t\in T}{\text {max}}\; g(x_0,t)\le \varepsilon \). A point \(x_0\in X\) is an \(\varepsilon \)-optimal solution of (gen-SIP) if \(x_0\) is an \(\varepsilon \)-feasible solution of (gen-SIP) and \(f(x_0)\le \text {Val}\)(gen-SIP).

figure b

Theorem 4.1

(Theorem 3.2 in [10]) If \(X\times T\) is compact, and g(xt) is continuous on \(X\times T\), then Algorithm 1 terminates in finitely many iterations and returns an \(\varepsilon \)-optimal solution of (gen-SIP).

We note that the oracle problem (41) in the cutting-surface algorithm is simply a function evaluation problem for the finite support case. In this case the algorithm can be adapted by sequentially adding constraints as violated inequalities are identified. This may be useful when N is large.

4.2 Models with binary decision variables and finite support

The reformulations developed in Sect. 3 have nonlinear terms that are products of dual variables and primal variables. These reformulations are in general non-convex programming problems. A solution approach is needed to handle this nonlinearity. However, when applying the reformulations to problems where first-stage decision variables \(\varvec{x}\) are binary and the dual variables/Lagrange multipliers are bounded, the nonlinear terms may become a product of a binary variable and a bounded continuous variable. For example, this is the case when in Sect. 3.1 the expressions \(\varvec{l}({\varvec{x}})\), \(\varvec{u}({\varvec{x}})\), \({\underline{p}}_k({\varvec{x}})\), \({\underline{p}}_k({\varvec{x}})\) are linear functions of \({\varvec{x}}\).

Such terms can be linearized as follows. Consider a bilinear term xy where \(x\in \{0,1\}\) is a binary variable and y is a bounded continuous variable with the bounds \(a\le y\le b\). We introduce a continuous variable z and the following constraints to represent xy:

$$\begin{aligned} ax\le z\le bx,\qquad y-b(1-x) \le z\le y-a(1-x). \end{aligned}$$
(42)

When \(x=1\), the second constraint of (42) is equivalent to \(z=y\), and the first constraint \(a\le z\le b\) becomes redundant. When \(x=0\), the first constraint of (42) is equivalent to \(z=0\), and the second constraint \(y-b\le z\le y-a\) becomes redundant. With the above linearization technique, problems where the nonlinear terms in the dual formulations in Sects. 3.13.4 involve a bilinear product of continuous and binary variables, and \(h({\varvec{x}}\varvec{\xi })\) is convex in \({\varvec{x}}\) can be reformulated as a mixed 0-1 convex program. As an example, in [50], we investigate a distributionally-robust service center location problem with decision dependent utilities. We show that this problem can be reformulated as a two-stage stochastic mixed 0-1 second-order cone program using such as technique.

5 Illustrative numerical example

We now use a numerical example to illustrate the relevance of incorporating endogenous ambiguity in a distributionally-robust optimization model. Specifically, we consider a distributionally-robust newsvendor model:

$$\begin{aligned} \underset{x\in {\mathcal {X}},q\in {\mathcal {Q}}}{\text {max}}\;\underset{P\in {\mathcal {P}}(x)}{\text {min}}\;{\mathbb {E}}_P[x\cdot \text {min}(q,D)]-cq, \end{aligned}$$
(DNV)

where c is the unit cost, D is the stochastic demand, x is the selling price, and q is the number of units ordered/purchased by the vendor. The variables x and q are decision variables. The sets \({\mathcal {X}}\) and \({\mathcal {Q}}\) are the feasible sets of selling price and order quantity. The probability distribution of the demand D depends on the selling price. The set \({\mathcal {P}}(x)\) is a decision dependent ambiguity set of the probability distribution of D. The objective is to maximize the risk-averse profit. Product demand is a decreasing function of the selling price. For every candidate selling price \(x_i\), there is a nominal deterministic demand value \({\widehat{D}}_i\) corresponding to \(x_i\). In the numerical illustration, we consider a finite support \({\varXi }\) for D which is given by \({\varXi }=\{d_i\}^8_{i=1}\) where \(d_1=5,\; d_2=10,\; d_3=15,\; d_4=20,\; d_5=25,\; d_6=30,\; d_7=35,\; d_8=40\). The set \({\mathcal {X}}\) of candidate selling price is given by \({\mathcal {X}}=\{x_1,x_2,x_3,x_4\}\) where \(x_1=1.2,\; x_2=1.4,\; x_3=1.6,\; x_4=1.8\). The set \({\mathcal {Q}}\) of order quantities is given by \({\mathcal {Q}}=\{q_i\}^8_{i=1}\) where \(q_i=d_i\) for all \(i\in \{1,\ldots ,8\}\). The cost is \(c=1.0\). The nominal demand values are given by \({\widehat{D}}_1=d_8,\;{\widehat{D}}_2=d_6,\;{\widehat{D}}_3=d_4,\;{\widehat{D}}_4=d_2\). Let \({\mathcal {P}}_i={\mathcal {P}}(x_i)\) for \(i\in \{1,\ldots ,4\}\). The nominal optimal solution (when no ambiguity exits) is \(x^*=x_2=1.4\), \(q^*=q_6=30\), and the nominal optimal value is \(V^*=11.30\). We consider different ways to define \({\mathcal {P}}_i\) and study the optimal solution of (DNV) under these definitions. In the current illustration, we enumerate all the possible combinations of \((x,q)\in {\mathcal {X}}\times {\mathcal {Q}}\) and solve (DNV) to determine the optimal solution. A future work addresses this problem through a systematic algorithm development [51].

5.1 Definition of \({\mathcal {P}}_i\) using mean and variance

Consider the case of defining \({\mathcal {P}}_i\) using mean and variance as described in Sect. 3.2. Specifically, the \({\mathcal {P}}_i\) is defined as

(43)

where \(K=8\). Similarly, we can enumerate all the possible combinations of \((x,q)\in {\mathcal {X}}\times {\mathcal {Q}}\) and solve (DNV) in every case to determine the optimal solution. In the case that \((x,q)=(x_i,q_j)\) the objective value of (DNV) is given by

$$\begin{aligned} \begin{aligned} V(x_i,q_j)=\;&\text {min}\; \sum ^K_{k=1}p_k x_i\cdot \text {min}(q_j,d_k) - cq_j \\&\text {s.t.}\, \sum ^K_{k=1} p_k = 1, \\&\qquad {\widehat{D}}_i-\delta _i \le \sum ^K_{k=1}p_kd_k\le {\widehat{D}}_i+\delta _i, \\&\qquad \sum ^K_{k=1}p_k(d_k-{\widehat{D}}_i)^2\le \gamma _i, \\&\qquad p_k\ge 0\qquad \forall k\in [K]. \end{aligned} \end{aligned}$$
(44)

The optimization problem in (44) is a linear program of the decision variables \(\varvec{p}\). In the first setting of the ambiguity, we set \(\delta _i=2\) and \(\gamma _i=5\) for \(i\in \{1,2,3,4\}\). The optimal solution is \(x^*=x_2=1.4\), \(q^*=q_6=30\), and the optimal value is \(V^*=10.60\). In the second setting of the ambiguity, we increase the ambiguity by setting \(\delta _i=3\) and \(\gamma _i=10\) for \(i\in \{1,2,3,4\}\). The optimal solution becomes \(x^*=x_2=1.4\), \(q^*=q_5=25\), and the optimal value is \(V^*=9.30\). Note that although the optimal selling price has not changed, the optimal order quantity changes with decision dependent ambiguity.

5.2 Definition of \({\mathcal {P}}_i\) using the Wasserstein metric

The \({\mathcal {P}}_i\) can be defined using the Wasserstein metic (20). Specifically, let \({\mathcal {P}}_i\) be defined as

$$\begin{aligned} {\mathcal {P}}_i=\{\varvec{p}\in {\mathbb {R}}^K\;|\; {\mathcal {W}}(\varvec{p},\hat{\varvec{p}}^i)\le r_i,\; \sum ^K_{k=1}p_k=1,\; p_k\ge 0,\; \forall k\in [K] \} \qquad \forall i\in [I],\qquad \end{aligned}$$
(45)

where \(I=4\), \(K=8\), and \(\hat{\varvec{p}}^i\) is a nominal probability distribution on \({\varXi }\) satisfying

$$\begin{aligned} {\hat{p}}^i_k = \left\{ \begin{array}{ll} 1 &{} \text {if}\,d_k={\widehat{D}}_i, \\ 0 &{} \text {otherwise}. \end{array} \right. \end{aligned}$$
(46)

In the case that \((x,q)=(x_i,q_j)\) the objective value of (DNV) is given by

$$\begin{aligned} \begin{aligned} V(x_i,q_j)=\;&\text {min}\; \sum ^K_{k=1}p_k x_i\cdot \text {min}(q_j,d_k) - cq_j \\&\text {s.t.}\, \sum ^K_{k=1} p_k = 1, \\&\qquad \sum ^K_{k=1} t_{kl} = {\hat{p}}^i_l \qquad \forall l\in [K], \\&\qquad \sum ^K_{l=1} t_{kl} = p_k \qquad \forall k\in [K], \\&\qquad \sum ^{K}_{k=1}\sum ^K_{l=1}|d_k-d_l|t_{kl}\le r_i, \\&\qquad p_k\ge 0,\; t_{kl}\ge 0\qquad \forall k,l\in [K], \end{aligned} \end{aligned}$$
(47)

Notice that (47) is a linear program of the decision variables \(p_k\) and \(t_{kl}\) (\(k,l\in [K]\)). In the first setting of the ambiguity, we set \(r_i=0.5\) for \(i\in \{1,2,3,4\}\). The optimal solution is \(x^*=x_2=1.4\), \(q^*=q_6=30\), and the optimal value is \(V^*=11.30\). In the second setting of the ambiguity, we increase the ambiguity of the demand at the price \(x_2\) by setting \(r_2=0.6\) and \(r_i=0.5\) for \(i\in \{1,3,4\}\). The optimal solution becomes \(x^*=x_3=1.6\), \(q^*=q_4=20\), and the optimal value is \(V^*=11.20\). Note that in this case both the optimal selling price and the order quantity changes with the ambiguity.

6 Concluding remarks

We have established a framework for reformulating the distributionally robust optimization problems with important types of decision dependent ambiguity sets. These ambiguity sets contain decision dependent parameters. For example, the moment robust ambiguity set (10) contains parameters \(\alpha ({\varvec{x}})\), \(\beta ({\varvec{x}})\), \(\varvec{\mu }({\varvec{x}})\) and \(\varvec{Q}({\varvec{x}})\), which are functions of the decision \({\varvec{x}}\). We now briefly discuss the estimation of these functions using a data-driven approach. Ambiguity sets for \(\varvec{\xi }\) under an arbitrary decision \({\varvec{x}}\) can be constructed if such information is available from past decisions, or if it is possible for us to experiment with trial decisions \(\{{\varvec{x}}^i\}^k_{i=1}\) and collect samples of the random vector \(\varvec{\xi }\) under each decision \({\varvec{x}}^i\). From these samples we can establish the analytical relation between the parameters in defining the ambiguity set and the decision using statistical learning models. We can subsequently extrapolate this analytical relation to a general decision \({\varvec{x}}\) to obtain an empirical decision dependent ambiguity set description.

The goal of this chapter was to show that it is possible to extend the dual formulations in DRO even when the ambiguity sets are decision dependent. The analysis suggests that the situations for which DRO models admit a dual reformulation also allow for dual reformulations for the decision dependent case. The reformulated models are generally non-convex optimization problems requiring further investigation towards developing efficient algorithms for the specific situations. The non-convex optimization problems may have further structure when additional assumptions on decision dependent parameters and the feasible set X are imposed. This structure may be exploited for further refined reformulations and the development of efficient algorithms.