Skip to main content
Log in

On linear optimization over Wasserstein balls

  • Short Communication
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

Wasserstein balls, which contain all probability measures within a pre-specified Wasserstein distance to a reference measure, have recently enjoyed wide popularity in the distributionally robust optimization and machine learning communities to formulate and solve data-driven optimization problems with rigorous statistical guarantees. In this technical note we prove that the Wasserstein ball is weakly compact under mild conditions, and we offer necessary and sufficient conditions for the existence of optimal solutions. We also characterize the sparsity of solutions if the Wasserstein ball is centred at a discrete reference measure. In comparison with the existing literature, which has proved similar results under different conditions, our proofs are self-contained and shorter, yet mathematically rigorous, and our necessary and sufficient conditions for the existence of optimal solutions are easily verifiable in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

  1. We are grateful to Lorenzo Dello Schiavo, who communicated this result to us.

References

  1. Aliprantis, C.D., Border, K.C.: Infinite Dimensional Analysis: A Hitchhiker’s Guide, 3rd edn. Springer, New York (2006)

    MATH  Google Scholar 

  2. Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: Proceedings of the 34th International Conference on Machine Learning, pp. 214–223 (2017)

  3. Billingsley, P.: Convergence of Probability Measures, 2nd edn. Wiley, Boca Raton (1992)

    MATH  Google Scholar 

  4. Billingsley, P.: Probability and Measure, 3rd edn. Wiley, Boca Raton (1995)

    MATH  Google Scholar 

  5. Blanchet, J., Murthy, K.: Quantifying distributional model risk via optimal transport. Math. Oper. Res. 44(2), 565–600 (2019)

    Article  MathSciNet  Google Scholar 

  6. Bogachev, V.I.: Measure Theory, vol. II. Springer, New York (2007)

    Book  Google Scholar 

  7. Clément, P., Desch, W.: Wasserstein metric and subordination. Studia Mathematica 1(189), 35–52 (2008)

    Article  MathSciNet  Google Scholar 

  8. Schiavo, L. Dello: Heat equation on metric measure spaces. Master’s thesis, Sapienza University of Rome (2015)

  9. Dudley, R.M.: Real Analysis and Probability. Wadsworth & Brooks/Cole, New York (1989)

    MATH  Google Scholar 

  10. Frogner, C., Zhang, C., Mobahi, H., Araya, M., Poggio, T.A.: Learning with a Wasserstein loss. Adv. Neural Inf. Process. Syst. 28, 2053–2061 (2015)

    Google Scholar 

  11. Gao, R., Kleywegt, A.J.: Distributionally robust stochastic optimization with Wasserstein distance. arXiv preprint arXiv:1604.02199 (2016)

  12. Gibbs, A.L., Su, F.E.: On choosing and bounding probability metrics. Int. Stat. Rev. 70(3), 419–435 (2002)

    Article  Google Scholar 

  13. Ho, N., Nguyen, X. L., Yurochkin, M., Bui, H. H., Huynh, V., Phung, D.: Multilevel clustering via Wasserstein means. In: Proceedings of the 34th International Conference on Machine Learning, pp. 1501–1509, (2017)

  14. Kuhn, D., Esfahani, P.M. Nguyen, V., Shafieezadeh-Abadeh, S.: Wasserstein distributionally robust optimization: theory and applications in machine learning. In: Operations Research & Management Science in the Age of Analytics, pp. 130–166. INFORMS (2019)

  15. Esfahani, P.M., Kuhn, D.: Data-driven distributionally robust optimization using the Wasserstein metric: Performance guarantees and tractable reformulations. Math. Programm. 171(1–2), 115–166 (2018)

    Article  MathSciNet  Google Scholar 

  16. Nguyen, V.A., Shafieezadeh-Abadeh, S., Yue, M.-C., Kuhn, D., Wiesemann, W.: Calculating optimistic likelihoods using (geodesically) convex optimization. Adv. Neural Inform. Process. Syst. 32 (2019)

  17. Nguyen, V.A., Shafieezadeh-Abadeh, S., Yue, M.-C., Kuhn, D., Wiesemann, W.: Optimistic distributionally robust optimization for nonparametric likelihood approximation. Adv. Neural Inform. Process. Syste. 32 (2019)

  18. Owhadi, H., Scovel, C.: Extreme points of a ball about a measure with finite support. Commun. Math. Sci. 15(1), 77–96 (2017)

    Article  MathSciNet  Google Scholar 

  19. Pflug, G., Wozabal, D.: Ambiguity in portfolio selection. Quant. Finance 7(4), 435–442 (2007)

    Article  MathSciNet  Google Scholar 

  20. Pichler, A., Xu, H.: Quantitative stability analysis for minimax distributionally robust risk optimization. Mathematical Programming, Available Online (2018)

    MATH  Google Scholar 

  21. Pinelis, I.: On the extreme points of moments sets. Math. Methods Oper. Res. 83(3), 325–349 (2016)

    Article  MathSciNet  Google Scholar 

  22. Shapiro, A., Dentcheva, D., Ruszczynski, A.: Lectures on Stochastic Programming. SIAM, New York (2009)

    Book  Google Scholar 

  23. Villani, C.: Optimal Transport: Old and New, vol. 338. Springer, New York (2008)

    MATH  Google Scholar 

  24. Wozabal, D.: A framework for optimization under ambiguity. Ann. Oper. Res. 193(1), 21–47 (2012)

    Article  MathSciNet  Google Scholar 

  25. Yue, M.-C., Kuhn, D., Wiesemann, W.: On linear optimization over Wasserstein balls. arXiv preprint arXiv:2004.07162, (2021)

  26. Zhao, C., Guan, Y.: Data-driven risk-averse stochastic optimization with Wasserstein metric. Oper. Res. Lett. 46(2), 262–267 (2018)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors gratefully acknowledge funding from the Swiss National Science Foundation under Grant BSCGI0\(\underline{~}\)157733, the UK’s Engineering and Physical Sciences Research Council under Grant EP/R045518/1 and the Hong Kong Research Grants Council under the Grant 25302420.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Man-Chung Yue.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Auxiliary measure-theoretic results

We review some well-known facts from measure theory that we use to prove our results. We first recall a connection between the notions of tightness and weak sequential compactness of collections of probability measures.

Definition 1

A collection \({\mathcal {S}} \subseteq {\mathcal {P}} (X)\) of probability measures is tight if for any \(\epsilon >0\), there exists a compact subset \(B \subseteq X\) such that \(\mu (X \setminus B) \le \epsilon \) for all \(\mu \in {\mathcal {S}}\).

Definition 2

A sequence \(\{\mu ^k\}_k \subseteq {\mathcal {P}} (X)\) of probability measures converges weakly to \(\mu ^\infty \in {\mathcal {P}} (X)\) if for any bounded and continuous function g on X, we have

$$\begin{aligned} \lim _{k\longrightarrow \infty } \int _X g(x) \,\mathrm {d}\mu ^k \;\; = \;\; \int _X g(x) \,\mathrm {d}\mu ^\infty . \end{aligned}$$

Definition 3

A collection \({\mathcal {S}} \subseteq {\mathcal {P}} (X)\) of probability measures is weakly sequentially compact if every sequence in \({\mathcal {S}}\) has a subsequence that converges weakly to an element of \({\mathcal {S}}\).

The concepts of tightness and weak sequential compactness are connected by Prokho-rov’s Theorem, see for example [3, Theorem 5.1].

Theorem 5

(Prokhorov’s Theorem) A collection \({\mathcal {S}} \subseteq {\mathcal {P}} (X)\) of probability measures is tight if and only if the closure of \({\mathcal {S}}\) is weakly sequentially compact in \({\mathcal {P}}(X)\).

Note that the space \({\mathcal {P}}(X)\) is metrizable, sequential compactness and compactness of subsets of \({\mathcal {P}}(X)\) are equivalent to each other.

The following lemma, which is excerpted from the Portmanteau Theorem (see for example [4, Problem 29.1(c)]), provides a useful characterization of weak convergence.

Lemma 3

A sequence \(\{\mu ^k \}_k \subseteq {\mathcal {P}} (X)\) of probability measures converges weakly to \(\mu ^\infty \in {\mathcal {P}} (X)\) if and only if for any upper bounded and upper semi-continuous function g on X, we have

$$\begin{aligned} \limsup _{k\longrightarrow \infty } \int _X g(x) \,\mathrm {d}\mu ^k (x) \;\; \le \;\; \int _X g(x) \,\mathrm {d}\mu ^\infty (x). \end{aligned}$$

Appendix B: Basic feasible solutions in infinite-dimensional linear programming

It is well-known that if a finite-dimensional linear program with m equality constraints has an optimal solution, then there must be an optimal basic feasible solution with at most m non-zero entries. An infinite-dimensional analogue of this fact is proved in [21, Corollary 5 and Proposition 6(v)]. To state this result, let Z be a topological space, let \({\mathcal {M}}_+ (Z)\) be the set of non-negative finite Borel measures supported on Z, and let \(\psi , \phi _1,\dots ,\phi _m : Z \rightarrow {\mathbb {R}}\) be Borel functions as well as \(v \in {\mathbb {R}}^m\). Consider now the optimization problem

$$\begin{aligned} \begin{array}{l@{\quad }l@{\quad }l} \displaystyle \mathop {\text {maximize}}_{\gamma } &{} \displaystyle \int _Z \psi (z) \, \mathrm {d}\gamma (z) \\ \displaystyle \text {subject to} &{} \gamma \in {\mathcal {M}}_+ (Z) \\ &{} \displaystyle \int _Z \phi _i (z) \, \mathrm {d}\gamma (z)= v_i &{} \displaystyle \forall i =1,\dots ,m, \end{array} \end{aligned}$$
(11)

and denote by \({\mathcal {F}}\) the feasible region of (11) and by \(\mathrm {ext}({\mathcal {F}})\) the set of extreme points of \({\mathcal {F}}\).

Proposition 1

Suppose that for all \(\gamma \in {\mathcal {F}}\), at least one of the integrals \(\int _X [\psi (z)]_+ \, \mathrm {d} \gamma (z)\) and \(\int _X [-\psi (z)]_+ \, \mathrm {d} \gamma (z)\) is finite and that \(\int _Z |\phi _i| (z) \, \mathrm {d}\gamma (z) <\infty \) for all \(i = 1,\dots ,m\). If

$$\begin{aligned} \sup \left\{ \int _Z \psi (z) \, \mathrm {d}\gamma (z): \gamma \in {\mathcal {F}} \right\} \;\; = \;\; \sup \left\{ \int _Z \psi (z) \, \mathrm {d}\gamma (z): \gamma \in \mathrm {ext}({\mathcal {F}}) \right\} , \end{aligned}$$
(12)

then it holds that

$$\begin{aligned} \sup \left\{ \int _Z \psi (z) \, \mathrm {d}\gamma (z) : \gamma \in {\mathcal {F}} \right\} \;\; = \;\; \sup \left\{ \int _Z \psi (z) \, \mathrm {d}\gamma : \gamma \in {\mathcal {F}}\cap {\mathcal {D}}_m (Z) \right\} , \end{aligned}$$

where \({\mathcal {D}}_m (Z)\) is the set of non-negative discrete measures supported on at most m points in Z. Furthermore, if \({\mathcal {F}}\subseteq {\mathcal {P}}(Z)\) and Z is Hausdorff, then the condition (12) is satisfied.

We note that the conclusion of Proposition 1 cannot readily be drawn from the Richter-Rogosinski theorem [22, Theorem 7.32]. Indeed, in our context the Richter-Rogosinski theorem would only ensure the existence of a non-negative discrete measure \(\gamma ^\star \) that is supported on at most \(m + 1\) (instead of m) points since \(\gamma ^\star \) would have to satisfy \(m + 1\) moment conditions: the m moment constraints of problem (11) as well as the additional constraint that \(\gamma ^\star \) attains the optimal objective value of problem (11).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yue, MC., Kuhn, D. & Wiesemann, W. On linear optimization over Wasserstein balls. Math. Program. 195, 1107–1122 (2022). https://doi.org/10.1007/s10107-021-01673-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-021-01673-8

Mathematics Subject Classification

Navigation