Skip to main content
Log in

On the Representation and Learning of Monotone Triangular Transport Maps

  • Published:
Foundations of Computational Mathematics Aims and scope Submit manuscript

Abstract

Transportation of measure provides a versatile approach for modeling complex probability distributions, with applications in density estimation, Bayesian inference, generative modeling, and beyond. Monotone triangular transport maps—approximations of the Knothe–Rosenblatt (KR) rearrangement—are a canonical choice for these tasks. Yet the representation and parameterization of such maps have a significant impact on their generality and expressiveness, and on properties of the optimization problem that arises in learning a map from data (e.g., via maximum likelihood estimation). We present a general framework for representing monotone triangular maps via invertible transformations of smooth functions. We establish conditions on the transformation such that the associated infinite-dimensional minimization problem has no spurious local minima, i.e., all local minima are global minima; and we show for target distributions satisfying certain tail conditions that the unique global minimizer corresponds to the KR map. Given a sample from the target, we then propose an adaptive algorithm that estimates a sparse semi-parametric approximation of the underlying KR map. We demonstrate how this framework can be applied to joint and conditional density estimation, likelihood-free inference, and structure learning of directed graphical models, with stable generalization performance across a range of sample sizes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Algorithm 2
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. For any \({\varvec{z}}\in \mathbb {R}^d\), \({\varvec{x}}=S^{-1}({\varvec{z}})\) can be computed recursively as \(x_{k}=T^{k}({\varvec{x}}_{< k},z_k)\) for \(k=1,\dots ,d\), where the function \(T^{k}({\varvec{x}}_{< k},\cdot )\) is the inverse of \(x_k\mapsto S_{k}({\varvec{x}}_{< k},x_k)\). In practice, evaluating \(T^{k}\) requires solving a root-finding problem which is guaranteed to have a unique (real) root, and for which the bisection method converges geometrically fast. Therefore, \(S^{-1}({\varvec{z}})\) can be evaluated to machine precision in negligible computational time.

  2. That is, \(\Vert v_1\otimes \cdots \otimes v_k\Vert _{V_k} = \Vert v_1\Vert _{L^2_{\eta _1}}\Vert v_2\Vert _{L^2_{\eta _2}}\cdots \Vert v_{k-1}\Vert _{L^2_{\eta _{k-1}}} \Vert v_k\Vert _{H^1_{\eta _k}}\) for any \(v_j\in L^2_{\eta _k}\) and \(v_k\in H^1_{\eta _k}\).

  3. https://github.com/baptistar/ATM.

References

  1. Ambrogioni, L., Güçlü, U., van Gerven, M. A. and Maris, E. (2017). The kernel mixture network: A nonparametric method for conditional density estimation of continuous random variables. arXiv preprintarXiv:1705.07111.

  2. Anderes, E. and Coram, M. (2012). A general spline representation for nonparametric and semiparametric density estimates using diffeomorphisms. arXiv preprintarXiv:1205.5314.

  3. Baptista, R., Hosseini, B., Kovachki, N. B. and Marzouk, Y. (2023). Conditional sampling with monotone GANs: from generative models to likelihood-free inference. arXiv preprintarXiv:2006.06755v3.

  4. Baptista, R., Marzouk, Y., Morrison, R. E. and Zahm, O. (2021). Learning non-Gaussian graphical models via Hessian scores and triangular transport. arXiv preprintarXiv:2101.03093.

  5. Bertsekas, D. P. (1997). Nonlinear programming. Journal of the Operational Research Society 48 334–334.

    Article  Google Scholar 

  6. Bigoni, D., Marzouk, Y., Prieur, C. and Zahm, O. (2022). Nonlinear dimension reduction for surrogate modeling using gradient information. Information and Inference: A Journal of the IMA.

  7. Bishop, C. M. (1994). Mixture density networks Technical Report No. Neural Computing Research Group report: NCRG/94/004, Aston University.

  8. Bogachev, V. I., Kolesnikov, A. V. and Medvedev, K. V. (2005). Triangular transformations of measures. Sbornik: Mathematics 196 309.

    Article  MathSciNet  MATH  Google Scholar 

  9. Boyd, J. P. (1984). Asymptotic coefficients of Hermite function series. Journal of Computational Physics 54 382–410.

    Article  MathSciNet  MATH  Google Scholar 

  10. Brennan, M., Bigoni, D., Zahm, O., Spantini, A. and Marzouk, Y. (2020). Greedy inference with structure-exploiting lazy maps. Advances in Neural Information Processing Systems 33.

  11. Chang, S.-H., Cosman, P. C. and Milstein, L. B. (2011). Chernoff-type bounds for the Gaussian error function. IEEE Transactions on Communications 59 2939–2944.

    Article  Google Scholar 

  12. Chkifa, A., Cohen, A. and Schwab, C. (2015). Breaking the curse of dimensionality in sparse polynomial approximation of parametric PDEs. Journal de Mathématiques Pures et Appliquées 103 400–428.

    Article  MathSciNet  MATH  Google Scholar 

  13. Cohen, A. (2003). Numerical analysis of wavelet methods. Elsevier.

    MATH  Google Scholar 

  14. Cohen, A. and Migliorati, G. (2018). Multivariate approximation in downward closed polynomial spaces. In Contemporary Computational Mathematics-A celebration of the 80th birthday of Ian Sloan 233–282. Springer.

  15. Cui, T. and Dolgov, S. (2021). Deep composition of tensor trains using squared inverse Rosenblatt transports. Foundations of Computational Mathematics 1–60.

  16. Cui, T., Dolgov, S. and Zahm, O. (2023). Scalable conditional deep inverse Rosenblatt transports using tensor trains and gradient-based dimension reduction. Journal of Computational Physics 485 112103.

    Article  MathSciNet  MATH  Google Scholar 

  17. Cui, T., Tong, X. T. and Zahm, O. (2022). Prior normalization for certified likelihood-informed subspace detection of Bayesian inverse problems. Inverse Problems 38 124002.

    Article  MathSciNet  MATH  Google Scholar 

  18. Dinh, L., Sohl-Dickstein, J. and Bengio, S. (2017). Density estimation using Real NVP. In International Conference on Learning Representations.

  19. Durkan, C., Bekasov, A., Murray, I. and Papamakarios, G. (2019). Neural spline flows. In Advances in Neural Information Processing Systems 7509–7520.

  20. El Moselhy, T. A. and Marzouk, Y. M. (2012). Bayesian inference with optimal maps. Journal of Computational Physics 231 7815–7850.

    Article  MathSciNet  MATH  Google Scholar 

  21. Huang, C.-W., Chen, R. T., Tsirigotis, C. and Courville, A. (2020). Convex Potential Flows: Universal Probability Distributions with Optimal Transport and Convex Optimization. In International Conference on Learning Representations.

  22. Huang, C.-W., Krueger, D., Lacoste, A. and Courville, A. (2018). Neural Autoregressive Flows. In International Conference on Machine Learning 2083–2092.

  23. Irons, N. J., Scetbon, M., Pal, S. and Harchaoui, Z. (2022). Triangular flows for generative modeling: Statistical consistency, smoothness classes, and fast rates. In International Conference on Artificial Intelligence and Statistics 10161–10195. PMLR.

  24. Jaini, P., Kobyzev, I., Yu, Y. and Brubaker, M. (2020). Tails of Lipschitz triangular flows. In International Conference on Machine Learning 4673–4681. PMLR.

  25. Jaini, P., Selby, K. A. and Yu, Y. (2019). Sum-of-squares polynomial flow. In International Conference on Machine Learning 3009–3018.

  26. Katzfuss, M. and Schäfer, F. (2023). Scalable Bayesian transport maps for high-dimensional non-Gaussian spatial fields. Journal of the American Statistical Association 0 1–15.

  27. Kingma, D. P. and Dhariwal, P. (2018). Glow: Generative flow with invertible 1x1 convolutions. In Advances in Neural Information Processing Systems 10215–10224.

  28. Kobyzev, I., Prince, S. and Brubaker, M. (2020). Normalizing flows: An introduction and review of current methods. IEEE Transactions on Pattern Analysis and Machine Intelligence.

  29. Koller, D. and Friedman, N. (2009). Probabilistic graphical models: principles and techniques. MIT press.

    MATH  Google Scholar 

  30. Kufner, A. and Opic, B. (1984). How to define reasonably weighted Sobolev spaces. Commentationes Mathematicae Universitatis Carolinae 25 537–554.

    MathSciNet  MATH  Google Scholar 

  31. Lezcano Casado, M. (2019). Trivializations for gradient-based optimization on manifolds. Advances in Neural Information Processing Systems 32 9157–9168.

    Google Scholar 

  32. Lichman, M. (2013). UCI Machine Learning Repository. http://archive.ics.uci.edu/ml.

  33. Lueckmann, J.-M., Boelts, J., Greenberg, D., Goncalves, P. and Macke, J. (2021). Benchmarking simulation-based inference. In International Conference on Artificial Intelligence and Statistics 343–351. PMLR.

  34. Mallat, S. (1999). A wavelet tour of signal processing. Elsevier.

    MATH  Google Scholar 

  35. Marzouk, Y., Moselhy, T., Parno, M. and Spantini, A. (2016). Sampling via Measure Transport: An Introduction In Handbook of Uncertainty Quantification 1–41. Springer International Publishing.

  36. Migliorati, G. (2015). Adaptive polynomial approximation by means of random discrete least squares. In Numerical Mathematics and Advanced Applications-ENUMATH 2013 547–554. Springer.

  37. Migliorati, G. (2019). Adaptive approximation by optimal weighted least-squares methods. SIAM Journal on Numerical Analysis 57 2217–2245.

    Article  MathSciNet  MATH  Google Scholar 

  38. Morrison, R., Baptista, R. and Marzouk, Y. (2017). Beyond normality: Learning sparse probabilistic graphical models in the non-Gaussian setting. In Advances in Neural Information Processing Systems 2359–2369.

  39. Muckenhoupt, B. (1972). Hardy’s inequality with weights. Studia Mathematica 44 31–38.

    Article  MathSciNet  MATH  Google Scholar 

  40. Nocedal, J. and Wright, S. (2006). Numerical optimization. Springer Science & Business Media.

    MATH  Google Scholar 

  41. Novak, E., Ullrich, M., Woźniakowski, H. and Zhang, S. (2018). Reproducing kernels of Sobolev spaces on \(\mathbb{R}^d\) and applications to embedding constants and tractability. Analysis and Applications 16 693–715.

    Article  MathSciNet  MATH  Google Scholar 

  42. Oord, A. V. D., Li, Y., Babuschkin, I., Simonyan, K., Vinyals, O., Kavukcuoglu, K., Driessche, G. V. D., Lockhart, E., Cobo, L. C., Stimberg, F. et al. (2017). Parallel WaveNet: Fast high-fidelity speech synthesis. arXiv preprintarXiv:1711.10433.

  43. Papamakarios, G. and Murray, I. (2016). Fast \(\varepsilon \)-free inference of simulation models with Bayesian conditional density estimation. In Advances in Neural Information Processing Systems 1028–1036.

  44. Papamakarios, G., Nalisnick, E., Rezende, D. J., Mohamed, S. and Lakshminarayanan, B. (2021). Normalizing flows for probabilistic modeling and inference. Journal of Machine Learning Research 22 1–64.

    MathSciNet  MATH  Google Scholar 

  45. Papamakarios, G., Pavlakou, T. and Murray, I. (2017). Masked autoregressive flow for density estimation. In Advances in Neural Information Processing Systems 2338–2347.

  46. Parno, M. D. and Marzouk, Y. M. (2018). Transport map accelerated Markov chain Monte Carlo. SIAM/ASA Journal on Uncertainty Quantification 6 645–682.

    Article  MathSciNet  MATH  Google Scholar 

  47. Radev, S. T., Mertens, U. K., Voss, A., Ardizzone, L. and Köthe, U. (2020). BayesFlow: Learning complex stochastic models with invertible neural networks. IEEE transactions on neural networks and learning systems.

  48. Ramsay, J. O. (1998). Estimating smooth monotone functions. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 60 365–375.

    Article  MathSciNet  MATH  Google Scholar 

  49. Raskutti, G. and Uhler, C. (2018). Learning directed acyclic graph models based on sparsest permutations. Stat 7 e183.

    Article  MathSciNet  Google Scholar 

  50. Rezende, D. and Mohamed, S. (2015). Variational inference with normalizing flows. In International conference on machine learning 1530–1538. PMLR.

  51. Rosenblatt, M. (1952). Remarks on a multivariate transformation. The Annals of Mathematical Statistics 23 470–472.

    Article  MathSciNet  MATH  Google Scholar 

  52. Rothfuss, J., Ferreira, F., Walther, S. and Ulrich, M. (2019). Conditional density estimation with neural networks: Best practices and benchmarks. arXiv preprintarXiv:1903.00954.

  53. Santambrogio, F. (2015). Optimal Transport for Applied Mathematicians. Springer International Publishing.

  54. Schäfer, F., Katzfuss, M. and Owhadi, H. (2021). Sparse Cholesky Factorization by Kullback–Leibler Minimization. SIAM Journal on Scientific Computing 43 A2019–A2046.

    Article  MathSciNet  MATH  Google Scholar 

  55. Schmuland, B. (1992). Dirichlet forms with polynomial domain. Math. Japon 37 1015–1024.

    MathSciNet  MATH  Google Scholar 

  56. Schölkopf, B., Herbrich, R. and Smola, A. J. (2001). A generalized representer theorem. In International conference on computational learning theory 416–426. Springer.

  57. Shin, Y. E., Zhou, L. and Ding, Y. (2022). Joint estimation of monotone curves via functional principal component analysis. Computational Statistics & Data Analysis 166 107343.

    Article  MathSciNet  MATH  Google Scholar 

  58. Silverman, B. W. (1982). On the estimation of a probability density function by the maximum penalized likelihood method. The Annals of Statistics 795–810.

  59. Sisson, S. A., Fan, Y. and Tanaka, M. M. (2007). Sequential Monte Carlo without likelihoods. Proceedings of the National Academy of Sciences 104 1760–1765.

    Article  MathSciNet  MATH  Google Scholar 

  60. Spantini, A., Baptista, R. and Marzouk, Y. (2022). Coupling techniques for nonlinear ensemble filtering. SIAM Review 64 921–953.

    Article  MathSciNet  MATH  Google Scholar 

  61. Spantini, A., Bigoni, D. and Marzouk, Y. (2018). Inference via low-dimensional couplings. The Journal of Machine Learning Research 19 2639–2709.

    MathSciNet  MATH  Google Scholar 

  62. Tabak, E. G. and Turner, C. V. (2013). A family of nonparametric density estimation algorithms. Communications on Pure and Applied Mathematics 66 145–164.

    Article  MathSciNet  MATH  Google Scholar 

  63. Teshima, T., Ishikawa, I., Tojo, K., Oono, K., Ikeda, M. and Sugiyama, M. (2020). Coupling-based invertible neural networks are universal diffeomorphism approximators. In Advances in Neural Information Processing Systems 33 3362–3373.

    Google Scholar 

  64. Trippe, B. L. and Turner, R. E. (2018). Conditional density estimation with Bayesian normalising flows. In Bayesian Deep Learning: NIPS 2017 Workshop.

  65. Truong, T. T. and Nguyen, H.-T. (2021). Backtracking Gradient Descent Method and Some Applications in Large Scale Optimisation. Part 2: Algorithms and Experiments. Applied Mathematics & Optimization 84 2557–2586.

    Article  MathSciNet  MATH  Google Scholar 

  66. Uria, B., Murray, I. and Larochelle, H. (2013). RNADE: The real-valued neural autoregressive density-estimator. arXiv preprintarXiv:1306.0186.

  67. Vershynin, R. (2018). High-dimensional probability: An introduction with applications in data science 47. Cambridge university press.

    Book  MATH  Google Scholar 

  68. Vidakovic, B. (2009). Statistical modeling by wavelets 503. John Wiley & Sons.

    Google Scholar 

  69. Villani, C. (2008). Optimal transport: old and new 338. Springer Science & Business Media.

    MATH  Google Scholar 

  70. Wang, S. and Marzouk, Y. (2022). On minimax density estimation via measure transport. arXiv preprintarXiv:2207.10231.

  71. Wasserman, L. (2013). All of statistics: a concise course in statistical inference. Springer Science & Business Media.

    MATH  Google Scholar 

  72. Wehenkel, A. and Louppe, G. (2019). Unconstrained monotonic neural networks. In Advances in Neural Information Processing Systems 1543–1553.

  73. Wenliang, L., Sutherland, D., Strathmann, H. and Gretton, A. (2019). Learning deep kernels for exponential family densities. In International Conference on Machine Learning 6737–6746.

  74. Zahm, O., Cui, T., Law, K., Spantini, A. and Marzouk, Y. (2022). Certified dimension reduction in nonlinear Bayesian inverse problems. Mathematics of Computation 91 1789–1835.

    Article  MathSciNet  MATH  Google Scholar 

  75. Zech, J. and Marzouk, Y. (2022). Sparse approximation of triangular transports. Part II: the infinite dimensional case. Constructive Approximation 55 987–1036.

    Article  MathSciNet  MATH  Google Scholar 

  76. Zech, J. and Marzouk, Y. (2022). Sparse Approximation of triangular transports. Part I: the finite-dimensional case. Constructive Approximation 55 919–986.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

RB, YM, and OZ gratefully acknowledge support from the INRIA associate team Unquestionable. RB and YM are also grateful for support from the AFOSR Computational Mathematics program (MURI award FA9550-15-1-0038) and the US Department of Energy AEOLUS center. RB acknowledges support from an NSERC PGSD-D fellowship. OZ also acknowledges support from the ANR JCJC project MODENA (ANR-21-CE46-0006-01).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ricardo Baptista.

Additional information

Communicated by Albert Cohen.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Proofs and Theoretical Details

1.1 A.1. Proof of Proposition 1

Proof

Recall that the KR rearrangement \(S_{\text {KR}}\) is a transport map that satisfies \(S_{\text {KR}}^\sharp \eta =\pi \), where \(\eta \) is the density of the standard Gaussian measure on \(\mathbb {R}^d\) and \(\pi \) is the target density. Corollary 3.10 in [8] states that for any PDF \(\varrho \) on \(\mathbb {R}^d\) of the form \(\varrho ({\varvec{x}}):=f({\varvec{x}})\eta ({\varvec{x}})\) with \(f\log f\in L^1_\eta \), the inequality

$$\begin{aligned} \int \Vert {\varvec{x}}-T({\varvec{x}})\Vert ^2 \eta ({\varvec{x}})\text {d}{\varvec{x}}\le 2 \int f({\varvec{x}})\log f({\varvec{x}}) \eta ({\varvec{x}})\text {d}{\varvec{x}}, \end{aligned}$$
(38)

holds, where T is the KR rearrangement such that \(T_\sharp \eta =\varrho \). Let S be an increasing lower triangular map as in (1) and let \(\varrho = S_\sharp \pi \). Thus, we have \(T = S\circ S_{\text {KR}}^{-1}\), and so, the left-hand side of (38) becomes

$$\begin{aligned} \int \Vert {\varvec{x}}-T({\varvec{x}})\Vert ^2 \eta ({\varvec{x}})\text {d}{\varvec{x}}= & {} \int \Vert {\varvec{x}}-S\circ S_{\text {KR}}^{-1}({\varvec{x}})\Vert ^2 \eta ({\varvec{x}})\text {d}{\varvec{x}}\\= & {} \int \Vert S_{\text {KR}}({\varvec{x}})-S({\varvec{x}})\Vert ^2 \pi ({\varvec{x}})\text {d}{\varvec{x}}, \end{aligned}$$

and the right-hand side becomes

$$\begin{aligned} 2 \int f({\varvec{x}})\log f({\varvec{x}}) \eta ({\varvec{x}})\text {d}{\varvec{x}}= 2 \mathcal {D}_\text {KL}( \varrho ||\eta ) = 2\mathcal {D}_\text {KL}( \pi || S^\sharp \eta ), \end{aligned}$$

which yields (4). \(\square \)

1.2 A.2. Convexity of \(s\mapsto \mathcal {J}_k(s)\)

Lemma 10

The optimization problem \(\min _{\{s:\partial _k s > 0\}} \mathcal {J}_{k}(s)\) is strictly convex.

Proof

Let \(s_{1}, s_{2}:\mathbb {R}^k \rightarrow \mathbb {R}\) be two functions such that \(\partial _{k} s_{1}({\varvec{x}}_{\le k}) > 0\) and \(\partial _{k} s_{2}({\varvec{x}}_{\le k}) > 0\). Let \( s_t = t s_1 + (1-t)s_2\) for \(0<t<1\). Then \(s_t\) also satisfies \(\partial _{k} s_{t}({\varvec{x}}_{\le k}) > 0\). Finally, because both \(\xi \mapsto \frac{1}{2}\xi ^2\) and \(\xi \mapsto -\log (\xi )\) are strictly convex functions, we have

$$\begin{aligned} \mathcal {J}_{k}(s_t)&\overset{(6)}{=}\ \int \left( \frac{1}{2}s_t({\varvec{x}}_{\le k})^2-\log \partial _k s_t({\varvec{x}}_{\le k}) \right) \pi ({\varvec{x}})\text {d}{\varvec{x}}\\&< \int \Big ( t \frac{1}{2}s_1({\varvec{x}}_{\le k})^2 + (1-t) \frac{1}{2}s_2({\varvec{x}}_{\le k})^2 \Big )\\&\quad -\Big ( t \log \partial _k s_1({\varvec{x}}_{\le k}) + (1-t) \log \partial _k s_2({\varvec{x}}_{\le k}) \Big ) \pi ({\varvec{x}})\text {d}{\varvec{x}}\\&= t \mathcal {J}_{k}(s_1) + (1-t)\mathcal {J}_{k}(s_2) , \end{aligned}$$

which shows that \(\mathcal {J}_{k}\) is strictly convex. \(\square \)

1.3 A.3. Proof of Proposition 2

To prove Proposition 2, we need the following lemma.

Lemma 11

Let

$$\begin{aligned} H^1([0,1])=\left\{ f:[0,1]\rightarrow \mathbb {R}\text { such that } \Vert f\Vert ^2_{H^1([0,1])}{:}{=}\int _0^1 f(t)^2+f'(t)^2 \, \text {d}t\right\} . \end{aligned}$$

Then

$$\begin{aligned} |f(0)|\le \sqrt{2} \Vert f\Vert _{H^1([0,1])} \end{aligned}$$
(39)

holds for any \(f\in H^1([0,1])\).

Proof

Because \(\mathcal {C}^{\infty }([0,1])\) is dense in \(H^1([0,1])\), it suffices to show (39) for any \(f\in \mathcal {C}^{\infty }([0,1])\). By the mean value theorem, there exists \(0\le z \le 1\) such that

$$\begin{aligned} f(z) = \frac{1}{1-0}\int _{0}^{1} f(t) \text {d}t. \end{aligned}$$

Thus, we can write

$$\begin{aligned} |f(0)|^2&\le 2|f(z)-f(0)|^2 + 2|f(z)|^2 \\&= 2\left| \int _{0}^{z} f'(t)\text {d}t\right| ^2 + 2\left| \int _{0}^{1} f(t) \text {d}t \right| ^2 \\&\le 2\int _{0}^{1} \left| f'(t) \right| ^2 \text {d}t + 2\int _{0}^{1} \left| f(t) \right| ^2 \text {d}t . \end{aligned}$$

This concludes the proof. \(\square \)

We now prove Proposition 2.

Proof

For any \(f\in V_k\), Lemma 11 permits us to write

$$\begin{aligned}&\int |f({\varvec{x}}_{<k},0)|^2 \eta _{<k}({\varvec{x}}_{<k})\text {d}{\varvec{x}}_{<k} \\&\quad \overset{(39)}{\le }\int \left( 2\int _0^1 |f({\varvec{x}}_{<k},t)|^2 + |\partial _k f({\varvec{x}}_{<k},t)|^2 \text {d}t \right) \eta _{<k}({\varvec{x}}_{<k})\text {d}{\varvec{x}}_{<k}\\&\quad \le C_T\int \int _0^1 \Big ( |f({\varvec{x}}_{<k},t)|^2 + |\partial _k f({\varvec{x}}_{<k},t)|^2 \Big ) \eta _{<k}({\varvec{x}}_{<k})\eta _1(t) \text {d}{\varvec{x}}_{<k}\text {d}t\\&\quad \le C_T\int \int _{-\infty }^{+\infty } \Big ( |f({\varvec{x}}_{<k},t)|^2 + |\partial _k f({\varvec{x}}_{<k},t)|^2 \Big ) \eta _{\le k}({\varvec{x}}_{<k},t) \text {d}{\varvec{x}}_{<k}\text {d}t \\&\quad = C_T\Vert f\Vert _{V_k}^2 , \end{aligned}$$

where \(C_T=2 \sup _{0\le t\le 1} \eta _1(t)^{-1}\). \(\square \)

1.4 A.4. Proof of Proposition 3

The proof relies on Proposition 2 and on the following generalized integral Hardy inequality, see [39].

Lemma 12

Let \(\eta _{\le k}\) be the standard Gaussian density on \(\mathbb {R}^k\). Then there exists a constant \(C_{H}\) such that for any \(v \in L^2_{\eta }(\mathbb {R}^{k})\),

$$\begin{aligned} \int \left( \int _{0}^{x_{k}} v({\varvec{x}}_{<k},t)\text {d}t\right) ^2\eta _{\le k}({\varvec{x}})\text {d}{\varvec{x}}\le C_{H} \int v({\varvec{x}})^2\eta _{\le k}({\varvec{x}})\text {d}{\varvec{x}}. \end{aligned}$$
(40)

Proof of Lemma 12

Let us recall the integral Hardy inequality [39]. \(\square \)

Theorem 13

(from [39]) For weight \(\rho :\mathbb {R}_{+} \rightarrow \mathbb {R}_{+}\) and \(u \in L^2_{\rho }(\mathbb {R})\), there exists a constant \(C_{H} < \infty \) such that

$$\begin{aligned} \int _{0}^{+\infty } \left( \int _{0}^{x} u(t)\text {d}t\right) ^2 \rho (x) \text {d}x \le C_{H} \int _{0}^{+\infty } u(x)^2\rho (x)\text {d}x \end{aligned}$$
(41)

if and only if

$$\begin{aligned} \sup _{x > 0} \, \left( \int _{x}^{+\infty }\rho (t)\text {d}t \right) ^{1/2} \left( \int _{0}^{x} \rho (t)^{-1} \text {d}t \right) ^{1/2} < +\infty . \end{aligned}$$
(42)

We apply Theorem 13 with the one-dimensional standard Gaussian density \(\rho = \eta \) for \(x > 0\). In order to check condition (42), we need to show that

$$\begin{aligned} D(x) {:}{=}\left( \int _{x}^{+\infty }\rho (t)\text {d}t \right) ^{1/2} \left( \int _{0}^{x} \rho (t)^{-1}\text {d}t \right) ^{1/2} = \left( \int _{x}^{+\infty } e^{-t^2/2}\text {d}t \right) ^{1/2} \left( \int _{0}^{x} e^{t^2/2}\text {d}t \right) ^{1/2}, \end{aligned}$$

is bounded. Since \(x\mapsto D(x)\) is a continuous function with a finite limit as \(x \rightarrow 0\), it is sufficient to show that D(x) has a finite limit when \(x \rightarrow \infty \). For \(x > 1\), \(\int _{x}^{+\infty } e^{-t^2/2} \text {d}t \le e^{-x^2/2}\) and \(D(x)^2 \le e^{-x^2/2} \int _{0}^{x} e^{t^2/2} \text {d}t\). Furthermore, using integration by parts we have \(\int _{0}^{x} e^{t^2/2} \text {d}t = \int _{0}^{1} e^{t^2/2}\text {d}t + e^{x^2/2}/x - \sqrt{e} + \int _{1}^{x} e^{t^2/2}/t^2\text {d}t\). As \(x \rightarrow \infty \), the dominating term in the sum is \(e^{x^2/2}/x\). Thus, \(e^{-x^2/2} \int _{0}^{x} e^{t^2/2}\text {d}t\) behaves asymptotically as \(\mathcal {O}(\frac{1}{x})\), so that \(D(x)\rightarrow 0\) when \(x\rightarrow \infty \). Thus, condition (42) is satisfied.

Then, by the Hardy inequality in (41) for \(u \in L_{\eta }^2(\mathbb {R})\) we have

$$\begin{aligned} \int _{0}^{+\infty } \left( \int _{0}^{x_{k}} u(t)\text {d}t\right) ^2 \eta (x_{k}) \text {d}x_{k} \le C_{H} \int _{0}^{+\infty } u(x_{k})^2\eta (x_{k}) \text {d}x_{k}. \end{aligned}$$
(43)

For the symmetric density \(\eta (x_{k}) = \eta (-x_{k})\), we also have

$$\begin{aligned} \int _{-\infty }^{0} \left( \int _{0}^{x_{k}} u(t)\text {d}t\right) ^2 \eta (x_{k}) \text {d}x_{k} \le C_{H} \int _{-\infty }^{0} u(x_{k})^2 \eta (x_{k}) \text {d}x_{k}. \end{aligned}$$
(44)

Combining the results in (43) and (44), we have

$$\begin{aligned} \int _{-\infty }^{+\infty } \left( \int _{0}^{x_{k}} u(t)\text {d}t\right) ^2 \eta (x_{k}) \text {d}x_{k} \le C_{H} \int _{-\infty }^{+\infty } u(x_{k})^2 \eta (x_{k}) \text {d}x_{k}. \end{aligned}$$

Setting \(u(t) = v({\varvec{x}}_{<k},t)\) and integrating both sides over \({\varvec{x}}_{<k} \in \mathbb {R}^{k-1}\) with the standard Gaussian weight function \(\eta ({\varvec{x}}_{<k})\) give the result. \(\square \)

We now prove Proposition 3.

Proof

By Proposition 2, by Lemma 12, and by the Lipschitz property of g, we can write

$$\begin{aligned}&\Vert \mathcal {R}_k(f_1) - \mathcal {R}_k(f_2)\Vert _{L^2_{\eta _{\le k}}}^2\nonumber \\&\quad \le 2 \int \Big ( f_1({\varvec{x}}_{<k},0)-f_2({\varvec{x}}_{<k},0) \Big )^2 \eta _{\le k}({\varvec{x}})\text {d}{\varvec{x}}\nonumber \\&\qquad + 2 \int \Big ( \int _0^{x_k} g\big (\partial _k f_1({\varvec{x}}_{<k},t)\big ) - g\big (\partial _k f_2({\varvec{x}}_{<k},t)\big ) \text {d}t \Big )^2\eta _{\le k}({\varvec{x}})\text {d}{\varvec{x}}\nonumber \\&\quad \le 2C_T \Vert f_1-f_2 \Vert _{V_k}^2 + 2 C_H \Vert g(\partial _k f_1)-g(\partial _k f_2) \Vert _{L^2_{\eta _{\le k}}}^2 \nonumber \\&\quad \le 2C_T \Vert f_1-f_2 \Vert _{V_k}^2 + 2 C_H L^2 \Vert \partial _k f_1-\partial _k f_2 \Vert _{L^2_{\eta _{\le k}}}^2 \nonumber \\&\quad \le 2(C_T+C_H L^2) \Vert f_1-f_2 \Vert _{V_k}^2 , \end{aligned}$$
(45)

for any \(f_1,f_2\in V_k\). Furthermore, using the Lipschitz property of g we have

$$\begin{aligned} \Vert \partial _k \mathcal {R}_k(f_1) - \partial _k \mathcal {R}_k(f_2)\Vert _{L^2_{\eta _{\le k}}}^2&= \int \Big ( g(\partial _k f_1({\varvec{x}}_{<k},t)) - g(\partial _k f_2({\varvec{x}}_{<k},t)) \Big )^2 \eta _{\le k}({\varvec{x}})\text {d}{\varvec{x}}\nonumber \\&\le L^2 \int \Big ( \partial _k f_1({\varvec{x}}_{<k},t) - \partial _k f_2({\varvec{x}}_{<k},t) \Big )^2 \eta _{\le k}({\varvec{x}})\text {d}{\varvec{x}}\nonumber \\&\le L^2 \Vert f_1-f_2 \Vert _{V_k}^2. \end{aligned}$$
(46)

Combining (45) with (46), we obtain (19) with \(C = \sqrt{2(C_T+C_H L^2)+L^2}\).

It remains to show that \(\Vert \mathcal {R}_k(f) \Vert _{V_k}<\infty \) for any \(f\in V_k\). Letting with \(f_1=f\) and \(f_2=0\) in (19), the triangle inequality yields

$$\begin{aligned} \Vert \mathcal {R}_k(f) \Vert _{V_k} \le \Vert \mathcal {R}_k(0) \Vert _{V_k} + C \Vert f\Vert _{V_k}. \end{aligned}$$

Because \(\mathcal {R}_k(0)\) is the affine function \({\varvec{x}}\mapsto g(0)x_k\), we have that \(\Vert \mathcal {R}_k(0) \Vert _{L^2_{\eta _{\le k}}}^2= g(0)^2\int x_k^2 \eta ({\varvec{x}})\text {d}{\varvec{x}}\) and \(\Vert \partial _k \mathcal {R}_k(0) \Vert _{L^2_{\eta _{\le k}}}^2 = g(0)^2\) are finite and so is \(\Vert \mathcal {R}_k(0) \Vert _{V_k}\). Thus, \(\mathcal {R}_k(f)\in V_k\) for all \(f\in V_k\). \(\square \)

1.5 A.5. Proof of Proposition 4

Proof

For any \(f\in V_k\), we have

$$\begin{aligned} ~~\left| \mathcal {L}_{k}(f) \right|&= \left| \int \left( \frac{1}{2}\mathcal {R}_k(f)^2 - \log (\partial _{k} \mathcal {R}_k(f)) \right) \text {d}\pi \right| \\&\!\!\overset{(20)}{\le }\ \frac{C_\pi }{2}\Vert \mathcal {R}_k(f)\Vert _{L^2_{\eta _{\le k}}}^2 + C_\pi \int \left| \log (g(\partial _{k} f)) \right| \text {d}\eta _{\le k} \\&\!\le \frac{C_\pi }{2} \Vert \mathcal {R}_k(f)\Vert _{L^2_{\eta _{\le k}}}^2 + C_\pi \int |\log (g(0))| + \left| \log (g(\partial _{k} f)) -\log (g(0)) \right| \text {d}\eta _{\le k} \\&\!\!\overset{(22)}{\le }\ \frac{C_\pi }{2} \Vert \mathcal {R}_k(f)\Vert _{L^2_{\eta _{\le k}}}^2 + C_\pi |\log (g(0))| + C_\pi L \int \left| \partial _{k} f -0 \right| \text {d}\eta _{\le k} \\&\le \frac{C_\pi }{2} \Vert \mathcal {R}_k(f)\Vert _{L^2_{\eta _{\le k}}}^2 +C_\pi |\log (g(0))| + C_\pi L \Vert f\Vert _{V_k}^2 . \end{aligned}$$

Because Proposition 3 ensures \(\mathcal {R}_k(f)\in V_k\subset L^2_{\eta _{\le k}}\), we have that \(\mathcal {L}_{k}(f)\) is finite for any \(f\in V_k\). Now, for any \(f_{1},f_2\in V_k\), we can write

$$\begin{aligned}&\left| \mathcal {L}_{k}(f_1) - \mathcal {L}_{k}(f_2) \right| \\&\quad = \left| \int \left( \frac{1}{2}\mathcal {R}_k(f_{1})^2 - \frac{1}{2}\mathcal {R}_k(f_{2})^2 - \log (\partial _{k} \mathcal {R}_k(f_{1})) + \log (\partial _{k} \mathcal {R}_k(f_{2})) \right) \text {d}\pi \right| \\&\quad \overset{(20)}{\le }C_\pi \int \frac{1}{2}\Big |\mathcal {R}_k(f_{1})^2 - \mathcal {R}_k(f_{2})^2 \Big | + \Big |\log (g(\partial _{k} f_{1})) - \log (g(\partial _{k}f_{2}))\Big | \text {d}\eta \\&\quad \overset{(22)}{\le }\ \frac{C_\pi }{2} \Vert \mathcal {R}_k(f_1)+\mathcal {R}_k(f_2)\Vert _{L^2_{\eta _{\le k}}}\Vert \mathcal {R}_k(f_1)-\mathcal {R}_k(f_2)\Vert _{L^2_{\eta _{\le k}}} + C_\pi L \Vert \partial _{k} f_{1} - \partial _{k}f_{2}\Vert _{L^2_{\eta _{\le k}}} \\&\quad \overset{(19)}{\le }\ C_\pi \frac{\Vert \mathcal {R}_k(f_1)\Vert _{L^2_{\eta _{\le k}}} + \Vert \mathcal {R}_k(f_2)\Vert _{L^2_{\eta _{\le k}}}}{2} C\Vert f_1-f_2\Vert _{V_k} + C_\pi L \Vert f_{1} - f_{2}\Vert _{V_k}. \end{aligned}$$

This shows that \(\mathcal {L}_{k}:V_k\rightarrow \mathbb {R}\) is continuous. To show that \(\mathcal {L}_{k}\) is differentiable, we let \(f,\varepsilon \in V_k\) so that

$$\begin{aligned}&\mathcal {L}_{k}(f+\varepsilon ) \\&\quad = \int \left( \frac{1}{2}\mathcal {R}_k(f+\varepsilon )^2 - \log (\partial _{k} \mathcal {R}_k(f+\varepsilon )) \right) \text {d}\pi \\&\quad = \int \left( \frac{1}{2}\left( f({\varvec{x}}_{<k},0) + \varepsilon ({\varvec{x}}_{<k},0) + \int _0^{x_k} g\big (\partial _k (f+\varepsilon )({\varvec{x}}_{<k},t)\big )\text {d}t\right) ^2 - \log \circ g(\partial _{k}(f+\varepsilon )({\varvec{x}})) \right) \pi ({\varvec{x}})\text {d}{\varvec{x}}\\&\quad = \int \left( \frac{1}{2}\mathcal {R}_k(f)({\varvec{x}})^2 + \mathcal {R}_k(f)({\varvec{x}})\left( \varepsilon ({\varvec{x}}_{<k},0) + \int _0^{x_k} g'\big (\partial _k f({\varvec{x}}_{<k},t)\big )\partial _k \varepsilon ({\varvec{x}}_{<k},t)\text {d}t\right) \right) \pi ({\varvec{x}})\text {d}{\varvec{x}}\\&\qquad - \int \log \circ g(\partial _{k}f) +( \log \circ g)'(\partial _{k}f)\partial _{k}\varepsilon \text {d}\pi + \mathcal {O}(\Vert \varepsilon \Vert _{V_k}^2) \\&\quad = \mathcal {L}_{k}(f) + \ell (\varepsilon ) + \mathcal {O}(\Vert \varepsilon \Vert _{V_k}^2), \end{aligned}$$

where \(\ell :V_k\rightarrow \mathbb {R}\) is the linear form defined by

$$\begin{aligned} \ell (\varepsilon )= & {} \int \mathcal {R}_k(f)({\varvec{x}})\left( \varepsilon ({\varvec{x}}_{<k},0) + \int _0^{x_k} g'\big (\partial _k f({\varvec{x}}_{<k},t)\big )\partial _k \varepsilon ({\varvec{x}}_{<k},t)\text {d}t\right) \\{} & {} - (\log \circ g)'(\partial _{k}f({\varvec{x}}))\partial _{k}\varepsilon ({\varvec{x}}) ~\pi ({\varvec{x}})\text {d}{\varvec{x}}. \end{aligned}$$

If \(\ell \) is continuous, meaning if there exists a constant \(C_\ell \) such that \(|\ell (\varepsilon )|\le C_\ell \Vert \varepsilon \Vert _{V_k}\) for any \(\varepsilon \in V_k\), then the Riesz representation theorem states that there exists a vector \(\nabla \mathcal {L}_k(f)\in V_k\) such that \(\ell (\varepsilon )=\langle \nabla \mathcal {L}_k(f),\varepsilon \rangle _{V_k}\). This proves \(\mathcal {L}_k\) is differentiable everywhere.

To show that \(\ell \) is continuous, we write

$$\begin{aligned} |\ell (\varepsilon )|&\overset{(20)}{\le }\ C_\pi \int \Big |\mathcal {R}_k(f)({\varvec{x}})\left( \varepsilon ({\varvec{x}}_{<k},0) + \int _0^{x_k} g'\big (\partial _k f({\varvec{x}}_{<k},t)\big )\partial _k \varepsilon ({\varvec{x}}_{<k},t)\text {d}t\right) \Big | \eta _{\le k}({\varvec{x}})\text {d}{\varvec{x}}\\&\quad + C_\pi \int \Big |(\log \circ g)'(\partial _{k}f({\varvec{x}}))\partial _{k}\varepsilon ({\varvec{x}}) \Big |\eta _{\le k}({\varvec{x}})\text {d}{\varvec{x}}\\&\overset{(22)}{\le }\ C_\pi \Vert \mathcal {R}_k(f)\Vert _{L^2_{\eta _{\le k}}} \sqrt{\int \Big |\varepsilon ({\varvec{x}}_{<k},0) + \int _0^{x_k} g'\big (\partial _k f({\varvec{x}}_{<k},t)\big )\partial _k \varepsilon ({\varvec{x}}_{<k},t)\text {d}t \Big |^2 \eta _{\le k}({\varvec{x}})\text {d}{\varvec{x}}}+ C_\pi L\Vert \partial _{k}\varepsilon \Vert _{L_{\eta _{\le k}}^2} \\&\overset{(21)}{\le }\ C_\pi \Vert \mathcal {R}_k(f)\Vert _{L^2_{\eta _{\le k}}} \sqrt{2 C_T\Vert \varepsilon \Vert _{V_k}^2 +2C_H L^2\int \big | \partial _k \varepsilon ({\varvec{x}})) \big |^2 \eta _{\le k}({\varvec{x}})\text {d}{\varvec{x}}} + C_\pi L\Vert \partial _{k}\varepsilon \Vert _{L_{\eta _{\le k}}^2} \\&\le C_\pi \Big ( \Vert \mathcal {R}_k(f)\Vert _{L^2_{\eta _{\le k}}} \sqrt{2 C_T + 2C_H L^2} + L \Big ) \Vert \varepsilon \Vert _{V_k}, \end{aligned}$$

where the second last inequality also uses Proposition 2 and Lemma 12. This concludes the proof. \(\square \)

1.6 A.6. Proof of the Local Lipschitz Regularity (24)

Proposition 14

In addition to the assumptions of Theorem 4, we further assume there exists a constant \(L<\infty \) such that for all \(\xi ,\xi '\in \mathbb {R}\) we have

$$\begin{aligned} |g'(\xi )-g'(\xi ')|&\le L |\xi -\xi '| \end{aligned}$$
(47)
$$\begin{aligned} |(\log \circ g)'(\xi )-(\log \circ g)'(\xi ')|&\le L |\xi -\xi '|. \end{aligned}$$
(48)

Then there exists \(M<\infty \) such that

$$\begin{aligned} \Vert \nabla \mathcal {L}_k(f_1)-\nabla \mathcal {L}_k(f_2) \Vert _{V_k} \le M(1+\Vert \mathcal {R}_k(f_2)\Vert _{V_k}) \Vert f_1 - f_2 \Vert _{\overline{V}_k}, \end{aligned}$$

for any \(f_1,f_2 \in \overline{V}_k\), where \(\overline{V}_k = \{f \in V_k, \partial _k f \in L^\infty \}\) is the space endowed with the norm \(\Vert f \Vert _{\overline{V}_k} = \Vert f\Vert _{V_k} + \Vert \partial _k f \Vert _{L^\infty }\).

Proof

Recall the definition (23) of \(\nabla \mathcal {L}_k(f)\)

$$\begin{aligned}&\langle \nabla \mathcal {L}_k(f) , \varepsilon \rangle _{V_k} \\&\quad = \int \mathcal {R}_k(f)({\varvec{x}})\left( \varepsilon ({\varvec{x}}_{<k},0) + \int _0^{x_k} g'\big (\partial _k f({\varvec{x}}_{<k},t)\big )\partial _k \varepsilon ({\varvec{x}}_{<k},t)\text {d}t\right) \pi ({\varvec{x}})\text {d}{\varvec{x}}\\&\qquad - \int (\log \circ g)'(\partial _{k}f({\varvec{x}}))\partial _{k}\varepsilon ({\varvec{x}}) \pi ({\varvec{x}})\text {d}{\varvec{x}}. \end{aligned}$$

Then for any \(f_1,f_2\in \overline{V}_k\). we can write

$$\begin{aligned} \langle \nabla \mathcal {L}_k(f_1)-\nabla \mathcal {L}_k(f_2), \varepsilon \rangle _{V_k} = A+B+C+D, \end{aligned}$$

where

$$\begin{aligned} A&= \int \Big (\mathcal {R}_k(f_1)({\varvec{x}}) - \mathcal {R}_k(f_2)({\varvec{x}}) \Big )\varepsilon ({\varvec{x}}_{<k},0) \pi ({\varvec{x}}) \text {d}{\varvec{x}}\\ B&=\int \Big (\mathcal {R}_k(f_1)({\varvec{x}}) - \mathcal {R}_k(f_2)({\varvec{x}})\Big )\left( \int _0^{x_k} g'(\partial _k f_1({\varvec{x}}_{<k},t)) \partial _k \varepsilon ({\varvec{x}}_{<k},t) \text {d}t \right) \pi ({\varvec{x}}) \text {d}{\varvec{x}}\\ C&= \int \mathcal {R}_k(f_2)({\varvec{x}})\left( \int _0^{x_k} \Big (g'(\partial _k f_1({\varvec{x}}_{<k},t)) - g'(\partial _k f_2({\varvec{x}}_{<k},t)) \Big ) \partial _k \varepsilon ({\varvec{x}}_{<k},t) \text {d}t \right) \pi ({\varvec{x}}) \text {d}{\varvec{x}}\\ D&= \int \Big ( (\log \circ g)'(\partial _k f_1({\varvec{x}})) - (\log \circ g)'(\partial _k f_2({\varvec{x}})) \Big )\partial _k \varepsilon ({\varvec{x}}) \pi ({\varvec{x}}) \text {d}{\varvec{x}}. \end{aligned}$$

For the first term A, we write

$$\begin{aligned} |A|&\overset{(20)}{\le }\ C_\pi \int \Big |\mathcal {R}_k(f_1)({\varvec{x}}) - \mathcal {R}_k(f_2)({\varvec{x}}) \Big | |\varepsilon ({\varvec{x}}_{<k},0)| \eta ({\varvec{x}}) \text {d}{\varvec{x}}\\&\le C_\pi \Vert \mathcal {R}_k(f_1) - \mathcal {R}_k(f_2)\Vert _{V_k} \left( \int |\varepsilon ({\varvec{x}}_{<k},0)|^2 \eta _{<k}({\varvec{x}})\text {d}{\varvec{x}}\right) ^{1/2} \\&\overset{(16)}{\le }\ C_\pi \sqrt{C_T} \Vert \mathcal {R}_k(f_1) - \mathcal {R}_k(f_2)\Vert _{V_k} \Vert \varepsilon \Vert _{V_k} \\&\overset{(19)}{\le }\ C_\pi \sqrt{C_T} C \Vert f_1 - f_2 \Vert _{V_k} \Vert \varepsilon \Vert _{V_k}. \end{aligned}$$

For the second term B, we write

$$\begin{aligned} |B|&\overset{(20)}{\le }\ C_\pi \Vert \mathcal {R}_k(f_1) - \mathcal {R}_k(f_2)\Vert _{V_k}\left( \int \left( \int _0^{x_k} g'(\partial _k f_1({\varvec{x}}_{<k},t)) \partial _k \varepsilon ({\varvec{x}}_{<k},t) \text {d}t \right) ^2\eta ({\varvec{x}}) \text {d}{\varvec{x}}\right) ^{1/2}\\&\overset{\begin{array}{c} (40) \\ (19) \end{array}}{\le } C_\pi \sqrt{C_H} C \Vert f_1-f_2\Vert _{V_k}\Big (\int \left( g'(\partial _k f_1({\varvec{x}}_{\le k})) \partial _k \varepsilon ({\varvec{x}}_{\le k}) \Big )^2\eta ({\varvec{x}}) \text {d}{\varvec{x}}\right) ^{1/2} \\&\overset{(21)}{\le }\ C_\pi \sqrt{C_H} C L \Vert f_1-f_2\Vert _{V_k} \Big (\int \left( \partial _k \varepsilon ({\varvec{x}}_{\le k}) \Big )^2\eta ({\varvec{x}}) \text {d}{\varvec{x}}\right) ^{1/2}\\&\le C_\pi \sqrt{C_H} C L\Vert f_1-f_2\Vert _{V_k} \Vert \varepsilon \Vert _{V_k} . \end{aligned}$$

For the third term C, we write

$$\begin{aligned} |C|&\overset{(20)}{\le }\ C_\pi \Vert \mathcal {R}_k(f_2)\Vert _{V_k} \left( \int \left( \int _0^{x_k} \Big (g'(\partial _k f_1({\varvec{x}}_{<k},t)) - g'(\partial _k f_2({\varvec{x}}_{<k},t)) \Big ) \partial _k \varepsilon ({\varvec{x}}_{<k},t) \text {d}t \right) ^2\eta ({\varvec{x}}) \text {d}{\varvec{x}}\right) ^{1/2} \\&\overset{(40)}{\le }\ C_\pi \sqrt{C_H} \Vert \mathcal {R}_k(f_2)\Vert _{V_k} \left( \int \left( \Big (g'(\partial _k f_1({\varvec{x}}_{\le k})) - g'(\partial _k f_2({\varvec{x}}_{\le k})) \Big ) \partial _k \varepsilon ({\varvec{x}}_{\le k}) \right) ^2\eta ({\varvec{x}}) \text {d}{\varvec{x}}\right) ^{1/2} \\&\le C_\pi \sqrt{C_H} \Vert \mathcal {R}_k(f_2)\Vert _{V_k}\left( {\textrm{ess}}\,{\textrm{sup}\,}\Big | g'\circ \partial _k f_1 - g'\circ \partial _k f_2 \Big | \right) \left( \int \left( \partial _k \varepsilon ({\varvec{x}}_{\le k}) \right) ^2\eta ({\varvec{x}}) \text {d}{\varvec{x}}\right) ^{1/2} \\&\overset{(47)}{\le }\ C_\pi \sqrt{C_H} L \Vert \mathcal {R}_k(f_2)\Vert _{V_k} \left( {\textrm{ess}}\,{\textrm{sup}\,}\Big | \partial _k f_1 - \partial _k f_2 \Big | \right) \Vert \varepsilon \Vert _{V_k} \\&\le C_\pi \sqrt{C_H} L \Vert \mathcal {R}_k(f_2)\Vert _{V_k} \Vert f_1 - f_2 \Vert _{\overline{V}_k} \Vert \varepsilon \Vert _{V_k}. \end{aligned}$$

For the last term D, we write

$$\begin{aligned} |D|&\overset{(20)}{\le }\ C_\pi \left( \int \Big ( (\log \circ g)'(\partial _k f_1({\varvec{x}})) - (\log \circ g)'(\partial _k f_2({\varvec{x}})) \Big )^2 \eta ({\varvec{x}}) \text {d}{\varvec{x}}\right) ^{1/2} \Vert \varepsilon \Vert _{V_k}\\&\overset{(48)}{\le }\ C_\pi L \left( \int \Big ( \partial _k f_1({\varvec{x}}) - \partial _k f_2({\varvec{x}}) \Big )^2 \eta ({\varvec{x}}) \text {d}{\varvec{x}}\right) ^{1/2} \Vert \varepsilon \Vert _{V_k} \\&\le C_\pi L \Vert f_1-f_2\Vert _{V_k} \Vert \varepsilon \Vert _{V_k} . \end{aligned}$$

Thus, because \(\Vert f_1-f_2\Vert _{V_k} \le \Vert f_1-f_2\Vert _{\overline{V}_k} \) we obtain

$$\begin{aligned}&\frac{|\langle \nabla \mathcal {L}_k(f_1)-\nabla \mathcal {L}_k(f_2) , \varepsilon \rangle _{V_k} |}{\Vert \varepsilon \Vert _{V_k}} \\&\quad \le C_\pi \Big ( \sqrt{C_T} C + \sqrt{C_H} C L + \sqrt{C_H} L \Vert \mathcal {R}_k(f_2)\Vert _{V_k}+ L\Big )\Vert f_1 - f_2 \Vert _{\overline{V}_k}\\&\quad \le M(1+\Vert \mathcal {R}_k(f_2)\Vert _{V_k}) \Vert f_1 - f_2 \Vert _{\overline{V}_k} , \end{aligned}$$

where

$$\begin{aligned} M = C_\pi \max \{ \sqrt{C_T} C + \sqrt{C_H} C L + L; \sqrt{C_H} L \}. \end{aligned}$$

This concludes the proof. \(\square \)

1.7 A.7. Proof of Proposition 6

Proof

To show that \(\mathcal {R}_k(V_k)=\{\mathcal {R}(f):f\in V_k\}\) is convex, let \(f_1,f_2\in V_k\) and \(0\le \alpha \le 1\). We need to show that there exists \(f_\alpha \in V_k\) such that \(\mathcal {R}(f_\alpha ) = S_\alpha \) where

$$\begin{aligned} S_\alpha {:}{=}\alpha \mathcal {R}_k(f_1) + (1 - \alpha )\mathcal {R}_k(f_2). \end{aligned}$$

Let

$$\begin{aligned} f_\alpha ({\varvec{x}}_{\le k}) {:}{=}\mathcal {R}^{-1}(S_\alpha )({\varvec{x}}_{\le k}) = S_\alpha ({\varvec{x}}_{<k},0) + \int _0^{x_k} g^{-1}(\partial _k S_\alpha ({\varvec{x}}_{<k},t)) \text {d}t. \end{aligned}$$

It remains to show that \(f_\alpha \in V_k\), meaning that \(f_\alpha \in L^2_{\eta _{\le k}}\) and \(\partial _k f_\alpha \in L^2_{\eta _{\le k}}\). By convexity of \(\xi \mapsto g^{-1}(\xi )^2\), we have

$$\begin{aligned} \Vert \partial _k f_\alpha \Vert _{L^2_{\eta _{\le k}}}^2&= \int g^{-1}\big (\alpha \partial _k \mathcal {R}_k(f_1) + (1 - \alpha ) \partial _k \mathcal {R}_k(f_2)\big )^2 \text {d}\eta _{\le k} \nonumber \\&= \int g^{-1}\big (\alpha g(\partial _k f_1) + (1 - \alpha ) g(\partial _k f_2)\big )^2 \text {d}\eta _{\le k} \nonumber \\&\le \int \alpha g^{-1}(g(\partial _k f_1))^2 + (1 - \alpha ) g^{-1}(g(\partial _k f_2))^2 \text {d}\eta _{\le k} \nonumber \\&= \alpha \Vert \partial _k f_1 \Vert ^2_{L^2_{\eta _{\le k}}} + (1 - \alpha ) \Vert \partial _k f_2 \Vert ^2_{L^2_{\eta _{\le k}}} . \end{aligned}$$
(49)

Thus, \(\partial _k f_\alpha \in L^2_{\eta _{\le k}}\). Furthermore, we have

$$\begin{aligned} \Vert f_\alpha \Vert _{L^2_{\eta _{\le k}}}^2&= \int \left( S_\alpha ({\varvec{x}}_{<k},0) + \int _0^{x_k} g^{-1}(\partial _k S_\alpha ({\varvec{x}}_{<k},t)) \text {d}t \right) ^2 \eta _{\le k}({\varvec{x}}) \text {d}{\varvec{x}}\\&\le 2 \int S_{\alpha }({\varvec{x}}_{<k},0)^2 \eta _{<k}({\varvec{x}}_{<k}) \text {d}{\varvec{x}}+ 2 \int \left( \int _0^{x_k} g^{-1}(\partial _k S_\alpha ({\varvec{x}}_{<k},t)) \text {d}t \right) ^2 \eta _{\le k}({\varvec{x}}) \text {d}{\varvec{x}}. \end{aligned}$$

To show that the above quantity is finite, Proposition 2 permits us to write

$$\begin{aligned} \int S_{\alpha }({\varvec{x}}_{<k},0)^2 \eta _{<k}({\varvec{x}}_{<k}) \text {d}{\varvec{x}}&= \int \Big ( \alpha f_1({\varvec{x}}_{<k},0) + (1 - \alpha ) f_2({\varvec{x}}_{<k},0) \Big )^2 \eta _{<k}({\varvec{x}}_{<k}) \text {d}{\varvec{x}}\\&\le C_T \Vert \alpha f_1 + (1 - \alpha ) f_2 \Vert ^2_{V_k} , \end{aligned}$$

which is finite. Finally, because \(g^{-1}(\partial _k S_\alpha ) =\partial _k f_\alpha \in L^2_{\eta _{\le k}}\) by (49), Lemma 12 yields

$$\begin{aligned} \int \left( \int _0^{x_k} g^{-1}(\partial _k S_\alpha ({\varvec{x}}_{<k},t)) \text {d}t \right) ^2 \eta _{\le k}({\varvec{x}}) \text {d}{\varvec{x}}&\le C_{H} \int g^{-1}(\partial _k S_\alpha ({\varvec{x}}_{\le k}))^2 \eta _{\le k}({\varvec{x}}) \text {d}{\varvec{x}}\\&= C_{H} \Vert \partial _k f_\alpha \Vert _{L^2_{\eta _{\le k}}}^2, \end{aligned}$$

which is finite. We deduce that \(f_\alpha \in L^2_{\eta _{\le k}}\) and therefore that \(f_\alpha \in V_k\). \(\square \)

1.8 A.8. Proof of Proposition 5

Proof

Let \(s_1,s_2 \in V_{k}\) be strictly increasing functions with respect to \(x_k\) that satisfy \(\partial _k s_i({\varvec{x}}_{\le k}) \ge c\) for \(i=1,2\) and all \({\varvec{x}}_{\le k} \in \mathbb {R}^{k}\). By the Lipschitz property of \(g^{-1}\) on the domain \([c,\infty )\) with constant \(L_c\), we can write

$$\begin{aligned} \Vert \partial _k \mathcal {R}_k^{-1}(s_1) - \partial _k \mathcal {R}_k^{-1}(s_2) \Vert _{L^2_{\eta _{\le k}}}^2&= \int (g^{-1}(\partial _k s_1({\varvec{x}}_{\le k})) - g^{-1}(\partial _k s_2({\varvec{x}}_{\le k})))^2 \eta _{\le k}({\varvec{x}}) \text {d}{\varvec{x}}\nonumber \\&\le L_{c}^2 \int (\partial _k s_1({\varvec{x}}_{\le k}) - \partial _k s_2({\varvec{x}}_{\le k}))^2 \eta _{\le k}({\varvec{x}}) \text {d}{\varvec{x}}\nonumber \\&\le L_{c}^2 \Vert s_1 - s_2 \Vert _{V_k}^2. \end{aligned}$$
(50)

Applying Proposition 2 to \(s_1,s_2 \in V_k\) and Lemma 12 to \(\partial _k \mathcal {R}_k^{-1}(s_i) = g^{-1}(\partial _k s_i) \in L_{\eta _{\le k}}^2\) for \(i=1,2\), we have

$$\begin{aligned} \Vert \mathcal {R}_k^{-1}(s_1) - \mathcal {R}_k^{-1}(s_2) \Vert _{L^2_{\eta _{\le k}}}^2&\le 2 \int \left( s_1({\varvec{x}}_{<k},0) - s_2({\varvec{x}}_{<k},0) \right) ^2 \eta _{\le k}({\varvec{x}})d{\varvec{x}}\nonumber \\&\quad + 2\int \left( \int _0^{x_k} g^{-1}(\partial _k s_1) - g^{-1}(\partial _k s_2) \text {d}t \right) ^2 \eta _{\le k}({\varvec{x}}) \text {d}{\varvec{x}}\nonumber \\&\le 2 C_T \Vert s_1 - s_2 \Vert _{V_k}^2 \nonumber \\&\quad + 2 C_H \int (g^{-1}(\partial _k s_1({\varvec{x}}_{\le k})) - g^{-1}(\partial _k s_2({\varvec{x}}_{\le k})))^2 \eta _{\le k}({\varvec{x}}) \text {d}{\varvec{x}}\nonumber \\&\le (2C_T + 2C_H L_{c}^2)\Vert s_1 - s_2 \Vert _{V_k}^2, \end{aligned}$$
(51)

where the last inequality follows from (50).

It remains to show that \(\Vert \mathcal {R}_k^{-1}(s)\Vert _{V_k} < \infty \) for any \(s \in V_{k}\) such that \({\textrm{ess}}\,{\textrm{inf}\,}\partial _k s > 0\). Letting \(s_1 = s\) and \(s_2 = g(0)x_k\), the triangle inequality combined with (26) yields

$$\begin{aligned} \Vert \mathcal {R}_k^{-1}(s) \Vert _{V_k} \le \Vert \mathcal {R}_k^{-1}(g(0)x_k) \Vert _{V_k} + C_{c} \Vert s - g(0)x_k \Vert _{V_k}. \end{aligned}$$

The function \(\mathcal {R}_k^{-1}(g(0)x_k)\) is zero. Therefore, \(\Vert \mathcal {R}_k^{-1}(s) \Vert _{V_k} \le C_c\Vert s - g(0)x_k \Vert _{V_k} \le C_c(\Vert s \Vert _{V_k} + \Vert g(0)x_k \Vert _{V_{k}})\). For a linear function, \(\Vert g(0)x_k \Vert _{V_k}^2 = \Vert g(0)x_k \Vert ^2_{L^2_{\eta _{\le k}}} + \Vert g(0) \Vert ^2_{L^2_{\eta _{\le k}}} = 2\,g(0)^2\) is finite, and so, \(\Vert \mathcal {R}_k^{-1}(s) \Vert _{V_k} < \infty \) for \(s \in V_k\). Furthermore, if \(\partial _k s \ge c > 0\), then \(\partial _k \mathcal {R}_k^{-1}(s) = g^{-1}(\partial _k s) \ge g^{-1}(c) > -\infty \), and so, \({\textrm{ess}}\,{\textrm{inf}\,}\mathcal {R}_k^{-1}(s) > -\infty \). \(\square \)

1.9 A.9. Proof for the KR Rearrangement

Proof

Let \(S_{\text {KR},k}\) be the kth component of the KR rearrangement, given by composing the inverse CDF of the standard Gaussian marginal \(F_{\eta ,k}(x_k)\) with the CDF of the target’s kth marginal conditional \(F_{\pi _k}(x_k|{\varvec{x}}_{<k})\). That is,

$$\begin{aligned} S_{\text {KR},k}({\varvec{x}}_{\le k}) = F_{\eta _k}^{-1} \circ F_{\pi _k}(x_k|{\varvec{x}}_{<k}). \end{aligned}$$
(52)

The goal is to show \(S_{\text {KR},k} \in V_k\), that is, \(S_{\text {KR},k} \in L^2_{\eta _{\le k}}\) and \(\partial _k S_{\text {KR},k} \in L^2_{\eta _{\le k}}\).

First we show \(S_{\text {KR},k} \in L^2_{\eta _{\le k}}\). From condition (30), we have \(F_{\eta _k}^{-1}(C_1 F_{\eta _k}(x_k)) \le S_{\text {KR},k}(x_k|{\varvec{x}}_{<k}) \le F_{\eta _k}^{-1}(C_2 F_{\eta _k}(x_k))\) for some constants \(C_1,C_2 > 0\) so that

$$\begin{aligned} S_{\text {KR},k}(x_k|{\varvec{x}}_{<k})^2 \le \max \{ F_{\eta _k}^{-1}(C_1 F_{\eta _k}(x_k))^2 ; F_{\eta _k}^{-1}(C_2 F_{\eta _k}(x_k))^2 \}, \end{aligned}$$
(53)

for all \({\varvec{x}}_{<k} \in \mathbb {R}^{k-1}\). To show that \(S_{\text {KR},k} \in L^2_{\eta _{\le k}}\), it is sufficient to prove that any function of the form \(x_k\mapsto F_{\eta _k}^{-1}( C F_{\eta _k}(x_k))\) is in \( L^2_{\eta _{\le k}}\) for any \(C>0\). From Theorems 1 and 2 in [11], there exists strictly positive constants \(\alpha _i,\beta _i > 0\) for \(i=1,2\) such that

$$\begin{aligned} 1 - \alpha _1 \exp (-\beta _1 x_k^2) \le F_{\eta _k}(x_k) \le 1 - \alpha _2 \exp (-\beta _2 x_k^2), \end{aligned}$$
(54)

for \(x_k > 0\). With a change of variable \(u=F_{\eta _k}(x_k)\), we obtain \(F^{-1}_{\eta _k}(u)^2 \le 1/\beta _{2} \log (\alpha _2/(1-u))\) for all \(u > F_{\eta _k}(0) = 1/2\). Letting \(u=C F_{\eta _k}(x_k)\) yields

$$\begin{aligned} F^{-1}_{\eta _k}( C F_{\eta _k}(x_k) )^2&\le \frac{1}{\beta _{2}} \log \left( \frac{\alpha _2}{C F_{\eta _k}(x_k)}\right) ,\\&\overset{(54)}{\le }\ \frac{1}{\beta _{2}} \log \left( \frac{\alpha _2}{C \alpha _2 \exp (-\beta _2 x_k^2)}\right) \\&\,\,=\frac{1}{\beta _{2}} \log \left( \frac{1}{C}\right) + x_k^2 \end{aligned}$$

for all \(x_k > \max \{ F_{\eta _k}^{-1}(1/(2C)), 0\}\). Using the same argument, we obtain a similar bound on \(F^{-1}_{\eta _k}( C F_{\eta _k}(x_k) )^2\) for all \(x_k\) smaller than a certain value. Together with the continuity of \(x_k\mapsto F^{-1}_{\eta _k}( C F_{\eta _k}(x_k) )^2\), these bounds ensure that \(x_k\mapsto F^{-1}_{\eta _k}( C F_{\eta _k}(x_k) )\) is in \(L^2_{\eta _{\le k}}\) for any C. Then \(S_{\text {KR},k} \in L^2_{\eta _{\le k}}\). Furthermore, we have \(S_{\text {KR},k}({\varvec{x}}_{\le k}) = \mathcal {O}(x_k)\) as \(|x_k| \rightarrow \infty \).

Now we show that \(\partial _k S_{\text {KR},k} \in L^2_{\eta _{\le k}}\) by showing \(\partial _k S_{\text {KR},k}\) is a continuous and bounded function. From the absolute continuity of \(\varvec{\mu }\) and \(\varvec{\nu }\), we have that

$$\begin{aligned} \partial _k S_{\text {KR},k}({\varvec{x}}_{\le k}) = \frac{\pi _k(x_k|{\varvec{x}}_{<k})}{\eta _k(S_{\text {KR},k}({\varvec{x}}_{\le k}))} = \frac{\pi _k(F_{\pi _k}^{-1}(F_{\pi _k}(x_k|{\varvec{x}}_{<k})|{\varvec{x}}_{<k})|{\varvec{x}}_{<k})}{\eta _k(F_{\eta _k}^{-1} \circ F_{\pi _k}(x_k|{\varvec{x}}_{< k}))}, \end{aligned}$$
(55)

is continuous, where \(F_{\pi _k}^{-1}(\cdot |{\varvec{x}}_{<k})\) denotes the inverse of the map \(x_k \mapsto F_{\pi _k}(x_k|{\varvec{x}}_{<k})\) for each \({\varvec{x}}_{<k} \in \mathbb {R}^{k-1}\). Hence, it is sufficient to show that \(\partial _k S_{\text {KR},k}\) goes to a finite limit as \(|x_k| \rightarrow \infty \). For the right-hand limit, we can write

$$\begin{aligned} \lim _{x_k \rightarrow \infty } \partial _k S_{\text {KR},k}({\varvec{x}}_{\le k})&= \lim _{u \rightarrow 1^{-}} \frac{\pi _{k}(F_{\pi _k}^{-1}(u|{\varvec{x}}_{<k})|{\varvec{x}}_{<k})}{\eta _k(F_{\eta _k}^{-1}(u))} \nonumber \\&= \lim _{u \rightarrow 1^{-}} \frac{(F_{\eta ,k}^{-1})'(u)}{(F_{\pi _k}^{-1})'(u|{\varvec{x}}_{<k})} \nonumber \\&= \lim _{u \rightarrow 1^{-}} \frac{F_{\eta _k}^{-1}(u)}{F_{\pi _k}^{-1}(u|{\varvec{x}}_{<k})}, \end{aligned}$$
(56)

where in the second equality we used the inverse function theorem and the third equality follows from l’Hôpital’s rule. To analyze the ratio \(F_{\eta _k}^{-1}(u)/F_{\pi _k}^{-1}(u|{\varvec{x}}_{<k})\), we combine the lower bound in (30) and the bounds in (54) to get

$$\begin{aligned} \frac{F_{\eta _k}^{-1}(u)}{F_{\pi _k}^{-1}(u|{\varvec{x}}_{<k})} \le \frac{F_{\eta _k}^{-1}(u)}{F_{\eta _k}^{-1}((u-1)/C_1 + 1)} \le \sqrt{\frac{\beta _1(\log \alpha _2 - \log (1-u))}{\beta _2(\log \alpha _1 - \log ((1-u)/C_1))}}. \end{aligned}$$

Similarly, from the upper bound in (30) and the bounds in (54), we have

$$\begin{aligned} \frac{F_{\eta _k}^{-1}(u)}{F_{\pi _k}^{-1}(u|{\varvec{x}}_{<k})} \ge \frac{F_{\eta _k}^{-1}(u)}{F_{\eta _k}^{-1}((u-1)/C_2 + 1)} \ge \sqrt{\frac{\beta _2(\log \alpha _1 - \log (1-u))}{\beta _1(\log \alpha _2 - \log ((1-u)/C_2))}}. \end{aligned}$$

Thus, \(\partial _k S_{\text {KR},k}({\varvec{x}}_{\le k}) = \mathcal {O}(1)\) as \(x_k \rightarrow \infty \), and we have \(\partial _k S_{\text {KR},k} \in L^2_{\eta _k}\).

Lastly, taking the limit in (56) we have \(\lim _{x_k \rightarrow \infty } \partial _k S_{\text {KR},k}({\varvec{x}}_{\le k}) \ge \sqrt{\beta _2/\beta _1}\). For a target distribution \(\pi \) with full support, all marginal conditional densities satisfy \(\pi _k(x_k|{\varvec{x}}_{<k}) > 0\) for each \({\varvec{x}}_{\le k} \in \mathbb {R}^{k}\). Given that the \(\partial _k S_{\text {KR},k}\) does not approach zero as \(|x_k| \rightarrow \infty \), we can find a strictly positive constant \(c_k > 0\) such that \(\partial _k S_{\text {KR},k}({\varvec{x}}_{\le k}) \ge c_k\) for all \({\varvec{x}}_{\le k} \in \mathbb {R}^k\). This shows that \({\textrm{ess}}\,{\textrm{inf}\,}\partial _k S_{\text {KR},k} > 0\). \(\square \)

Appendix B: Multi-index Refinement for the Wavelet Basis

In this section, we show how to greedily enrich the index set \(\Lambda _t\) for a one-dimensional wavelet basis parameterized by the tuple of indices (lq) representing the level l and translation q of each wavelet \(\psi _{(l,q)}\). To define the allowable indices, we construct a binary tree where each node is indexed by (lq) and has two children with indices \((l+1,2q)\) and \((l+1,2q+1)\). The root of the tree has index (0, 0) and corresponds to the mother wavelet \(\psi \). Analogously to the downward-closed property for polynomial indices, we only add nodes to the tree (i.e., indices in \(\Lambda _t\)) if its parents have already been added. Given any set \(\Lambda _t\), we define its reduced margin as

$$\begin{aligned} \Lambda _{t}^{\text {RM}} = \left\{ \alpha =(l,q) \not \in \Lambda _t \text { such that } \begin{array}{ll} (l-1,q/2) \in \Lambda _t &{} \text { for odd } q \\ (l-1,(q-1)/2) \in \Lambda _t &{} \text { for even } q \end{array} \right\} . \end{aligned}$$

Then, the ATM algorithm with a wavelet basis follows from Algorithm 1 with this construction for the reduced margin at each iteration.

Appendix C: Architecture Details of Alternative Methods

In this section, we present the details of the alternative methods to ATM that we consider in Sect. 5.

For each normalizing flow model, we use the recommended stochastic gradient descent optimizer with a learning rate of \(10^{-3}\). We partition 10% of the samples in each training set to be validation samples and use the remaining samples for training the model. We select the optimal hyperparameters for each dataset by fitting the density with the training data and choosing the parameters that minimize the negative log-likelihood of the approximate density on the validation samples. We also use the validation samples to set the termination criteria during the optimization.

We follow the implementation of [52] to define the architectures of these models. The hyperparameters we consider for the neural networks in the MDN and NF models are: 2 hidden layers, 32 hidden units in each layer, \(\{5,10,20,50,100\}\) centers or flows, weight normalization, and a dropout probability of \(\{0,0.2\}\) for regularizing the neural networks during training. For CKDE and NKDE, we select the bandwidth of the kernel estimators using fivefold cross-validation.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Baptista, R., Marzouk, Y. & Zahm, O. On the Representation and Learning of Monotone Triangular Transport Maps. Found Comput Math (2023). https://doi.org/10.1007/s10208-023-09630-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10208-023-09630-x

Keywords

Mathematics Subject Classification

Navigation