Abstract
Natural gradients have been widely used in the optimization of loss functionals over probability space, with important examples such as Fisher–Rao gradient descent for Kullback–Leibler divergence, Wasserstein gradient descent for transport-related functionals, and Mahalanobis gradient descent for quadratic loss functionals. This note considers the situation in which the loss is a convex linear combination of these examples. We propose a new natural gradient algorithm by utilizing compactly supported wavelets to diagonalize approximately the Hessian of the combined loss. Numerical results are included to demonstrate the efficiency of the proposed algorithm.
References
Amari, S.: Information Geometry and Its Applications, vol. 194. Springer, Berlin (2016)
Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pac. J. Math. 16(1), 1–3 (1966)
Ay, N., Jost, J., Vân Lê, H., Schwachhöfer, L.: Information Geometry, vol. 64. Springer, Berlin (2017)
Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31(3), 167–175 (2003)
Bubeck, S., et al.: Foundations and trends®. Mach. Learn. 8(3–4), 231–357 (2015)
Carrillo, J.A., McCann, R.J., Villani, C., et al.: Kinetic equilibration rates for granular media and related equations: entropy dissipation and mass transportation estimates. Revista Matematica Iberoamericana 19(3), 971–1018 (2003)
Chen, Y., Li, W.: Natural gradient in wasserstein statistical manifold (2018). arXiv preprint arXiv:1805.08380
Daubechies, I.: Ten Lectures on Wavelets, vol. 61. Siam, Philadelphia (1992)
Indyk, P., Thaper, N.: Fast image retrieval via embeddings. In: 3rd International Workshop on Statistical and Computational Theories of Vision, p. 5 (2003)
Jordan, R., Kinderlehrer, D., Otto, F.: The variational formulation of the Fokker–Planck equation. SIAM J. Math. Anal. 29(1), 1–17 (1998)
Li, W., Lin, A.T., Montúfar, G.: Affine natural proximal learning. In: International Conference on Geometric Science of Information, pp. 705– 714. Springer (2019)
Li, W., Montúfar, G.: Natural gradient via optimal transport. Inf. Geom. 1(2), 181–214 (2018)
Li, W., Montúfar, G.: Ricci curvature for parametric statistics via optimal transport. Inf. Geom. 3, 89–117 (2020)
Mallat, S.: A Wavelet Tour of Signal Processing. Elsevier, New York (1999)
Nemirovsky, A.S., Yudin, D.B.: Problem Complexity and Method Efficiency in Optimization. A Wiley-Interscience Publication. Wiley, New York. Translated from the Russian and with a preface by E. R. Dawson, Wiley-Interscience Series in Discrete Mathematics (1983)
Otto, F.: The geometry of dissipative evolution equations: the porous medium equation. Commun. Part. Diff. Eq. 26(1–2), 101–174 (2001)
Perthame, B.: Transport Equations in Biology. Springer, Berlin (2006)
Peyré, G., Cuturi, M., et al.: Computational optimal transport. foundations and trends®. Mach. Learn. 11(5–6), 355–607 (2019)
Peyre, R.: Comparison between w2 distance and - 1 norm, and localization of wasserstein distance. ESAIM Control Optim. Calc. Var. 24(4), 1489–1501 (2018)
Santambrogio, F.: Optimal Transport for Applied Mathematicians, vol. 55, pp. 58–63. Birkäuser, New York (2015)
Shirdhonkar, S., Jacobs, D.W.: Approximate earth mover’s distance in linear time. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)
Villani, C.: Topics in Optimal Transportation. American Mathematical Soc., Providence (2003)
Villani, C.: Optimal Transport: Old and New, vol. 338. Springer, Berlin (2008)
Ying, L.: Mirror descent algorithms for minimizing interacting free energy. J. Sci. Comput. 84(3), 51 (2020). https://doi.org/10.1007/s10915-020-01303-z
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The work of L.Y. is partially supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Scientific Discovery through Advanced Computing (SciDAC) program and also by the National Science Foundation under Award DMS-1818449.
Rights and permissions
About this article
Cite this article
Ying, L. Natural Gradient for Combined Loss Using Wavelets. J Sci Comput 86, 26 (2021). https://doi.org/10.1007/s10915-020-01367-x
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10915-020-01367-x
Keywords
- Natural gradient
- Fisher–Rao metric
- Wasserstein metric
- Mahalanobis metric
- Compactly supported wavelet
- Diagonal approximation