Skip to main content
Log in

Natural Gradient for Combined Loss Using Wavelets

  • Technical Note
  • Published:
Journal of Scientific Computing Aims and scope Submit manuscript

Abstract

Natural gradients have been widely used in the optimization of loss functionals over probability space, with important examples such as Fisher–Rao gradient descent for Kullback–Leibler divergence, Wasserstein gradient descent for transport-related functionals, and Mahalanobis gradient descent for quadratic loss functionals. This note considers the situation in which the loss is a convex linear combination of these examples. We propose a new natural gradient algorithm by utilizing compactly supported wavelets to diagonalize approximately the Hessian of the combined loss. Numerical results are included to demonstrate the efficiency of the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

References

  1. Amari, S.: Information Geometry and Its Applications, vol. 194. Springer, Berlin (2016)

    Book  Google Scholar 

  2. Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pac. J. Math. 16(1), 1–3 (1966)

    Article  MathSciNet  Google Scholar 

  3. Ay, N., Jost, J., Vân Lê, H., Schwachhöfer, L.: Information Geometry, vol. 64. Springer, Berlin (2017)

    Book  Google Scholar 

  4. Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31(3), 167–175 (2003)

    Article  MathSciNet  Google Scholar 

  5. Bubeck, S., et al.: Foundations and trends®. Mach. Learn. 8(3–4), 231–357 (2015)

    MATH  Google Scholar 

  6. Carrillo, J.A., McCann, R.J., Villani, C., et al.: Kinetic equilibration rates for granular media and related equations: entropy dissipation and mass transportation estimates. Revista Matematica Iberoamericana 19(3), 971–1018 (2003)

    Article  MathSciNet  Google Scholar 

  7. Chen, Y., Li, W.: Natural gradient in wasserstein statistical manifold (2018). arXiv preprint arXiv:1805.08380

  8. Daubechies, I.: Ten Lectures on Wavelets, vol. 61. Siam, Philadelphia (1992)

    Book  Google Scholar 

  9. Indyk, P., Thaper, N.: Fast image retrieval via embeddings. In: 3rd International Workshop on Statistical and Computational Theories of Vision, p. 5 (2003)

  10. Jordan, R., Kinderlehrer, D., Otto, F.: The variational formulation of the Fokker–Planck equation. SIAM J. Math. Anal. 29(1), 1–17 (1998)

    Article  MathSciNet  Google Scholar 

  11. Li, W., Lin, A.T., Montúfar, G.: Affine natural proximal learning. In: International Conference on Geometric Science of Information, pp. 705– 714. Springer (2019)

  12. Li, W., Montúfar, G.: Natural gradient via optimal transport. Inf. Geom. 1(2), 181–214 (2018)

    Article  MathSciNet  Google Scholar 

  13. Li, W., Montúfar, G.: Ricci curvature for parametric statistics via optimal transport. Inf. Geom. 3, 89–117 (2020)

    Article  MathSciNet  Google Scholar 

  14. Mallat, S.: A Wavelet Tour of Signal Processing. Elsevier, New York (1999)

    MATH  Google Scholar 

  15. Nemirovsky, A.S., Yudin, D.B.: Problem Complexity and Method Efficiency in Optimization. A Wiley-Interscience Publication. Wiley, New York. Translated from the Russian and with a preface by E. R. Dawson, Wiley-Interscience Series in Discrete Mathematics (1983)

  16. Otto, F.: The geometry of dissipative evolution equations: the porous medium equation. Commun. Part. Diff. Eq. 26(1–2), 101–174 (2001)

    Article  MathSciNet  Google Scholar 

  17. Perthame, B.: Transport Equations in Biology. Springer, Berlin (2006)

    MATH  Google Scholar 

  18. Peyré, G., Cuturi, M., et al.: Computational optimal transport. foundations and trends®. Mach. Learn. 11(5–6), 355–607 (2019)

    Google Scholar 

  19. Peyre, R.: Comparison between w2 distance and - 1 norm, and localization of wasserstein distance. ESAIM Control Optim. Calc. Var. 24(4), 1489–1501 (2018)

    Article  MathSciNet  Google Scholar 

  20. Santambrogio, F.: Optimal Transport for Applied Mathematicians, vol. 55, pp. 58–63. Birkäuser, New York (2015)

    Book  Google Scholar 

  21. Shirdhonkar, S., Jacobs, D.W.: Approximate earth mover’s distance in linear time. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)

  22. Villani, C.: Topics in Optimal Transportation. American Mathematical Soc., Providence (2003)

    Book  Google Scholar 

  23. Villani, C.: Optimal Transport: Old and New, vol. 338. Springer, Berlin (2008)

    MATH  Google Scholar 

  24. Ying, L.: Mirror descent algorithms for minimizing interacting free energy. J. Sci. Comput. 84(3), 51 (2020). https://doi.org/10.1007/s10915-020-01303-z

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lexing Ying.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The work of L.Y. is partially supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Scientific Discovery through Advanced Computing (SciDAC) program and also by the National Science Foundation under Award DMS-1818449.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ying, L. Natural Gradient for Combined Loss Using Wavelets. J Sci Comput 86, 26 (2021). https://doi.org/10.1007/s10915-020-01367-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10915-020-01367-x

Keywords

Mathematics Subject Classification

Navigation