Skip to main content
Log in

Minimum-correction second-moment matching: theory, algorithms and applications

  • Published:
Numerische Mathematik Aims and scope Submit manuscript

Abstract

We address the problem of finding the closest matrix \(\tilde{\varvec{U}}\) to a given \(\varvec{U}\) under the constraint that a prescribed second-moment matrix \(\tilde{\varvec{P}}\) must be matched, i.e. \(\tilde{\varvec{U}}^{\mathrm {T}}\tilde{\varvec{U}}=\tilde{\varvec{P}}\). We obtain a closed-form formula for the unique global optimizer \(\tilde{\varvec{U}}\) for the full-rank case, that is related to \(\varvec{U}\) by an SPD (symmetric positive definite) linear transform. This result is generalized to rank-deficient cases as well as to infinite dimensions. We highlight the geometric intuition behind the theory and study the problem’s rich connections to minimum congruence transform, generalized polar decomposition, optimal transport, and rank-deficient data assimilation. In the special case of \(\tilde{\varvec{P}}=\varvec{I}\), minimum-correction second-moment matching reduces to the well-studied optimal orthonormalization problem. We investigate the general strategies for numerically computing the optimizer and analyze existing polar decomposition and matrix square root algorithms. We modify and stabilize two Newton iterations previously deemed unstable for computing the matrix square root, such that they can now be used to efficiently compute both the orthogonal polar factor and the SPD square root. We then verify the higher performance of the various new algorithms using benchmark cases with randomly generated matrices. Lastly, we complete two applications for the stochastic Lorenz-96 dynamical system in a chaotic regime. In reduced subspace tracking using dynamically orthogonal equations, we maintain the numerical orthonormality and continuity of time-varying base vectors. In ensemble square root filtering for data assimilation, the prior samples are transformed into posterior ones by matching the covariance given by the Kalman update while also minimizing the corrections to the prior samples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. Since we will exclusively focus on real matrices in this paper, the explicit specification “\(({\mathbb {R}})\)” of the field of matrix elements will be dropped hereafter.

  2. For square \(\varvec{A}\) diagonalizable with nonnegative eigenvalues and EVD \(\varvec{A}=\varvec{V}\varvec{\varLambda }\varvec{V}^{-1}\), we define \(\varvec{A}^{1/2} \triangleq \varvec{V}\varvec{\varLambda }^{1/2}\varvec{V}^{-1}\). Since \(\varvec{P}\tilde{\varvec{P}}= \sqrt{\tilde{\varvec{P}}}^{-1} \left( \sqrt{\tilde{\varvec{P}}}\varvec{P}\sqrt{\tilde{\varvec{P}}}^{\mathrm {T}}\right) \sqrt{\tilde{\varvec{P}}}\) is similar to \(\sqrt{\tilde{\varvec{P}}}\varvec{P}\sqrt{\tilde{\varvec{P}}}^{\mathrm {T}}\), \(\varvec{P}\tilde{\varvec{P}}\), though not symmetric in general, is diagonalizable with positive eigenvalues and EVD \(\varvec{P}\tilde{\varvec{P}}=\varvec{V}\varvec{\varLambda }\varvec{V}^{-1}\). Therefore, \(\mathrm {tr}((\varvec{P}\tilde{\varvec{P}})^{1/2}) = \sum _{i=1}^{n}\sqrt{\lambda _i} = \mathrm {tr}((\sqrt{\tilde{\varvec{P}}}\varvec{P}\sqrt{\tilde{\varvec{P}}}^{\mathrm {T}})^{1/2})\).

  3. See [48, sec. 5.1] for the definition of terms related to dynamical systems.

  4. The explicit dependence on \(\varvec{X}_*\) is omitted in the notation since it should be clear from context.

  5. For example, [49] uses the leapfrog scheme and [13] uses the implicit-explicit backward difference schemes for the DO equations.

  6. Here the role of rows and columns flips compared to previous sections, since we want to follow the notation convention in the field of data assimilation.

  7. See [42] for how these matrix gradients are computed or refer to “Appendix A” of [37].

  8. Since \((1-\sigma _{k+1})= (1-\sigma _k)(1-\gamma \lambda \sigma _k(\sigma _k+1))\), if \(0<\sigma _0 \le 1\) and \(0<\gamma \le 1/(2\lambda )\), \(\sigma _k\) monotonically increases and converges to 1.

References

  1. Absil, P.A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2009)

    MATH  Google Scholar 

  2. Absil, P.A., Malick, J.: Projection-like retractions on matrix manifolds. SIAM J. Optim. 22(1), 135–158 (2012)

    Article  MathSciNet  Google Scholar 

  3. Anderson, J.L.: An ensemble adjustment Kalman filter for data assimilation. Mon. Weather Rev. 129(12), 2884–2903 (2001)

    Article  Google Scholar 

  4. Benamou, J.D., Brenier, Y.: A computational fluid mechanics solution to the Monge-Kantorovich mass transfer problem. Numerische Mathematik 84(3), 375–393 (2000)

    Article  MathSciNet  Google Scholar 

  5. Bergemann, K., Gottwald, G., Reich, S.: Ensemble propagation and continuous matrix factorization algorithms. Q. J. R. Meteorol. Soc. 135(643), 1560–1572 (2009)

    Article  Google Scholar 

  6. Bishop, C.H., Etherton, B.J., Majumdar, S.J.: Adaptive sampling with the ensemble transform Kalman filter. Part I: theoretical aspects. Mon. Weather Rev. 129(3), 420–436 (2001)

  7. Brenier, Y.: Polar factorization and monotone rearrangement of vector-valued functions. Commun. Pure Appl. Math. 44(4), 375–417 (1991)

    Article  MathSciNet  Google Scholar 

  8. Brockett, R.W.: Dynamical systems that learn subspaces. In: Mathematical System Theory, pp. 579–592. Springer (1991)

  9. Chandrasekaran, S., Ipsen, I.C.: Backward errors for eigenvalue and singular value decompositions. Numerische Mathematik 68(2), 215–223 (1994)

    Article  MathSciNet  Google Scholar 

  10. Cheng, M., Hou, T.Y., Zhang, Z.: A dynamically bi-orthogonal method for time-dependent stochastic partial differential equations I: derivation and algorithms. J. Comput. Phys. 242, 843–868 (2013)

    Article  MathSciNet  Google Scholar 

  11. Dieci, L., Russell, R.D., Van Vleck, E.S.: Unitary integrators and applications to continuous orthonormalization techniques. SIAM J. Numer. Anal. 31(1), 261–281 (1994)

    Article  MathSciNet  Google Scholar 

  12. Dowson, D., Landau, B.: The Fréchet distance between multivariate normal distributions. J. Multivar. Anal. 12(3), 450–455 (1982)

    Article  Google Scholar 

  13. Dutt, A.: High order stochastic transport and Lagrangian data assimilation. Master’s thesis, Massachusetts Institute of Technology, Department of Mechanical Engineering, Cambridge, Massachusetts (2018)

  14. Edelman, A., Arias, T.A., Smith, S.T.: The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl. 20(2), 303–353 (1998)

    Article  MathSciNet  Google Scholar 

  15. Evensen, G.: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. J. Geophys. Res. Oceans 99(C5), 10143–10162 (1994)

    Article  Google Scholar 

  16. Evensen, G.: The ensemble Kalman filter: theoretical formulation and practical implementation. Ocean Dyn. 53(4), 343–367 (2003)

    Article  Google Scholar 

  17. Feppon, F., Lermusiaux, P.F.J.: A geometric approach to dynamical model-order reduction. SIAM J. Matrix Anal. Appl. 39(1), 510–538 (2018). https://doi.org/10.1137/16M1095202

    Article  MathSciNet  MATH  Google Scholar 

  18. Feppon, F., Lermusiaux, P.F.J.: Dynamically orthogonal numerical schemes for efficient stochastic advection and Lagrangian transport. SIAM Rev. 60(3), 595–625 (2018). https://doi.org/10.1137/16M1109394

    Article  MathSciNet  MATH  Google Scholar 

  19. Feppon, F., Lermusiaux, P.F.J.: The extrinsic geometry of dynamical systems tracking nonlinear matrix projections. SIAM J. Matrix Anal. Appl. 40(2), 814–844 (2019). https://doi.org/10.1137/18M1192780

    Article  MathSciNet  MATH  Google Scholar 

  20. Golub, G.H., Van Loan, C.F.: Matrix Comput, 4th edn. JHU Press, Baltimore (2013)

    Google Scholar 

  21. Govaerts, W., Werner, B.: Continuous bordering of matrices and continuous matrix decompositions. Numerische Mathematik 70(3), 303–310 (1995)

    Article  MathSciNet  Google Scholar 

  22. Hairer, E., Lubich, C., Wanner, G.: Geometric Numerical Integration: Structure-Preserving Algorithms for Ordinary Differential Equations, vol. 31. Springer, Berlin (2006)

    MATH  Google Scholar 

  23. Hale, N., Higham, N.J., Trefethen, L.N.: Computing \(A^\alpha \), \(\log (A)\), and related matrix functions by contour integrals. SIAM J. Numer. Anal. 46(5), 2505–2523 (2008)

    Article  MathSciNet  Google Scholar 

  24. Higham, D.J.: Runge-Kutta type methods for orthogonal integration. Appl. Numer. Math. 22(1–3), 217–223 (1996)

    Article  MathSciNet  Google Scholar 

  25. Higham, D.J.: Time-stepping and preserving orthonormality. BIT Numer. Math. 37(1), 24–36 (1997)

    Article  MathSciNet  Google Scholar 

  26. Higham, N.J.: Computing the polar decomposition-with applications. SIAM J. Sci. Stat. Comput. 7(4), 1160–1174 (1986)

    Article  MathSciNet  Google Scholar 

  27. Higham, N.J.: Newton’s method for the matrix square root. Math. Comput. 46(174), 537–549 (1986)

    MathSciNet  MATH  Google Scholar 

  28. Higham, N.J.: Matrix Nearness Problems and Applications. Department of Mathematics, University of Manchester, Tech. rep (1988)

    MATH  Google Scholar 

  29. Higham, N.J., Schreiber, R.S.: Fast polar decomposition of an arbitrary matrix. SIAM J. Sci. Stat. Comput. 11(4), 648–655 (1990)

    Article  MathSciNet  Google Scholar 

  30. Koch, O., Lubich, C.: Dynamical low-rank approximation. SIAM J. Matrix Anal. Appl. 29(2), 434–454 (2007)

    Article  MathSciNet  Google Scholar 

  31. Laub, A.J.: Matrix Analysis for Scientists and Engineers. SIAM, New Delhi (2005)

    Book  Google Scholar 

  32. Lermusiaux, P.F.J.: Error subspace data assimilation methods for ocean field estimation: theory, validation and applications. Ph.D. thesis, Harvard University, Cambridge, Massachusetts (1997)

  33. Lermusiaux, P.F.J.: Data assimilation via error subspace statistical estimation, part II: Mid-Atlantic Bight shelfbreak front simulations, and ESSE validation. Mon. Weather Rev. 127(7), 1408–1432 (1999). https://doi.org/10.1175/1520-0493

    Article  Google Scholar 

  34. Lermusiaux, P.F.J.: Evolving the subspace of the three-dimensional multiscale ocean variability: Massachusetts Bay. J. Mar. Syst. 29(1), 385–422 (2001)

    Article  Google Scholar 

  35. Lermusiaux, P.F.J., Robinson, A.R.: Data assimilation via error subspace statistical estimation, part I: theory and schemes. Mon. Weather Rev. 127(7), 1385–1407 (1999). https://doi.org/10.1175/1520-0493

    Article  Google Scholar 

  36. Lin, J.: Bayesian learning for high-dimensional nonlinear dynamical systems: methodologies, numerics and applications to fluid flows. Ph.D. thesis, Massachusetts Institute of Technology, Department of Mechanical Engineering, Cambridge, Massachusetts (2020)

  37. Lin, J.: Minimum-correction second-moment matching: theory, algorithms and applications. Master’s thesis, Massachusetts Institute of Technology, Department of Mechanical Engineering, Cambridge, Massachusetts (2020)

  38. Lorenz, E.N.: Predictability: a problem partly solved. In: Proc. Seminar on Predictability, vol. 1 (1996)

  39. Musharbash, E., Nobile, F.: Dual dynamically orthogonal approximation of incompressible Navier Stokes equations with random boundary conditions. J. Comput. Phys. 354, 135–162 (2018)

    Article  MathSciNet  Google Scholar 

  40. Olkin, I., Pukelsheim, F.: The distance between two random vectors with given dispersion matrices. Linear Algebra Appl. 48, 257–263 (1982)

    Article  MathSciNet  Google Scholar 

  41. Ott, E., Hunt, B.R., Szunyogh, I., Zimin, A.V., Kostelich, E.J., Corazza, M., Kalnay, E., Patil, D., Yorke, J.A.: A local ensemble Kalman filter for atmospheric data assimilation. Tellus A 56(5), 415–428 (2004)

    Article  Google Scholar 

  42. Petersen, K.B., Pedersen, M.S.: The matrix cookbook. Technical University of Denmark (2012)

  43. Philippe, B.: An algorithm to improve nearly orthonormal sets of vectors on a vector processor. SIAM J. Algebr. Discrete Methods 8(3), 396–403 (1987)

    Article  MathSciNet  Google Scholar 

  44. Reich, S., Cotter, C.: Probabilistic Forecasting and Bayesian Data Assimilation. Cambridge University Press, Cambridge (2015)

    Book  Google Scholar 

  45. Rheinboldt, W.C.: On the computation of multi-dimensional solution manifolds of parametrized equations. Numerische Mathematik 53(1–2), 165–181 (1988)

    Article  MathSciNet  Google Scholar 

  46. Sapsis, T.P., Lermusiaux, P.F.J.: Dynamically orthogonal field equations for continuous stochastic dynamical systems. Physica D: Nonlinear Phenomena 238(23–24), 2347–2360 (2009). https://doi.org/10.1016/j.physd.2009.09.017

    Article  MathSciNet  MATH  Google Scholar 

  47. Schatzman, M.: Numerical Analysis: A Mathematical Introduction. Oxford University Press, Oxford (2002)

    MATH  Google Scholar 

  48. Strogatz, S.H.: Nonlinear Dynamics and Chaos: With Applications to Physics, Biology, Chemistry, and Engineering. Westview Press, Boulder (1994)

    MATH  Google Scholar 

  49. Subramani, D.N.: Probabilistic regional ocean predictions: stochastic fields and optimal planning. Ph.D. thesis, Massachusetts Institute of Technology, Department of Mechanical Engineering, Cambridge, Massachusetts (2018)

  50. Tagare, H.D.: Notes on Optimization on Stiefel Manifolds. Yale University, New Haven (2011)

    Google Scholar 

  51. Trefethen, L.N., Bau, D.I.I.I.: Numerical Linear Algebra, vol. 50. SIAM, New Delhi (1997)

    Book  Google Scholar 

  52. Ueckermann, M., Lermusiaux, P., Sapsis, T.: Numerical schemes and computational studies for dynamically orthogonal equations. MSEAS Report 11, Department of Mechanical Engineering, Massachusetts Institute of Technology (2011). http://mseas.mit.edu/?p=1928

  53. Ueckermann, M.P., Lermusiaux, P.F.J., Sapsis, T.P.: Numerical schemes for dynamically orthogonal equations of stochastic fluid and ocean flows. J. Comput. Phys. 233, 272–294 (2013). https://doi.org/10.1016/j.jcp.2012.08.041

    Article  MathSciNet  Google Scholar 

  54. Vetra-Carvalho, S., van Leeuwen, P.J., Nerger, L., Barth, A., Altaf, M.U., Brasseur, P., Kirchgessner, P., Beckers, J.M.: State-of-the-art stochastic data assimilation methods for high-dimensional non-Gaussian problems. Tellus A: Dyn. Meteorol. Oceanogr. 70(1), 1–43 (2018). https://doi.org/10.1080/16000870.2018.1445364

    Article  Google Scholar 

  55. Villani, C.: Optimal Transport: Old and New, vol. 338. Springer, Berlin (2008)

    MATH  Google Scholar 

  56. Wen, Z., Yin, W.: A feasible method for optimization with orthogonality constraints. Math. Program. 142(1–2), 397–434 (2013)

    Article  MathSciNet  Google Scholar 

  57. Yang, H., Li, H.: Weighted polar decomposition and WGL partial ordering of rectangular complex matrices. SIAM J. Matrix Anal. Appl. 30(2), 898–924 (2008)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We thank the members of our MSEAS group at MIT. We especially thank Florian Feppon for several suggestions and for pointing us to the connection to the polar decomposition. We are grateful to the Office of Naval Research for support under grants N00014-14-1-0476 (Science of Autonomy – LEARNS) and N00014-14-1-0725 (Bays-DA) to the Massachusetts Institute of Technology. We also want to thank the anonymous referees for reading our manuscript carefully and providing helpful feedback to help us improve the presentation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pierre F. J. Lermusiaux.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

A: Proof of theorem 2.1

Proof

First, since the objective function of (4) can be rewritten as

$$\begin{aligned} \Vert \tilde{\varvec{U}}-\varvec{U}\Vert _{\mathrm {F}}^2 = \mathrm {tr}[ (\tilde{\varvec{U}}-\varvec{U})^{\mathrm {T}}(\tilde{\varvec{U}}-\varvec{U}) ] = \mathrm {tr} \tilde{\varvec{P}} + \mathrm {tr} \varvec{P} -2 \mathrm {tr} (\varvec{U}^{\mathrm {T}}\tilde{\varvec{U}}), \end{aligned}$$

we have

$$\begin{aligned} \mathop {\mathrm{arg \, min}}\limits _{\tilde{\varvec{U}}\in {\mathcal {M}}_{m\times n}:~ \tilde{\varvec{U}}^{\mathrm {T}}\tilde{\varvec{U}}=\tilde{\varvec{P}}} \Vert \tilde{\varvec{U}}-\varvec{U}\Vert _{\mathrm {F}}^2 = \mathop {\mathrm{arg \, min}}\limits _{\tilde{\varvec{U}}\in {\mathcal {M}}_{m\times n}:~ \tilde{\varvec{U}}^{\mathrm {T}}\tilde{\varvec{U}}=\tilde{\varvec{P}}} -2\mathrm {tr} (\varvec{U}^{\mathrm {T}}\tilde{\varvec{U}}). \end{aligned}$$
(59)

The Lagrangian of this optimization is thus

$$\begin{aligned} L(\tilde{\varvec{U}},\varvec{\varLambda }) = -2\mathrm {tr} (\varvec{U}^{\mathrm {T}}\tilde{\varvec{U}}) + \mathrm {tr} (\varvec{\varLambda }^{\mathrm {T}}(\tilde{\varvec{U}}^{\mathrm {T}}\tilde{\varvec{U}} - \tilde{\varvec{P}})), \end{aligned}$$
(60)

where \(\varvec{\varLambda }\in {\mathcal {M}}_{n\times n}\) is the Lagrangian multiplier for the constraint \(\tilde{\varvec{U}}^{\mathrm {T}}\tilde{\varvec{U}}=\tilde{\varvec{P}}\). The global optimizer should be one of the critical points, soFootnote 7

$$\begin{aligned} \left. \nabla _{\tilde{\varvec{U}}} L \right| _{\tilde{\varvec{U}}_*,\varvec{\varLambda }_*} = -2\varvec{U} + 2 \tilde{\varvec{U}}_* \varvec{\varLambda }_* = \varvec{0}, \qquad \left. \nabla _{\varvec{\varLambda }} L \right| _{\tilde{\varvec{U}}_*,\varvec{\varLambda }_*} = \tilde{\varvec{U}}^{\mathrm {T}}_* \tilde{\varvec{U}}_* - \tilde{\varvec{P}} = \varvec{0}. \end{aligned}$$
(61)

The first condition gives \(\varvec{U}=\tilde{\varvec{U}}_*\varvec{\varLambda }_*\) and the second is simply the second-moment constraint on \(\tilde{\varvec{U}}_*\). The symmetry of this constraint implies the symmetry of the Lagrangian multiplier \(\varvec{\varLambda }_*\). Inserting these results into the objective function (59), we obtain

$$\begin{aligned} -2\mathrm {tr} (\varvec{U}^{\mathrm {T}}\tilde{\varvec{U}}_*) = -2\mathrm {tr} (\varvec{\varLambda } \tilde{\varvec{U}}_*^{\mathrm {T}}\tilde{\varvec{U}}_*) = -2\mathrm {tr} (\varvec{\varLambda }_* \tilde{\varvec{P}}) = -2\mathrm {tr} (\sqrt{\tilde{\varvec{P}}} \varvec{\varLambda }_* \sqrt{\tilde{\varvec{P}}}^{\mathrm {T}}). \end{aligned}$$
(62)

The reason for symmetrifying the matrix in the last step will soon become clear. Since

$$\begin{aligned} \varvec{P} = \varvec{U}^{\mathrm {T}}\varvec{U} = \varvec{\varLambda }_* \tilde{\varvec{U}}_*^{\mathrm {T}}\tilde{\varvec{U}}_* \varvec{\varLambda }_* = \varvec{\varLambda }_* \tilde{\varvec{P}} \varvec{\varLambda }_*, \end{aligned}$$

the second-moment constraint reduces to \(\varvec{P}=\varvec{\varLambda }_* \tilde{\varvec{P}} \varvec{\varLambda }_*\). To relate the objective function (62) to this constraint, we use a small trick:

$$\begin{aligned} (\sqrt{\tilde{\varvec{P}}} \varvec{\varLambda }_* \sqrt{\tilde{\varvec{P}}}^{\mathrm {T}})^2&= (\sqrt{\tilde{\varvec{P}}} \varvec{\varLambda }_* \sqrt{\tilde{\varvec{P}}}^{\mathrm {T}}) (\sqrt{\tilde{\varvec{P}}} \varvec{\varLambda }_* \sqrt{\tilde{\varvec{P}}}^{\mathrm {T}}) \\&= \sqrt{\tilde{\varvec{P}}} (\varvec{\varLambda }_* \tilde{\varvec{P}} \varvec{\varLambda }_*) \sqrt{\tilde{\varvec{P}}}^{\mathrm {T}}\\&= \sqrt{\tilde{\varvec{P}}} \varvec{P} \sqrt{\tilde{\varvec{P}}}^{\mathrm {T}}. \end{aligned}$$

Therefore, \(\sqrt{\tilde{\varvec{P}}} \varvec{\varLambda }_* \sqrt{\tilde{\varvec{P}}}^{\mathrm {T}}\) must be a symmetric square root of \(\sqrt{\tilde{\varvec{P}}} \varvec{P} \sqrt{\tilde{\varvec{P}}}^{\mathrm {T}}\), which is not unique. This characterizes all the critical points of (4), among which is the desired global optimizer \(\tilde{\varvec{U}}_*=\varvec{U}\varvec{\varLambda }^{-1}_*\).

Next, we identify the global optimizer by comparing the values of the objective function evaluated at these critical points. Since \(\sqrt{\tilde{\varvec{P}}} \varvec{\varLambda }_* \sqrt{\tilde{\varvec{P}}}^{\mathrm {T}}\) is symmetric, it has an eigendecomposition

$$\begin{aligned} \sqrt{\tilde{\varvec{P}}} \varvec{\varLambda }_* \sqrt{\tilde{\varvec{P}}}^{\mathrm {T}}= \varvec{V} \mathrm {diag}\left( \lambda _1,\ldots ,\lambda _n \right) \varvec{V}^{\mathrm {T}}\end{aligned}$$

and its square \(\sqrt{\tilde{\varvec{P}}} \varvec{P} \sqrt{\tilde{\varvec{P}}}^{\mathrm {T}}\) will have the corresponding eigendecomposition

$$\begin{aligned} \sqrt{\tilde{\varvec{P}}} \varvec{P} \sqrt{\tilde{\varvec{P}}}^{\mathrm {T}}= \varvec{V} \mathrm {diag}\left( \lambda _1^2,\ldots ,\lambda _n^2 \right) \varvec{V}^{\mathrm {T}}. \end{aligned}$$
(63)

If we denote the n positive eigenvalues of \(\sqrt{\tilde{\varvec{P}}} \varvec{P} \sqrt{\tilde{\varvec{P}}}^{\mathrm {T}}\) by \(\sigma _1,\ldots ,\sigma _n\), then without loss of generality, we must have \(\sigma _i = \lambda _i^2\) and hence \(\lambda _i = \pm \sqrt{\sigma _i}\). Therefore, the objective function (62) reduces to

$$\begin{aligned} -2\mathrm {tr} \left( \varvec{U}^{\mathrm {T}}\tilde{\varvec{U}}_* \right)= & {} -2 \mathrm {tr} \left( \varvec{V} \mathrm {diag}\left( \lambda _1,\ldots ,\lambda _n \right) \varvec{V}^{\mathrm {T}} \right) \nonumber \\= & {} -2\sum _{i=1}^{n} \pm \sqrt{\sigma _i} \ge -2\sum _{i=1}^{n} \sqrt{\sigma _i}. \end{aligned}$$
(64)

We can see that the lower bound is attained if and only if \(\lambda _i = \sqrt{\sigma _i}\), \(i=1,\ldots ,n\), i.e. when \(\sqrt{\tilde{\varvec{P}}} \varvec{\varLambda }_* \sqrt{\tilde{\varvec{P}}}^{\mathrm {T}}\) takes the unique SPD square root:

$$\begin{aligned} \sqrt{\tilde{\varvec{P}}} \varvec{\varLambda }_* \sqrt{\tilde{\varvec{P}}}^{\mathrm {T}}= (\sqrt{\tilde{\varvec{P}}} \varvec{P} \sqrt{\tilde{\varvec{P}}}^{\mathrm {T}})^{1/2}. \end{aligned}$$
(65)

Therefore, the global minimizer to (4) is \(\tilde{\varvec{U}}_*=\varvec{U}\varvec{\varLambda }_*^{-1}\) with

$$\begin{aligned} \varvec{\varLambda }_*^{-1} = \sqrt{\tilde{\varvec{P}}}^{\mathrm {T}}(\sqrt{\tilde{\varvec{P}}} \varvec{P} \sqrt{\tilde{\varvec{P}}}^{\mathrm {T}})^{-1/2} \sqrt{\tilde{\varvec{P}}}. \end{aligned}$$
(66)

Note that the particular choice of square root \(\sqrt{\tilde{\varvec{P}}}\) is irrelevant because both the spectrum of \(\sqrt{\tilde{\varvec{P}}} \varvec{P} \sqrt{\tilde{\varvec{P}}}^{\mathrm {T}}\) and the global optimizer (66) are invariant under the unitary freedom \(\sqrt{\tilde{\varvec{P}}}\rightarrow \varvec{Q}\sqrt{\tilde{\varvec{P}}}\), due to the fact that \(\varvec{Q}\varvec{A}^{1/2}\varvec{Q}^{\mathrm {T}}= (\varvec{Q}\varvec{A}\varvec{Q}^{\mathrm {T}})^{1/2}\) for any SPD \(\varvec{A}\).

We remark that an alternative to using (62) is to use

$$\begin{aligned} -2\mathrm {tr} (\varvec{U}^{\mathrm {T}}\tilde{\varvec{U}}_*) = -2\mathrm {tr} (\varvec{U}^{\mathrm {T}}\varvec{U} \varvec{\varLambda }_*^{-1}) = -2\mathrm {tr} \left( \varvec{P}\varvec{\varLambda }_*^{-1} \right) = -2\mathrm {tr} (\sqrt{\varvec{P}} \varvec{\varLambda }_*^{-1} \sqrt{\varvec{P}}^{\mathrm {T}}) \end{aligned}$$

and correspondingly \(\tilde{\varvec{P}} = \varvec{\varLambda _*}^{-1}\varvec{P}\varvec{\varLambda }_*^{-1}\). This simply switches the role of \(\varvec{P}\) and \(\tilde{\varvec{P}}\) and replaces \(\varvec{\varLambda }_*\) with \(\varvec{\varLambda }_*^{-1}\). Therefore, this gives an alternative expression of \(\varvec{\varLambda }_*^{-1}\) obtained in (66), i.e.

$$\begin{aligned} \varvec{\varLambda }_*^{-1} = \sqrt{\varvec{P}}^{-1} (\sqrt{\varvec{P}} \tilde{\varvec{P}} \sqrt{\varvec{P}}^{\mathrm {T}})^{1/2} \sqrt{\varvec{P}}^{-\mathrm {T}}. \end{aligned}$$
(67)

\(\square \)

B: Generalization from \({\mathbb {R}}^m\) to a Hilbert space

In Theorem 2.1, each column of \(\varvec{U}\) or \(\tilde{\varvec{U}}\) is an element in \({\mathbb {R}}^m\) and the second-moment matrices are computed based on an inner product on \({\mathbb {R}}^m\). A natural question is whether Theorem 2.1 still holds when \({\mathbb {R}}^m\) is replaced by an infinite-dimensional Hilbert space \({\mathcal {H}}\) equipped with an inner product \(\left\langle \cdot ,\cdot \right\rangle _{\mathcal {H}}\).

More precisely, assume \(u_1,\ldots ,u_n\in {\mathcal {H}}\) are linearly independent and form an n-vector of \({\mathcal {H}}\) denoted by \(\varvec{u}=\left[ u_1,\ldots ,u_n \right] ^{\mathrm {T}}\in {\mathcal {H}}^n\). Then we can compute the second-moment matrix (a.k.a. the Gram matrix) of \(\varvec{u}\) as \(\varvec{P}=\mathrm {Gr}(\varvec{u},\varvec{u})\), where

$$\begin{aligned} \mathrm {Gr}(\varvec{u},\varvec{v}) \triangleq \left\langle \varvec{u},\varvec{v}^{\mathrm {T}} \right\rangle _{\mathcal {H}} \in {\mathcal {M}}_{n\times n} \end{aligned}$$
(68)

i.e. with the (ij)-th element of \(\mathrm {Gr}(\varvec{u},\varvec{v})\) being \(\left\langle u_i,v_j \right\rangle _{\mathcal {H}}\). If \({\mathcal {H}}^n\) is equipped with an inner product \(\left\langle \cdot ,\cdot \right\rangle _{{\mathcal {H}}^n}\) defined as

$$\begin{aligned} \left\langle \varvec{u},\varvec{v} \right\rangle _{{\mathcal {H}}^n} \triangleq \mathrm {tr} \left( \sqrt{\varvec{\varGamma }_n} \mathrm {Gr}(\varvec{u},\varvec{v}) \sqrt{\varvec{\varGamma }_n}^{\mathrm {T}} \right) \end{aligned}$$
(69)

with again an SPD weight matrix \(\varvec{\varGamma }_n\), we want to solve the following minimum-correction second-moment matching problem

$$\begin{aligned} \mathop {\mathrm{arg \, min}}\limits _{\tilde{\varvec{u}}\in {\mathcal {H}}^n: \mathrm {Gr}(\tilde{\varvec{u}},\tilde{\varvec{u}})=\tilde{\varvec{P}}} \left\| \tilde{\varvec{u}}-\varvec{u} \right\| _{{\mathcal {H}}^n}^2 \end{aligned}$$
(70)

for a given SPD \(\tilde{\varvec{P}}\in {\mathcal {M}}_{n\times n}\). Here the norm is induced by the inner product (69). As a common scenario, if \({\mathcal {H}}=L_2(\varOmega )\), which is the space of square-integrable real functions on some measure space \((\varOmega ,{\mathcal {A}},m)\) equipped with inner product \(\left\langle u(\cdot ),v(\cdot ) \right\rangle _{\mathcal {H}}=\int _\varOmega u(\omega )v(\omega ) {\mathrm {d}}m(\omega )\), then the inner product (69) on \({\mathcal {H}}^n\) can be rewritten as

$$\begin{aligned} \left\langle \varvec{u},\varvec{v} \right\rangle _{{\mathcal {H}}^n} = \int _{\varOmega } \varvec{u}(\omega )^{\mathrm {T}}\varvec{\varGamma }_n \varvec{v}(\omega ) {\mathrm {d}}m(\omega ). \end{aligned}$$
(71)

The definitions (69) and (71) are simply the Hilbert space counterparts of the two expressions in (2). Since we can get rid of \(\varvec{\varGamma }_n\) by scaling the elements in \({\mathcal {H}}^n\) by \(\sqrt{\varvec{\varGamma }_n}\) as before, we will assume \(\varvec{\varGamma }_n=\varvec{I}\) hereafter.

Theorem 2.1 does not apply to (70), but as in Sect. 2.2, if we restrict the candidate set to \(\left\{ \tilde{\varvec{u}}=\varvec{A}\varvec{u}: \varvec{A}\in {\mathcal {M}}_{n\times n} \right\} \), the Hilbert space complication will be confined to computing the Gram matrices only. Since

$$\begin{aligned} \mathrm {Gr}(\varvec{A}\varvec{u},\varvec{B}\varvec{v}) = \left\langle \varvec{A}\varvec{u},(\varvec{B}\varvec{v})^{\mathrm {T}} \right\rangle _{\mathcal {H}} = \varvec{A} \left\langle \varvec{u},\varvec{v}^{\mathrm {T}} \right\rangle _{\mathcal {H}} \varvec{B}^{\mathrm {T}}= \varvec{A} \mathrm {Gr}(\varvec{u},\varvec{v}) \varvec{B}^{\mathrm {T}}\end{aligned}$$

due to the bilinearity of an inner product, we have

$$\begin{aligned} \left\| \tilde{\varvec{u}}-\varvec{u} \right\| _{{\mathcal {H}}^n}^2 = \left\| (\varvec{A}-\varvec{I})\varvec{u} \right\| _{{\mathcal {H}}^n}^2 = \mathrm {tr} \left( (\varvec{A}-\varvec{I}) \mathrm {Gr}(\varvec{u},\varvec{u}) (\varvec{A}-\varvec{I})^{\mathrm {T}} \right) = \left\| \varvec{A}-\varvec{I} \right\| _{\mathrm {F},\varvec{P}}^2. \end{aligned}$$

Hence (70) reduces to (16) and Corollary 2.2 applies.

To establish the analog of Theorem 2.1 for (70), we need some techniques from the calculus of variation to generalize optimization in \({\mathbb {R}}^m\) to that in a Hilbert space.

Theorem B.1

(Minimum-correction 2nd-moment matching on a Hilbert space) Given \(\varvec{u}\in {\mathcal {H}}^n\) whose entries are linearly independent and \(\tilde{\varvec{P}}\) which is SPD, we have

$$\begin{aligned} \mathop {\mathrm{arg \, min}}\limits _{\tilde{\varvec{u}}\in {\mathcal {H}}^n: \mathrm {Gr}(\tilde{\varvec{u}},\tilde{\varvec{u}})=\tilde{\varvec{P}}} \left\| \tilde{\varvec{u}}-\varvec{u} \right\| _{{\mathcal {H}}^n}^2 = \varvec{A}_* \varvec{u} \end{aligned}$$
(72)

with \(\varvec{A}_*\) given by (7).

Proof

Compared to the case in \({\mathbb {R}}^m\), although \({\mathcal {S}}_{\tilde{\varvec{P}}} = \{\tilde{\varvec{u}}\in {\mathcal {H}}^n: \mathrm {Gr}(\tilde{\varvec{u}},\tilde{\varvec{u}})=\tilde{\varvec{P}}\}\) is still closed and bounded (\(\left\| \tilde{\varvec{u}} \right\| _{{\mathcal {H}}^n}^2 \equiv \mathrm {tr}(\tilde{\varvec{P}})\) on \({\mathcal {S}}_{\tilde{\varvec{P}}}\)), \({\mathcal {S}}_{\tilde{\varvec{P}}}\) is no longer compact due to the infinite dimensionality. Therefore, although the objective function is continuous and bounded from below, its infimum may not be attainable. In the following, we first assume that a global minimizer \(\tilde{\varvec{u}}_*\) exists and derive an analytic expression for it. After that, we will show that this expression, which is well-defined whether or not a global minimum exists, is indeed the unique global minimizer to (70).

Here the objective function is

$$\begin{aligned} F(\tilde{\varvec{u}}) = \left\| \tilde{\varvec{u}}-\varvec{u} \right\| _{{\mathcal {H}}^n}^2 = \mathrm {tr} \left( \mathrm {Gr}(\tilde{\varvec{u}}-\varvec{u},\tilde{\varvec{u}}-\varvec{u}) \right) = \mathrm {tr} (\tilde{\varvec{P}}) + \mathrm {tr} (\varvec{P}) - 2\mathrm {tr} \left( \mathrm {Gr}(\tilde{\varvec{u}}, \varvec{u}) \right) , \end{aligned}$$

so its first-order variation is

$$\begin{aligned} \delta F(\tilde{\varvec{u}}, \delta \tilde{\varvec{u}}) = - 2\mathrm {tr} \left( \mathrm {Gr}(\delta \tilde{\varvec{u}}, \varvec{u}) \right) . \end{aligned}$$

The constraint on \(\tilde{\varvec{u}}\) is \(\mathrm {Gr}(\tilde{\varvec{u}},\tilde{\varvec{u}}) = \tilde{\varvec{P}}\), so the constraint on \(\delta \tilde{\varvec{u}}\) is

$$\begin{aligned} \mathrm {Gr}(\delta \tilde{\varvec{u}},\tilde{\varvec{u}}) + \mathrm {Gr}(\tilde{\varvec{u}},\delta \tilde{\varvec{u}}) = 0, \end{aligned}$$
(73)

which is simply saying that \(\mathrm {Gr}(\delta \tilde{\varvec{u}},\tilde{\varvec{u}})\) is anti-symmetric. If \(\tilde{\varvec{u}}\) is a local minimizer to (72), we must have \(\delta F(\tilde{\varvec{u}}, \delta \tilde{\varvec{u}})\equiv 0\) for any \(\delta \tilde{\varvec{u}}\in {\mathcal {H}}^n\) that satisfies (73).

Now we prove by contradiction that each entry of \(\varvec{u}\) must be a linear combination of the entries of a local minimizer \(\tilde{\varvec{u}}\), which means \(\varvec{u}=\varvec{A}\tilde{\varvec{u}}\) for some \(\varvec{A}\in {\mathcal {M}}_{n\times n}\). If this was not the case, without loss of generality, we can assume \(u_1\notin \mathrm {span}\left\{ {\tilde{u}}_1,\ldots ,{\tilde{u}}_n \right\} \subset {\mathcal {H}}\). Therefore, the projection of \(u_1\) onto \(\mathrm {span}\left\{ {\tilde{u}}_1,\ldots ,{\tilde{u}}_n \right\} ^\perp \) must be nonzero and we denote it by v. Now we construct \(\delta \tilde{\varvec{u}} = \left[ v, 0, \ldots , 0 \right] ^{\mathrm {T}}\). We can see that \(\mathrm {Gr}(\delta \tilde{\varvec{u}},\tilde{\varvec{u}})=0\) because \(\left\langle v,{\tilde{u}}_i \right\rangle _{\mathcal {H}}=0,~i=1,\ldots ,n\). Hence, (73) is satisfied. On the other hand, since \(\left\langle v,u_1 \right\rangle _{\mathcal {H}} = \left\| v \right\| _{\mathcal {H}}^2 > 0\), we have

$$\begin{aligned} \delta F(\tilde{\varvec{u}}, \delta \tilde{\varvec{u}}) = - 2\mathrm {tr} \left( \mathrm {Gr}(\delta \tilde{\varvec{u}}, \varvec{u}) \right) = - 2 \sum _{i=1}^{n} \left\langle \delta {\tilde{u}}_i,u_i \right\rangle _{\mathcal {H}} = - 2 \left\langle v,u_1 \right\rangle _{\mathcal {H}} < 0, \end{aligned}$$

which contradicts the assumption that \(\tilde{\varvec{u}}\) is a local minimizer. Therefore, any local minimizer \(\tilde{\varvec{u}}\) must satisfy \(\varvec{u}=\varvec{A}\tilde{\varvec{u}}\) for some \(\varvec{A}\in {\mathcal {M}}_{n\times n}\). Since the entries of \(\varvec{u}\) are linearly independent, \(\varvec{A}\) must be invertible and thus \(\tilde{\varvec{u}} = \varvec{A}^{-1}\varvec{u}\).

Since we have already shown that with the candidate set restricted to \(\left\{ \tilde{\varvec{u}}=\varvec{A}\varvec{u} \right\} \), Corollary 2.2 applies. This completes the proof of (72) when a global minimum exists.

Note that no matter whether a global minimum exists or not, we can always construct \(\tilde{\varvec{u}}_* = \varvec{A}_* \varvec{u} \in {\mathcal {S}}_{\tilde{\varvec{P}}}\) with \(\varvec{A}_*\) given by (7). Therefore, if we can show \(\left\| \tilde{\varvec{u}}_*-\varvec{u} \right\| _{{\mathcal {H}}^n}\) is indeed a lower bound of \(\left\| \tilde{\varvec{u}}-\varvec{u} \right\| _{{\mathcal {H}}^n}\) on \({\mathcal {S}}_{\tilde{\varvec{P}}}\), we can prove the existence and uniqueness of a global minimizer.

Since \(F(\tilde{\varvec{u}}) - \mathrm {tr} (\tilde{\varvec{P}}) - \mathrm {tr} (\varvec{P}) = - 2\mathrm {tr} \left( \mathrm {Gr}(\tilde{\varvec{u}}, \varvec{u}) \right) = - 2\left\langle \tilde{\varvec{u}},\varvec{u} \right\rangle _{{\mathcal {H}}^n}\), we want to show \(\left\langle \tilde{\varvec{u}},\varvec{u} \right\rangle _{{\mathcal {H}}^n} \le \left\langle \tilde{\varvec{u}}_*,\varvec{u} \right\rangle _{{\mathcal {H}}^n}\) for any \(\tilde{\varvec{u}} \in {\mathcal {S}}_{\tilde{\varvec{P}}}\) by the Cauchy-Schwarz inequality. Since \(\tilde{\varvec{u}}_*\) is not parallel to \(\varvec{u}\) in general, we need two linear transforms \(\tilde{\varvec{u}}=\varvec{A}_*^{1/2}\tilde{\varvec{v}}\) and \(\varvec{u}=\varvec{A}_*^{-1/2}\varvec{v}\) to make sure that the Cauchy-Schwarz inequality attains its equal sign when \(\tilde{\varvec{u}}=\varvec{A}_*\varvec{u}\). Therefore, we have

$$\begin{aligned} \left\langle \tilde{\varvec{u}},\varvec{u} \right\rangle _{{\mathcal {H}}^n} = \mathrm {tr} \left( \varvec{A}_*^{1/2} \left\langle \tilde{\varvec{v}},\varvec{v}^{\mathrm {T}} \right\rangle _{{\mathcal {H}}}\varvec{A}_*^{-1/2} \right) = \mathrm {tr} \left( \left\langle \tilde{\varvec{v}},\varvec{v}^{\mathrm {T}} \right\rangle _{{\mathcal {H}}} \right) = \left\langle \tilde{\varvec{v}},\varvec{v} \right\rangle _{{\mathcal {H}}^n} \le \left\| \tilde{\varvec{v}} \right\| _{{\mathcal {H}}^n} \left\| \varvec{v} \right\| _{{\mathcal {H}}^n}, \end{aligned}$$

where the last step is the Cauchy-Schwarz inequality. On the other hand, we have

$$\begin{aligned} \left\| \varvec{v} \right\| _{{\mathcal {H}}^n}^2&= \left\langle \varvec{A}_*^{1/2}\varvec{u},\varvec{A}_*^{1/2}\varvec{u} \right\rangle _{{\mathcal {H}}^n} = \mathrm {tr} \left( \varvec{A}_*^{1/2} \left\langle \tilde{\varvec{u}},\varvec{u}^{\mathrm {T}} \right\rangle _{{\mathcal {H}}} \varvec{A}_*^{1/2} \right) = \mathrm {tr} \left( \varvec{A}_* \varvec{P} \right) , \\ \left\| \tilde{\varvec{v}} \right\| _{{\mathcal {H}}^n}^2&= \left\langle \varvec{A}_*^{-1/2}\tilde{\varvec{u}},\varvec{A}_*^{-1/2}\tilde{\varvec{u}} \right\rangle _{{\mathcal {H}}^n} = \mathrm {tr} \left( \tilde{\varvec{P}}\varvec{A}_*^{-1} \right) = \mathrm {tr} \left( \varvec{A}_* \varvec{P} \right) , \\ \left\langle \tilde{\varvec{u}}_*,\varvec{u} \right\rangle _{{\mathcal {H}}^n}&= \left\langle \varvec{A}_*\varvec{u},\varvec{u} \right\rangle _{{\mathcal {H}}^n} = \mathrm {tr} \left( \varvec{A}_* \varvec{P} \right) , \end{aligned}$$

where the last equal sign in the second line is due to the second-moment matching constraint \(\varvec{A}_*\varvec{P}\varvec{A}_*=\tilde{\varvec{P}}\). Combining the above results, we have shown

$$\begin{aligned} \left\langle \tilde{\varvec{u}},\varvec{u} \right\rangle _{{\mathcal {H}}^n} \le \mathrm {tr} \left( \varvec{A}_* \varvec{P} \right) = \left\langle \tilde{\varvec{u}}_*,\varvec{u} \right\rangle _{{\mathcal {H}}^n} \end{aligned}$$

for any \(\tilde{\varvec{u}}\in {\mathcal {S}}_{\tilde{\varvec{P}}}\), which completes the proof. \(\square \)

Theorem 2.2 can also be generalized to the case with \({\mathbb {R}}^m\) replaced by a Hilbert space \({\mathcal {H}}\). This continuous setup can be intuitively viewed as the discrete one with \(m=\infty \), which implies that we always have \(m>n\ge {\tilde{r}}\) so the modification in step 1 is never needed. We state this generalization in the following theorem, whose proof is omitted since it is totally in parallel with that of Theorem 2.2.

Theorem B.2

(Degenerate counterpart of Theorem B.1) Given \(\varvec{u}\in {\mathcal {H}}^n\) with \(\varvec{P}=\mathrm {Gr}(\varvec{u},\varvec{u})\in {\mathcal {M}}_{n\times n}\) and \(\mathrm {rank}(\varvec{P})=\mathrm {dim}(\mathrm {span}\{u_1,\ldots ,u_n\})=r\), and semi-SPD \(\tilde{\varvec{P}}\in {\mathcal {M}}_{n\times n}\) with \(\mathrm {rank}(\tilde{\varvec{P}})={\tilde{r}}\), we have

$$\begin{aligned} \mathop {\mathrm{arg \, min}}\limits _{\tilde{\varvec{u}}\in {\mathcal {H}}^n: \mathrm {Gr}(\tilde{\varvec{u}},\tilde{\varvec{u}})=\tilde{\varvec{P}}} \left\| \tilde{\varvec{u}}-\varvec{u} \right\| _{{\mathcal {H}}^n}^2 = \tilde{\varvec{u}}_* = \tilde{\varvec{Z}} \tilde{\varvec{s}}_* = \tilde{\varvec{Z}} (\varvec{Z} \tilde{\varvec{w}}_* + \varvec{Z}_\perp \tilde{\varvec{w}}_{\perp *}) \end{aligned}$$
(74)

where \(\tilde{\varvec{s}}_* = \varvec{Z} \tilde{\varvec{w}}_* + \varvec{Z}_\perp \tilde{\varvec{w}}_{\perp *} \in {\mathcal {H}}^{{\tilde{r}}}\). Moreover, the columns of \(\tilde{\varvec{Z}}\in {\mathcal {M}}_{n\times {\tilde{r}}}\) form an orthonormal basis of \(\mathrm {Row}(\tilde{\varvec{P}})\) and the columns of \(\varvec{Z}\in {\mathcal {M}}_{{\tilde{r}}\times r'}\) form an orthonormal basis of \(\mathrm {Row}(\tilde{\varvec{Z}}^{\mathrm {T}}\varvec{P}\tilde{\varvec{Z}})\) with

$$\begin{aligned} r' = {\tilde{r}} - \mathrm {dim}(\mathrm {Row}(\tilde{\varvec{P}}) \cap \mathrm {Row}(\varvec{P})^\perp ) = r - \mathrm {dim}(\mathrm {Row}(\varvec{P})\cap \mathrm {Row}(\tilde{\varvec{P}})^\perp ). \end{aligned}$$
(75)

Finally, \(\tilde{\varvec{w}}_*\) and \(\tilde{\varvec{w}}_{\perp *}\) are given by

$$\begin{aligned} \tilde{\varvec{w}}_* = \mathop {\mathrm{arg \, min}}\limits _{\tilde{\varvec{w}}\in {\mathcal {H}}^{r'}:~ \mathrm {Gr}(\tilde{\varvec{w}}, \tilde{\varvec{w}}) = \varvec{Z}^{\mathrm {T}}\tilde{\varvec{Z}}^{\mathrm {T}}\tilde{\varvec{P}}\tilde{\varvec{Z}} \varvec{Z}} \Vert \tilde{\varvec{w}} - \varvec{Z}^{\mathrm {T}}\tilde{\varvec{Z}}^{\mathrm {T}}\varvec{u}\Vert _{{\mathcal {H}}^{r'}}^2 \end{aligned}$$
(76)

and \(\tilde{\varvec{w}}_{\perp *} = \varvec{B}_{12}^{\mathrm {T}}\varvec{B}_{11}^{-1} \tilde{\varvec{w}}_* + \varvec{v}_* \in {\mathcal {H}}^{{\tilde{r}}-r'}\) with arbitrary \(\varvec{v}_*\) such that \(\mathrm {Gr}(\varvec{v}_*,\tilde{\varvec{w}}_*)=\varvec{0}\), i.e. \(v_{*,i}\in \mathrm {span}\{{\tilde{w}}_{*,1},\ldots ,{\tilde{w}}_{*,r'}\}^\perp \subset {\mathcal {H}}\) for \(i=1,\ldots ,({\tilde{r}}-r')\), and \(\mathrm {Gr}(\varvec{v}_*,\varvec{v}_*) =\varvec{B}_{22}-\varvec{B}_{12}^{\mathrm {T}}\varvec{B}_{11}^{-1}\varvec{B}_{12}\). The \(\varvec{B}_{ij}\) blocks are defined as:

$$\begin{aligned} \begin{bmatrix} \varvec{B}_{11} &{}\quad \varvec{B}_{12} \\ \varvec{B}_{12}^{\mathrm {T}}&{}\quad \varvec{B}_{22} \end{bmatrix} \triangleq \mathrm {Gr}\left( \begin{bmatrix} \tilde{\varvec{w}} \\ \tilde{\varvec{w}}_\perp \end{bmatrix}, \begin{bmatrix} \tilde{\varvec{w}} \\ \tilde{\varvec{w}}_\perp \end{bmatrix} \right) = \begin{bmatrix} \varvec{Z}^{\mathrm {T}}\\ \varvec{Z}^{\mathrm {T}}_\perp \end{bmatrix} \tilde{\varvec{Z}}^{\mathrm {T}}\tilde{\varvec{P}}\tilde{\varvec{Z}} \begin{bmatrix} \varvec{Z}, \varvec{Z}_\perp \end{bmatrix}. \end{aligned}$$

Besides, the global minimum of the objective function is

$$\begin{aligned} \Vert \tilde{\varvec{u}}_*-\varvec{u} \Vert _{{\mathcal {H}}^n}^2 = \mathrm {tr}[\varvec{P} + \tilde{\varvec{P}} - 2 (\varvec{P}\tilde{\varvec{P}})^{1/2}]. \end{aligned}$$
(77)

All previous remarks regarding Theorem 2.2 apply here as well. Moreover, this theorem also extends our optimal transport interpretation in “Appendix C” to the most general case with arbitrary semi-SPD covariance matrix \(\varvec{\varSigma }_\mu \) and \(\varvec{\varSigma }_\nu \), unlike most previous literature (e.g. [12, 40]) on this topic, which essentially restrict themselves in cases with \(\mathrm {Row}(\varvec{\varSigma }_\nu ) \subset \mathrm {Row}(\varvec{\varSigma }_\mu )\). In particular, Theorem B.2 implies that the optimal coupling \(\pi _*\) between \(\mu \) and \(\nu \) (which is a joint distribution with \(\mu \) and \(\nu \) being its marginals) for matching only the first two moments is deterministic (i.e. corresponds to a bijection between \(\varvec{u}\sim \mu \) and \(\tilde{\varvec{u}}\sim \nu \)) if and only if \(\mathrm {rank}(\varvec{\varSigma }_\mu )=\mathrm {rank}(\varvec{\varSigma }_\nu ) =\mathrm {rank}(\varvec{\varSigma }_\mu \varvec{\varSigma }_\nu )\). The optimal coupling is deterministic in the direction from \(\mu \) to \(\nu \) (i.e. \(\pi _*\) induces a map from \(\varvec{u}\) to \(\tilde{\varvec{u}}\) or the conditional \(\pi _{\nu |\mu }\) is always a Dirac delta) if and only if \(\mathrm {Row}(\varvec{\varSigma }_\nu ) \cap \mathrm {Row}(\varvec{\varSigma }_\mu )^\perp = 0\), of which the assumption \(\mathrm {Row}(\varvec{\varSigma }_\nu ) \subset \mathrm {Row}(\varvec{\varSigma }_\mu )\) usually made in previous literature is a special case.

C: Connections to optimal transport

Our main results Theorems 2.1 and B.1 are also closely connected to the problem of optimal transport (a.k.a. the Monge-Kantorovich minimization problem [55, p. 10]), which concerns the coupling (in the form of a joint distribution) between two given probability distributions that minimizes an associated transport cost. More precisely, in a particular setting that is relevant to our context, given two PDFs \(\mu (\cdot )\) and \(\nu (\cdot )\) with finite second moments over \({\mathbb {R}}^n\), the problem concerns the joint distribution \(\pi (\cdot ,\cdot )\) over \({\mathbb {R}}^n\times {\mathbb {R}}^n\), which can be degenerate, such that \(\pi \) has \(\mu \) and \(\nu \) as marginals and the \(L_2\) distance between the two marginal random variables are minimized, i.e.

$$\begin{aligned} \pi _* = \mathop {\mathrm{arg \, min}}\limits _{\pi \in \varPi _{\mu ,\nu }} \int _{{\mathbb {R}}^n\times {\mathbb {R}}^n} \Vert \varvec{x}-\varvec{y}\Vert ^2 \pi (\varvec{x},\varvec{y}) {\mathrm {d}}\varvec{x}{\mathrm {d}}\varvec{y}, \end{aligned}$$
(78)

where \(\Vert \cdot \Vert \) is the Euclidean distance in \({\mathbb {R}}^n\) and

$$\begin{aligned} \varPi _{\mu ,\nu } = \left\{ \pi (\cdot ,\cdot )\in \mathrm {PDFs}: \int _{{\mathbb {R}}^n} \pi (\varvec{x},\varvec{y}){\mathrm {d}}\varvec{y}=\mu (\varvec{x}), \int _{{\mathbb {R}}^n} \pi (\varvec{x},\varvec{y}){\mathrm {d}}\varvec{x}=\nu (\varvec{y}) \right\} . \end{aligned}$$
(79)

Under the above conditions, a unique optimizer \(\pi _*\) exists [55, p. 11] and it turns out to be deterministic, i.e. \(\pi _*(\varvec{x},\varvec{y})=\delta _{\varvec{y}=\varvec{T}_*(\varvec{x})}\mu (\varvec{x})\) for some mapping \(\varvec{T}_*:{\mathbb {R}}^n\rightarrow {\mathbb {R}}^n\) such that \(\int \pi _*(\varvec{x},\varvec{y}){\mathrm {d}}\varvec{x}=\nu (\varvec{y})\). Moreover, the quantity

$$\begin{aligned} W_2(\mu ,\nu ) = \min _{\pi \in \varPi _{\mu ,\nu }} \left( \int _{{\mathbb {R}}^n\times {\mathbb {R}}^n} \Vert \varvec{x}-\varvec{y}\Vert ^2 \pi (\varvec{x},\varvec{y}) {\mathrm {d}}\varvec{x}{\mathrm {d}}\varvec{y} \right) ^{1/2} \end{aligned}$$
(80)

is called the \(L_2\) Wasserstein distance between \(\mu \) and \(\nu \) [55, ch. 6].

In general, the optimal transport mapping \(\varvec{T}_*(\cdot )\) is nonlinear. However, if we relax the constraint of marginal matching to only first- and second-moment matching, Theorem B.1 implies that \(\varvec{T}_*(\cdot )\) is indeed affine with the linear part being SPD, even when \(\mu \) and \(\nu \) do not have a well-defined PDF (i.e. not absolutely continuous with respect to the Lebesgue measure). To show this, first notice that the optimization (78) over joint distributions on \({\mathbb {R}}^n\times {\mathbb {R}}^n\) can be translated into an equivalent problem over random vectors in \({\mathbb {R}}^n\):

$$\begin{aligned} \tilde{\varvec{u}}_* = \mathop {\mathrm{arg \, min}}\limits _{\tilde{\varvec{u}}\sim \nu } {\mathbb {E}}\left[ \Vert \tilde{\varvec{u}}-\varvec{u}\Vert ^2 \right] \end{aligned}$$
(81)

with some \(\varvec{u}\sim \mu \). Next, denote the mean of \(\mu \) and \(\nu \) by \(\varvec{m}_\mu \) and \(\varvec{m}_\nu \), respectively, and also the covariance matrices by \(\varvec{\varSigma }_\mu \) and \(\varvec{\varSigma }_\nu \). Then the objective function in (81) becomes

$$\begin{aligned} {\mathbb {E}}\left[ \Vert \tilde{\varvec{u}}-\varvec{u}\Vert ^2 \right] = \Vert \varvec{m}_\mu -\varvec{m}_\nu \Vert ^2 + {\mathbb {E}}\left[ \Vert \tilde{\varvec{u}}'-\varvec{u}'\Vert ^2 \right] , \end{aligned}$$

where the primes indicate the centered random vectors. Now if we relax the distribution matching constraint \(\tilde{\varvec{u}}\sim \nu \) to only mean and covariance matching, i.e. \({\mathbb {E}}[\tilde{\varvec{u}}]=\varvec{m}_\nu \) and \({\mathbb {E}}[\tilde{\varvec{u}}^{'}\tilde{\varvec{u}}^{'\mathrm {T}}]=\varvec{\varSigma }_\nu \), the optimal transport problem (81) is simplified to the task of minimum-correction second-moment matching:

$$\begin{aligned} \tilde{\varvec{u}}'_* = \mathop {\mathrm{arg \, min}}\limits _{{\mathbb {E}}[\tilde{\varvec{u}}^{'}\tilde{\varvec{u}}^{'\mathrm {T}}] =\varvec{\varSigma }_\nu } {\mathbb {E}}\left[ \Vert \tilde{\varvec{u}}-\varvec{u}\Vert ^2 \right] . \end{aligned}$$
(82)

If we set \({\mathcal {H}}=L_2(\varOmega )\) as the Hilbert space of all square-integrable random variables defined over the probability space \((\varOmega ,{\mathcal {A}},{\mathcal {P}})\) on which \(\varvec{u}\) is defined, then Theorem B.1 applies to (82) and yields \(\tilde{\varvec{u}}_*=\varvec{T}_*(\varvec{u})=\varvec{A}_*(\varvec{u}-\varvec{m}_\mu )+\varvec{m}_\nu \) with

$$\begin{aligned} \varvec{A}_* = \sqrt{\varvec{\varSigma }_\mu }^{-1} (\sqrt{\varvec{\varSigma }_\mu } \varvec{\varSigma }_\nu \sqrt{\varvec{\varSigma }_\mu }^{\mathrm {T}})^{1/2} \sqrt{\varvec{\varSigma }_\mu }^{-\mathrm {T}}. \end{aligned}$$
(83)

Moreover, since

$$\begin{aligned} {\mathbb {E}}\left[ \Vert \tilde{\varvec{u}}'-\varvec{u}'\Vert ^2 \right] = \mathrm {tr}(\varvec{\varSigma }_\mu ) + \mathrm {tr}(\varvec{\varSigma }_\nu ) -2\mathrm {tr}[\mathrm {Cov}(\tilde{\varvec{u}},\varvec{u})] \end{aligned}$$
(84)

and

$$\begin{aligned} \mathrm {tr}[\mathrm {Cov}(\tilde{\varvec{u}}_*,\varvec{u})] = \mathrm {tr}[\mathrm {Cov}(\varvec{A}_*\varvec{u}',\varvec{u}')] = \mathrm {tr} [(\sqrt{\varvec{\varSigma }_\mu } \varvec{\varSigma }_\nu \sqrt{\varvec{\varSigma }_\mu }^{\mathrm {T}})^{1/2}] = \mathrm {tr} [(\varvec{\varSigma }_\mu \varvec{\varSigma }_\nu )^{1/2}], \end{aligned}$$

we have

$$\begin{aligned} W_{2}^2(\mu ,\varvec{T}_*\mu ) = \Vert \varvec{m}_\mu -\varvec{m}_\nu \Vert ^2 + \mathrm {tr}[\varvec{\varSigma }_\mu + \varvec{\varSigma }_\nu -2(\varvec{\varSigma }_\mu \varvec{\varSigma }_\nu )^{1/2}], \end{aligned}$$
(85)

where the distribution \(\varvec{T}_*\mu \) is the push-forward of \(\mu \) by \(\varvec{T}_*\). Note that although \(\varvec{T}_*\mu \) share the same mean and covariance with \(\nu \), in general \(\varvec{T}_*\mu \ne \nu \) and \(W_{2}(\mu ,\varvec{T}_*\mu ) < W_{2}(\mu ,\nu )\), so (85) indeed provides a lower bound for \(W_2(\mu ,\nu )\) based on the first two moments of \(\mu \) and \(\nu \).

If we partition the set of all distributions (possibly degenerate) over \({\mathbb {R}}^n\) that have finite second moments into equivalent classes by their means and covariances then (85) induces a metric on this quotient space. Denote by \({\mathcal {S}}_{\varvec{m},\varvec{\varSigma }}\) the equivalent class of all distributions with mean \(\varvec{m}\) and covariance \(\varvec{\varSigma }\). Given \({\mathcal {S}}_{\varvec{m}_2,\varvec{\varSigma }_2}\) and \(\mu \in {\mathcal {S}}_{\varvec{m}_1,\varvec{\varSigma }_1}\), there exists a unique \({\tilde{\mu }}=\varvec{T}_*\mu \in {\mathcal {S}}_{\varvec{m}_2,\varvec{\varSigma }_2}\) that is the closest to \(\mu \) under the distance \(W_2(\cdot ,\cdot )\). As a generalization to the one-to-one correspondence of (9) and the intuition illustrated in Fig. 1, the members in any two equivalent classes \({\mathcal {S}}_{\varvec{m}_1,\varvec{\varSigma }_1}\) and \({\mathcal {S}}_{\varvec{m}_2,\varvec{\varSigma }_2}\) can be paired up by such minimum-\(W_2\)-distance matching. Moreover, since this distance between \(\mu \) and \(\varvec{T}_*\mu \) depends only on \({\mathcal {S}}_{\varvec{m}_1,\varvec{\varSigma }_1}\) and \({\mathcal {S}}_{\varvec{m}_2,\varvec{\varSigma }_2}\) but not on the particular member \(\mu \), \(W_2(\mu ,\varvec{T}_*\mu )\) can be used to define the distance between \({\mathcal {S}}_{\varvec{m}_1,\varvec{\varSigma }_1}\) and \({\mathcal {S}}_{\varvec{m}_2,\varvec{\varSigma }_2}\). This is sometimes called the Fréchet distance [12]. Note that the covariance part of (85) can also be isolated and used as a metric among SPD matrices.

Although in general (85) is only a lower bound for \(W_2(\mu ,\nu )\), if both \(\mu \) and \(\nu \) belong to a family of distributions closed under affine transforms and each member of this family is uniquely determined by its mean and covariance, then we have \(\varvec{T}_*\mu =\nu \) and thus \(W_2(\mu ,\nu )\) coincides with (85). [12] mentions one typical example of such where the distribution family is characterized by a radial kernel \(\phi :[0,\infty )\rightarrow [0,\infty )\) that satisfies \(0<\int _{0}^{\infty }r^{2n+1}\phi (r^2){\mathrm {d}}r < \infty \), which guarantees the finiteness of second moments. In this family, a member \(\mu \)’s PDF takes the form \(\mu (\varvec{x})\propto \phi ((\varvec{x}-\varvec{m})^{\mathrm {T}}\varvec{A}(\varvec{x}-\varvec{m}))\) with some SPD \(\varvec{A}\), where \(\varvec{m}\) is the mean and \(\varvec{A}^{-1}\) is a multiple of the covariance. In other words, this family is generated by pushing forward a non-degenerate elliptically contoured distribution that has finite second moments by invertible affine transforms. It can be checked that within such a family, a distribution can be uniquely identified by its mean and covariance. Therefore, given \(\mu \) and \(\nu \), we always have \(\varvec{T}_*\mu =\nu \) and thus \(W_2(\mu ,\nu )\) can be computed by (85). The most familiar example of such is probably the family of Gaussian distributions corresponding to the exponential kernel \(\phi (r)=e^{-r}\), for which the above results are well known.

To the best of our knowledge, the earliest work that derives the right hand side expression in (85) is in [12, 40]. However, they did not connect this result to the problem of optimal transport. Moreover, they translate the optimization into one over the cross-covariance matrix \(\varvec{B}=\mathrm {Cov}(\varvec{u},\tilde{\varvec{u}})\), which does not necessarily uniquely determine \(\tilde{\varvec{u}}\) given \(\varvec{u}\) if the optimizer \(\varvec{B}_*\) does not correspond to a deterministic affine coupling between the two. In Theorem B.1, although the objective function is also simplified to \(\mathrm {tr}(\mathrm {Cov}(\varvec{u},\tilde{\varvec{u}}))\) (or its empirical counterpart (59) in the discrete case of Theorem 2.1), we have provided an alternative proof based on optimizing on \(\tilde{\varvec{u}}\) directly rather than on \(\varvec{B}=\mathrm {Cov}(\varvec{u},\tilde{\varvec{u}})\).

Finally, note also that the more general result Theorem B.1 on a Hilbert space indeed contains the special case on \({\mathbb {R}}^m\) described by Theorem 2.1. Moreover, if we view Theorem 2.1 through the lens of Theorem B.1, we can identify a discrete counterpart of the optimal transport interpretation for Theorem 2.1. The key observation is that Theorem 2.1 is equivalent to Theorem B.1 with \({\mathcal {H}}={\mathbb {R}}^m\), which on the other hand, corresponds to (82) with \(\mu \) being a discrete distribution over \({\mathbb {R}}^n\) with m particles, i.e. with its (generalized) PDF being the sum of m Dirac deltas. Therefore, the optimal transport interpretation of Theorem B.1 implies that \(\tilde{\varvec{u}}_*\) also follows a discrete distribution with its m particles given by the rows of \(\tilde{\varvec{U}}_*\) in Theorem 2.1. This is actually a stronger result than Theorem 2.1 because in the latter, we have restricted the feasible \({\tilde{\mu }}\)’s within the m-particle discrete distributions a priori.

For an exposition of other connections between the optimal transport and the polar decomposition, see [4, 7].

D: Gradient descent for matrix square root

[18] proposes to solve the equation \(\varvec{X}^{\mathrm {T}}\varvec{P}\varvec{X}=\varvec{I}\) for an \(\varvec{X}\) close to \(\varvec{I}\) by solving the unconstrained optimization \(\mathop {\mathrm{arg \, min}}\limits _{\varvec{X}}\Vert \varvec{X}^{\mathrm {T}}\varvec{PX}-\varvec{I}\Vert _\mathrm {F}^2\) iteratively with the initial guess \(\varvec{X}_0 = \varvec{I}\).

The objective function attains its global minimum 0 at every \(\varvec{X}\) that solves \(\varvec{X}^{\mathrm {T}}\varvec{P}\varvec{X}=\varvec{I}\). The hope is that if we use an iterative scheme and start from \(\varvec{I}\), the iterates will converge to a solution \(\varvec{X}\) close to \(\varvec{I}\). [18] proposes to use a simple gradient descent iteration. Since \(f(\varvec{X}) = \Vert \varvec{X}^{\mathrm {T}}\varvec{PX}-\varvec{I}\Vert ^2 = \mathrm {tr}(\varvec{X}^{\mathrm {T}}\textit{\textbf{PX X}}^{\mathrm {T}}\varvec{PX}) -2\mathrm {tr}(\varvec{X}^{\mathrm {T}}\varvec{PX}) + \mathrm {tr}(\varvec{I})\), we can compute the gradient analytically as

$$\begin{aligned} \nabla _{\varvec{X}} f = 4\varvec{PX}(\varvec{X}^{\mathrm {T}}\varvec{PX} - \varvec{I}). \end{aligned}$$
(86)

Therefore, the gradient descent iteration takes the form

$$\begin{aligned} \varvec{X}_{n+1} = \varvec{X}_n - \gamma \varvec{P}\varvec{X}_n(\varvec{X}_n^{\mathrm {T}}\varvec{P} \varvec{X}_n - \varvec{I}) \end{aligned}$$
(87)

with \(\gamma \) being a tunable step size. This is Algorithm 3 in [18, sec. 3.5]. However, [18] does not mention how to pick \(\gamma \), when the iteration converges, and how fast.

We can analyze this iteration by exactly the same techniques used in Sect. 3.2. First, we have \(\varvec{X}_k \rightarrow \varvec{P}^{-1/2} \varvec{Q}_0\) for

$$\begin{aligned} \varvec{X}_0 = \varvec{S}_0 \varvec{Q}_0, \quad \varvec{S}_0\in {\mathcal {C}}_{\varvec{P}}, \quad \Vert \varvec{X}_0\Vert _2\le 1/\Vert \varvec{P}\Vert _2^{1/2}, \quad \gamma \le 1/(2\Vert \varvec{P}\Vert _2), \end{aligned}$$
(88)

since the matrix iteration can be reduced to the scalar one \(\sigma _{k+1}=\sigma _k -\gamma \lambda \sigma _k(\sigma _k^2 - 1)\)Footnote 8, where \(\lambda \) is an eigenvalue of \(\varvec{P}\) and \(\sigma _k/\sqrt{\lambda }\) is the corresponding singular value of \(\varvec{X}_k\). Next, the stability analysis shows that the Fréchet derivative at the limit \(\varvec{X}_*\) is

$$\begin{aligned} \mathrm {D}\varvec{F}(\varvec{X}) = \varvec{X} -\gamma \varvec{\varLambda } \varvec{X} - \gamma \varvec{\varLambda }^{1/2}\varvec{X}^{\mathrm {T}}\varvec{\varLambda }^{1/2}, \end{aligned}$$
(89)

which has eigenvectors and eigenvalues as

$$\begin{aligned} \begin{aligned} \varvec{Z}_{ij}&= \varvec{E}_{ij} + \sqrt{\lambda _j/\lambda _i}\varvec{E}_{ji}, \quad \mu _{ij} = 1-\gamma (\lambda _i+\lambda _j), \quad i\le j, \\ \varvec{Z}_{ij}&= \varvec{E}_{ij} - \sqrt{\lambda _i/\lambda _j}\varvec{E}_{ji}, \quad \mu _{ij} = 1, \quad i>j. \end{aligned} \end{aligned}$$
(90)

Here the \(\lambda _i\)’s are the diagonal entries of \(\varvec{\varLambda }\) and are also the eigenvalues of \(\varvec{P}\). Therefore, for \(\rho (\mathrm {D}\varvec{F})<1\), we need \(\gamma < 1/\Vert \varvec{P}\Vert _2\), which is less stringent than (88). Moreover, the iteration converges linearly with the asymptotic error diminishing coefficient not smaller than \((\lambda _\mathrm {max}-\lambda _\mathrm {min}) /(\lambda _\mathrm {max}+\lambda _\mathrm {min})\). Therefore, the gradient descent is considered unattractive here compared to the quadratically converging iterations (45) and (46).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lin, J., Lermusiaux, P.F.J. Minimum-correction second-moment matching: theory, algorithms and applications. Numer. Math. 147, 611–650 (2021). https://doi.org/10.1007/s00211-021-01178-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00211-021-01178-8

Mathematics Subject Classification

Navigation