Skip to main content
Log in

Modeling of missing dynamical systems: deriving parametric models using a nonparametric framework

  • Research
  • Published:
Research in the Mathematical Sciences Aims and scope Submit manuscript

Abstract

In this paper, we consider modeling missing dynamics with a nonparametric non-Markovian model, constructed using the theory of kernel embedding of conditional distributions on appropriate reproducing kernel Hilbert spaces (RKHS), equipped with orthonormal basis functions. Depending on the choice of the basis functions, the resulting closure model from this nonparametric modeling formulation is in the form of parametric model. This suggests that the success of various parametric modeling approaches that were proposed in various domains of applications can be understood through the RKHS representations. When the missing dynamical terms evolve faster than the relevant observable of interest, the proposed approach is consistent with the effective dynamics derived from the classical averaging theory. In the linear Gaussian case without the time-scale gap, we will show that the proposed non-Markovian model with a very long memory yields an accurate estimation of the nontrivial autocovariance function for the relevant variable of the full dynamics. The supporting numerical results on instructive nonlinear dynamics show that the proposed approach is able to replicate high-dimensional missing dynamical terms on problems with and without the separation of temporal scales.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Berry, T., Harlim, J.: Linear theory for filtering nonlinear multiscale systems with model error. Proc. R. Soc. A 20140168, 168 (2014)

    MATH  Google Scholar 

  2. Berry, T., Harlim, J.: Semiparametric modeling: correcting low-dimensional model error in parametric models. J. Comput. Phys. 308, 305–321 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  3. Berry, T., Harlim, J.: Correcting biased observation model error in data assimilation. Mon. Weather Rev. 145(7), 2833–2853 (2017)

    Article  Google Scholar 

  4. Chorin, A., Hald, O., Kupferman, R.: Optimal prediction with memory. Phys. D Nonlinear Phenom. 166(3), 239–257 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  5. Chorin, A., Stinis, P.: Problem reduction, renormalization, and memory. Commun. Appl. Math. Comput. Sci. 1(1), 1–27 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  6. Christmann, A., Steinwart, I.: Support Vector Machines. Springer, Berlin (2008)

    MATH  Google Scholar 

  7. Crommelin, D., Vanden-Eijnden, E.: Subgrid-scale parameterization with conditional Markov chains. J. Atmos. Sci. 65(8), 2661–2675 (2008)

    Article  Google Scholar 

  8. Fatkullin, I., Vanden-Eijnden, E.: A computational strategy for multiscale systems with applications to Lorenz 96 model. J. Comput. Phys. 200(2), 605–638 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  9. Frederiksen, J., O’Kane, T.: Entropy, closures and subgrid modeling. Entropy 10, 635–683 (2008)

    Article  Google Scholar 

  10. Givon, D., Kupferman, R., Stuart, A.: Extracting macroscopic dynamics: model problems and algorithms. Nonlinearity 17(6), R55 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  11. Gottwald, G.A., Harlim, J.: The role of additive and multiplicative noise in filtering complex dynamical systems. Proc. R. Soc. A Math. Phys. Eng. Sci. 469(2155), 20130096 (2013)

    MathSciNet  MATH  Google Scholar 

  12. Gouasmi, A., Parish, E.J., Duraisamy, K.: A priori estimation of memory effects in reduced-order models of nonlinear systems using the Mori–Zwanzig formalism. Proc. R. Soc. A Math. Phys. Eng. Sci. 473(2205), 20170385 (2017)

    MathSciNet  MATH  Google Scholar 

  13. Grabowski, W.: An improved framework for superparameterization. J. Atmos. Sci. 61, 1940–1952 (2004)

    Article  Google Scholar 

  14. Hamill, T.M.: Interpretation of rank histograms for verifying ensemble forecasts. Mon. Weather Rev. 129(3), 550–560 (2001)

    Article  Google Scholar 

  15. Harlim, J.: Data-Driven Computational Methods: Parameter and Operator Estimations. Cambridge University Press, Cambridge (2018)

    Book  MATH  Google Scholar 

  16. Harlim, J., Jiang, S., Liang, S., Yang, H.: Machine learning for prediction with missing dynamics. arXiv:1910.05861 (2019)

  17. Harlim, J., Li, X.: Parametric reduced models for the nonlinear Schrödinger equation. Phys. Rev. E. 91, 053306 (2015)

    Article  MathSciNet  Google Scholar 

  18. Harlim, J., Mahdi, A., Majda, A.: An ensemble Kalman filter for statistical estimation of physics constrained nonlinear regression models. J. Comput. Phys. 257(Part A), 782–812 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  19. Jiang, S.W., Harlim, J.: Parameter estimation with data-driven nonparametric likelihood functions. Entropy 21(6), 559 (2019)

    Article  MathSciNet  Google Scholar 

  20. Kerstein, A.: A linear-eddy model of turbulent scalar transport and mixing. Combust. Sci. Technol. 60(4–6), 391–421 (1988)

    Article  Google Scholar 

  21. Kerstein, A.: One-dimensional turbulence: model formulation and application to homogeneous turbulence, shear flows, and buoyant stratified flows. J. Fluid Mech. 392, 277–334 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  22. Khasminskii, R.: On averaging principle for Itô stochastic differential equations. Kybern. Chekhoslovakia 4(3), 260–279 (1968). (in Russian)

    Google Scholar 

  23. Khouider, B., Biello, J.A., Majda, A.J.: A stochastic multicloud model for tropical convection. Commun. Math. Sci. 8, 187–216 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  24. Khouider, B., St-Cyr, A., Majda, A., Tribbia, J.: The MJO and convectively coupled waves in a coarse-resolution GCM with a simple multicloud parameterization. J. Atmos. Sci. 68, 240–264 (2011)

    Article  Google Scholar 

  25. Kondrashov, D., Chekroun, M.D., Ghil, M.: Data-driven non-Markovian closure models. Phys. D Nonlinear Phenom. 297, 33–55 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  26. Kraichnan, R.H.: The structure of isotropic turbulence at very high Reynolds numbers. J. Fluid Mech. 5, 497–543 (1959)

    Article  MathSciNet  MATH  Google Scholar 

  27. Kravtsov, S., Kondrashov, D., Ghil, M.: Multilevel regression modeling of nonlinear processes: derivation and applications to climatic variability. J. Clim. 18(21), 4404–4424 (2005)

    Article  Google Scholar 

  28. Kurtz, T.: Semigroups of conditional shifts and approximations of Markov processes. Ann. Probab. 3, 618–642 (1975)

    Article  MATH  Google Scholar 

  29. Kwasniok, F.: Data-based stochastic subgrid-scale parametrization: an approach using cluster-weighted modelling. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 370(1962), 1061–1086 (2012)

    Article  Google Scholar 

  30. Lei, H., Baker, N.A., Li, X.: Data-driven parameterization of the generalized Langevin equation. Proc. Natl. Acad. Sci. 113(50), 14183–14188 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  31. Lorenz, E.: Predictability: a problem partly solved. In Seminar on Predictability, 4–8 September 1995, vol 1, pp. 1–18, Shinfield Park, Reading. ECMWF (1995)

  32. Lu, F., Lin, K., Chorin, A.: Comparison of continuous and discrete-time data-based modeling for hypoelliptic systems. Commun. Appl. Math. Comput. Sci. 11(2), 187–216 (2016)

    Article  MathSciNet  Google Scholar 

  33. Lu, F., Lin, K., Chorin, A.: Data-based stochastic model reduction for the Kuramoto–Sivashinsky equation. Phys. D Nonlinear Phenom. 340, 46–57 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  34. Lu, F., Tu, X., Chorin, A.J.: Accounting for model error from unresolved scales in ensemble kalman filters by stochastic parameterization. Mon. Weather Rev. 145(9), 3709–3723 (2017)

    Article  Google Scholar 

  35. Majda, A., Abramov, R.V., Grote, M.J.: Information Theory and Stochastics for Multiscale Nonlinear Systems, vol. 25. American Mathematical Society, Providence (2005)

    Book  MATH  Google Scholar 

  36. Majda, A., Grooms, I.: New perspectives on superparameterization for geophysical turbulence. J. Comput. Phys. 271, 60–77 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  37. Majda, A., Harlim, J.: Physics constrained nonlinear regression models for time series. Nonlinearity 26, 201–217 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  38. Majda, A., Timofeyev, I., Vanden-Eijnden, E.: Stochastic models for selected slow variables in large deterministic systems. Nonlinearity 19(4), 769 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  39. Majda, A., Tomofeyev, I.: Statistical mechanics for truncations of the Burgers-Hopf equation: a model for intrinsic stochastic behavior with scaling. Milan J. Math. 70(1), 39–96 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  40. Majda, A.J., Harlim, J.: Physics constrained nonlinear regression models for time series. Nonlinearity 26(1), 201 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  41. Majda, A.J., Timofeyev, I.: Remarkable statistical behavior for truncated Burgers-Hopf dynamics. Proc. Natl. Acad. Sci. 97(23), 12413–12417 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  42. Majda, A.J., Timofeyev, I., Eijnden, E.V.: Models for stochastic climate prediction. Proc. Natl. Acad. Sci. 96(26), 14687–14691 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  43. Majda, A.J., Timofeyev, I., Eijnden, E.: A mathematical framework for stochastic climate models. Commun. Pure Appl. Math. J. Issued Courant Inst. Math. Sci. 54(8), 891–974 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  44. Mori, H.: Transport, collective motion, and Brownian motion. Prog. Theor. Phys. 33, 423–450 (1965)

    Article  MATH  Google Scholar 

  45. Nemtsov, A., Averbuch, A., Schclar, A.: Matrix compression using the Nyström method. Intell. Data Anal. 20(5), 997–1019 (2016)

    Article  Google Scholar 

  46. Papanicolaou, G.C., et al.: Some probabilistic problems and methods in singular perturbations. Rocky Mt. J. Math. 6(4), 653–674 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  47. Pavliotis, G., Stuart, A.: Multiscale Methods: Averaging and Homogenization. Springer, Berlin (2008)

    MATH  Google Scholar 

  48. Song, L., Fukumizu, K., Gretton, A.: Kernel embeddings of conditional distributions: a unified kernel framework for nonparametric inference in graphical models. IEEE Signal Process. Mag. 30(4), 98–111 (2013)

    Article  Google Scholar 

  49. Song, L., Huang, J., Smola, A., Fukumizu, K.: Hilbert space embeddings of conditional distributions with applications to dynamical systems. In Proceedings of 26th Annual International Conference on Machine Learning, pp. 961–968. ACM (2009)

  50. Weinan, E., Engquist, B., Li, X., Ren, W., Vanden-Eijnden, E.: Heterogeneous multiscale methods: a review. Commun. Comput. Phys. 2(3), 367–450 (2007)

    MathSciNet  MATH  Google Scholar 

  51. Wilks, D.S.: Effects of stochastic parametrizations in the Lorenz’96 system. Q. J. R. Meteorol. Soc. 131(606), 389–407 (2005)

    Article  Google Scholar 

  52. Zhang, H., Harlim, J., Li, X.: Computing linear response statistics using orthogonal polynomial based estimators: An RKHS formulation. arXiv:1912.11110 (2019)

  53. Zwanzig, R.: Statistical mechanics of irreversiblity. Lect. Theor. Phys. 3, 106–141 (1961)

    Google Scholar 

  54. Zwanzig, R.: Nonlinear generalized Langevin equations. J. Stat. Phys. 9, 215–220 (1973)

    Article  Google Scholar 

Download references

Acknowledgements

It is a great pleasure to dedicate this paper to Andrew Majda on the occasion of his 70th birthday. The research of J.H. was partially supported by the ONR Grant N00014-16-1-2888, NSF Grants DMS-1619661 and DMS-1854299. S.W.J. was supported as a postdoctoral fellow under the ONR Grant N00014-16-1-2888.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to John Harlim.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Kernel mean embedding of conditional distributions

The purpose of this review is to verify Eq. (7). While the derivation here follows closely the description in [48, 49], we present a formulation with Mercer-type kernels induced by orthonormal basis of \(L^2\)-spaces. Some of the basic theory of RKHS can be found in many texts, such as [6].

First, let us repeat the discussion in Sect. 2.1 on \(\mathcal {Z}\). Let \(\mathcal {Z}\) be a compact set and define \(\hat{K}:\mathcal {Z}\times \mathcal {Z}\rightarrow \mathbb {R}\) to be a kernel, which means it is symmetric positive definite and let it be bounded. By Moore–Aronszajn theorem, there exists a unique Hilbert space \(\mathcal {H}_Z=\overline{\text{ span }\{\hat{K}(\textit{\textbf{z}},\cdot ),\forall \textit{\textbf{z}}\in \mathcal {Z}\}}\). Let \(\hat{q}:\mathcal {Z}\rightarrow \mathbb {R}\) be a positive weight function and \(\{\varphi _k\}_{k\ge 1}\) be a set of eigenfunctions corresponding to eigenvalues \(\{\xi _k\}\) of the following integral operator \(\mathcal {\hat{K}}:L^2(\mathcal {Z},\hat{q}) \rightarrow L^2(\mathcal {Z},\hat{q})\), defined as

$$\begin{aligned} \mathcal {\hat{K}} f(\textit{\textbf{z}}) := \int _{\mathcal {Z}} \hat{K}(\textit{\textbf{z}},\textit{\textbf{z}}') f(\textit{\textbf{z}}') \hat{q}(\textit{\textbf{z}}') \hbox {d}\textit{\textbf{z}}'. \end{aligned}$$
(34)

By Mercer’s theorem, the kernel \(\hat{K}\) has the following representation:

$$\begin{aligned} \hat{K}(\textit{\textbf{z}},\textit{\textbf{z}}') = \sum _{k=1}^{\infty } \xi _k \varphi _k(\textit{\textbf{z}})\varphi _k(\textit{\textbf{z}}'). \end{aligned}$$
(35)

We should point out that if \(\mathcal {Z}\) is not a compact domain such as \(\mathbb {R}^n\), with an exponentially decaying \(\hat{q}\), one can construct a bounded Mercer-type kernel as in (35) with an appropriate choice of decreasing sequence \(\{\xi _k\}\) (see Lemma 3.2 in [52]) and it is a reproducing kernel corresponding to the RKHS \(\mathcal {H}_Z\) (see Proposition 3.4 in [52]).

In this case, the RKHS \(\mathcal {H}_Z\) induced by the Mercer-type kernel in (35) is a subspace of \(L^2(\mathcal {Z},\hat{q})\) with the reproducing property corresponding to an inner product defined as \(\langle f,g\rangle _{\mathcal {H}_Z} = \sum _{k=1}^\infty \frac{f_k g_k}{\xi _k}\), for all \(f,g\in \mathcal {H}_Z\) where \(f_k = \langle f,\varphi _k\rangle _{L^2(\mathcal {Z},\hat{q})}\) and \(g_k = \langle g,\varphi _k\rangle _{L^2(\mathcal {Z},\hat{q})}\) . Then, for any \(f\in \mathcal {H}_Z\) and \(\textit{\textbf{z}}\in \mathcal {Z}\), we can represent

$$\begin{aligned} f(\textit{\textbf{z}}) = \langle f,\hat{K}(\textit{\textbf{z}},\cdot )\rangle _{\mathcal {H}_Z} = \sum _{k=1}^\infty \frac{f_k \xi _k \varphi _k(\textit{\textbf{z}})}{\xi _k} = \sum _{k=1}^\infty f_k \varphi _k(\textit{\textbf{z}}), \end{aligned}$$
(36)

with basis of \(L^2(\mathcal {Z},\hat{q})\), where the convergence of the series holds uniformly (or in \(C_0(\mathbb {R}^n)\) for non-compact \(\mathcal {Z}=\mathbb {R}^n\)).

We called the Hilbert space of functions, \(\mathcal {H}_Z\), as an RKHS induced by the orthonormal basis of \(L^2(\mathcal {Z},\hat{q})\). While we have discussed \(\mathcal {H}\) as an RKHS induced by the orthonormal basis of \(L^2(\mathcal {Y},q^{-1})\) in Sect. 2.1, we can also repeat the argument above and construct \(\mathcal {H}_Y\) as an RKHS induced by the orthonormal basis of \(L^2(\mathcal {Y},q)\). In this case, recall that while \(\{\psi _k q\}\) are orthogonal eigenbasis of the integral operator in (4), the orthogonal basis \(\psi _k\in L^2(\mathcal {Y},q)\) is eigenfunctions of an adjoint integral operator of (4). That is, one can verify that

$$\begin{aligned} \langle \psi _kq, \mathcal {K}^* \psi _k \rangle _{L^2(\mathcal {Y})} = \langle \mathcal {K}(\psi _kq), \psi _k\rangle _{L^2(\mathcal {Y})} = \lambda _k\langle \psi _kq,\psi _k\rangle _{L^2(\mathcal {Y})}, \end{aligned}$$
(37)

where for \(f\in L^2(\mathcal {Y},q)\),

$$\begin{aligned} \mathcal {K}^* f(x) := \int _\mathcal {Y} K^*(x,y) f(y) q(y)\,\hbox {d}y, \end{aligned}$$

and \(K^*(x,y)= q(x)^{-1}K(x,y)q^{-1}(y)\) is also a symmetric positive definite kernel. By Mercer’s theorem, one can write

$$\begin{aligned} K^*(y,y') = \sum _{k=1}^{\infty } \lambda _k \psi _k(y)\psi _k(y'). \end{aligned}$$
(38)

Let Y and Z be random variables on \(\mathcal {Y}\) and \(\mathcal {Z}\) with distribution P(YZ), we define the cross-covariance operators, \(\mathcal {C}_{YZ}:\mathcal {H}_Z\rightarrow \mathcal {H}_Y\) and \(\mathcal {C}_{ZZ}:\mathcal {H}_Z\rightarrow \mathcal {H}_Z\) as

$$\begin{aligned} \begin{aligned}&\mathcal {C}_{YZ} := \mathbb {E}_{YZ} [K^*(Y,\cdot )\otimes \hat{K}(Z,\cdot ) ], \\&\mathcal {C}_{ZZ} := \mathbb {E}_{Z} [\hat{K}(Z,\cdot )\otimes \hat{K}(Z,\cdot ) ]. \end{aligned} \end{aligned}$$
(39)

One can immediately see that for any \(f\in \mathcal {H}_Y\) and \(g\in \mathcal {H}_Z\),

$$\begin{aligned} \mathbb {E}_{YZ}[f(Y)\otimes g(Z)]= & {} \int _{\mathcal {Y}\times \mathcal {Z}} f(y)g(\textit{\textbf{z}}) \hbox {d}P(y,\textit{\textbf{z}})\nonumber \\= & {} \int _{\mathcal {Y}\times \mathcal {Z}} \langle f,K^*(y,\cdot )\rangle _{\mathcal {H}_Y} \langle g,\hat{K}(\textit{\textbf{z}},\cdot )\rangle _{\mathcal {H}_Z} \hbox {d}P(y,\textit{\textbf{z}}) \nonumber \\= & {} \int _{\mathcal {Y}\times \mathcal {Z}} \langle f\otimes g, K^*(y,\cdot )\otimes \hat{K}(\textit{\textbf{z}},\cdot )\rangle _{\mathcal {H}_Y\otimes \mathcal {H}_Z} \hbox {d}P(y,\textit{\textbf{z}})\nonumber \\= & {} \langle f\otimes g, \mathcal {C}_{YZ}\rangle _{\mathcal {H}_Y\otimes \mathcal {H}_Z}. \end{aligned}$$
(40)

Let us define feature maps \(\varPsi :\mathcal {Y}\rightarrow \mathcal {F}_Y\subset \ell _2\) and \(\varPhi :\mathcal {Z}\rightarrow \mathcal {F}_Z\subset \ell _2\), respectively,

$$\begin{aligned} \begin{aligned} \varPsi (y)&= (\sqrt{\lambda _1}\psi _1(y), \sqrt{\lambda _2}\psi _2(y),\ldots ), \\ \varPhi (\textit{\textbf{z}})&= (\sqrt{\xi _1}\varphi _1(\textit{\textbf{z}}), \sqrt{\xi _2}\varphi _2(\textit{\textbf{z}}),\ldots ). \end{aligned} \end{aligned}$$
(41)

Then, we can write

$$\begin{aligned} \hat{K}(\textit{\textbf{z}},\textit{\textbf{z}}')= & {} \langle \varPhi (\textit{\textbf{z}}),\varPhi (\textit{\textbf{z}}')\rangle _{\ell _2} = \langle \hat{K}(\textit{\textbf{z}},\cdot ),\hat{K}(\textit{\textbf{z}}',\cdot )\rangle _{\mathcal {H}_Z},\\ K^*(y,y')= & {} \langle \varPsi (y),\varPsi (y')\rangle _{\ell _2} = \langle K^*(y,\cdot ),K^*(y',\cdot )\rangle _{\mathcal {H}_Y}, \end{aligned}$$

where the inner products in \(\mathcal {H}_Z\) and \(\mathcal {H}_Y\) can be identified by \(\ell _2\) inner products in the corresponding feature spaces. Also, for any function \(f\in \mathcal {H}_Z\) and \(\textit{\textbf{z}}\in \mathcal {Z}\), we can rewrite the expansion in (36) as,

$$\begin{aligned} f(\textit{\textbf{z}})= & {} \langle f,\hat{K}(\textit{\textbf{z}},\cdot ) \rangle _{\mathcal {H}_Z} = \sum _{k=1}^\infty \langle f,\varphi _k \rangle _{L^2(\mathcal {Z},\hat{q})} \varphi _k(\textit{\textbf{z}})\nonumber \\= & {} \sum _{k=1}^\infty \frac{\langle f,\varphi _k \rangle _{L^2(\mathcal {Z},\hat{q})}}{\sqrt{\xi _k}} \varPhi _k(\textit{\textbf{z}}) = \sum _{k=1}^\infty \langle f, \varPhi _k\rangle _{\mathcal {H}_Z} \varPhi _k(\textit{\textbf{z}}), \end{aligned}$$
(42)

where we have defined the functions \(\varPhi _k = \sqrt{\xi _k}\varphi _k \in \mathcal {H}_Z\). For convenience of the discussion below, we also define the functions \(\varPsi _k:=\sqrt{\lambda _k}\psi _k\in \mathcal {H}_Y\).

Using the identity in (40), we can represent the cross-operators in (39) on the basis coordinates \(\varPsi _k \in \mathcal {H}_Y\) and \(\varPhi _\ell \in \mathcal {H}_Z\) as follows:

$$\begin{aligned} \begin{aligned} \left[ C_{YZ}\right] _{k\ell }&:=\mathbb {E}_{YZ}[\varPsi _k(Y)\otimes \varPhi _\ell (Z)] = \langle \varPsi _k\otimes \varPhi _\ell , \mathcal {C}_{YZ}\rangle _{\mathcal {H}_Y\otimes \mathcal {H}_Z},\\ \left[ C_{ZZ}\right] _{k\ell }&:=\mathbb {E}_{ZZ}[\varPhi _k(Z)\otimes \varPhi _\ell (Z)]=\langle \varPhi _k\otimes \varPhi _\ell , \mathcal {C}_{ZZ}\rangle _{\mathcal {H}_Z\otimes \mathcal {H}_Z}=\langle \varPhi _k, \mathcal {C}_{ZZ} \varPhi _\ell \rangle _{\mathcal {H}_Z} . \end{aligned}\nonumber \\ \end{aligned}$$
(43)

Thus, the components of the following matrix multiplication are given as

$$\begin{aligned} \left[ C_{YZ}C_{ZZ}^{-1} \right] _{k\ell }= & {} \sum _{j} \left[ C_{YZ}\right] _{kj}\left[ C_{ZZ}^{-1}\right] _{j\ell } \nonumber \\= & {} \sum _j \langle \varPsi _k\otimes \varPhi _j, \mathcal {C}_{YZ}\rangle _{\mathcal {H}_Y\otimes \mathcal {H}_Z}\langle \varPhi _j, \mathcal {C}_{ZZ}^{-1}\varPhi _\ell \rangle _{\mathcal {H}_Z} \nonumber \\= & {} \left\langle \mathcal {C}_{YZ}, \varPsi _k\otimes \left( \sum _j \langle \varPhi _j, \mathcal {C}_{ZZ}^{-1}\varPhi _\ell \rangle _{\mathcal {H_Z}} \varPhi _j \right) \right\rangle _{\mathcal {H}_Y\otimes \mathcal {H_Z}} \nonumber \\= & {} \left\langle \mathcal {C}_{YZ}, \varPsi _k \otimes \mathcal {C}_{ZZ}^{-1} \varPhi _\ell \right\rangle _{\mathcal {H}_Y\otimes \mathcal {H}_Z} \nonumber \\= & {} \left\langle \mathcal {C}_{YZ} \mathcal {C}_{ZZ}^{-1}, \varPsi _k \otimes \varPhi _\ell \right\rangle _{\mathcal {H}_Y\otimes \mathcal {H}_Z} \nonumber \\= & {} \left\langle \mathcal {C}_{YZ} \mathcal {C}_{ZZ}^{-1}\varPhi _\ell , \varPsi _k \right\rangle _{\mathcal {H}_Y}. \end{aligned}$$
(44)

To clarify this derivation, the second equality used the definition in (43), the fourth line used the fact that \(\mathcal {C}_{ZZ}^{-1}\varPsi _\ell \in \mathcal {H}_Z\) can be expanded as in (42), and the rest of the lines used the standard tensor identity.

The theory of kernel mean embedding of conditional distributions (see [48, 49]) suggests that

$$\begin{aligned} \mathbb {E}_{Y|\textit{\textbf{z}}} [\varPsi _k(Y)] = \langle \varPsi _k, \mathcal {C}_{YZ} \mathcal {C}_{ZZ}^{-1} \hat{K}(\textit{\textbf{z}},\cdot ) \rangle _{\mathcal {H}_Y}. \end{aligned}$$
(45)

Since \(\hat{K}(\textit{\textbf{z}},\cdot ) \in \mathcal {H}_Z\), we can employ the expansion in (42) and deduce

$$\begin{aligned} \mathbb {E}_{Y|\textit{\textbf{z}}} [\varPsi _k(Y)]= & {} \langle \varPsi _k, \mathcal {C}_{YZ} \mathcal {C}_{ZZ}^{-1} \sum _{j=1}^{\infty } \frac{\langle \hat{K}(\textit{\textbf{z}},\cdot ),\varphi _j\rangle _{L^2(\mathcal {Z},\hat{q})}}{\sqrt{\xi _j}}\varPhi _j \rangle _{\mathcal {H}_Y} \nonumber \\= & {} \sum _{j=1}^{\infty } \frac{\langle \hat{K}(\textit{\textbf{z}},\cdot ),\varphi _j\rangle _{L^2(\mathcal {Z},\hat{q})}}{\sqrt{\xi _j}} \langle \varPsi _k, \mathcal {C}_{YZ} \mathcal {C}_{ZZ}^{-1} \varPhi _j \rangle _{\mathcal {H}_Y} \nonumber \\= & {} \sum _{j=1}^{\infty } \frac{1}{\sqrt{\xi _j}}\left[ C_{YZ}C_{ZZ}^{-1} \right] _{kj} \int _{\mathcal {Z}} \hat{K}(\textit{\textbf{z}},\textit{\textbf{z}}') \varphi _j(\textit{\textbf{z}}')\hat{q}(\textit{\textbf{z}}')\,\hbox {d}\textit{\textbf{z}}' \nonumber \\= & {} \sum _{j=1}^{\infty } \left[ C_{YZ}C_{ZZ}^{-1} \right] _{kj} \varPhi _j(\textit{\textbf{z}}), \end{aligned}$$
(46)

where we have used (44) to deduce the third equality above and used the fact that \(\varphi _j\) and \(\xi _j\) are eigenfunction and eigenvalue of the integral operator in (34). Define,

$$\begin{aligned} \left[ \textit{\textbf{C}}_{{YZ}}\right] _{ks} =\mathbb {E}_{{YZ}} \left[ \psi _{k}(Y)\otimes \varphi _{s}(Z)\right] , \quad \quad \left[ \textit{\textbf{C}}_{{ZZ}}\right] _{sl} =\mathbb {E}_{{ZZ}} \left[ \varphi _{s}(Z)\otimes \varphi _{l}(Z)\right] , \end{aligned}$$

then from (43) and the definitions of the corresponding feature maps in (41),

$$\begin{aligned}&\left[ C_{{YZ}}\right] _{ks} = \sqrt{\lambda _k\xi _s}\left[ \textit{\textbf{C}}_{{YZ}}\right] _{ks}, \quad \quad \left[ C_{{ZZ}}\right] _{sl} = \sqrt{\xi _s\xi _l}\left[ \textit{\textbf{C}}_{{ZZ}}\right] _{sl}, \\&\quad \left[ C_{YZ}C_{ZZ}^{-1} \right] _{k\ell } = \frac{\sqrt{\lambda _k}}{\sqrt{\xi _l}} \left[ \textit{\textbf{C}}_{YZ}\textit{\textbf{C}}_{ZZ}^{-1} \right] _{k\ell }. \end{aligned}$$

Substituting the third equation above to (46) and using the definitions of the feature maps in (41), we obtain

$$\begin{aligned} \mathbb {E}_{Y|\textit{\textbf{z}}} [\psi _k(Y)] = \frac{1}{\sqrt{\lambda _k}} \sum _{j=1}^{\infty } \left[ C_{YZ}C_{ZZ}^{-1} \right] _{kj} \varPhi _j(\textit{\textbf{z}}) = \sum _{j=1}^{\infty } \left[ \textit{\textbf{C}}_{YZ}\textit{\textbf{C}}_{ZZ}^{-1} \right] _{kj} \varphi _j(\textit{\textbf{z}}), \end{aligned}$$

which is exactly the claim in (7).

Appendix B: ACV of the multiscale linear Gaussian model

The full model (18) and (19) can be rewritten as

$$\begin{aligned} \dot{x}= & {} \left( a_{11}x+a_{12}y\right) +\sigma _{x}\xi _{x},\nonumber \\ \dot{y}= & {} \frac{1}{\epsilon }\left( a_{21}x+a_{22}y\right) +\frac{\sigma _{y}}{\sqrt{\epsilon }}\xi _{y}, \end{aligned}$$
(47)

where \(\xi _{x}\) and \(\xi _{y}\) are independent standard Gaussian noises. Similarly, the closure model (25) can be rewritten as

$$\begin{aligned} \dot{x}_t=\left( a_{11}x_t+a_{12}\varSigma _{12}\varSigma _{22}^{-1}\textit{\textbf{x}}\right) +\sigma _{x}\xi _{x}, \end{aligned}$$
(48)

where \(\textit{\textbf{x}}:=\textit{\textbf{x}}_{t-m:t} = \left[ x_{t-m},x_{t-m+1},\ldots ,x_{t}\right] ^\top \) and \( \varSigma _{12}\) and \(\varSigma _{22}\) are defined in Eq. (26). To simplify the notation, we drop the time indices \(t-m:t\). We also drop the “hat”-notation in \(x_t\) and \(\textit{\textbf{x}}_t\) since we will use it to denote the Fourier coefficient in this section. In this “Appendix,” we prove that the autocovariance (ACV) function of the closure model (48) is approximately equal to that of the full model (47) for any value of \(\epsilon \).

The Fourier transform and inverse Fourier transform are defined as

$$\begin{aligned} \widehat{f}\left( \omega \right) =\int f\left( t\right) \hbox {e}^{-i\omega t}\hbox {d}t, f\left( t\right) =\frac{1}{2\pi }\int \widehat{f}\left( \omega \right) \hbox {e}^{i\omega t}\hbox {d}\omega . \end{aligned}$$

The Fourier transforms of variables x and y of the full model (47) can be obtained as

$$\begin{aligned} \widehat{x}= & {} \frac{\left( i\omega -\frac{1}{\epsilon }a_{22}\right) \sigma _{x}\widehat{\xi }_{x}+a_{12}\frac{\sigma _{y}}{\sqrt{\epsilon }}\widehat{ \xi }_{y}}{\left( i\omega -a_{11}\right) \left( i\omega -\frac{1}{\epsilon } a_{22}\right) -a_{12}\frac{1}{\epsilon }a_{21}}, \end{aligned}$$
(49)
$$\begin{aligned} \widehat{y}= & {} \frac{\left( i\omega -a_{11}\right) \frac{\sigma _{y}}{\sqrt{ \epsilon }}\widehat{\xi }_{y}+\frac{1}{\epsilon }a_{21}\sigma _{x}\widehat{ \xi }_{x}}{\left( i\omega -a_{11}\right) \left( i\omega -\frac{1}{\epsilon } a_{22}\right) -a_{12}\frac{1}{\epsilon }a_{21}}. \end{aligned}$$
(50)

Then, for the full model (47), the resulting spectrum of x is

$$\begin{aligned} \left| \widehat{x}\left( \omega \right) \right| ^{2}=\frac{\left( \omega ^{2}+c_{0}^{2}\right) \sigma _{x}^{2}\left| \widehat{\xi } _{x}\right| ^{2}+d_{0}^{2}\sigma _{y}^{2}\left| \widehat{\xi } _{y}\right| ^{2}}{\left( -\omega ^{2}+\omega _{0}^{2}\right) ^{2}+\gamma _{0}^{2}\omega ^{2}}, \end{aligned}$$

where

$$\begin{aligned} c_{0}= & {} \frac{a_{22}}{\epsilon },d_{0}=\frac{a_{12}}{\sqrt{ \epsilon }}, \omega _{0}=\sqrt{\frac{1}{\epsilon }\left( a_{11}a_{22}-a_{12}a_{21}\right) }, \\ \gamma _{0}= & {} a_{11}+\frac{1}{\epsilon }a_{22}, \left| \widehat{\xi }_{x}\right| ^{2}=1, \left| \widehat{\xi } _{y}\right| ^{2}=1. \end{aligned}$$

Now we compute the Fourier transform of the closure model (48),

$$\begin{aligned} i\omega \widehat{X}=a_{11}\widehat{X}+a_{12}\varSigma _{12}\varSigma _{22}^{-1} \left[ \begin{array}{c} 1 \\ \hbox {e}^{-i\omega \tau } \\ \vdots \\ \hbox {e}^{-i\omega m\tau } \end{array} \right] \widehat{X}+\sigma _{x}\widehat{\xi }_{x}, \end{aligned}$$
(51)

where \(\widehat{X}\) is the Fourier transform of \(x_t\) in Eq. (48). We need to simplify the quantity

\(\varSigma _{12}\varSigma _{22}^{-1}\left[ \begin{array}{cccc} 1&\hbox {e}^{-i\omega \tau }&\cdots&\hbox {e}^{-i\omega m\tau } \end{array} \right] ^\top \) in Eq. (51). Let \(S=\varSigma _{12}\varSigma _{22}^{-1}\) be the \(1\times \left( m+1\right) \) vector with components denoted by \(S\left[ n\right] \) for \(n=0,\ldots ,m\). Then, we can write

$$\begin{aligned} \varSigma _{12}\varSigma _{22}^{-1}\left[ \begin{array}{c} 1 \\ \hbox {e}^{-i\omega \tau } \\ \vdots \\ \hbox {e}^{-i\omega m\tau } \end{array} \right] =S\left[ \begin{array}{c} 1 \\ \hbox {e}^{-i\omega \tau } \\ \vdots \\ \hbox {e}^{-i\omega m\tau } \end{array} \right] =\sum _{n=0}^{m}S\left[ n\right] \hbox {e}^{-i\omega n\tau } := \widehat{S}_m\left( \omega \right) , \end{aligned}$$
(52)

which is nothing but the discrete Fourier transform of S. Notice that, for any \(n=0,\ldots ,m\),

$$\begin{aligned} \sum _{k=0}^m S\left[ k\right] \gamma _{xx,m}\left[ n-k\right] = \sum _{k=0}^m S\left[ k\right] \varSigma _{22}\left[ k, n\right] = \varSigma _{12}\left[ n\right] = \gamma _{xy,m} \left[ n\right] \end{aligned}$$
(53)

where the first equality is due to the fact that the process is stationary such that \(\varSigma _{22}[k,n] = \gamma _{xx,m}[n-k]\), the second equality is due to \(S\varSigma _{22}=\varSigma _{12}\), and the last equality is by the definition of the covariance function. By the discrete convolution theorem, we have

$$\begin{aligned} \widehat{S}_m\left( \omega \right) \widehat{\gamma }_{xx,m}\left( \omega \right) =\widehat{\gamma }_{xy,m} \left( \omega \right) , \end{aligned}$$
(54)

where \(\widehat{\gamma }_{xx,m}\) and \(\widehat{\gamma }_{xy,m}\) are the discrete Fourier transforms of \(\gamma _{xx,m}\) and \(\gamma _{xy,m}\), respectively. Substituting \( \widehat{S}_m\left( \omega \right) \) in Eq. (54) into Eq. (52), we obtain

$$\begin{aligned} \varSigma _{12}\varSigma _{22}^{-1}\left[ \begin{array}{c} 1 \\ \hbox {e}^{-i\omega \delta t} \\ \vdots \\ \hbox {e}^{-i\omega m\delta t} \end{array} \right] = \frac{\widehat{\gamma }_{xy,m}\left( \omega \right) }{\widehat{ \gamma }_{xx,m}\left( \omega \right) } \longrightarrow \frac{\widehat{\gamma }_{xy}\left( \omega \right) }{\widehat{\gamma }_{xy}\left( \omega \right) } \quad \text{ as } m\rightarrow \infty , \end{aligned}$$
(55)

where \(\widehat{\gamma }_{xx}\) and \(\widehat{\gamma }_{xy}\) denote the Fourier transform of the covariance functions \(\gamma _{xx}\) and \(\gamma _{xy}\).

Substituting the limiting case of Eq. (55) into Eq. (51), we can simplify the Fourier transform of the closure model as follows,

$$\begin{aligned} i\omega \widehat{X}=a_{11}\widehat{X}+a_{12}\frac{\widehat{\gamma } _{xy}\left( \omega \right) }{\widehat{\gamma }_{xx}\left( \omega \right) } \widehat{X}+\sigma _{x}\widehat{\xi }_{x}. \end{aligned}$$
(56)

Moreover, based on the Wiener–Khinchin theorem and the cross-correlation theorem, we can further simplify Eq. (56) as

$$\begin{aligned} i\omega \widehat{X}=a_{11}\widehat{X}+a_{12}\frac{\widehat{y}}{\widehat{x}} \widehat{X}+\sigma _{x}\widehat{\xi }_{x}. \end{aligned}$$
(57)

Substituting Eqs. (49) and (50) into Eq. (57), we obtain the Fourier transform of the relevant variable, \(\widehat{X}\), of the closure model,

$$\begin{aligned} \widehat{X}=\frac{\left( i\omega -\frac{1}{\epsilon }a_{22}\right) \sigma _{x}\widehat{\xi }_{x}+a_{12}\frac{\sigma _{y}}{\sqrt{\epsilon }}\widehat{ \xi }_{y}}{\left( i\omega -a_{11}\right) \left( i\omega -\frac{1}{\epsilon } a_{22}\right) -a_{12}\frac{1}{\epsilon }a_{21}}, \end{aligned}$$
(58)

which is the same as the \(\widehat{x}\) of the full model in Eq. (49). Therefore, the ACV of the closure model (48) is consistent with that of the full model (47) in the limit of \(m\rightarrow \infty \). In the numerics, the error comes from the truncation of finite number of memory terms in Eq. (52).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, S.W., Harlim, J. Modeling of missing dynamical systems: deriving parametric models using a nonparametric framework. Res Math Sci 7, 16 (2020). https://doi.org/10.1007/s40687-020-00217-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s40687-020-00217-4

Keywords

Mathematics Subject Classification

Navigation