Abstract
In this paper, we consider modeling missing dynamics with a nonparametric non-Markovian model, constructed using the theory of kernel embedding of conditional distributions on appropriate reproducing kernel Hilbert spaces (RKHS), equipped with orthonormal basis functions. Depending on the choice of the basis functions, the resulting closure model from this nonparametric modeling formulation is in the form of parametric model. This suggests that the success of various parametric modeling approaches that were proposed in various domains of applications can be understood through the RKHS representations. When the missing dynamical terms evolve faster than the relevant observable of interest, the proposed approach is consistent with the effective dynamics derived from the classical averaging theory. In the linear Gaussian case without the time-scale gap, we will show that the proposed non-Markovian model with a very long memory yields an accurate estimation of the nontrivial autocovariance function for the relevant variable of the full dynamics. The supporting numerical results on instructive nonlinear dynamics show that the proposed approach is able to replicate high-dimensional missing dynamical terms on problems with and without the separation of temporal scales.
Similar content being viewed by others
References
Berry, T., Harlim, J.: Linear theory for filtering nonlinear multiscale systems with model error. Proc. R. Soc. A 20140168, 168 (2014)
Berry, T., Harlim, J.: Semiparametric modeling: correcting low-dimensional model error in parametric models. J. Comput. Phys. 308, 305–321 (2016)
Berry, T., Harlim, J.: Correcting biased observation model error in data assimilation. Mon. Weather Rev. 145(7), 2833–2853 (2017)
Chorin, A., Hald, O., Kupferman, R.: Optimal prediction with memory. Phys. D Nonlinear Phenom. 166(3), 239–257 (2002)
Chorin, A., Stinis, P.: Problem reduction, renormalization, and memory. Commun. Appl. Math. Comput. Sci. 1(1), 1–27 (2007)
Christmann, A., Steinwart, I.: Support Vector Machines. Springer, Berlin (2008)
Crommelin, D., Vanden-Eijnden, E.: Subgrid-scale parameterization with conditional Markov chains. J. Atmos. Sci. 65(8), 2661–2675 (2008)
Fatkullin, I., Vanden-Eijnden, E.: A computational strategy for multiscale systems with applications to Lorenz 96 model. J. Comput. Phys. 200(2), 605–638 (2004)
Frederiksen, J., O’Kane, T.: Entropy, closures and subgrid modeling. Entropy 10, 635–683 (2008)
Givon, D., Kupferman, R., Stuart, A.: Extracting macroscopic dynamics: model problems and algorithms. Nonlinearity 17(6), R55 (2004)
Gottwald, G.A., Harlim, J.: The role of additive and multiplicative noise in filtering complex dynamical systems. Proc. R. Soc. A Math. Phys. Eng. Sci. 469(2155), 20130096 (2013)
Gouasmi, A., Parish, E.J., Duraisamy, K.: A priori estimation of memory effects in reduced-order models of nonlinear systems using the Mori–Zwanzig formalism. Proc. R. Soc. A Math. Phys. Eng. Sci. 473(2205), 20170385 (2017)
Grabowski, W.: An improved framework for superparameterization. J. Atmos. Sci. 61, 1940–1952 (2004)
Hamill, T.M.: Interpretation of rank histograms for verifying ensemble forecasts. Mon. Weather Rev. 129(3), 550–560 (2001)
Harlim, J.: Data-Driven Computational Methods: Parameter and Operator Estimations. Cambridge University Press, Cambridge (2018)
Harlim, J., Jiang, S., Liang, S., Yang, H.: Machine learning for prediction with missing dynamics. arXiv:1910.05861 (2019)
Harlim, J., Li, X.: Parametric reduced models for the nonlinear Schrödinger equation. Phys. Rev. E. 91, 053306 (2015)
Harlim, J., Mahdi, A., Majda, A.: An ensemble Kalman filter for statistical estimation of physics constrained nonlinear regression models. J. Comput. Phys. 257(Part A), 782–812 (2014)
Jiang, S.W., Harlim, J.: Parameter estimation with data-driven nonparametric likelihood functions. Entropy 21(6), 559 (2019)
Kerstein, A.: A linear-eddy model of turbulent scalar transport and mixing. Combust. Sci. Technol. 60(4–6), 391–421 (1988)
Kerstein, A.: One-dimensional turbulence: model formulation and application to homogeneous turbulence, shear flows, and buoyant stratified flows. J. Fluid Mech. 392, 277–334 (1999)
Khasminskii, R.: On averaging principle for Itô stochastic differential equations. Kybern. Chekhoslovakia 4(3), 260–279 (1968). (in Russian)
Khouider, B., Biello, J.A., Majda, A.J.: A stochastic multicloud model for tropical convection. Commun. Math. Sci. 8, 187–216 (2010)
Khouider, B., St-Cyr, A., Majda, A., Tribbia, J.: The MJO and convectively coupled waves in a coarse-resolution GCM with a simple multicloud parameterization. J. Atmos. Sci. 68, 240–264 (2011)
Kondrashov, D., Chekroun, M.D., Ghil, M.: Data-driven non-Markovian closure models. Phys. D Nonlinear Phenom. 297, 33–55 (2015)
Kraichnan, R.H.: The structure of isotropic turbulence at very high Reynolds numbers. J. Fluid Mech. 5, 497–543 (1959)
Kravtsov, S., Kondrashov, D., Ghil, M.: Multilevel regression modeling of nonlinear processes: derivation and applications to climatic variability. J. Clim. 18(21), 4404–4424 (2005)
Kurtz, T.: Semigroups of conditional shifts and approximations of Markov processes. Ann. Probab. 3, 618–642 (1975)
Kwasniok, F.: Data-based stochastic subgrid-scale parametrization: an approach using cluster-weighted modelling. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 370(1962), 1061–1086 (2012)
Lei, H., Baker, N.A., Li, X.: Data-driven parameterization of the generalized Langevin equation. Proc. Natl. Acad. Sci. 113(50), 14183–14188 (2016)
Lorenz, E.: Predictability: a problem partly solved. In Seminar on Predictability, 4–8 September 1995, vol 1, pp. 1–18, Shinfield Park, Reading. ECMWF (1995)
Lu, F., Lin, K., Chorin, A.: Comparison of continuous and discrete-time data-based modeling for hypoelliptic systems. Commun. Appl. Math. Comput. Sci. 11(2), 187–216 (2016)
Lu, F., Lin, K., Chorin, A.: Data-based stochastic model reduction for the Kuramoto–Sivashinsky equation. Phys. D Nonlinear Phenom. 340, 46–57 (2017)
Lu, F., Tu, X., Chorin, A.J.: Accounting for model error from unresolved scales in ensemble kalman filters by stochastic parameterization. Mon. Weather Rev. 145(9), 3709–3723 (2017)
Majda, A., Abramov, R.V., Grote, M.J.: Information Theory and Stochastics for Multiscale Nonlinear Systems, vol. 25. American Mathematical Society, Providence (2005)
Majda, A., Grooms, I.: New perspectives on superparameterization for geophysical turbulence. J. Comput. Phys. 271, 60–77 (2014)
Majda, A., Harlim, J.: Physics constrained nonlinear regression models for time series. Nonlinearity 26, 201–217 (2013)
Majda, A., Timofeyev, I., Vanden-Eijnden, E.: Stochastic models for selected slow variables in large deterministic systems. Nonlinearity 19(4), 769 (2006)
Majda, A., Tomofeyev, I.: Statistical mechanics for truncations of the Burgers-Hopf equation: a model for intrinsic stochastic behavior with scaling. Milan J. Math. 70(1), 39–96 (2002)
Majda, A.J., Harlim, J.: Physics constrained nonlinear regression models for time series. Nonlinearity 26(1), 201 (2012)
Majda, A.J., Timofeyev, I.: Remarkable statistical behavior for truncated Burgers-Hopf dynamics. Proc. Natl. Acad. Sci. 97(23), 12413–12417 (2000)
Majda, A.J., Timofeyev, I., Eijnden, E.V.: Models for stochastic climate prediction. Proc. Natl. Acad. Sci. 96(26), 14687–14691 (1999)
Majda, A.J., Timofeyev, I., Eijnden, E.: A mathematical framework for stochastic climate models. Commun. Pure Appl. Math. J. Issued Courant Inst. Math. Sci. 54(8), 891–974 (2001)
Mori, H.: Transport, collective motion, and Brownian motion. Prog. Theor. Phys. 33, 423–450 (1965)
Nemtsov, A., Averbuch, A., Schclar, A.: Matrix compression using the Nyström method. Intell. Data Anal. 20(5), 997–1019 (2016)
Papanicolaou, G.C., et al.: Some probabilistic problems and methods in singular perturbations. Rocky Mt. J. Math. 6(4), 653–674 (1976)
Pavliotis, G., Stuart, A.: Multiscale Methods: Averaging and Homogenization. Springer, Berlin (2008)
Song, L., Fukumizu, K., Gretton, A.: Kernel embeddings of conditional distributions: a unified kernel framework for nonparametric inference in graphical models. IEEE Signal Process. Mag. 30(4), 98–111 (2013)
Song, L., Huang, J., Smola, A., Fukumizu, K.: Hilbert space embeddings of conditional distributions with applications to dynamical systems. In Proceedings of 26th Annual International Conference on Machine Learning, pp. 961–968. ACM (2009)
Weinan, E., Engquist, B., Li, X., Ren, W., Vanden-Eijnden, E.: Heterogeneous multiscale methods: a review. Commun. Comput. Phys. 2(3), 367–450 (2007)
Wilks, D.S.: Effects of stochastic parametrizations in the Lorenz’96 system. Q. J. R. Meteorol. Soc. 131(606), 389–407 (2005)
Zhang, H., Harlim, J., Li, X.: Computing linear response statistics using orthogonal polynomial based estimators: An RKHS formulation. arXiv:1912.11110 (2019)
Zwanzig, R.: Statistical mechanics of irreversiblity. Lect. Theor. Phys. 3, 106–141 (1961)
Zwanzig, R.: Nonlinear generalized Langevin equations. J. Stat. Phys. 9, 215–220 (1973)
Acknowledgements
It is a great pleasure to dedicate this paper to Andrew Majda on the occasion of his 70th birthday. The research of J.H. was partially supported by the ONR Grant N00014-16-1-2888, NSF Grants DMS-1619661 and DMS-1854299. S.W.J. was supported as a postdoctoral fellow under the ONR Grant N00014-16-1-2888.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Kernel mean embedding of conditional distributions
The purpose of this review is to verify Eq. (7). While the derivation here follows closely the description in [48, 49], we present a formulation with Mercer-type kernels induced by orthonormal basis of \(L^2\)-spaces. Some of the basic theory of RKHS can be found in many texts, such as [6].
First, let us repeat the discussion in Sect. 2.1 on \(\mathcal {Z}\). Let \(\mathcal {Z}\) be a compact set and define \(\hat{K}:\mathcal {Z}\times \mathcal {Z}\rightarrow \mathbb {R}\) to be a kernel, which means it is symmetric positive definite and let it be bounded. By Moore–Aronszajn theorem, there exists a unique Hilbert space \(\mathcal {H}_Z=\overline{\text{ span }\{\hat{K}(\textit{\textbf{z}},\cdot ),\forall \textit{\textbf{z}}\in \mathcal {Z}\}}\). Let \(\hat{q}:\mathcal {Z}\rightarrow \mathbb {R}\) be a positive weight function and \(\{\varphi _k\}_{k\ge 1}\) be a set of eigenfunctions corresponding to eigenvalues \(\{\xi _k\}\) of the following integral operator \(\mathcal {\hat{K}}:L^2(\mathcal {Z},\hat{q}) \rightarrow L^2(\mathcal {Z},\hat{q})\), defined as
By Mercer’s theorem, the kernel \(\hat{K}\) has the following representation:
We should point out that if \(\mathcal {Z}\) is not a compact domain such as \(\mathbb {R}^n\), with an exponentially decaying \(\hat{q}\), one can construct a bounded Mercer-type kernel as in (35) with an appropriate choice of decreasing sequence \(\{\xi _k\}\) (see Lemma 3.2 in [52]) and it is a reproducing kernel corresponding to the RKHS \(\mathcal {H}_Z\) (see Proposition 3.4 in [52]).
In this case, the RKHS \(\mathcal {H}_Z\) induced by the Mercer-type kernel in (35) is a subspace of \(L^2(\mathcal {Z},\hat{q})\) with the reproducing property corresponding to an inner product defined as \(\langle f,g\rangle _{\mathcal {H}_Z} = \sum _{k=1}^\infty \frac{f_k g_k}{\xi _k}\), for all \(f,g\in \mathcal {H}_Z\) where \(f_k = \langle f,\varphi _k\rangle _{L^2(\mathcal {Z},\hat{q})}\) and \(g_k = \langle g,\varphi _k\rangle _{L^2(\mathcal {Z},\hat{q})}\) . Then, for any \(f\in \mathcal {H}_Z\) and \(\textit{\textbf{z}}\in \mathcal {Z}\), we can represent
with basis of \(L^2(\mathcal {Z},\hat{q})\), where the convergence of the series holds uniformly (or in \(C_0(\mathbb {R}^n)\) for non-compact \(\mathcal {Z}=\mathbb {R}^n\)).
We called the Hilbert space of functions, \(\mathcal {H}_Z\), as an RKHS induced by the orthonormal basis of \(L^2(\mathcal {Z},\hat{q})\). While we have discussed \(\mathcal {H}\) as an RKHS induced by the orthonormal basis of \(L^2(\mathcal {Y},q^{-1})\) in Sect. 2.1, we can also repeat the argument above and construct \(\mathcal {H}_Y\) as an RKHS induced by the orthonormal basis of \(L^2(\mathcal {Y},q)\). In this case, recall that while \(\{\psi _k q\}\) are orthogonal eigenbasis of the integral operator in (4), the orthogonal basis \(\psi _k\in L^2(\mathcal {Y},q)\) is eigenfunctions of an adjoint integral operator of (4). That is, one can verify that
where for \(f\in L^2(\mathcal {Y},q)\),
and \(K^*(x,y)= q(x)^{-1}K(x,y)q^{-1}(y)\) is also a symmetric positive definite kernel. By Mercer’s theorem, one can write
Let Y and Z be random variables on \(\mathcal {Y}\) and \(\mathcal {Z}\) with distribution P(Y, Z), we define the cross-covariance operators, \(\mathcal {C}_{YZ}:\mathcal {H}_Z\rightarrow \mathcal {H}_Y\) and \(\mathcal {C}_{ZZ}:\mathcal {H}_Z\rightarrow \mathcal {H}_Z\) as
One can immediately see that for any \(f\in \mathcal {H}_Y\) and \(g\in \mathcal {H}_Z\),
Let us define feature maps \(\varPsi :\mathcal {Y}\rightarrow \mathcal {F}_Y\subset \ell _2\) and \(\varPhi :\mathcal {Z}\rightarrow \mathcal {F}_Z\subset \ell _2\), respectively,
Then, we can write
where the inner products in \(\mathcal {H}_Z\) and \(\mathcal {H}_Y\) can be identified by \(\ell _2\) inner products in the corresponding feature spaces. Also, for any function \(f\in \mathcal {H}_Z\) and \(\textit{\textbf{z}}\in \mathcal {Z}\), we can rewrite the expansion in (36) as,
where we have defined the functions \(\varPhi _k = \sqrt{\xi _k}\varphi _k \in \mathcal {H}_Z\). For convenience of the discussion below, we also define the functions \(\varPsi _k:=\sqrt{\lambda _k}\psi _k\in \mathcal {H}_Y\).
Using the identity in (40), we can represent the cross-operators in (39) on the basis coordinates \(\varPsi _k \in \mathcal {H}_Y\) and \(\varPhi _\ell \in \mathcal {H}_Z\) as follows:
Thus, the components of the following matrix multiplication are given as
To clarify this derivation, the second equality used the definition in (43), the fourth line used the fact that \(\mathcal {C}_{ZZ}^{-1}\varPsi _\ell \in \mathcal {H}_Z\) can be expanded as in (42), and the rest of the lines used the standard tensor identity.
The theory of kernel mean embedding of conditional distributions (see [48, 49]) suggests that
Since \(\hat{K}(\textit{\textbf{z}},\cdot ) \in \mathcal {H}_Z\), we can employ the expansion in (42) and deduce
where we have used (44) to deduce the third equality above and used the fact that \(\varphi _j\) and \(\xi _j\) are eigenfunction and eigenvalue of the integral operator in (34). Define,
then from (43) and the definitions of the corresponding feature maps in (41),
Substituting the third equation above to (46) and using the definitions of the feature maps in (41), we obtain
which is exactly the claim in (7).
Appendix B: ACV of the multiscale linear Gaussian model
The full model (18) and (19) can be rewritten as
where \(\xi _{x}\) and \(\xi _{y}\) are independent standard Gaussian noises. Similarly, the closure model (25) can be rewritten as
where \(\textit{\textbf{x}}:=\textit{\textbf{x}}_{t-m:t} = \left[ x_{t-m},x_{t-m+1},\ldots ,x_{t}\right] ^\top \) and \( \varSigma _{12}\) and \(\varSigma _{22}\) are defined in Eq. (26). To simplify the notation, we drop the time indices \(t-m:t\). We also drop the “hat”-notation in \(x_t\) and \(\textit{\textbf{x}}_t\) since we will use it to denote the Fourier coefficient in this section. In this “Appendix,” we prove that the autocovariance (ACV) function of the closure model (48) is approximately equal to that of the full model (47) for any value of \(\epsilon \).
The Fourier transform and inverse Fourier transform are defined as
The Fourier transforms of variables x and y of the full model (47) can be obtained as
Then, for the full model (47), the resulting spectrum of x is
where
Now we compute the Fourier transform of the closure model (48),
where \(\widehat{X}\) is the Fourier transform of \(x_t\) in Eq. (48). We need to simplify the quantity
\(\varSigma _{12}\varSigma _{22}^{-1}\left[ \begin{array}{cccc} 1&\hbox {e}^{-i\omega \tau }&\cdots&\hbox {e}^{-i\omega m\tau } \end{array} \right] ^\top \) in Eq. (51). Let \(S=\varSigma _{12}\varSigma _{22}^{-1}\) be the \(1\times \left( m+1\right) \) vector with components denoted by \(S\left[ n\right] \) for \(n=0,\ldots ,m\). Then, we can write
which is nothing but the discrete Fourier transform of S. Notice that, for any \(n=0,\ldots ,m\),
where the first equality is due to the fact that the process is stationary such that \(\varSigma _{22}[k,n] = \gamma _{xx,m}[n-k]\), the second equality is due to \(S\varSigma _{22}=\varSigma _{12}\), and the last equality is by the definition of the covariance function. By the discrete convolution theorem, we have
where \(\widehat{\gamma }_{xx,m}\) and \(\widehat{\gamma }_{xy,m}\) are the discrete Fourier transforms of \(\gamma _{xx,m}\) and \(\gamma _{xy,m}\), respectively. Substituting \( \widehat{S}_m\left( \omega \right) \) in Eq. (54) into Eq. (52), we obtain
where \(\widehat{\gamma }_{xx}\) and \(\widehat{\gamma }_{xy}\) denote the Fourier transform of the covariance functions \(\gamma _{xx}\) and \(\gamma _{xy}\).
Substituting the limiting case of Eq. (55) into Eq. (51), we can simplify the Fourier transform of the closure model as follows,
Moreover, based on the Wiener–Khinchin theorem and the cross-correlation theorem, we can further simplify Eq. (56) as
Substituting Eqs. (49) and (50) into Eq. (57), we obtain the Fourier transform of the relevant variable, \(\widehat{X}\), of the closure model,
which is the same as the \(\widehat{x}\) of the full model in Eq. (49). Therefore, the ACV of the closure model (48) is consistent with that of the full model (47) in the limit of \(m\rightarrow \infty \). In the numerics, the error comes from the truncation of finite number of memory terms in Eq. (52).
Rights and permissions
About this article
Cite this article
Jiang, S.W., Harlim, J. Modeling of missing dynamical systems: deriving parametric models using a nonparametric framework. Res Math Sci 7, 16 (2020). https://doi.org/10.1007/s40687-020-00217-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s40687-020-00217-4