1 Introduction

A nonlinear functional is a map from a space of functions into the real line or the complex plane. Such map, which seems a rather abstract mathematical concept, plays a fundamental role in many areas of mathematical physics and applied sciences. In fact, nonlinear functionals were used, for example, by Wiener to describe Brownian motion mathematically [100], by Hohenberg and Kohn [47] to reduce the dimensionality of the Schrödinger equation in many-body quantum systems [56, 67], by Hopf to describe the statistical properties of turbulence [2, 48, 63], and by Bogoliubov to model systems of interacting bosons in superfluid liquid helium [12, 84]. Applications of nonlinear functionals to other areas of mathematical physics can be found in [3, 37, 52, 54, 59, 92].

Nonlinear functionals have also appeared in evolution equations known as functional differential equations (FDEs) [93]. A classical example in fluid dynamics is the Hopf equation [48, 66, 77]

$$\begin{aligned} \frac{\partial \Phi ([ \theta ],t)}{\partial t}= \sum _{k=1}^3\int _V\theta _k( x)\left( i \sum _{j=1}^3\frac{\partial }{\partial x_j} \frac{\delta ^2 \Phi ([ \theta ],t)}{\delta \theta _k( x)\delta \theta _j( x)} +\nu \nabla ^2\frac{\delta \Phi ([ \theta ],t)}{\delta \theta _k( x)}\right) \mathrm {d}x, \end{aligned}$$
(1)

which governs the dynamics of the characteristic functional

$$\begin{aligned} \Phi ([ \theta ],t)={\mathbb {E}}\left\{ \exp \left( i\int _{V} u (x,t)\cdot \theta (x)\mathrm {d}x\right) \right\} . \end{aligned}$$
(2)

Here, u(xt) represents a stochastic solution to the Navier–Stokes equation corresponding to a random initial state, and \({\mathbb {E}}\{\cdot \}\) is the expectation over the probability measure of such random initial state. Remarkably, the complex-valued nonlinear functional (2) encodes all statistical information of the stochastic solution to the Navier–Stokes equation. For this reason, Eq. (1) was deemed by Monin and Yaglom ( [63, Ch. 10]) to be “the most compact formulation of the general turbulence problem”, which is the problem of determining the statistical properties of the velocity and the pressure fields of the Navier–Stokes equations given statistical information on the initial state.Footnote 1 Another well-known example of functional differential equation is the Schwinger–Dyson equation of quantum field theory [68, 105]. Such equation describes the dynamics of the generating functional of the Green functions of a quantum field theory, allowing us to propagate field interactions in a perturbation setting (e.g., with Feynman diagrams), or in a strong coupling regime. The Schwinger–Dyson functional formalism is also useful in studying statistical dynamics of classical systems described in terms of stochastic ordinary or partial differential equationsFootnote 2 [52, 59, 69]. More recently, FDEs appeared in mean-field games [17] and mean-field optimal control [31, 79]. Mean-field games are optimization problems involving a very large (potentially infinite) number of interacting players. In some cases, it is possible to reformulate such optimization problems in terms of a nonlinear Hamilton–Jacobi FDE in probability density space. The standard form of such equation is [20]

$$\begin{aligned} \frac{\partial F([\rho ],t)}{\partial t}+ {\mathcal {H}}\left( [\rho ],\left[ \frac{\delta F([\rho ],t)}{\delta \rho ( x)}\right] \right) =0, \qquad F([\rho ],0)=F_0([\rho ]), \end{aligned}$$
(3)

where \(\rho (x)\) is a n-dimensional probability density function supported on \(\Omega \subseteq {\mathbb {R}}^n\), \(\delta F /\delta \rho (x)\) is the first-order functional derivative of F relative to \(\rho (x)\), and \({\mathcal {H}}\) is the Hamilton functional

$$\begin{aligned} {\mathcal {H}}\left( [\rho ],\left[ \frac{\delta F([\rho ],t)}{\delta \rho ( x)}\right] \right) = \int _{\Omega } {\Psi }\left( x,\nabla \frac{\delta F([\rho ],t)}{\delta \rho (x)} \right) \rho (x)\mathrm{d}x + G([\rho ]). \end{aligned}$$
(4)

Here, \(\Psi \) is a Hamilton function, and \(G([\rho ])\) is an interaction potential. More general FDEs of the type (3) have been recently derived in the context of unnormalized optimal PDF transport [40]. Mean-field theory is also useful in optimal feedback control of nonlinear stochastic dynamical systems and in deep learning. For instance, recent work of W. E and collaborators [31] laid the mathematical foundations of the population risk minimization problem in deep learning as a mean-field optimal control problem. Such mean-field optimal control problem yields a generalized version of the Hamilton–Jacobi–Bellman equation in a Wasserstein space, which is a nonlinear FDE (see Eq. (20) in [31]).

Computing accurate approximations of the solution to FDEs such as (1) or (3) is a long-standing problem in mathematical physics. In a recent Physics Report [93], we reviewed state-of-the-art methods to approximate nonlinear functionals and FDEs. In particular, we discussed an approximation method, known as “cylindrical approximation”, in which nonlinear functionals and FDEs defined on function spaces admitting a basis are approximated by multivariate functions and multivariate partial differential equations (PDEs), respectively. The idea is, if a function space admits a basis, then any function in the space can be represented uniquely by projection coefficients onto the basis. Accordingly, nonlinear functionals defined on such a function space can be represented as multivariate functions of the coefficients. The objective of this paper is to provide a rigorous mathematical foundation for cylindrical approximations to nonlinear functionals, functional derivatives, and FDEs defined on Banach spaces admitting a basis. The purpose of this analysis is twofold: first, we prove that cylindrical approximations converge uniformly on compact subsets of real Banach spaces admitting a basis. Second, we prove that the convergence rate can be exponential in the number of projection coefficients. We also provide necessary and sufficient conditions for consistency, stability and convergence of cylindrical approximations to FDEs based on the Trotter–Kato approximation theorem [33, 41].

This paper is organized as follows. In Sect. 2 we briefly review the theory of nonlinear functionals defined on a Banach space, and recall the notions of continuity, compactness and differentiability. In Sect. 3 we specialize these concepts to nonlinear functionals defined on a real separable Hilbert space H. In Sect. 4 we introduce cylindrical approximations of nonlinear functionals and functional derivatives defined on a Hilbert space. Uniform convergence for both approximations is established in Sects. 5 and 6, provided the functional (or functional derivative) is defined on a compact subset of H. We also show that cylindrical approximations can converge exponentially fast for Fréchet differentiable functionals. In Sect. 7 we develop a self-consistent convergence analysis of cylindrical approximations to linear FDEs in compact subsets of real separable Hilbert spaces. In Sect. 8 we outline the extension of the functional approximation theory we developed in Hilbert spaces to compact subsets of real Banach spaces admitting a basis. In Sect. 9 we provide numerical examples demonstrating convergence of cylindrical approximations of nonlinear functionals and a linear FDE. In particular, we study the Hopf equation corresponding to a linear advection problem evolving from a random initial state. The main findings are summarized in Sect. 10. We also include two brief Appendices where we discuss cylindrical approximations of functional integrals in real separable Hilbert spaces, and the notion of distance between function spaces.

2 Nonlinear functionals in Banach spaces

Let X be a Banach space. A nonlinear functional on X is a map F from X into a field \({\mathbb {F}}\). In this paper, \({\mathbb {F}}\) will either be the real line (\({\mathbb {R}}\)) or the complex plane (\({\mathbb {C}}\)). In general, the functional F does not operate on the entire Banach space X but rather on a subset set of X, which we denote as \(D(F) \subseteq X\) (domain of the functional)

$$\begin{aligned} F: D(F)\subseteq X \rightarrow {\mathbb {F}}. \end{aligned}$$
(5)

As an example, consider

$$\begin{aligned} F([\theta ])=\int _0^1 x^3 e^{\theta (x)+\theta '(x)}\mathrm{d}x, \qquad \theta \in D(F)=C^{(1)}([0,1]), \end{aligned}$$
(6)

where \(C^{(1)}([0,1])\) is the space of continuously differentiable real-valued functions defined on [0, 1]. The map (6) associates to each function \(\theta \in C^{(1)}([0,1])\) the real number \(F([\theta ])\). Analysis of nonlinear functionals in Banach spaces is a well-developed subject [35, 49, 65, 83, 90]. In particular, classical definitions of continuity and differentiability that hold for real-valued functions can be extended to functionals. For instance,

Definition 2.1

(Pointwise continuity of functionals) A nonlinear functional \(F: D(F)\subseteq X \rightarrow {\mathbb {F}}\) is continuous at a point \(\theta \in D(F)\) if for any Cauchy sequence \(\{\theta _1, \theta _2, \ldots \}\) in D(F) converging to \(\theta \) (in the metric of X) we have that the sequence \(\{F([\theta _1]), F([\theta _1]), \ldots \}\) converges to \(F([\theta ])\) (in the metric of \({\mathbb {F}}\)), i.e.,

$$\begin{aligned} \lim _{n\rightarrow \infty } \left\| \theta _n(x)-\theta (x)\right\| _X=0 \quad \Rightarrow \quad \lim _{n\rightarrow \infty } \left| F([\theta _n])-F([\theta ])\right| = 0. \end{aligned}$$
(7)

Example 1 The functional (6) is continuous at every point \(\theta \in C^{(1)}([0,1])\) relative to the norm

$$\begin{aligned} \Vert \theta \Vert = \sup _{x\in [0,1]}|\theta (x)|+ \sup _{x\in [0,1]}|\theta '(x)|. \end{aligned}$$
(8)

Definition 2.2

(Uniform continuity of functionals) The functional \(F: D(F)\subseteq X \rightarrow {\mathbb {F}}\) is said to be uniformly continuous on the domain D(F) if for every \(\epsilon >0\) there exists \(\delta >0\) such that the inequality \(| F([\theta _1])-F([\theta _2])|<\epsilon \) holds for all points \(\theta _1,\theta _2\in D(F)\) satisfying \(\left\| \theta _1-\theta _2\right\| _X\le \delta \).

Note that the definition of continuity and uniform continuity of a functional depends on how we measure the distance between elements of Banach space X.

Definition 2.3

(Compactness and complete continuity of functionals) The functional \(F: D(F)\subseteq X \rightarrow {\mathbb {F}}\) is said to be compact on the domain D(F) if it maps every bounded subset of D(F) into a pre-compact subset set of \({\mathbb {F}}\), i.e., a subset whose closure is compact. The functional \(F([\theta ])\) is called completely continuous on D(F) if it is continuous and compact.

It is clear that a continuous functional \(F([\theta ])\) is completely continuous if and only if for every bounded sequence \(\{\theta _n\}\) in D(F) we have that the sequence \(\{F([\theta _n])\}\) has a convergent sub-sequence. As is well known, continuous functions defined on a closed and bounded subset of \({\mathbb {R}}^n\) are always uniformly continuous and bounded. This is not the case with functionals defined on Banach spaces. In fact, uniform continuity of a functional \(F([\theta ])\) on a closed and bounded set \(S\subseteq D(F)\), say the unit sphere \(S=\{\theta \in C^{(1)}([0,1]): \left\| \theta \right\| \le 1\}\),Footnote 3 is not sufficient to guarantee that the functional is bounded [90, p. 18]. However, if the functional \(F([\theta ])\) is uniformly continuous on D(F), then it maps compact sets into compact sets. Moreover, if F is completely continuous on a bounded set \(K\subseteq D(F)\) (open or closed), then F is bounded on K. This is obvious since the definition of complete continuity of F implies that bounded sets are mapped into relatively compact sets, which are bounded.

Example 2 Consider the nonlinear functional

$$\begin{aligned} F([\theta ]) = \sin (\theta (\pi )) \end{aligned}$$
(9)

in the Banach space of Lipschitz continuous periodic functions in \([0,2\pi ]\),

$$\begin{aligned} D(F)= C_{\text {Lip}}^{(0)}([0,2\pi ]). \end{aligned}$$
(10)

As is well known, the Fourier series of any element \(\theta \in D(F)\) defines a sequence of partial sums \(\{\theta _m\}\in D(F)\) that converge uniformly to \(\theta \) (see [50]). Thanks to such uniform convergence result, we have

$$\begin{aligned} \lim _{m\rightarrow \infty } \left| F([\theta _m])-F([\theta ])\right|&= \lim _{m\rightarrow \infty } \left| \sin (\theta _m(\pi ))-\sin (\theta (\pi )) \right| \nonumber \\&\le \lim _{m\rightarrow \infty } \sup _{x\in [0,2\pi ]} \left| \sin (\theta _m(x))-\sin (\theta (x))\right| \nonumber \\&= 0, \end{aligned}$$
(11)

where the last equality follows from [7, Theorem 10]. Hence, the functional (9) is continuous on \(C^{(0)}_{\text {Lip}}([0,2\pi ])\). Moreover, F sends any bounded subset of such function space into a pre-compact subset of the real line, and therefore, F is completely continuous.

2.1 Differentials and derivatives of nonlinear functionals

Let us consider a nonlinear functional \(F: D(F)\subseteq X \rightarrow {\mathbb {F}}\), where D(F) is an open set. We say that F is Gâteaux differentiable at a point \(\theta \in D(F)\) if the limit

$$\begin{aligned} \mathrm{d}F_\eta ([\theta ]) = \lim _{\epsilon \rightarrow 0}\frac{F([\theta +\epsilon \eta ]) -F([\theta ])}{\epsilon } \end{aligned}$$
(12)

exists and is finite for all \(\eta \in D(F)\). The quantity \(\mathrm{d}F_\eta ([\theta ])\) is known as Gâteaux differential of F in the direction of \(\eta \) [83, 90]. Under rather mild conditions (see, e.g., [90, p. 37]), such differential can be represented as a linear operator acting on \(\eta \) [65, 92]. Such linear operator is known as the Gâteaux derivative of F at \(\theta \), and it will be denoted by \(F'([\theta ])\)

$$\begin{aligned} \mathrm{d}F_\eta ([\theta ]) = F'([\theta ]) \eta . \end{aligned}$$
(13)

The Fréchet differential, on the other hand, is defined as the term \(\mathrm{d}F([\theta ],\eta )\) in the series expansion

$$\begin{aligned} F([\theta +\eta ])=F([\theta ])+\mathrm{d}F([\theta ], \eta ) + R([\theta ],[\eta ]),\qquad \lim _{\left\| \eta \right\| \rightarrow 0} \frac{|R([\theta ],[\eta ])|}{\left\| \eta \right\| }=0. \end{aligned}$$
(14)

It is well known that if \(F([\theta ])\) has a continuous Gâteaux derivative on D(F), then F is Fréchet differentiable on D(F), and these two derivatives coincide [90, p. 41]. In this paper, we consider nonlinear functionals F that are continuously Gâteaux differentiable in D(F). Hence, we will not need to distinguish between Fréchet and Gâteaux derivatives.

There has been significant research activity on obtaining the minimal conditions under which a nonlinear functional is Gâteaux or Fréchet differentiable. It turns out that there are reasonably satisfactory results on Gâteaux differentiablility of Lipschitz functionals. For instance,

Theorem 2.1

(Gâteaux differentiablility of Lipschitz functionals [4, 57, 58]) Let X be a separable Banach space. Then, every real- or complex-valued Lipschitz functional F from an open set \(D(F)\subseteq X\) is Gâteaux differentiable outside a Gauss-null set.Footnote 4

Results on Fréchet differentiability are more rare, and usually much harder to prove [57]. For instance, we have

Theorem 2.2

(Fréchet derivatives of Lipschitz functionals [71]) Let K be a compact subset of a Hilbert space H. If \(F([\theta ])\) is real-valued and locally Lipschitz on K, then F is Fréchet differentiable on a dense subset of K.

Note that Theorem 2.2 does not imply that F is Fréchet differentiable everywhere on K. On the other hand, a continuously differentiable functional F, which is also completely continuous in the sense of Definition 2.3, has completely continuous Gâteaux and Fréchet derivatives ( [90, p.51]). As we will see in Sect. 3, continuously differentiable nonlinear functionals on compact metric spaces are also compact, and have compact Fréchet derivative. We emphasize that it is also possible to define Gâteaux and Fréchet differentiability directly in terms of bounded linear operators. For instance, Lindenstrauss and Preiss [57, p. 258] define F as Gâteaux differentiable if there exists a bounded linear operator \(F':D(F)\rightarrow {\mathbb {F}}\) such that for every \(\eta \in D(F)\)

$$\begin{aligned} F'([\theta ])\eta = \lim _{\epsilon \rightarrow 0}\frac{F([\theta +\epsilon \eta ]) -F([\theta ])}{\epsilon }. \end{aligned}$$

Clearly, this definition is more strict than (12)–(13), since it does not allow for unbounded derivative operators \(F'([\theta ])\).

In the context of nonlinear functionals defined on spaces of functions, it is convenient to define another type of functional derivative, namely

$$\begin{aligned} \frac{\delta F([\theta ]) }{\delta \theta (x)} = \lim _{\epsilon \rightarrow 0} \frac{F([\theta (y)+\epsilon \delta (x-y)]) -F([\theta (y)])}{\epsilon }, \end{aligned}$$
(15)

provided the limit exists. The quantity \(\delta F([\theta ])/\delta \theta (x)\) is known as first-order functional derivative of \(F([\theta ])\) with respect to \(\theta (x)\) [45, p. 309]). Functional derivatives are used extensively in many areas mathematical physics, e.g., in stochastic dynamics [37, 44, 52, 95], turbulence modeling [30, 63, 66], and quantum field theory [67, 105].

If the Fréchet derivative \(F'([\theta ])\) admits an integral representation,Footnote 5 then it is possible to establish a one-to-one correspondence between \(\delta F([\theta ])/\delta \theta (x)\) and \(F'([\theta ])\). For instance, if \(\ell (\eta ) = F'([\theta ])\eta \) is a bounded linear functional in a Hilbert space H, and \(F'([\theta ])\) is continuous in \(D(F)\subseteq H\), then the Riesz representation theorem guarantees that there exists a unique element \(\delta F([\theta ])/\delta \theta (x)\in H\) such that

$$\begin{aligned} F'([\theta ])\eta =\left( \frac{\delta F([\theta ]) }{\delta \theta (x)},\eta (x) \right) _H \qquad \forall \eta \in H, \qquad \forall \theta \in D(F). \end{aligned}$$
(16)

Here, \((\cdot ,\cdot )_H\) denotes the inner product in H. As we will see in Sect. 3, the Fréchet derivative of a continuous nonlinear functional defined on a compact subset of a real separable Hilbert space is a completely continuous linear operator, i.e., continuous and compact. In this case, the Riesz representation (16) holds and we have a one-to-one correspondence between \(F'([\theta ])\) and \(\delta F([\theta ])/\delta \theta (x)\).

More generally, if \(\Omega \) is a locally compact Hausdorff space, e.g., an open or closed subset of \({\mathbb {R}}^n\), and \(F'([\theta ])\) is a bounded linear operator from \(C_c^{(0)}(\Omega )\) (space of compactly supported continuous functions on \(\Omega \)) into \({\mathbb {R}}\), then there exists a unique finite regular signed measureFootnote 6\(\mu ([\theta ],x)\) on the Borel subsets of \(\Omega \) such that

$$\begin{aligned} F'([\theta ])\eta =\int _{\Omega } \eta (x) \mathrm{d}\mu ([\theta ],x), \qquad \eta \in C_c^{(0)}(\Omega ) \end{aligned}$$
(17)

(see [85, p. 324], [28, p. 4] or [35, 103]). In addition, if \(\mu ([\theta ],x)\) is absolutely continuous with respect to x, then there exists a Radon–Nikodym derivative, i.e., a functional density \(\delta F([\theta ])/\delta \theta (x)\), such that

$$\begin{aligned} \mathrm{d}\mu ([\theta ],x) = \frac{\delta F([\theta ]) }{\delta \theta (x)} \mathrm{d}x. \end{aligned}$$
(18)

Under these conditions, the Fréchet derivative of F admits the Lebesgue integral representation

$$\begin{aligned} F'([\theta ])\eta =\left( \frac{\delta F([\theta ]) }{\delta \theta (x)},\eta (x)\right) _{L^2(\Omega )}. \end{aligned}$$
(19)

We emphasize that (16) and (19) can be considered as infinite-dimensional generalizations of the concept of directional derivative of a multivariate function f(x), in which the dot product between the gradient \(\nabla f\) and a vector \(\eta \) is now replaced by the inner product of \(\delta F([\theta ])/\delta \theta (x)\) and \(\eta (x)\). By analogy, the quantity \(\delta F([\theta ])/\delta \theta (x)\) can be thought of as an infinite-dimensional gradient. Note that such gradient is a nonlinear functional of \(\theta \) and a function of x. Higher-order Fréchet and functional derivatives can be defined similarly [44, 93].

Example 1 The Fréchet derivative of the nonlinear functional (9) is obtained as

$$\begin{aligned} F'([\theta ])\eta = \cos (\theta (\pi ))\eta (\pi ), \end{aligned}$$
(20)

where both \(\theta \) and \(\eta \) are in the space (10) of Lipschitz continuous periodic functions in \([0,2\pi ]\). Clearly, equation (20) can be written as

$$\begin{aligned} F'([\theta ])\eta = \int _0^{2\pi } \cos (\theta (x))\delta (x-\pi ) \eta (x) \mathrm{d}x. \end{aligned}$$
(21)

From this expression we see that the signed measure \(\mathrm{d}\mu ([\theta ],x)\) appearing in equation (17) in this case has a density, which coincides with the first-order functional derivative

$$\begin{aligned} \frac{\delta F([\theta ])}{\delta \theta (x)} = \cos (\theta (x))\delta (x-\pi ). \end{aligned}$$
(22)

Such derivative is a distribution in x and a continuous functional of \(\theta \). The Fréchet differential (20) is a linear functional in \(\eta \), bounded in the \(C^{(0)}\) norm. In fact,

$$\begin{aligned} \left| \cos (\theta (\pi ))\eta (\pi )\right| \le \left| \cos (\theta (\pi ))\right| \sup _{x\in [0,2\pi ]}\left| \eta (x)\right| . \end{aligned}$$
(23)

However, such functional is unbounded in \(L^2([0,2\pi ])\), and therefore, (21) is not based on the Riesz representation (16), but rather on (19). To show this, we just need to prove that (20) admits an integral representation with kernel that is not in \(L^2([0,2\pi ])\). To this end, let us represent \(\eta \) relative to any orthonormal trigonometric basis \(\{\varphi _0,\varphi _1,\ldots \}\) of \(L_p^2([0,2\pi ])\) (space of square integrable periodic functions in \([0,2\pi ]\))

$$\begin{aligned} \eta (x)=\sum _{k=0}^\infty a_k\varphi _k(x), \qquad a_k=(\eta ,\varphi _k)_{L_p^2([0,2\pi ])}. \end{aligned}$$
(24)

The series (24) converges in the \(L_p^2([0,2\pi ])\) sense, and also pointwise since \(\eta \) is continuous [50]. A substitution of (24) into (20) yields

$$\begin{aligned} \cos (\theta (\pi ))\eta (\pi ) = \left( \cos (\theta (\pi )) v(x),\eta (x)\right) _{L_p^2([0,2\pi ])}, \qquad v(x)=\sum _{k=0}^\infty \varphi _k(\pi )\varphi (x). \end{aligned}$$
(25)

It straightforward to show that

$$\begin{aligned} \sum _{k=0}^\infty \varphi _k(\pi )^2 = \infty , \end{aligned}$$
(26)

and therefore, the function v(x) in (25) is not an element of \(L_p^2([0,2\pi ])\). Indeed, v(x) is the trigonometric series expansion of the Dirac delta function \(\delta (x-\pi )\), which is not in \(L_p^2([0,2\pi ])\).

3 Nonlinear functionals defined on compact subsets of real separable Hilbert spaces

In this section, we study the mathematical properties of nonlinear functionals defined on compact subsets of real separable Hilbert spaces. As we will see in subsequent sections, this is a very important class of functionals which allows us to build an effective approximation theory based on orthogonal projections.

Before we present such theory, let us briefly review the notion of bounded, closed, compact and pre-compact sets. Consider a metric space of functions X, e.g., a Hilbert or a Banach space, and a subset \(K\subseteq X\). We say that K is bounded if for all \(\theta \in K\) we have \(\left\| \theta \right\| _X \le M\), where M is a finite real number, and \(\Vert \cdot \Vert _X\) is the norm in X. The set K is said to be closed if any convergent sequence in K has a limit in K. An example of a closed and bounded subset of the Hilbert space \(L^2([0,1])\) is the unit sphere \(S=\{\theta \in L^2([0,1]): \left\| \theta \right\| _{L^2([0,1])} \le 1\}\). We say that the subset \(K\subseteq X\) is compact if every open cover of K has a finite subcover.

There are several equivalent characterizations of compactness in metric spaces. For instance, a subset K of a metric space X is compact if and only if every sequence in K has a bounded subsequence whose limit is in K [49, §1.7]. The set K is said to be pre-compact if its closure is compact, meaning that every sequence in K has a convergent sub-sequence whose limit is in X (not in K). Closed and bounded function spaces are not necessarily compact, since we can define sequences that do not have convergent sub-sequences. An example is the unit sphere S mentioned above. On the other hand, a compact set is always bounded and closed. A useful characterization of pre-compactness in real separable Hilbert spaces is the following:

Theorem 3.1

(Compact subsets of real separable Hilbert spaces) A subset E of a real separable Hilbert space H is pre-compact if and only if it is bounded, closed, and for any (one) orthonormal basis \(\{\varphi _1,\varphi _2,\ldots \}\) of H, and any \(\epsilon >0\) there exists a natural number m such thatFootnote 7

$$\begin{aligned} \sum _{k=m+1}^\infty |(\theta ,\varphi _k)_H|^2 \le \epsilon \end{aligned}$$
(27)

for all \(\theta \in E\). The closure of B in H, which we denote as \({\overline{E}}\), is compact.

The proof of Theorem 3.1 can be found in [62, p. 76] (Proposition 3.8). Note that neither pre-compact nor compact subsets of a vector space can be vector spaces, since pre-compactness implies boundedness. Hereafter, we provide two simple examples of compact subsets of real separable Hilbert spaces.

Example 1 Let \(H_p^{s}([0,2\pi ])\) be the Sobolev space of weakly differentiable (up to degree \(s\ge 1\)) periodic functions in \([0,2\pi ]\). The set

$$\begin{aligned} K = \overline{\left\{ \theta \in H_p^{s}([0,2\pi ])\,:\, \left\| \frac{\mathrm{d}^q\theta }{\mathrm{d}x^q}\right\| _{L_p^{2}([0,2\pi ])}\le \rho \right\} }\subseteq L_p^{2}([0,2\pi ]) \qquad 1\le q\le s, \end{aligned}$$
(28)

where \(L_p^{2}([0,2\pi ])\) is the Lebesgue space of periodic functions in \([0,2\pi ]\) and \(\rho >0\) is the radius of the Sobolev sphere, is a compact subset of \(L_p^{2}([0,2\pi ])\). Indeed, by expanding an arbitrary element \(\theta \in K\) in a Fourier series we obtain, for any \(1\le q\le s\) (see [46, p. 35])

$$\begin{aligned} \sum _{|k|>m}|(\theta ,e^{ikx})_{L_p^2([0,2\pi ])}|^2 \le \frac{C_1}{m^{2q}}\left\| \frac{\mathrm{d}^q\theta }{\mathrm{d}x^q}\right\| ^2_{L_p^2([0,2\pi ])}\le \frac{C_1\rho ^2}{m^{2q}} \qquad \forall \theta \in K. \end{aligned}$$
(29)

At this point it is clear that for any given \(\epsilon >0\) there exists a natural number m such that the right-hand side of (29) can be made smaller than \(\epsilon \) for any \(\theta \in K\). In other words, the equi-small tail condition (27) is satisfied. Moreover, in (28) we take the closure of the Sobolev sphere in \(L_p^2([0,2\pi ])\), which makes K is a compact subset of \(L_p^2([0,2\pi ])\).

Example 2 A closed sphere with radius \(\rho \) in \(H_w^s([-1,1])\) (weighted Sobolev space of degree s), is a pre-compact subset of \(L_w^2([-1,1])\) (weighted Lebesgue space of functions in \([-1,1]\)). Hence,

$$\begin{aligned} K=\overline{\left\{ \theta \in H_w^s([-1,1]) \,:\, \left\| \theta \right\| _{H_w^s([-1,1])}\le \rho \right\} } \subseteq L_w^2([-1,1]) \end{aligned}$$
(30)

is a compact subset of \(L_w^2([-1,1])\). This claim is based on the following well-known spectral convergence result [46, p. 109]

$$\begin{aligned} \sum _{k=m+1}^{\infty }|(\theta ,\varphi _k)_{L^2_w([-1,1])}|^2 \le \frac{C_2}{m^{s}}\left\| \theta \right\| ^2_{H_w^s([-1,1])}, \end{aligned}$$
(31)

where \(\varphi _k(x)\) here are ultra-spherical polynomials. By combining (31) with (30), we see that the equi-small tail condition

$$\begin{aligned} \sum _{k=m+1}^{\infty }|(\theta ,\varphi _k)_{L^2_w([-1,1])}|^2 \le \frac{C_2\rho ^2}{m^{s}} \end{aligned}$$
(32)

is satisfied \(\forall \theta \in K\).

The compact subsets we discussed in Example 1 and Example 2 are particular instances of a general compact embedding result known as Rellich–Kondrachov theorem [1, §6]. Such theorem states that the Sobolev space of functions \(W^{k,p}(\Omega )\) defined on a compact subset \(\Omega \subseteq {\mathbb {R}}^n\) with differentiable boundary is compactly embedded in \(W^{l,q}(\Omega )\), provided \(k>l\) and \(k-p/n>l-q/n\). This means that there exists a compact linear operator \(T:W^{k,p}(\Omega )\rightarrow W^{l,q}(\Omega )\) that maps bounded subsets of \(W^{k,p}(\Omega )\) into pre-compact subsets of \(W^{l,q}(\Omega )\). In Example 1 and Example 2, we have that \(H^s = W^{s,2}\) and \(L^2= W^{0,2}\) Hence, the Rellich–Kondrachov embedding in this case reduces to the statement that the Sobolev space \(H^s\) is compactly embedded in \(L^2\) for all \(s>0\).

Continuous nonlinear functionals defined on a compact subset K of a metric space have nice mathematical properties. First of all, they are bounded since they map compact sets into a closed and bounded subset of \({\mathbb {R}}\) or \({\mathbb {C}}\). Moreover, by the Heine–Cantor Theorem we have that any continuous functional defined on a compact set is necessarily uniformly continuous and bounded [49]. If the functional is real-valued, this means that the maximum and the minimum are attained at points within K. We also recall that closed subsets of compact sets are necessarily compact. Hence, a continuous functional on K maps any closed subset of K into a closed and bounded subset of \({\mathbb {R}}\) or \({\mathbb {C}}\). Such functional is necessarily compact,Footnote 8 i.e., completely continuous (see Definition 2.3).

3.1 Fréchet and functional derivatives

Next, we show that the Fréchet derivative of a continuous nonlinear functional defined on a compact subset of a real separable Hilbert space is a compact linear operator.

Lemma 3.1

(Compactness of the Fréchet derivative) Let K be a compact subset of a real separable Hilbert space H, and let \(F([\theta ])\) be a continuous real- or complex-valued functional on H. If the Fréchet derivative \(F'([\theta ^*])\) exists at \(\theta ^*\in K\), then it is a compact linear operator.Footnote 9

Proof

Continuous functionals on compact metric spaces are necessarily completely continuous (see Definition 2.3). To prove the Lemma, we proceed by contradiction. To this end, suppose that \(F'([\theta ^*])\) is not compact. Then, it is possible to find \(\epsilon > 0\) and a sequence \(\{\theta _k\}\in K\subseteq H\) such that \(\Vert \theta _k\Vert _H\le 1\) and

$$\begin{aligned} \left| F'([\theta ^*])\theta _k -F'([\theta ^*])\theta _j \right| \ge \epsilon \end{aligned}$$
(33)

for all \(k\ne j\). By definition of Fréchet derivative at \(\theta ^*\), we have

$$\begin{aligned} \left| F([\theta ^*+\eta ]) - F([\theta ^*]) - F'([\theta ^*])\eta )\right| = {o(\left\| \eta \right\| _H)}, \end{aligned}$$
(34)

for all \(\eta \in K\) with reasonably small norm, say \(\left\| \eta \right\| _H\le \delta \). In particular, we can choose \(\delta \) such that

$$\begin{aligned} \left| F([\theta ^*+\eta ]) - F([\theta ^*]) - F'([\theta ^*])\eta \right| \le \frac{\epsilon }{4} \left\| \eta \right\| _{H}. \end{aligned}$$
(35)

Next, choose \(\tau \) small enough so that \((\theta ^*+\tau \theta _k)\in K\) and \(\left\| \tau \theta _k\right\| \le \delta \) for all \(k\in {\mathbb {N}}\). For such functions, we have

$$\begin{aligned} \left| F([\theta ^*+\tau \theta _k])-F([\theta ^*+\tau \theta _j]) \right|&\ge \left| \tau F'([\theta ^*])(\theta _k-\theta _j)\right| \nonumber \\&\quad -\left| F([\theta ^*+\tau \theta _k])-F([\theta ^*])- \tau F'([\theta ^*])\theta _k \right| \nonumber \\&\quad -\left| F([\theta ^*+\tau \theta _j])-F([\theta ^*])- \tau F'([\theta ^*])\theta _j\right| \nonumber \\&\ge \epsilon \tau -\frac{\tau \epsilon }{4}-\frac{\tau \epsilon }{4} \nonumber \\&= \frac{\epsilon \tau }{2}. \end{aligned}$$
(36)

This means that the functional F is not completely continuous. In fact, the inequality (36) implies that it is not possible to extract a convergent sub-sequence from the sequence \(\{F([\theta ^*+\tau \theta _k])\}\), with \(\theta _k\) bounded. This proves the lemma. \(\square \)

Lemma 3.2

(Representation of the Fréchet derivative) Let K be a compact subset of a real separable Hilbert space H, and let \(F([\theta ])\) be a continuous real- or complex-valued functional on K. If the Fréchet derivative of \(F([\theta ])\) exists at \(\theta ^*\in K\), then \(F'([\theta ^*])\) admits the unique integral representation

$$\begin{aligned} F'([\theta ^*])\eta = \left( \frac{\delta F([\theta ^*])}{\delta \theta (x)},\eta (x)\right) _H\qquad \forall \eta \in H, \end{aligned}$$
(37)

where \(\delta F([\theta ^*])/\delta \theta (x)\in H\) is the first-order functional derivative (15).

Proof

We have seen in Lemma 3.1 that the Fréchet derivative of a continuous nonlinear functional defined on a compact subset of a real separable Hilbert space is a compact linear operator. Hence, \(\ell ([\eta ])=F'([\theta ^*])\eta \) is a bounded linear functional in H. By applying the Riesz representation theorem (37) to \(\ell \), we conclude that there exists a unique element \(\delta F([\theta ^*])/\delta \theta (x)\in H\) such that (37) holds. This proves the Lemma. \(\square \)

Example 1 Let \(H=L^2_p([0,2\pi ])\) and K be the Sobolev sphere (28), which includes its closure in \(L^2_p([0,2\pi ])\). Consider the nonlinear functional

$$\begin{aligned} F([\theta ]) = \int _0^{2\pi } \sin (x) \sin ^2(\theta (x))\mathrm{d}x, \qquad \theta \in K. \end{aligned}$$
(38)

The Fréchet derivative of (38) is

$$\begin{aligned} F'([\theta ]) \eta = \int _{0}^{2\pi } \sin (x) \sin (2\theta (x))\eta (x)\mathrm{d}x, \qquad \eta \in H. \end{aligned}$$
(39)

For fixed \(\theta \in K\), the linear functional \(F'([\theta ]) \eta \) is bounded in \(L^2_p([0,2\pi ])\). In fact, by the Cauchy–Schwarz inequality we have

$$\begin{aligned} \left| F'([\theta ]) \eta \right| \le&\underbrace{\left\| \sin (x) \sin (2\theta (x)) \right\| _{L^2_p([0,2\pi ])}}_{\left\| F'([\theta ])\right\| } \left\| \eta \right\| _{L^2_p([0,2\pi ])}, \qquad \left\| F'([\theta ])\right\| \le \sqrt{\pi }, \end{aligned}$$
(40)

for all \(\eta \in H\). Therefore, for each \(\theta \in K\) Lemma 3.1 holds, i.e., there exists a unique first-order functional derivative

$$\begin{aligned} \frac{\delta F([\theta ])}{\delta \theta (x)}=\sin (x) \sin (2\theta (x)), \end{aligned}$$
(41)

which is an element of \(L^2_p([0,2\pi ])\) (as a function of x).

4 Cylindrical approximation of nonlinear functionals in real separable Hilbert spaces

Let H be a real separable Hilbert space with inner product \((\cdot ,\cdot )_H\). Any element \(\theta \in H\) can be represented uniquely in terms of an orthonormal basis \(\{\varphi _1,\varphi _2,\ldots \}\) as

$$\begin{aligned} \theta (x)=\sum _{k=1}^\infty a_k\varphi _k(x),\qquad a_k=(\theta ,\varphi _k)_H, \end{aligned}$$
(42)

where the series converges in the norm \(\Vert \cdot \Vert _H\) induced by the inner product \((\cdot ,\cdot )_H\). We introduce the projection operator \(P_m\), which truncates the series expansion (42) to m terms

$$\begin{aligned} P_m \theta = \sum _{k=1}^m (\theta ,\varphi _k)_H\varphi _k. \end{aligned}$$
(43)

Clearly, \(P_m\) is an operator from H into the finite-dimensional space

$$\begin{aligned} D_m = \text {span}\{\varphi _1,\ldots ,\varphi _m\}. \end{aligned}$$
(44)

With this notation, we can represent any nonlinear functional \(F([\theta ])\) in H as a function depending on an infinite (countable) number of variables. To this end, we substitute (42) into \(F([\theta ])\) to obtain

$$\begin{aligned} f(a_1,a_2,\ldots ) = F\left( \left[ \sum _{k=1}^\infty a_k\varphi _k(x)\right] \right) , \qquad {a_k=(\theta ,\varphi _k)_H, \qquad \theta \in H.} \end{aligned}$$
(45)

A simple way to approximate this functional is to restrict its domain to the range of the projection \(P_m\), which is the finite-dimensional space \(D_m\) in (44). This reduces \(F([\theta ])\) to a multivariate function \(f(a_1,\ldots ,a_m)\), which depends on as many variables as the number of basis elements of \(D_m\). Specifically, we have

$$\begin{aligned} f(a_1,\ldots ,a_m)=F\left( \left[ \sum _{k=1}^m a_k \varphi _k \right] \right) , \qquad {a_k=(\theta ,\varphi _k)_H, \qquad \theta \in H.} \end{aligned}$$
(46)

In the theory of stochastic processes, the set

$$\begin{aligned} \left\{ \theta \in H:\, ((\theta ,\varphi _1)_H,\ldots ,(\theta ,\varphi _m)_H)\in B\right\} , \end{aligned}$$
(47)

where B is a Borel set of \({\mathbb {R}}^m\), is known as cylindrical set (see [91, p. 55] or [88, p. 45]). Therefore, functionals of the form (46), i.e.,

$$\begin{aligned} f\left( (\theta ,\varphi _1)_H, (\theta ,\varphi _2)_H, \ldots , (\theta ,\varphi _m)_H \right) =F([P_m\theta ]), \qquad \theta \in H, \end{aligned}$$
(48)

where f is a multivariate function, are often referred to as cylindrical (or cylinder) functionalsFootnote 10 [6, 39]. Such functionals play a fundamental role, e.g., in the approximation of functional integrals arising in quantum field theory [23, 105] (see also “Appendix A”).

Definition 4.1

(Cylindrical approximation of nonlinear functionals) Let H be a real separable Hilbert space, \(P_m\) the projection operator (43), and F a nonlinear functional on H. We will call \(F([P_m\theta ])\) cylindrical approximation of \(F([\theta ])\).

We will see in Sect. 5 that \(F([P_m \theta ])\) (Eq. (48)) converges uniformly to \(F([\theta ])\) (Eq. (45)) as m goes to infinity in any compact subset of a real separable Hilbert space H.

Next, we study the representation of the Fréchet and the first-order functional derivatives. If \(F([\theta ])\) is Fréchet differentiable at \(\theta \in H\) with continuous Fréchet derivative \(F'([\theta ])\), then \(\ell (\eta )=F'([\theta ])\eta \) is a bounded linear functional. Hence, by Riesz’s representation theorem, there exists a unique element of H, which we denoted by \(\delta F([\theta ])/\delta \theta (x)\), such that

$$\begin{aligned} F'([\theta ])\eta =\left( \frac{ \delta F\left( \left[ \theta \right] \right) }{\delta \theta (x)},\eta \right) _{H}. \end{aligned}$$
(49)

As we pointed out in Lemma 3.2, \(\delta F([\theta ])/\delta \theta (x)\) coincides with the first-order functional derivative (15). Such derivative is an element of H, and therefore, it can be represented in terms of the orthonormal basis \(\{\varphi _1,\varphi _2,\ldots \}\) as

$$\begin{aligned} \frac{\delta F([\theta ])}{\delta \theta (x)}= \sum _{k=1}^\infty \left( \frac{\delta F([\theta ])}{\delta \theta (x)}, \varphi _k\right) _H \varphi _k(x). \end{aligned}$$
(50)

A differentiation of (45) with respect to \(a_k\) yieldsFootnote 11

$$\begin{aligned} \frac{\partial f}{\partial a_k} = \left( \frac{ \delta F\left( \left[ \theta \right] \right) }{\delta \theta (x)},\varphi _k\right) _{H}, \qquad k=1,2,\ldots \end{aligned}$$
(52)

This means that the partial derivative of \(f(a_1,a_2,\ldots )\) with respect to \(a_k=(\theta ,\varphi _k)_H\) is the projection of the first-order functional derivative of F onto the basis element \(\varphi _k\). By substituting (52) into (50) we obtain

$$\begin{aligned} \frac{\delta F([\theta ])}{\delta \theta (x)}= \sum _{k=1}^\infty \frac{\partial f}{\partial a_k} \varphi _k(x). \end{aligned}$$
(53)

This expression emphasizes that the first-order functional derivative (53) is essentially a “dot product” between the (infinite-dimensional) gradient of f and the (infinite-dimensional) vector of basis elements. Evaluating (50) on the finite-dimensional function space \(D_m\) yields the cylindrical approximation

$$\begin{aligned} \frac{\delta F([P_m \theta ])}{\delta \theta (x)}= \sum _{k=1}^m \frac{\partial f}{\partial a_k} \varphi _k(x) + \sum _{k=m+1}^\infty \left( \frac{ \delta F\left( \left[ P_m\theta \right] \right) }{\delta \theta (x)},\varphi _k\right) _{H} \varphi _k(x). \end{aligned}$$
(54)

Here, f depends solely on the variables \((a_1,\ldots , a_m)\). If the functional derivative \(\delta F([P_m \theta ])/\delta \theta (x)\) is an element of \( D_m\) (as a function of x), then the second term at the right-hand side of (54) is clearly equal to zero.

Example 1 Let \(F([\theta ])\) be a continuously differentiable functional on a real separable Hilbert space H, and let \(P_m\) be the projection operator (43). Then, from Eqs. (49) and (52) it follows that

$$\begin{aligned} F'([P_m\theta ])P_m\theta = \sum _{k=1}^m a_k \frac{\partial f}{\partial a_k}, \end{aligned}$$
(55)

where f is the multivariate function defined in (46).

5 Convergence analysis of cylindrical approximations: continuous nonlinear functionals

In this section, we perform a convergence analysis for nonlinear functional approximations of the form (46) in compact subsets of real separable Hilbert spaces. We begin by recalling an approximation result first obtained by Prenter in [72].

Lemma 5.1

(Uniform convergence of cylindrical functional approximations [72, Lemma 5.3]) Let H be a real separable Hilbert space, K a compact subset of H, and \(P_m : H \rightarrow D_m\) the projection operator (43). If F is a continuous functional on H, then the sequence \(F([P_m\theta ])\) converges uniformly to \(F([\theta ])\) on K, i.e., for all \(\epsilon >0\) there exists \(m_\epsilon \in {\mathbb {N}}\) such that

$$\begin{aligned} \left| F([\theta ])-F([P_m\theta ])\right| \le \epsilon , \qquad \end{aligned}$$
(56)

for all \(\forall m\ge m_\epsilon \) and for all \(\theta \in K\)

The compactness hypothesis of the subset K in Lemma 5.1 can be replaced by the weaker assumption that \(K\subseteq H\) is bounded (e.g., a sphere), and F is uniformly continuous with respect to the so-called S-topology (see [9] for details).

5.1 Convergence rate

Lemma 5.1 establishes uniform convergence of the functional \(F([P_m\theta ])\) to \(F([\theta ])\) on compact subsets of real separable Hilbert spaces. We will now address how fast the approximation \(F([P_m\theta ])\) converges to \(F([\theta ])\). We will show that for continuously differentiable functionals (functionals with continuous Fréchet derivative) defined on compact, convex subset of real separable Hilbert spaces, \(F([P_m \theta ])\) converges to \(F([\theta ])\) at the same rate at which \(P_m \theta \) converges to \(\theta \) in H. This results follows from the well-known mean value theorem.

Theorem 5.1

(Mean value theorem) Let F be a real-valued continuously differentiable functional on a compact convex subset K of a real separable Hilbert space H. Then, for all \(\theta _1, \theta _2\in K\) the following estimate holds:

$$\begin{aligned} \left| F([\theta _1])-F([\theta _2])\right| \le \sup _{\eta \in K}\left\| F'([\eta ])\right\| \left\| \theta _1-\theta _2\right\| _H, \end{aligned}$$
(57)

where \(F'([\eta ])\) denotes the first-order Fréchet derivative of F.

We omit the proof as this is a well-known result. We simply recall that since \(F'([\theta ])\) is the Fréchet derivative of a continuously differentiable functional on a compact metric space, we have that \(F'([\theta ])\) is a compact linear operator (Theorem 3.1), and therefore, it is bounded on K. Hence,

$$\begin{aligned} \sup _{\eta \in K}\left\| F'([\eta ])\right\| =M<\infty . \end{aligned}$$

We also emphasize that it is possible to relax the assumptions in Theorem 5.1. For instance, it is possible to drop the requirement that F is continuously differentiable and leverage the fact that for each \(\epsilon >0\) and any pair of points \(\theta _1,\theta _2\in K\) there exists a point \(\theta ^* \in K\) in which F is Fréchet differentiable, and

$$\begin{aligned} F([\theta _1])-F([\theta _2])< F'([\theta ^*])(\theta _1-\theta _2)+\epsilon , \end{aligned}$$
(58)

provided the line \(\theta _t= t\theta _1+(1-t)\theta _2\) is in K for all \(t\in [0,1]\). However, for the purpose of the present paper we shall simply restrict the class of nonlinear functionals we study to continuously differentiable nonlinear functionals. This allows us to obtain the following convergence rate result using the mean value Theorem 5.1.

Lemma 5.2

(Convergence rate of cylindrical functional approximations) Let F be a real-valued, continuously differentiable functional on a compact and convex subset K of a real separable Hilbert space H. Then, for all \(\theta \in K\) and for any finite-dimensional projection \(P_m\) of the form (43), we have

$$\begin{aligned} \left| F([\theta ])-F([P_m\theta ])\right| \le \sup _{\eta \in K}\left\| F'([\eta ])\right\| \left\| \theta -P_m\theta \right\| _H. \end{aligned}$$
(59)

In particular, \(F([P_m\theta ])\) converges to \(F([\theta ])\) for all \(\theta \in K\) at the same rate as \(P_m \theta \) converges to \(\theta \) in H.

The proof follows directly from the mean value Theorem 5.1 by setting \(\theta _1=\theta \) and \(\theta _2=P_m\theta \).

Example 1 Consider the compact subset \(K\subseteq L^2_w([-1,1])\) defined in equation (30). Then, for any continuously differentiable functional F on \(L^2_w([-1,1])\), we have

$$\begin{aligned} \left| F([\theta ])-F([P_m \theta ])\right| \le \frac{C}{m^{s}},\qquad \forall \theta \in K, \end{aligned}$$
(60)

where \(P_m\) is a projection onto ultra-spherical polynomials in \([-1,1]\), and C is the (finite) constant

$$\begin{aligned} C=C_2\rho ^2\sup _{\eta \in K}\left\| F'([\eta ])\right\| . \end{aligned}$$
(61)

Here, \(C_2\) and \(\rho ^2\) are defined in (32). If \(\theta \) is infinitely differentiable, then \(F([P_m \theta ])\) converges to \(F([\theta ])\) exponentially fast in m.

6 Convergence analysis of cylindrical approximations: Fréchet and functional derivatives

In this section, we study convergence of cylindrical approximations of \(F'([\theta ])\) and \(\delta F([\theta ])/\delta \theta (x)\) in a compact subset K of a separable real Hilbert space H. We begin with the following

Theorem 6.1

(Uniform approximation of first-order Fréchet derivatives) Let H be a real separable Hilbert space, K a compact subset of H, and \(P_m:H \rightarrow D_m\) the projection operator (43). If F is continuously differentiable on K with Fréchet derivative \(F'([\theta ])\), then the sequence of operators \(F'([P_m \theta ])\) converges uniformly to \(F'([\theta ])\). In other words, for all \(\epsilon >0\) there exists \(m_\epsilon \in {\mathbb {N}}\) such that

$$\begin{aligned} \left\| F'([\theta ]) - F'([P_m\theta ]) \right\| =\sup _{\begin{array}{c} \eta \in H\\ \eta \ne 0 \end{array}} \frac{\left| F'([\theta ])\eta -F'([P_m\theta ])\eta \right| }{\left\| \eta \right\| _H}<\epsilon , \end{aligned}$$
(62)

for all \(m\ge m_\epsilon \), and for all \(\theta \in K\).

Proof

Let us define the functional \(G_\eta ([\theta ])=F'([\theta ])\eta \). For each fixed \(\eta \in H\), we have that \(G_\eta ([\theta ])\) is nonlinear and continuous in \(\theta \). Hence, we can apply Theorem 5.1 to conclude that \(G_\eta ([P_m \theta ])\) converges uniformly to \(G_\eta ([\theta ])\), i.e., that for each \(\epsilon _\eta >0\) and there exists \(m_\eta \in {\mathbb {N}}\) such that for all \(m \ge m_\eta \)

$$\begin{aligned} \left| G_\eta ([\theta ]) - G_\eta ([P_m \theta ])\right| <\epsilon _\eta , \qquad \forall \theta \in K. \end{aligned}$$
(63)

Since F is a continuously differentiable functional on a compact metric space, the Fréchet derivative \(F'([\theta ])\) is a compact linear operator (Theorem 3.1) on H for each \(\theta \in K\). This means that for each fixed \(\theta \in K\) the linear functional \(G_\eta ([\theta ]) - G_\eta ([P_m \theta ])\) is bounded

$$\begin{aligned} \left| G_\eta ([\theta ]) - G_\eta ([P_m \theta ])\right| \le \gamma (m)\left\| \eta \right\| _H. \end{aligned}$$
(64)

By combining (63) and (64) we conclude that for each \(\epsilon >0\) there exists \(m_\epsilon \in {\mathbb {N}}\) such for

$$\begin{aligned} \left\| F'([\theta ]) - F'([P_m \theta ]) \right\| = \sup _{\begin{array}{c} {\eta \in H}\\ \eta \ne 0 \end{array}} \frac{\left| \left( F'([\theta ])-F'([P_m \theta ])\right) \eta \right| }{\left\| \eta \right\| _H}< \epsilon , \end{aligned}$$
(65)

for all \(m\ge m_\epsilon \) and for all \(\theta \in K\). This proves the theorem. \(\square \)

Next, we study convergence of the first-order functional derivative (15). This is relatively straightforward given the convergence result we just obtained in Theorem 6.1. In fact, the linear functional \(F'([\theta ])\eta \) is bounded for each \(\theta \) in the compact set \(K\subseteq H\), and therefore, it admits the Riesz integral representation

$$\begin{aligned} F'([\theta ])\eta = \left( \frac{\delta F([\theta ])}{\delta \theta (x)},\eta \right) _H \qquad \theta \in K, \qquad \eta \in H, \end{aligned}$$
(66)

where \((\cdot ,\cdot )_H\) is the inner product in H. Uniform convergence of \(F'([P_m\theta ])\) to \(F'([\theta ])\) for all \(\theta \) in the compact set \(K\subseteq H\) implies that for every \(\epsilon >0\) there exists \(m_\epsilon \in {\mathbb {N}}\) such that

$$\begin{aligned} \left| \left( \frac{\delta F([\theta ])}{\delta \theta (x)}- \frac{\delta F([P_m\theta ])}{\delta \theta (x)},\eta \right) _H\right| < \epsilon {\left\| \eta \right\| _H}, \qquad \forall \theta \in K,\qquad \forall \eta \in H{\setminus }\{0\}, \end{aligned}$$
(67)

and for all \(m\ge m_\epsilon \).

Lemma 6.1

(Uniform approximation of first-order functional derivatives) Let H be a real separable Hilbert space, K a compact subset of H, \(\theta \in K\) and \(P_m: H \rightarrow D_m\) the projection (43). If F is continuously differentiable on K with Fréchet derivative \(F'([\theta ])\), then the sequence \(\delta F([P_m\theta ])/\delta \theta (x)\) converges uniformly to \(\delta F([\theta ])/\delta \theta (x)\). In other words, for all \(\epsilon >0\) there exists \(m_\epsilon \in {\mathbb {N}}\) such that for all \(m\ge m_\epsilon \)

$$\begin{aligned} \left\| \frac{\delta F([\theta ])}{\delta \theta (x)}- \frac{\delta F([P_m\theta ])}{\delta \theta (x)}\right\| _H <\epsilon , \qquad \forall \theta \in K. \end{aligned}$$
(68)

Proof

Consider the linear functional of \(\eta \in H\)

$$\begin{aligned} \left( F([\theta ])- F([P_m \theta ])\right) \eta = \left( \frac{\delta F([\theta ])}{\delta \theta (x)}-\frac{\delta F([P_m\theta ])}{\delta \theta (x)},\eta \right) _H. \end{aligned}$$
(69)

It is well known that the norm of (69) is

$$\begin{aligned} M([\theta ])=\left\| \frac{\delta F([\theta ])}{\delta \theta (x)}-\frac{\delta F([P_m\theta ])}{\delta \theta (x)}\right\| _{H}. \end{aligned}$$
(70)

By definition, \(M([\theta ]\)) is the smallest number such that

$$\begin{aligned} \left| \left( F([\theta ])-F([P_m \theta ])\right) \eta \right| \le M([\theta ])\left\| \eta \right\| _{H}. \end{aligned}$$
(71)

This observation together with (67) allows us to conclude that \(M([\theta ])<\epsilon \) for all \(\theta \in K\). This proves the theorem. \(\square \)

6.1 Convergence rate

Let us assume that \(G_\eta ([\theta ])=F'([\theta ])\eta \) is continuously Fréchet differentiable with respect to \(\theta \) in H. Denote by \(G'_\eta ([\theta ])\) the first-order Fréchet derivative and let K be a compact convex subset of H. By applying the mean value Theorem 5.1, we obtain

$$\begin{aligned} \left| G_\eta ([\theta ])-G_\eta ([P_m\theta ])\right| \le {\sup _{\zeta \in K}} \left\| G'_\eta ([\zeta ])\right\| \left\| \theta -P_m\theta \right\| _H,\qquad {\forall \eta \in H, \quad \forall \theta \in K}. \end{aligned}$$
(72)

The Fréchet derivative of \(G_\eta ([\theta ])\) can be written asFootnote 12

$$\begin{aligned} G'_\eta ([\theta ]) \xi =F''([\theta ])\eta \xi . \end{aligned}$$
(73)

If we divide (72) by \(\left\| \eta \right\| _H\) (\(\eta \ne 0\)) and take the supremum over \(\eta \in H\), we obtain

$$\begin{aligned} \left\| F'([\theta ])-F'([P_m\theta ])\right\| \le {\sup _{\zeta \in K}} \left\| F'' ([\zeta ])\right\| \left\| \theta -P_m\theta \right\| _H, \qquad {\forall \theta \in K}, \end{aligned}$$
(74)

where

$$\begin{aligned} \left\| F'' ([\zeta ])\right\| =\sup _{\begin{array}{c} {\eta ,\xi \in H}\\ \eta ,\xi \ne 0 \end{array}} \frac{\left| F''([\zeta ])\eta \xi \right| }{\left\| \eta \right\| _H\left\| \xi \right\| _H}. \end{aligned}$$
(75)

The symmetric bilinear form \(F'' ([\theta ])\) is continuous on \(H \times H\), and therefore, it is bounded. Moreover, \(\left\| F'' ([\theta ])\right\| \) is continuous in \(\theta \) and attains its minimum and maximum values in any compact set \(K\subseteq H\). By equation (74) this implies that the Fréchet derivative \(F'([P_m\theta ])\) converges to \(F'([\theta ])\) in K at the same rate as \(P_m\theta \) converges to \(\theta \) in H. We summarize these results in the following lemma.

Lemma 6.2

(Convergence rate of first-order Fréchet derivatives) Let H be a real separable Hilbert space, and let \(F([\theta ])\) be a nonlinear functional with continuous first- and second-order Fréchet derivatives. Then, for all \(\theta \) in a compact convex subset K of H and for any finite-dimensional projection \(P_m\) of the form (43), we have

$$\begin{aligned} \left\| F'([\theta ])-F'([P_m\theta ])\right\| \le {\sup _{\zeta \in K}}\left\| F'' ([\zeta ])\right\| \left\| \theta -P_m\theta \right\| _H. \end{aligned}$$
(76)

In particular, \(F'([P_m\theta ])\) converges uniformly to \(F'([\theta ])\) in K at the same rate as \(P_m \theta \) converges to \(\theta \) in H.

Convergence rate results for higher-order Fréchet derivatives can be obtained in a similar way.

7 Approximation of linear functional differential equations

Let \({\mathcal {F}}(H)\) denote a Banach space of nonlinear functionals from a real separable Hilbert space H into \({\mathbb {R}}\) or \({\mathbb {C}}\). In this section, we develop necessary and sufficient conditions which guarantee that the solution to linear functional differential equations (FDEs) of the form

$$\begin{aligned} \frac{\partial F([\theta ],t)}{\partial t} = {\mathcal {L}}([\theta ]) F([\theta ],t), \qquad F([\theta ],0)=F_0([\theta ]) \end{aligned}$$
(77)

can be approximated by the solution of suitable finite-dimensional linear partial differential equations. Equation (77) is a linear abstract evolution equation (Cauchy problem) in the Banach space \({\mathcal {F}}(H)\) [41]. The linear operator \({\mathcal {L}}([\theta ])\) is assumed to be in \({\mathcal {C}}({\mathcal {F}})\), which is the set of closed, densely defined and continuous linear operators on \({\mathcal {F}}(H)\). Note that \({\mathcal {L}}([\theta ])\) can be unbounded. To construct the approximation scheme for the FDE (77), we consider the following cylindrical approximation of the solution functional F

$$\begin{aligned} F_m([\theta ],t) = F([P_m \theta ],t), \qquad \theta \in H, \end{aligned}$$
(78)

where \(P_m\) is the projection operator (43). We have seen in Sect. 5 that \(F_m([\theta ],t)= f(a_1,\ldots ,a_m,t)\) is a multivariate function in the m variables \(a_k=(\theta ,\varphi _k)_H\) (\(k=1,\ldots , m\)) which converges uniformly to \(F([\theta ],t)\) in every compact subset of H, for any fixed time t. From a functional analysis perspective, \(F_m\) is an element of a Banach space of functionals on H, which we denote by \({\mathcal {F}}_m(H)\). With this notation, we see that the functional approximation (78) is essentially induced by the application of a continuous linear operator \(B_m:{\mathcal {F}}(H)\rightarrow {\mathcal {F}}_m(H)\) defined as

$$\begin{aligned} B_m F([\theta ],t) = F([P_m\theta ],t). \end{aligned}$$
(79)

Using the operator \(B_m\), we perform the following decomposition of the right-hand side of (77)

$$\begin{aligned} B_m \left( {\mathcal {L}}([\theta ]) F([\theta ],t)\right) = {\mathcal {L}}_m([\theta ]) F_m([\theta ],t) + R_m([\theta ],t), \end{aligned}$$
(80)

where \({\mathcal {L}}_m([\theta ])\) is a linear operator acting on the m-dimensional function \(F_m([\theta ],t)=f(a_1,\ldots ,a_m,t)\), and \(R_m\) is a functional residual. The operator \({\mathcal {L}}_m([\theta ])\) can be unbounded. As an example, let \(H=L_p^2([0,2\pi ])\) (space of square integrable periodic functions in \([0,2\pi ]\)) and consider

$$\begin{aligned} {\mathcal {L}}([\theta ])F([\theta ],t) = \int _{0}^{2\pi } \theta (x) \frac{\partial }{\partial x} \frac{\delta F([\theta ],t)}{\delta \theta (x)} \mathrm{d}x, \qquad \theta \in H. \end{aligned}$$
(81)

A substitution of (46) and (54) into (81) yields

$$\begin{aligned} B_m \left( {\mathcal {L}}([\theta ])F([\theta ],t)\right)= & {} \underbrace{\sum _{k,j=1}^m a_j\frac{\partial f}{\partial a_k} \int _{0}^{2\pi } \frac{\partial \varphi _k}{\partial x} \varphi _j\,\mathrm{d}x}_{{\mathcal {L}}_m([\theta ]) F_m([\theta ],t)}\nonumber \\&+ \underbrace{\sum _{k=m+1}^\infty \left( \frac{\delta F([P_m\theta ],t)}{\delta \theta (x)},\varphi _k\right) _{H}\int _{0}^{2\pi } \frac{\partial \varphi _k}{\partial x} P_m\theta \,\mathrm{d}x}_{R_m([\theta ],t)}, \end{aligned}$$
(82)

where \(a_j=(\theta ,\varphi _j)_{H}\) (\(j=1,\ldots ,m\)). Note that \({\mathcal {L}}_m([\theta ])\) in (82) is a linear first-order partial differential operator with non-constant coefficients.

Definition 7.1

(Consistency) A sequence of linear operators \(\{{\mathcal {L}}_m\}\in {\mathcal {C}}({\mathcal {F}}_m)\), is said to be consistent (or compatible) with a linear operator \({\mathcal {L}}\in {\mathcal {C}}({\mathcal {F}})\) if for every \(F\in D({\mathcal {L}})\)Footnote 13 there exists a sequence \(F_m\in D({\mathcal {L}}_m)\) such that

$$\begin{aligned} \left\| F-F_m\right\| \rightarrow 0\quad \text {and} \quad \left\| {\mathcal {L}} F-{\mathcal {L}}_m F_m \right\| \rightarrow 0 \end{aligned}$$
(83)

as \(m\rightarrow \infty \). Moreover, if \(\left\| {\mathcal {L}}_m F_m - {\mathcal {L}} F\right\| ={\mathcal {O}}(m^{-p})\), then we say that the sequence \(\{{\mathcal {L}}_m\}\) is consistent with \({\mathcal {L}}\) to order p.

Lemma 7.1

(Consistency of cylindrical approximations to FDEs) Let H be a real separable Hilbert space. Consider a functional \(F\in {\mathcal {F}}(H)\) and a densely defined closed linear operator \({\mathcal {L}}\in {\mathcal {C}}({\mathcal {F}})\). If \({\mathcal {L}}([\theta ])F([\theta ])\) is continuous in \(\theta \), then the sequence of operators \(\{{\mathcal {L}}_m\}\) defined in (80) is consistent with \({\mathcal {L}}\) on every compact subset K of H, provided \( \left\| R_m([\theta ])\right\| \rightarrow 0\) as \(m\rightarrow \infty \) for all \(\theta \in K\).

Proof

Equation (80) implies that

$$\begin{aligned} \left| {\mathcal {L}}([\theta ])F([\theta ]) - {\mathcal {L}}_m([\theta ])F_m([\theta ])\right| =&\left| {\mathcal {L}}([\theta ])F([\theta ]) - B_m\left( {\mathcal {L}}([\theta ])F([\theta ])\right) +R_m([\theta ]) \right| . \nonumber \\ \end{aligned}$$
(84)

Since the functional \({\mathcal {L}}([\theta ])F([\theta ]) \) is continuous in \(\theta \), we can now use the uniform approximation Theorem 5.1 and claim that for any \(\epsilon >0\) there exists \(m_\epsilon \in {\mathbb {N}}\) such that

$$\begin{aligned} \left| {\mathcal {L}}([\theta ])F([\theta ]) - B_m\left( {\mathcal {L}}([\theta ])F([\theta ])\right) \right| \le \epsilon , \qquad \forall m\ge m_\epsilon ,\quad \forall \theta \in K, \end{aligned}$$
(85)

where K is a compact subset of H. A substitution of (85) into (84) yields

$$\begin{aligned} \left| {\mathcal {L}}([\theta ])F([\theta ]) - {\mathcal {L}}_m([\theta ])F_m([\theta ])\right| \le&\epsilon +\left| R_m([\theta ]) \right| , \qquad \theta \in K. \end{aligned}$$
(86)

Hence, if \(\left| R_m([\theta ]) \right| \rightarrow 0\) for all \(\theta \in K\) as \(m\rightarrow \infty \), then \({\mathcal {L}}([\theta ])F([\theta ]) \rightarrow {\mathcal {L}}_m([\theta ])F_m([\theta ])\) for all \(\theta \) in K. By Theorem 5.1 we also have that \(F_m\rightarrow F\) on K. Hence the sequence \(\{{\mathcal {L}}_m\}\) is a consistent approximation of \({\mathcal {L}}\). \(\square \)

Corollary 7.1

Under the same assumptions of Lemma 7.1, if, in addition, K is convex, \({\mathcal {L}}([\theta ])F([\theta ])\) is continuously Fréchet differentiable in K, and \(\left\| R_m([\theta ])\right\| ={\mathcal {O}}( \left\| \theta -P_m\theta \right\| _H)\), then \(\{{\mathcal {L}}_m\}\) is consistent with \({\mathcal {L}}\) to the same order as \(P_m\theta \) converges to \(\theta \) in H.

Proof

By using the mean value Theorem 5.1 and Eq. (84), we immediately conclude that

$$\begin{aligned} \left\| {\mathcal {L}}([\theta ])F([\theta ]) - {\mathcal {L}}_m([\theta ])F_m([\theta ])\right\| = {\mathcal {O}}(\left\| \theta -P_m\theta \right\| _H). \end{aligned}$$
(87)

Hence, \({\mathcal {L}}_m\) is consistent with \({\mathcal {L}}\) to the same order as \(P_m\theta \) converges to \(\theta \) in K. \(\square \)

Example 1 Let \(H=L_p^2([0,2\pi ])\) be the space of square integrable periodic functions in \([0,2\pi ]\), \(\{\varphi _k\}\) an orthonormal Fourier basis in H, and \(K\subseteq H\) the Sobolev sphere (28) (together with its closure in H). We have seen in Sect. 3 that K is a compact subset of H. We now show that the operator \({\mathcal {L}}_m([\theta ])\) defined in (82) is a consistent approximation of the operator (81), in the compact set K. For all m larger than some fixed \(m_0\in {\mathbb {N}}\), we have

$$\begin{aligned} \left| R_m([\theta ],t)\right|&=\left| \sum _{k=m+1}^\infty \left( \frac{\delta F([P_m\theta ],t)}{\delta \theta (x)},\varphi _k\right) _{H}\int _{0}^{2\pi } \frac{\partial \varphi _k}{\partial x} P_m\theta \mathrm{d}x \right| \nonumber \\&\le \sum _{k=m+1}^\infty \left| \left( \frac{\delta F([P_m\theta ],t)}{\delta \theta (x)},\varphi _k\right) _{H}\right| \left\| \varphi _k \right\| _{H} \left\| \frac{\partial (P_m \theta )}{\partial x}\right\| _{H}\nonumber \\&\le \gamma \sum _{k=m+1}^\infty \left| \left( \frac{\delta F([P_m\theta ],t)}{\delta \theta (x)},\varphi _k\right) _{H}\right| , \end{aligned}$$
(88)

where \(\gamma \) is a constant independent of m. To obtain the last inequality, we used the fact that \(\varphi _k\) is orthonormal in H (\(\left\| \varphi _k\right\| _{H}=1\)), and that \(\left\| \partial (P_m \theta )/\partial x\right\| \) converges to \(\left\| \partial \theta /\partial x\right\| \) in H (uniformly in \(\theta \in K\)). The proof of this statement is based on the following inequalities [46, p. 38]

$$\begin{aligned} \left| \left\| \frac{\partial \theta }{\partial x}\right\| _H - \left\| \frac{\partial (P_m \theta )}{\partial x}\right\| _H\right| \le \left\| \frac{\partial \theta }{\partial x}-\frac{\partial (P_m \theta )}{\partial x}\right\| _H \le \frac{C}{m^{s-1}} \left\| \frac{\mathrm{d}^s\theta }{\mathrm{d}x^s}\right\| _H\le \frac{\rho C}{m^{s-1}}. \end{aligned}$$
(89)

In the last inequality, we used the fact that \(\theta \) is in the Sobolev sphere (28). From (89) it follows that

$$\begin{aligned} \left\| \frac{\partial (P_m \theta )}{\partial x}\right\| _H\le \left\| \frac{\partial \theta }{\partial x}\right\| _H+\frac{\rho C}{m^{s-1}}\le \kappa \rho + \frac{\rho C}{m^{s-1}} \le \underbrace{\kappa \rho + \frac{\rho C}{m_0^{s-1}}}_{\gamma } \qquad \forall m\ge m_0, \end{aligned}$$
(90)

where we repeatedly applied the Poincaré inequality \(\left\| f\right\| _{H}\le g \left\| \partial f/\partial x\right\| _{H}\) to obtain the constant \(\kappa \). Equation (90) defines the constant \(\gamma \) appearing in (88). At this point we recall that the functional derivative \(\delta F([P_m\theta ],t)/\delta \theta (x)\) converges strongly in H to \(\delta F([\theta ],t)/\delta \theta (x)\) as m goes to infinity for all \(\theta \in K\) (Theorem 6.1). This implies that (88) goes to zero for all \(\theta \in K\) as m goes to infinity,Footnote 14 i.e.,

$$\begin{aligned} \max _{\theta \in K}\left| R_m([\theta ],t)\right| \rightarrow 0 \qquad \forall t\in [0,T]. \end{aligned}$$
(92)

The rate of convergence depends on the regularity of the first-order functional derivative \(\delta F([\theta ],t)/\delta \theta (x)\) as a function of x. In particular, if \(\delta F([\theta ],t)/\delta \theta (x)\) is infinitely differentiable in x, then (92) goes to zero exponentially fast with m [46, p. 36].

7.1 Cylindrical approximation of FDEs: stability and convergence

Let us now consider the m-dimensional linear PDE

$$\begin{aligned} \frac{\partial F_m([\theta ],t)}{\partial t}= {\mathcal {L}}_m([\theta ]) F_m([\theta ],t), \qquad F_m([\theta ],0)= B_m F_0([\theta ]), \end{aligned}$$
(93)

where \(B_m\) and \(F_0\) are defined in (79) and (77), respectively. If the conditions of Lemma 7.1 are satisfied, then we say that the PDE (93) is a consistent approximation of the FDE (77). Moreover, if \(\{{\mathcal {L}}_m\}\) in (93) is consistent with \({\mathcal {L}}\) to order p, then we say that the PDE (93) is consistent with the FDE (77) with order p.

A fundamental question at this point is whether the solution of (93) converges to the solution of the FDE (77) as we send m to infinity. The Trotter–Kato approximation theorem for abstract evolution equations in Banach spaces [33, p. 209] states that this is indeed the case, provided the initial value problem (93) is “stable” in the following sense.

Definition 7.2

(Stability) Suppose that the linear operator \({\mathcal {L}}_m\) in (93) generates a strongly continuous semigroup \(e^{t {\mathcal {L}}_m}\). We say that the FDE approximation (93) is stable if there are two constants M and \(\omega \) independent of m such that \(\left\| e^{t {\mathcal {L}}_m}\right\| \le Me^{\omega t}\).

We now have all elements to state a version of the Trotter–Kato theorem [41, p. 8] that holds for cylindrical approximations of functional differential equations.

Theorem 7.1

(Convergence of cylindrical approximations to FDEs) Suppose that the initial value problem (77) is well posed in the time interval [0, T] (T finite), and that \({\mathcal {L}}([\theta ])\in {\mathcal {C}}({\mathcal {F}})\) generates a strongly continuous semigroup in [0, T]. Then, the FDE approximation (93) is stable and consistent in a compact subset K of a real separable Hilbert space H if and only if it is convergent, i.e.,

$$\begin{aligned} \max _{t\in [0,T]}\max _{\theta \in K} \left| F_m([\theta ],t) - F([\theta ],t) \right| \rightarrow 0 \end{aligned}$$
(94)

as \(m\rightarrow \infty \), provided \(F_m([\theta ],0)\rightarrow F_0([\theta ])\).

The proof of this theorem can be found in [33, p. 210]. In summary, to prove that a cylindrical approximations to FDEs is convergent we can proceed as follows:

  1. (a)

    Construct the multivariate PDE (93) and show that such PDE is a consistent approximation to the FDE (77) (Lemma 7.1)

  2. (b)

    Study stability of (93). This is a PDE-specific result stating that it is possible to control some norm of the solution of (93) by a constant multiple of a suitable norm of the initial condition, and all the norms involved (including the constant) do not depend on m. The simplest stability results arise from energy inequalities, e.g., for PDEs with continuous and coercive linear operators \({\mathcal {L}}_m\).

  3. (c)

    Apply Theorem 7.1 to claim that if a) and b) are satisfied, then the solution of the PDE (93) converges uniformly to the solution of the FDE (77) as the number of independent variables m goes to infinity.

Example 2 Consider the initial value problem

$$\begin{aligned} \frac{\partial F([\theta ],t)}{\partial t}= \int _{0}^{2\pi } \theta (x) \frac{\partial }{\partial x} \frac{\delta F([\theta ])}{\delta \theta (x)}\mathrm{d}x, \qquad F([\theta ],0)=F_0([\theta ]). \end{aligned}$$
(95)

The FDE (95) is the Hopf equation corresponding to the linear PDE

$$\begin{aligned} \frac{\partial u}{\partial t}= \frac{\partial u}{\partial x}, \qquad u(x,0)=u_0(x), \end{aligned}$$
(96)

where \(u_0(x)\) is random and periodic in \([0,2\pi ]\). To show this, let

$$\begin{aligned} F([\theta ],t)= {\mathbb {E}}\left\{ \exp \left( i\int _{0}^{2\pi }u(x,t)\theta (x)\mathrm{d}x\right) \right\} \end{aligned}$$
(97)

be the Hopf functional associated with the solution to (96). The expectation operator \({\mathbb {E}}\left\{ \cdot \right\} \) in (97) is an integral over the probability measure of \(u_0(x)\). Differentiation of (97) with respect to time yields

$$\begin{aligned} \frac{\partial F([\theta ],t)}{\partial t}&= i{\mathbb {E}}\left\{ \exp \left( i\int _{0}^{2\pi } u(x,t)\theta (x)\mathrm{d}x\right) i\int _{0}^{2\pi } \frac{\partial u(x,t)}{\partial t}\theta (x)\mathrm{d}x\right\} \nonumber \\&= i{\mathbb {E}}\left\{ \exp \left( i\int _{0}^{2\pi } u(x,t)\theta (x)\mathrm{d}x\right) i\int _{0}^{2\pi } \frac{\partial u(x,t)}{\partial x}\theta (x)\mathrm{d}x\right\} \nonumber \\&= \int _{0}^{2\pi } \frac{\partial }{\partial x} \left( \frac{\delta F([\theta ],t)}{\delta \theta (x)}\right) \theta (x)\mathrm{d}x. \end{aligned}$$
(98)

We assume that \(\theta \) is in the compact set \(K\subseteq L^2_p([0,2\pi ])\) defined in (28). This is domain in which we solve the FDE (95). Let \(\{\varphi _1,\varphi _2,\ldots \}\) be an orthonormal basis of \(L^2_p([0,2\pi ])\). By equations (82) and (92), we have that the m-dimensional PDE

$$\begin{aligned} \frac{\partial f}{\partial t}=\sum _{k,j=1}^m \frac{\partial f}{\partial a_k}\int _{0}^{2\pi } \frac{\partial \varphi _k}{\partial x}\varphi _j\,\mathrm{d}x, \qquad f(a_1,\ldots ,a_m,0) = F_0([P_m \theta ]), \end{aligned}$$
(99)

where \(a_k=(\theta ,\varphi _k)_{L^2_p([0,2\pi ])}\), is a consistent cylindrical approximation to the FDE (95). Next, we show that such approximation is stable in the sense of Definition 7.2. By using the method of characteristics [75], it is straightforward to show that the solution of (99) can be bounded as

$$\begin{aligned} |f(a_1,\ldots , a_m,t)| \le \left\| f_0\right\| _{L^\infty ({\mathbb {R}}^m)}, \end{aligned}$$
(100)

where \(f_0 = f(a_1,\ldots , a_m,0)\). Hence, if the \(L^\infty \) norm of \(f_0\) is bounded by a constant \(\kappa \) that is independent of m, then (99) is stable in the \(L^\infty ({\mathbb {R}}^m)\) norm.Footnote 15 Such strong bound implies that the solution (99) is also bounded in the \(L^2_\mu \) norm, where \(\mu \) is the measure defined in (A.2). In fact, we have

$$\begin{aligned} \left\| f\right\| _{L^2_\mu }\le \left\| f_0\right\| _{L^\infty }\le \kappa ,\qquad \forall m\in {\mathbb {N}}. \end{aligned}$$
(102)

Note that this also implies that the functional integral defined in (A.1)–(A.3) converges, as it is bounded by the same constant \(\kappa \) independently of m. By using Theorem 7.1, we conclude that the solution of the PDE (99) converges uniformly to the solution of the FDE (95) in K, as the number of independent variables m goes to infinity.

We now have the main tools to study convergence of cylindrical approximations to FDEs. The main result is Theorem 7.1 which is based on the Trotter–Kato approximation theorem for abstract evolution equations in Banach spaces [53, 89]. The theorem states that stable consistent approximations of FDEs are convergent, but it does not provide an estimate on the rate of convergence of the approximation, i.e., how fast \(F_m\) converges to F. Estimates of such rate of convergence are available in rather general cases (e.g., [15]), but a thorough analysis for cylindrical approximations of FDEs is lacking. Nevertheless, in Sect. 9.3 we will show that the convergence rate of the cylindrical approximation to a prototype FDE can be exponential.

8 Approximation of nonlinear functionals and FDEs in real Banach spaces admitting a basis

In this section, we outline the extension of the functional approximation theory we developed in real separable Hilbert spaces to nonlinear functionals and FDEs defined on compact subsets of real Banach spaces admitting a basis.Footnote 16 Well-known examples of such Banach spaces are:

  • \(C^{(0)}([0,1])\) (space of continuous functions in [0, 1]) [64, §5.2];

  • \(L^p(\Omega )\) for \(1<p<\infty \) (Lebesgue space defined on a compact domain \(\Omega \subseteq {\mathbb {R}}^n\)) [8, Theorem 2.1];

  • \(W^{k,p}(\Omega )\) for \(1<p<\infty \) (Sobolev space defined on a compact domain \(\Omega \subseteq {\mathbb {R}}^n\) with smooth or Lipschitz boundary [21, 60]).

Before we present the main results, let us briefly recall the definition and the basic properties of Schauder bases in Banach spaces.

Definition 8.1

(Schauder basis) A Schauder basis of a Banach space X is sequence of linearly independent elements \(\varphi _k\in X\) such that every \(\theta \in X\) can be uniquely represented as

$$\begin{aligned} \theta =\sum _{k=1}^\infty a_k([\theta ]) \varphi _k, \end{aligned}$$
(103)

where \(a_k: X \rightarrow {\mathbb {R}}\) is a sequence of bounded linear functionalsFootnote 17 uniquely determined by the basis \(\{\varphi _k\}\).

As is well known, every basis in a Banach space is a Schauder basis (see, e.g., [86, p. 20] or [64, Proposition 5.3]). Hence, hereafter we will drop the adjective “Schauder” when referring to a basis in Banach space.

8.1 Compact subsets of real separable Banach spaces

Just like in the case of functional approximation in real separable Hilbert spaces, all approximation results we present hereafter hold in compact subsets of Banach spaces with a basis. Characterizing such compact subsets, is not as straightforward as in the case of Hilbert spaces (see the introduction of Sect. 3 and Theorem 3.1). Nevertheless, compactness results are available in rather general cases. For instance, the Arzelà–Ascoli theorem [78] provides necessary and sufficient conditions for a set \(K\subseteq C^{(0)}(\Omega )\) (\(\Omega \) compact subset of \({\mathbb {R}}^n\)) to be pre-compact. Specifically, the theorem states that K is pre-compact in the topology induced by the uniform norm if and only if K is equicontinuous and pointwise bounded. By using the Arzelà–Ascoli theorem, it is straightforward to prove, e.g., that the set of Lipschitz-continuous (with the same Lipschitz constant) probability density functions on \(\Omega \) is pre-compact in \(C^{(0)}(\Omega )\). A similar compactness result, known as Kolmogorov–Riesz theorem [42, 43], can be obtained in \(L^p({\mathbb {R}}^n)\) and \(W^{1,p}({\mathbb {R}}^n)\) (\(1\le p<\infty \)). Such theorem can be stated as follows.

Theorem 8.1

(Compact subsets of \(L^p\) [43]) A subset K of \(L^p({\mathbb {R}}^n)\) (\(1\le p<\infty \)) is pre-compact if and only if

$$\begin{aligned} \lim _{|h|\rightarrow 0} \left\| f(x+h)-f(x)\right\| _{L^{p}({\mathbb {R}}^n)}=0\qquad \text {and}\qquad \lim _{r\rightarrow \infty } \int _{|x|\ge r} |f(x)|^p\,\mathrm{d}x = 0, \end{aligned}$$
(104)

for all \(f\in K\).

The two conditions in (104) are known as equicontinuity and equitight conditions. Theorem 8.1 also holds in \(L^p(\Omega )\), where \(\Omega \) is a compact subset of \({\mathbb {R}}^n\). More generally, one can use well-known compact embedding results such as the Rellich–Kondrachov theorem [1, §6]. Such theorem states that the Sobolev space \(W^{k,p}(\Omega )\) defined on a compact domain \(\Omega \subseteq {\mathbb {R}}^n\) with differentiable boundary is compactly embedded in \(W^{l,q}(\Omega )\), provided \(k>l\) and \(k-p/n>l-q/n\). This means, for example, that a closed sphere in \(W^{k,p}(\Omega )\) is pre-compact in \(L^q(\Omega )\) if \(k>(p-q)/n\).

8.2 Approximation results for nonlinear functionals, functional derivatives and FDEs

Let X be a Banach with basis \(\{\varphi _k\}\) and let \(D_m=\text {span}\{\varphi _1,\ldots ,\varphi _m\}\). Define the linear projection operator \(P_m: X\rightarrow D_m\)

$$\begin{aligned} P_m \theta = \sum _{k=1}^m a_k([\theta ]) \varphi _k. \end{aligned}$$
(105)

It is straightforward to show that \(P_m\) is bounded and that \(P_m \theta \) converges uniformly to \(\theta \) in every compact subset of X as m goes to infinity. In fact, we have the following

Lemma 8.1

Let X be a Banach space with basis \(\{\varphi _k\}\), and let K be compact subset of X. Then, for each \(\epsilon >0\) there exists \(m_\epsilon \in {\mathbb {N}}\) such that

$$\begin{aligned} \left\| \theta - P_m \theta \right\| _X< \epsilon , \qquad \forall m\ge m_\epsilon \qquad \forall \theta \in K. \end{aligned}$$
(106)

The uniform convergence result (106) is known as “approximation property” in Banach space theory [61] (see [51, p. 638] for a proof). Hence, Lemma 8.1 shows that every Banach space with a basis has the approximation property. We remark that in a real separable Hilbert space the uniform approximation property follows immediately from the monotonicity of the sequence \(f_m([\theta ])=\left\| \theta - P_m\theta \right\| _H\) (Parseval’s identity implies \(f_{m+1}([\theta ])\le f_{m}([\theta ])\)), and Dini’s theorem.

Lemma 8.2

(Uniform convergence of functional approximations) Let X be a real Banach space with a basis, K a compact subset of H, and \(P_m\) the projection operator (105). If F is a continuous functional on X, then the sequence \(F([P_m\theta ])\) converges uniformly to \(F([\theta ])\) on K, i.e., for all \(\epsilon >0\) there exists \(m_\epsilon \in {\mathbb {N}}\) such that

$$\begin{aligned} \left| F([\theta ])-F([P_m\theta ])\right| \le \epsilon \qquad \forall m\ge m_\epsilon ,\qquad \forall \theta \in K. \end{aligned}$$
(107)

The proof of this Lemma is essentially the same as the proof of Lemma 5.1 in [72]. We only need to replace the first equation at page 380 in [72] with (106). As before, we will refer to \(F([P_m \theta ])\) as “cylindrical approximation”Footnote 18 of \(F([\theta ])\).

Most of the approximation results we obtained for nonlinear functionals, Fréchet derivatives, functional derivatives and FDEs in compact subsets of in real separable Hilbert spaces hold also in compact subsets Banach spaces admitting a basis. Hereafter, we list the most important ones. The proofs are the same as in the case of real separable Hilbert spaces and therefore omitted.

Lemma 8.3

(Compactness of first-order Fréchet derivatives) Let K be a compact subset of a real Banach space X admitting a basis, and let \(F([\theta ])\) be a continuous real- or complex-valued functional on X. If the Fréchet derivative \(F'([\theta ^*])\) exists at \(\theta ^*\in K\), then it is a compact linear operator.

Lemma 8.4

(Convergence rate of functional approximations) Let F be a real-valued, continuously differentiable functional on a compact and convex subset K of a real Banach space X admitting a basis. Then, for all \(\theta \in K\) and for any finite-dimensional projection \(P_m\) of the form (105), we have

$$\begin{aligned} \left| F([\theta ])-F([P_m\theta ])\right| \le \sup _{\eta \in K}\left\| F'([\eta ])\right\| \left\| \theta -P_m\theta \right\| _X. \end{aligned}$$
(109)

In particular, \(F([P_m\theta ])\) converges uniformly to \(F([\theta ])\) in K at the same rate as \(P_m \theta \) converges to \(\theta \) in X.

Convergence rate estimates for \(\left\| \theta -P_m\theta \right\| _X\) are available, e.g., in [21, 27, 60].

Theorem 8.2

(Uniform approximation of first-order Fréchet derivatives) Let X be a real Banach space admitting a basis, K a compact subset of X, \(\theta \in K\) and \(P_m\) the projection operator (105). If F is continuously differentiable on K with Fréchet derivative \(F'([\theta ])\), then the sequence of operators \(F'([P_m \theta ])\) converges uniformly to \(F'([\theta ])\). In other words, for all \(\epsilon >0\) there exists \(m_\epsilon \in {\mathbb {N}}\) such that

$$\begin{aligned} \left\| F'([\theta ]) - F'([P_m\theta ]) \right\| = \sup _{\begin{array}{c} \eta \in H\\ \eta \ne 0 \end{array}} \frac{\left| F'([\theta ])\eta -F'([P_m\theta ])\eta \right| }{\left\| \eta \right\| _X}<\epsilon , \end{aligned}$$
(110)

for all \(m\ge m_\epsilon \), and for all \(\theta \in K\).

Lemma 8.5

(Convergence rate of first-order Fréchet derivatives) Let X be a real Banach admitting a basis, and let \(F([\theta ])\) be a nonlinear functional with continuous first- and second-order Fréchet derivatives. Then, for all \(\theta \) in a convex compact subset K of X and for any projection \(P_m\) of the form (105), we have

$$\begin{aligned} \left\| F'([\theta ])-F'([P_m\theta ])\right\| \le {\sup _{\zeta \in K}}\left\| F'' ([\zeta ])\right\| \left\| \theta -P_m\theta \right\| _X. \end{aligned}$$
(111)

In particular, \(F'([P_m\theta ])\) converges uniformly to \(F'([\theta ])\) in K at the same rate as \(P_m \theta \) converges to \(\theta \).

Regarding the extension of the approximation result for the first-order functional derivative, i.e., Lemma 6.1, we can leverage various generalizations of the Riesz representation theorem (16) to specific Banach spaces. Hereafter, we consider the \(L^p\) generalization. More general versions may involve measure theory, e.g., in the case of spaces of continuous functions defined on compact subsets of \({\mathbb {R}}^n\) (see Eq. (17)).

Lemma 8.6

(Uniform approximation of first-order functional derivatives) Let \(\Omega \) be a compact subset of \({\mathbb {R}}^n\), K a compact subset of \(L^p(\Omega )\) (\(1< p<\infty \)), and \(P_m\) the projection operator (105). If F is continuously differentiable on K with Fréchet derivative \(F'([\theta ])\), then the sequence \(\delta F([P_m\theta ])/\delta \theta (x)\) converges uniformly to \(\delta F([\theta ])/\delta \theta (x)\) in \(L^q(\Omega )\), where \(1/p +1/q=1\). In other words, for all \(\epsilon >0\) there exists \(m_\epsilon \in {\mathbb {N}}\) such that for all \(m\ge m_\epsilon \)

$$\begin{aligned} \left\| \frac{\delta F([\theta ])}{\delta \theta (x)}-\frac{\delta F([P_m\theta ])}{\delta \theta (x)}\right\| _{L^q(\Omega )} <\epsilon \qquad \forall \theta \in K. \end{aligned}$$
(112)

Proof

By Lemma 8.3 the Fréchet derivative \(F'(\theta )\) is a compact linear operator in \(L^p(\Omega )\) for each \(\theta \in K\). Hence, \(F'([\theta ])\eta \) is a bounded linear functional in \(L^{p}(\Omega )\) for each \(\theta \in K\). By using the Riesz representation theorem, we conclude that there exists a unique function \(\delta F([\theta ])/\delta \theta (x)\in L^q(\Omega )\) with \(1/q =1- 1/p\) such that

$$\begin{aligned} F'([\theta ])\eta =\int _{\Omega } \frac{\delta F([\theta ])}{\delta \theta (x)}\eta (x)\mathrm{d}x. \end{aligned}$$
(113)

By applying Theorem 8.2, we obtain that

$$\begin{aligned} \!\!\!\!\!\!\left| \int _{\Omega } \left( \frac{\delta F([\theta ])}{\delta \theta (x)}-\frac{\delta F([P_m\theta ])}{\delta \theta (x)}\right) \eta (x)\mathrm{d}x\right| <\epsilon \left\| \eta \right\| _{L^p(\Omega )}\qquad \forall \eta \in L^p(\Omega ){\setminus }\{0\}, \qquad \forall \theta \in K, \nonumber \\ \end{aligned}$$
(114)

As is well known, the norm of the linear functional \(\left( F([\theta ])- F([P_m \theta ])\right) \eta \) (linear functional of \(\eta \in L^p(\Omega )\)) is

$$\begin{aligned} M([\theta ])=\left\| \frac{\delta F([\theta ])}{\delta \theta (x)}-\frac{\delta F([P_m\theta ])}{\delta \theta (x)}\right\| _{L^q(\Omega )}. \end{aligned}$$
(115)

By definition, \(M([\theta ]\)) is the smallest number such that

$$\begin{aligned} \left| \left( F([\theta ])-F([P_m \theta ])\right) \eta \right| \le M([\theta ])\left\| \eta \right\| _{L^p(\Omega )}. \end{aligned}$$
(116)

This fact, together with (114) allow us to conclude that \(M([\theta ])<\epsilon \) for all \(\theta \in K\). This proves the theorem. \(\square \)

A few comments on Lemma 8.6 are necessary at this point. First, compact subsets of \(L^p(\Omega )\) are identified by the equicontinuity and the equitight conditions in Theorem 8.1. Second, we excluded the case \(p=1\) as the Banach space \(L^1(\Omega )\) does not admit a basis. Regarding approximation of FDEs in real Banach spaces with a basis, we have the following results.

Lemma 8.7

(Consistency of cylindrical approximations to FDEs) Let X be a real Banach space with a basis. Consider a functional \(F\in {\mathcal {F}}(X)\) and a densely defined closed linear operator \({\mathcal {L}}\in {\mathcal {C}}({\mathcal {F}})\). If \({\mathcal {L}}([\theta ])F([\theta ])\) is continuous in \(\theta \), then the sequence of operators \(\{{\mathcal {L}}_m\}\) defined in (80) is consistent with \({\mathcal {L}}\) on every compact subset K of X, provided \( \left\| R_m([\theta ])\right\| \rightarrow 0\) as \(m\rightarrow \infty \) for all \(\theta \in K\).

Theorem 8.3

(Convergence of cylindrical approximations to FDEs) Suppose that the initial value problem (77) is well posed in the time interval [0, T] (T finite), and that \({\mathcal {L}}([\theta ])\in {\mathcal {C}}({\mathcal {F}})\) generates a strongly continuous semigroup in [0, T]. Then, the FDE approximation (93) is stable and consistent (in the sense of Definitions 7.1 and 7.2) in a compact subset K of a real Banach space X admitting a basis if and only if it is convergent, i.e.,

$$\begin{aligned} \max _{t\in [0,T]}\max _{\theta \in K} \left| F_m([\theta ],t) - F([\theta ],t) \right| \rightarrow 0 \end{aligned}$$
(117)

as \(m\rightarrow \infty \), provided \(F_m([\theta ],0)\rightarrow F_0([\theta ])\).

The proof of this theorem can be found in [33, p. 210]. We emphasize that the sequence of steps to prove convergence of functional approximations to FDEs in Banach spaces admitting a basis is the same (a)–(c) listed after Theorem 7.1.

9 Numerical examples

In this section, we provide numerical demonstrations of the approximation theorems we developed for nonlinear functionals and functional differential equations. To this end, we consider the function space defined by the following closure of a Sobolev sphere with radius \(\rho \):

$$\begin{aligned} K=\overline{\left\{ \theta \in H_p^{s}([0,2\pi ]): \left\| \theta \right\| _{H_p^s} \le \rho \right\} }\subseteq L_p^2([0,2\pi ]). \end{aligned}$$
(118)

We have seen in Sect. 3 that K is a convex compact subset of \(L_p^2([0,2\pi ])\). Hence, any real-valued continuous functional \(F([\theta ])\) defined on K can be represented as the limit of a uniformly convergent sequence of functionals of the form \(F([P_m \theta ])\), where \(P_m\) is the projection operator (43). We can sample elements from (118) by taking a truncated Fourier series of the form

$$\begin{aligned} \theta (x)=\sum _{k=-N}^N c_ke^{ikx},\qquad c_k=c_{-k}^*, \end{aligned}$$
(119)

and then choosing the modulus of the complex numbers \(\{c_0,\ldots ,c_N\}\) within an ellipsoid in \({\mathbb {R}}^{N+1}\). In fact, we have

$$\begin{aligned} \left\| \theta \right\| ^2_{H_p^s}&= \sum _{k=-N}^N (1+k^2+k^4+\cdots k^{2s}) \left| c_k \right| ^2 \nonumber \\&= c_0^2 + 2\sum _{k=1}^N (1+k^2+k^4+\cdots k^{2s}) \left| c_k\right| ^2. \end{aligned}$$
(120)

Hence, the condition \(\left\| \theta \right\| ^2_{H_p^s}\le \rho ^2\) implies that

$$\begin{aligned} c_0^2 + 2\sum _{k=1}^N (1+k^2+k^4+\cdots k^{2s}) \left| c_k\right| ^2\le \rho ^2, \end{aligned}$$
(121)

which defines the closure of an ellipsoid in the variables \(\{|c_0|,\ldots ,|c_N|\}\).

9.1 Generation of test functions with prescribed Fourier spectrum

The decay rate of the modulus of the Fourier coefficients \(\left| c_k\right| \) in the series expansion (119) is related to the degree of smoothness of \(\theta \), i.e., the value of s in (118) (see [46, §2]). Hence, by sampling \(\theta \) from a space of periodic functions with a prescribed spectral decay \(\left| c_k\right| \) we can study the effects of the regularity parameter s in (118) on the rate of convergence of the nonlinear functional approximations we developed in Sects. 4, 5 and 6. To sample test functions from (118), we represent \(c_k\) in (119) in polar form, prescribe the decay of the amplitudes \(|c_k|\) (\(k\ge 0\)) and introduce a uniformly distributed random shift \(\vartheta _k\in [0,2\pi ]\) subject to the constraint \(\vartheta _k=-\vartheta _{-k}\). This yields

$$\begin{aligned} \theta (x) = c_0 + \sum _{\begin{array}{c} k=-N\\ k\ne 0 \end{array}}^N |c_k| e^{i(kx+\vartheta _k)}, \qquad c_k=c_{-k}^*. \end{aligned}$$
(122)

We study two types of decay rates of the Fourier spectrum. The first is a power-law decay of the form

$$\begin{aligned} c_0=a(0), \qquad |c_k| = \frac{a(k)}{k^\alpha } \qquad \text {(algebraic decay)}, \end{aligned}$$
(123)

where \(\alpha \ge 1\) and \(k=1,\ldots , N\). In equation (123) a(0) is a uniformly distributed random variable in \([-10,10]\) and \(\{a(1),\ldots , a(N)\}\) is a sequence of i.i.d. uniformly distributed random variables in [0, 10]. The algebraic decay (123) defines functions in a Sobolev sphere (118) with index \(s=\alpha \). The radius of such sphere can be computed by substituting (123) into (120), and then evaluating the supremum. The second power spectrum we consider has an exponential decay of the form

$$\begin{aligned} c_0=b(0), \qquad |c_k| = \frac{b(k)}{\beta ^k} \qquad \text {(exponential decay)}, \end{aligned}$$
(124)

where \(\beta > 1\) and \(k=1,\ldots , N\). The random sequence \(\{b(0),b(1),\ldots , b(N)\}\) has the same properties as the sequence \(\{a(0),a(1),\ldots , a(N)\}\) in (123). The spectrum (124) defines functions in a Sobolev sphere (118) with index \(s\rightarrow \infty \).

In Fig. 1 we plot one sample of the random spectra (123) and (124), together with the corresponding sample functions (122) for \(N=500\), \(\alpha \in \{1.5, 2, 2.5, 3\}\) and \(\beta \in \{1.2, 1.5, 2, 3\}\). In the numerical examples presented, hereafter we choose N large enough so that the contribution of the tail of the spectrum is negligible in the series expansion (122). This allows us to generate highly accurate approximations of \(\theta \) in the space (118), which will then be projected onto a lower-dimensional subspace generated by a second trigonometric basis.

Fig. 1
figure 1

Random spectra (123) and (124) and corresponding sample functions (122) for \(N=500\)

Specifically, we chose the following orthonormal basis consisting of discrete trigonometric polynomials [46, p. 29]

$$\begin{aligned} \displaystyle \varphi _k(x)= & {} \displaystyle \frac{1}{\sqrt{2\pi (m+1)}} \frac{\displaystyle \sin \left( (m+1)\frac{x-x_{k}}{2}\right) }{\displaystyle \sin \left( \frac{x-x_{k}}{2}\right) },\nonumber \\ x_k= & {} \frac{2\pi }{m+1}k,\qquad k=0,\ldots , m\quad \mathrm{(m \,\, even)}, \end{aligned}$$
(125)

which yields the projection operator

$$\begin{aligned} P_{m}\theta = \sum _{k=0}^{m} a_k \varphi _k(x), \qquad a_k = (\theta , \varphi _k)_{L_p^2([0,2\pi ])}, \qquad k = 0, 1, \ldots , m. \end{aligned}$$
(126)

As is well known, if the first \((s-1)\) derivatives of \(\theta \) are all continuous, and if the s-th derivative is in \(L_p^2([0,2\pi ])\), then the \(L_p^2([0,2\pi ])\) distance between \(\theta \) and \(P_m\theta \) as defined in (126) decays as \(1/m^s\). On the other hand, if \(\theta \) is of class \(C^{\infty }\), then \(P_m\theta \) converges to \(\theta \) exponentially fast in \(L_p^2([0,2\pi ])\) (see [46, §2.3]).

9.2 Approximation of nonlinear functionals

Consider the nonlinear functional

$$\begin{aligned} F([\theta ])=\int _0^{2\pi } \sin (x)\sin (\theta (x))^2 \mathrm{d}x. \end{aligned}$$
(127)

The Fréchet differential of \(F([\theta ])\) is given by

$$\begin{aligned} F'([\theta ])\eta = \int _0^{2\pi } \sin (x)\sin (2\theta (x))\eta (x) \mathrm{d}x, \end{aligned}$$
(128)

which is a linear operator in \(\eta \). We have shown in Sect. 3 that \(F'([\theta ])\) is compact in the function space (118), and therefore, it is bounded and continuous.Footnote 19 In fact, for all \(\theta \in K\) and \(\eta \in L^2_p([0,2\pi ])\) it follows from (128) that

$$\begin{aligned} \left| F'([\theta ]) \eta \right|\le & {} \left\| \sin (x)\sin (2\theta )\right\| _{L_p^2([0,2\pi ])} \left\| \eta \right\| _{L_p^2([0,2\pi ])}\nonumber \\&\Rightarrow \quad \left\| F'([\theta ])\right\| \le \sqrt{\pi }. \end{aligned}$$
(129)

Plugging this result into the mean value Theorem 5.1 yields the spectral convergence result

$$\begin{aligned} \left| F([\theta ])-F([P_m\theta ])\right|\le & {} \sqrt{\pi } \left\| \theta -P_m\theta \right\| _{L_p^2([0,2\pi ])}\le \frac{\sqrt{\pi } C}{m^s}\left\| \theta \right\| _{H_p^s([0,2\pi ])} \nonumber \\\le & {} \frac{\sqrt{\pi } C\rho }{m^s} \qquad \forall \theta \in K. \end{aligned}$$
(130)

The last two inequalities follow from well-known Fourier series approximation theory [46, p. 42], and from the fact that \(\theta \) is in the closure of a Sobolev sphere with radius bounded by \(\rho \).

Next, we determine the convergence rate of the first-order Fréchet and functional derivative approximations. To this end, we notice that the second-order Fréchet derivative of (128), i.e.,

$$\begin{aligned} F''([\theta ])\eta \psi = \int _{0}^{2\pi } 2\sin (x)\cos (2\theta (x)) \psi (x)\eta (x) \mathrm{d}x \end{aligned}$$
(131)

is a continuous bilinear operator on the compact set \(K \times K\). Therefore, by equation (74), the first-order Fréchet derivative must converge at the same rate as (130). The first-order functional derivative of F, i.e., the kernel of the integral operator (128) is

$$\begin{aligned} \frac{\delta F([\theta ])}{\delta \theta (x)}= \sin (x)\sin (2\theta (x)). \end{aligned}$$
(132)

As easily seen, if we evaluate (132) at \(P_m\theta \) we obtain the approximated functional derivative

$$\begin{aligned} \frac{\delta F([P_m\theta ])}{\delta \theta (x)}= \sin (x)\sin \left( 2 \sum _{k=1}^m a_k\varphi _k(x)\right) ,\qquad a_k=(\theta ,\varphi _k)_{L_p^2([0,2\pi ])}, \end{aligned}$$
(133)

which is an element of \(L_p^2([0,2\pi ])\) that converges to (132) uniformly in \(\theta \in K\) in the \(L_p^2([0,2\pi ])\) norm. This is because bounded and continuous functions such as \(\sin (x)\) preserve \(L^2\) convergence under composition (see [7, Theorem 7]). This result is also in agreement with Lemma 6.1. Hereafter, we provide a numerical verification of the convergence rate we just predicted. To this end, in Fig. 2 we plot

$$\begin{aligned} \epsilon _0(m) = \sup _{\theta \in K} |F([\theta ])-F([P_m\theta ])| \end{aligned}$$
(134)

versus m. The error \(\epsilon _0(m)\) is computed numerically for each m by taking the maximum over \(10^3\) sample functions of the form (122), with \(N=1000\), and different spectra of the form (123) and (124) (see Fig. 1).

Fig. 2
figure 2

Functional approximation error (134) versus the number of Fourier modes in (126) for functions \(\theta \) with spectra (123) (power law decay) and (124) (exponential decay). The functional (127) is continuously Fréchet differentiable. Therefore, by Lemma 5.2 we have that \(F([P_m\theta ])\) converges to \(F([\theta ])\) at the same rate as \(P_m\theta \) converges to \(\theta \)

The error in the Fréchet derivative is defined as

$$\begin{aligned} \epsilon _1(m) = \sup _{\begin{array}{c} \eta ,\theta \in K\\ \eta \ne 0 \end{array}} \frac{\left| F'([\theta ])\eta - F'([P_m \theta ])\eta \right| }{\left\| \eta \right\| _{L_p^2([0,2\pi ])}}, \end{aligned}$$
(135)

and is computed as follows: for each given \(\theta \), we determine \(P_m\theta \) and then approximate the supremum over \(\eta \) using \(10^3\) sample functions \(\eta \). This is done for \(10^3\) functions \(\theta \) sampled from K as before. Notice that \(\eta \in K\) has the same form as \(\theta \), and therefore, it is taken from the same ensemble as \(\theta \) is taken from. The results of our calculations are shown in Fig. 3. As expected, the approximated functional derivative \(F'([P_m\theta ])\) converges to \(F'([\theta ])\) at the same rate as \(P_m\theta \) converges to \(\theta \) in \(L^2_p([0,2\pi ])\). The reason is that the Fréchet derivative (128) is continuously Fréchet differentiable,Footnote 20 and therefore the mean value formula (74) holds.

Fig. 3
figure 3

Fréchet derivative approximation error (135) versus the number of Fourier modes in (126), and for test functions \(\theta \) with spectra (123) (power law decay) and (124) (exponential decay). Note that, as expected, the approximated Fréchet derivative \(F'([P_m \theta ])\) converges to \(F'([\theta ])\) at the same rate as \(P_m\theta \) converges to \(\theta \). The reason is that the functional \(F([\theta ])\) is continuously Fréchet differentiable to any desired order. Hence, the mean value formula (74) holds. This is also the reason why the convergence plots are nearly identical to those in Fig. 2 (compare (74) with (59))

9.3 Approximation of functional differential equations

In this section, we provide a simple example of convergence analysis that shows how fast the solution of the multivariate PDE (99) converges to the solution of the FDE (95) as we send m to infinity. To this end, we first examine the analytical solution of the FDE (95).

9.3.1 Analytical solution

It was shown in [93, p. 76] that the analytical solution of the FDE (99) in the function space (118) is

$$\begin{aligned} F([\theta ],t)= F_{0}([\theta (x-t)]). \end{aligned}$$
(136)

Clearly, if \(F_0\) is invariant under translations, i.e., if \(F_0([\theta (x-t)])=F_0([\theta (x)])\), then \(F([\theta ],t)= F_{0}([\theta ])\), i.e., the solution is constantly equal to the initial condition \(F_0([\theta ])\) at each time. Examples of such translation-invariant functionals are

$$\begin{aligned} {\widehat{F}}_{0}([\theta ]) = \exp \left[ -\int _{0}^{2\pi }\theta (x)^2\mathrm{d}x\right] , \quad \text {and}\quad {\widetilde{F}}_{0}([\theta ]) = \int _{0}^{2\pi } \sin \left( \theta (x)\right) ^2\mathrm{d}x. \end{aligned}$$
(137)

On the other hand, the initial condition

$$\begin{aligned} F_{0}([\theta ]) = \int _{0}^{2\pi }\sin (x) \sin \left( \theta (x)\right) ^2\mathrm{d}x \end{aligned}$$
(138)

is not translation-invariant. The solution to the initial value problem (95), with \(F_0\) given in (138), is

$$\begin{aligned} F([\theta ],t) = \int _{0}^{2\pi }\sin (x) \sin \left( \theta (x-t)\right) ^2\mathrm{d}x, \end{aligned}$$
(139)

which is periodic in t with period \(2\pi \). It is easy to verify by direct calculation that (139) is indeed a solution to (95). To this end, let us define \(\partial _x=\partial /\partial x\). We begin by noting that

$$\begin{aligned} \theta (x-t)=e^{-t\partial _x}\theta (x). \end{aligned}$$
(140)

The first-order functional derivative of (139) is obtained by analyzing its Fréchet differential

$$\begin{aligned} \mathrm{d}F_\eta ([\theta ],t)&=\int _{0}^{2\pi } 2\sin (x)\sin (e^{-t\partial _x}\theta ) \cos (e^{-t\partial _x}\theta ) e^{-t\partial _x}\eta \,\mathrm{d}x \nonumber \\&= \int _{0}^{2\pi } e^{t\partial _x} \left[ \sin (x)\sin (2e^{-t\partial _x}\theta ) \right] \eta \,\mathrm{d}x. \end{aligned}$$
(141)

Here, we utilized the fact that the operator adjoint of the semigroup \(e^{-t\partial _x}\) relative to standard \(L_p^2([0,2\pi ])\) inner product is \(e^{t\partial _x}\). Hence, the first-order functional derivative of (139) is

$$\begin{aligned} \frac{\delta F([\theta ],t)}{\delta \theta (x)} = e^{t\partial _x}\left[ \sin (x)\sin (2e^{-t\partial _x}\theta )\right] . \end{aligned}$$
(142)

Using again the fact that \(\partial _x\) is skew-symmetric relative to the \(L_p^2([0,2\pi ])\) inner product, we obtain

$$\begin{aligned} \int _{0}^{2\pi } \frac{\partial }{\partial x} \left( \frac{\delta F([\theta ],t)}{\delta \theta (x)}\right) \theta (x)\mathrm{d}x&= \int _{0}^{2\pi } \partial _x e^{t\partial _x} \left[ \sin (x)\sin (2e^{-t\partial _x}\theta )\right] \theta \,\mathrm{d}x \nonumber \\&= -\int _{0}^{2\pi } \sin (x)\sin (2e^{-t\partial _x}\theta )e^{-t\partial _x} \partial _x\theta \,\mathrm{d}x \nonumber \\&= -\int _{0}^{2\pi } \sin (x)\sin (2\theta (x-t)) \partial _x\theta (x-t)\mathrm{d}x. \end{aligned}$$
(143)

On the other hand, a temporal differentiation of (139) yields

$$\begin{aligned} \frac{\partial F([\theta ],t)}{\partial t} = \int _{0}^{2\pi } \sin (x)\sin (2\theta (x-t))\partial _t \theta (x-t) \mathrm{d}x. \end{aligned}$$
(144)

By setting the equality between (143) and (144), we conclude that (139) is a solution to (95) if and only if

$$\begin{aligned} \frac{\partial \theta (x-t)}{\partial t}+ \frac{\partial \theta (x-t)}{\partial x}=0, \end{aligned}$$
(145)

which is clearly an identity, given (140). This proof can be generalized to arbitrary Fréchet differentiable initial conditions \(F_0\).

9.3.2 FDE approximation and convergence analysis

We have seen in Sect. 7 that the cylindrical approximation of the FDE (95) yields the multivariate PDE (99). By using integration by parts, it can be shown that the matrix of coefficients

$$\begin{aligned} C_{jk}=\int _0^{2pi} \varphi _j(x) \frac{\partial \varphi _k(x)}{\partial x}\mathrm{d}x \end{aligned}$$
(146)

is skew-symmetric since the basis functions \(\varphi _j\) are periodic. The initial condition appearing in (99) is obtained by evaluating (138) on the range of \(P_m\). This yields the cylindrical functional

$$\begin{aligned} f_0(a) = \int _{0}^{2\pi }\sin (x) \sin \left( \sum _{k=0}^m a_j\varphi _j(x)\right) ^2\mathrm{d}x,\qquad a_j=(\theta ,\varphi _k)_{L_p^2([0,2\pi ])}. \end{aligned}$$
(147)

The solution to the initial value problem (99) with initial condition given in (147) is obtained as

$$\begin{aligned} f(a,t) = f_0\left( e^{tC} a\right) , \qquad a=[a_0,\ldots , a_m]^T. \end{aligned}$$
(148)

We have seen in Sect. 9.2 that (147) converges uniformly to \(F_0([\theta ])\) as m goes to infinity at the same rate as \(\left\| \theta - P_m\theta \right\| _{L_p^2([0,2\pi ])}\) goes to zero. We also know that the residual of the finite-dimensional PDE approximation (99) goes to zero as we send m to infinity (Example 1 in Sect. 7), and that (99)–(147) is stable in the \(L^{\infty }\) norm (Example 2 in Sect. 7.1). By Theorem 7.1 this is sufficient to guarantee that (148) converges uniformly in \(\theta \) to the FDE solution (139) as we increase m (Theorem 7.1). Hereafter, we calculate the convergence rate of such approximation and show that it can be exponential depending on degree of smoothness of \(\theta \in K\), which is measured by the index s in (118). To this end, we begin with

$$\begin{aligned}&\left| F([\theta ],t) - f(a_0,\ldots , a_m,t)\right| ^2 \nonumber \\&\quad = \left| F_0([e^{-t\partial _x}\theta ]) - f_0(e^{tC}a)\right| ^2\nonumber \\&\quad =\left| \int _{0}^{2\pi } \sin (x)\left[ \sin ^2\left( e^{-t\partial _x}\theta (x)\right) - \sin ^2\left( \sum _{j=0}^m [e^{tC}a]_j\varphi _j(x)\right) \right] \mathrm{d}x \right| ^2. \end{aligned}$$
(149)

Recall that for any \(a,b\in {\mathbb {R}}\) we have

$$\begin{aligned} \sin ^2(a)-\sin ^2(b)= \sin (a+b)\sin (a-b). \end{aligned}$$
(150)

Hence, from equation (149) it follows that

$$\begin{aligned} \left| F([\theta ],t) - f(a_0,\ldots , a_m,t)\right| ^2&\le \int _{0}^{2\pi }\left| \sin \left( e^{-t\partial _x}\theta (x)-\sum _{j=0}^m [e^{tC}a]_j\varphi _j(x)\right) \right| ^2\mathrm{d}x\nonumber \\&\le \int _{0}^{2\pi }\left| e^{-t\partial _x}\theta (x)-\sum _{j=0}^m [e^{tC}a]_j\varphi _j(x)\right| ^2\mathrm{d}x\nonumber \\&=\left\| e^{-t\partial _x}\theta (x)-\sum _{j=0}^m [e^{tC}a]_j\varphi _j(x)\right\| ^2_{L_p^2([0,2\pi ])}. \end{aligned}$$
(151)

At this point, we recall that \(\theta (x,t)=e^{-t\partial _x}\theta (x)\) is the exact solution to the advection equation (145), while the function \(\theta _m(x,t)=\sum _{j=0}^m \left[ e^{tC}a\right] _j\varphi _j(x)\) is the solution to the Fourier–Galerkin discretization of (145)

$$\begin{aligned} \frac{\mathrm{d}a_j(t)}{\mathrm{d}t}-\sum _{k=1}^m a_k C_{jk}=0. \end{aligned}$$
(152)

It is well known that the Galerkin scheme (152) is stable in the \(L_p^2([0,2\pi ])\) norm (see, e.g., [16, §6.1.1]), and that the solution \(\theta _m(x,t)\) converges to \(\theta (x,t)\) at a rate that depends only on the smoothness of \(\theta (x,0)\) (initial condition). This implies that

$$\begin{aligned} \left| F([\theta ],t) - f(a_0,\ldots , a_m,t)\right| \le \frac{C}{m^s}\left\| \theta \right\| _{H_p^s} \le \frac{C\rho }{m^s}, \end{aligned}$$
(153)

where the parameter s measures the regularity of \(\theta \). If \(\theta \) is infinitely differentiable, then \(f(a_0,\ldots , a_m,t)\) converges to \(F([\theta ],t)\) exponentially fast in m. To validate (153) numerically, in Fig. 4 we plot the error

$$\begin{aligned} \epsilon _0(m,t) = \sup _{\theta \in K} \left| F([\theta ],t)- f(a_0,\ldots ,a_m,t)\right| \end{aligned}$$
(154)

at time \(t=\pi \) in the case where \(\theta \) has power law or exponential decaying Fourier coefficients. It is seen that the cylindrical approximation \(f(a_0,\ldots ,a_m,t)\) indeed converges to \(F([\theta ],t)\) at the same rate at which \(P_m\theta \) converges to \(\theta \), which depends on the smoothness of \(\theta \in K\). It is worthwhile noticing that the convergence plots in Figs. 2, 3 and 4 are essentially a rescaled version of the same plot. The reason is that the FDE solution has continuous Fréchet derivatives up to any desired order. Hence, by the mean value Theorem 5.1, the convergence slopes are determined by the rate at which \(\left\| \theta -P_m\theta \right\| _{L_p^2([0,2\pi ])}\) goes to zero.

Fig. 4
figure 4

Convergence of the PDE solution (148) to the FDE solution (139) as we increase the number of variables m. Note that the convergence rate of the PDE (99) to the FDE (95) is, as before, the same as the convergence rate of \(P_m\theta \) to \(\theta \) in the \(L_p^2([0,2\pi ])\) norm

10 Conclusions

We established rigorous convergence results for cylindrical approximations of nonlinear functionals, functional derivatives, and functional differential equations (FDEs) defined on a compact subset of a real Banach space X admitting a basis. Such approximations are constructed by restricting the domain of the functionals and the FDEs to the range of a finite-dimensional projection acting on X. In this setting, we proved that continuous functionals and FDEs can be approximated by multivariate functions and multidimensional partial differential equations (PDEs), respectively. The convergence rate of such functional approximation can be exponential, depending on the regularity of the functional (in particular its Fréchet differentiability), and its domain. Rapidly converging approximations allow us to represent nonlinear functionals and FDEs in terms of multivariate functions and PDEs involving fewer independent variables. We also provided necessary and sufficient conditions for consistency, stability and convergence of functional approximations schemes to compute the solution of linear FDEs. The main results are Theorem 7.1 and Theorem 8.3, which are based on the Trotter–Kato approximation theorem for abstract evolution equations in Banach spaces. The results presented in this paper open the possibility to utilize techniques for high-dimensional function representation such as deep neural networks [73, 74, 104] and numerical tensor methods [5, 10, 24,25,26, 55, 76, 76, 82] to approximate nonlinear functionals in terms of high-dimensional functions, and to compute approximate solutions of functional differential equations by solving high-dimensional PDEs.