1 Introduction

In this paper we study maximal inequalities for the mild solutions of time-dependent stochastic evolution equations of the form

$$\begin{aligned} {\left\{ \begin{array}{ll} \,{\mathrm{d}}u_t &{}= A(t)u_t + g_t\,{\mathrm{d}}W_t, \quad t\in [0,T], \\ u_0 &{} = 0. \end{array}\right. } \end{aligned}$$
(1.1)

Here, \((A(t))_{t\in [0,T]}\) is a family of closed operators acting in a Banach space X generating a \(C_0\)-evolution family \((S(t,s))_{0\le s\le t\le T}\), \((W_t)_{t\in [0,T]}\) is a Brownian motion defined on a probability space \((\Omega ,{\mathscr {F}},{\mathbb {P}})\), adapted to some give filtration \(({\mathscr {F}}_t)_{t\in [0,T]}\), and \((g_t)_{t\in [0,T]}\) is a progressively measurable stochastic process with values in X. Under these assumptions the mild solution is given, at least formally, by the X-valued stochastic convolution-type integral

$$\begin{aligned} u_t := \int _0^t S(t,s)g_s\,{\mathrm{d}}W_s, \quad t\in [0,T]. \end{aligned}$$
(1.2)

An important special case of (1.1) is the time-independent case where \(A(t)\equiv A\) generates a \(C_0\)-semigroup \((S(t))_{t\ge 0}\) on X and \(S(t,s) = S(t-s)\). More generally we will consider stochastic convolutions driven by cylindrical Brownian motions and assume that g is operator-valued; this extension is mostly routine and for the ease of presentation will not be considered in this introduction.

In order to give a rigorous meaning to the stochastic integral in (1.2) one needs to impose suitable measurability and integrability assumptions on g and geometrical properties on X, such as 2-smoothness [2, 7, 8, 22, 23, 73, 75, 76] or the UMD property [96, 97]. The UMD theory is in some sense the definitive theory, in that it features a two-sided Burkholder inequality and completely natural extensions of the martingale representation theorem [96, 97] and the Clark–Ocone theorem [67]; from the point of view of applications to SPDE it provides stochastic maximal \(L^p\)-regularity for parabolic problems [84, 98,99,100] which in turn can be used to study quasi- and semi-linear PDEs [1]. The 2-smooth theory only allows for limited versions of these results, but it is easier in its basic constructions and adequate for many other purposes, and will provide the setting for this paper.

Instrumental in proving pathwise continuity of mild solutions to (1.1) is the availability of suitable estimates for the maximal function \(u^\star : \Omega \rightarrow [0,\infty )\),

$$\begin{aligned} u^\star := \sup _{t\in [0,T]} \Vert u_t\Vert , \end{aligned}$$

where \((u_t)_{t\in [0,T]}\) is the process defined by (1.2); norms are taken in X pointwise on \(\Omega \). The first such estimate was obtained by Kotelenez [56, 57] who showed that if \((A(t))_{t\in [0,T]}\) generates a contractive evolution family \((S(t,s))_{0\le s \le t\le T}\) on a Hilbert space X, then the process \((u_t)_{t\in [0,T]}\) defined by (1.2) has a continuous modification which satisfies the maximal inequality

$$\begin{aligned} {\mathbb {E}}\sup _{t\in [0,T]} \Vert u_t\Vert ^2 \le C^2 {\mathbb {E}}\int _0^T \Vert g_t\Vert ^2\,{\mathrm{d}}t, \end{aligned}$$
(1.3)

where C is some absolute constant. The extension of (1.3) to 2-smooth Banach spaces and general exponents \(0<p<\infty \) has been investigated by many authors [10, 43, 44, 52, 92, 94] who all limited themselves to the special case of contraction semigroups. This development is surveyed in [95], where also some extensions to evolution families are discussed. The more general case of stochastic convolutions driven by Lévy processes has been studied in the 2-smooth setting in [110, 111].

For Brownian motion as the driving process, the best result available to date is due to Zhu and the first author in [94], where it was shown that if \((S(t))_{t\ge 0}\) is a \(C_0\)-semigroup of contractions on a 2-smooth Banach space X and \((g_t)_{t\in [0,T]}\) is a progressively measurable process with values in X, then the process \((u_t)_{t\in [0,T]}\) defined by the stochastic convolution

$$\begin{aligned} u_t := \int _0^t S(t-s)g_s\,{\mathrm{d}}W_s, \quad t\in [0,T], \end{aligned}$$

has a continuous modification which satisfies, for every \(0<p<\infty \),

$$\begin{aligned} {\mathbb {E}}\sup _{t\in [0,T]} \Vert u_t\Vert ^p \le C_{p,X}^p {\mathbb {E}}\Bigl (\int _0^T \Vert g_t\Vert ^2\,{\mathrm{d}}t\Bigr )^{p/2}, \end{aligned}$$
(1.4)

where \(C_{p,X}\) is a constant depending only on p and X. In certain applications it is important to have explicit information on the constant in the asymptotic regime \(p\rightarrow \infty \). In the special case \(S(t) \equiv I\) the estimate (1.4) reduces to the Burkholder inequality for 2-smooth Banach spaces, for which the asymptotic dependence of \(C_{p,X}\) is known to be of order \(O(\sqrt{p})\) as \(p\rightarrow \infty \) [90]. For Hilbert spaces X and \(C_0\)-semigroup of contractions, (1.4) is known to hold to order \(O(\sqrt{p})\) as \(p\rightarrow \infty \) [43, 44]. In that setting the Sz.-Nagy dilation theorem can be used to reduce matters to the Burkholder inequality. The order \(O(\sqrt{p})\) can be used to derive exponential estimates which in turn can be used to study large deviations (see [13] and [79]). Inspecting the proof of (1.4) in [94] in the 2-smooth case, it is seen that the asymptotic p-dependence of the constant in that paper is non-optimal.

The aim of the present paper is to simultaneously improve the results cited above in two directions:

  • To extend (1.4) to arbitrary \(C_0\)-evolution families of contractions on 2-smooth Banach spaces X (not even assuming the existence of a generating family \((A(t))_{t\in [0,T]}\));

  • To show that the constant \(C_{p,X}\) in the resulting maximal inequality is of order \(O(\sqrt{p})\) as \(p\rightarrow \infty \).

The precise statement of our main result, which corresponds to the special case of Theorem 4.1 for Brownian motion, is as follows.

Theorem 1.1

Let \((S(t,s))_{0\leqslant s\leqslant t\leqslant T}\) be a \(C_0\)-evolution family of contractions on a 2-smooth Banach space X. Let \((W_t)_{t\in [0,T]}\) be an adapted Brownian motion on a probability space \((\Omega ,{\mathbb {P}})\), and let \((g_t)_{t\in [0,T]}\) be a progressively measurable process with values in X. Then the X-valued process \((u_t)_{t\in [0,T]}\) defined by

$$\begin{aligned} u_t:= \int _0^t S(t,s)g_s\,{\mathrm{d}}W_s, \qquad t\in [0,T], \end{aligned}$$

has a continuous modification which satisfies

$$\begin{aligned}{\mathbb {E}}\sup _{t\in [0,T]}\Vert u_t \Vert ^p\leqslant C_{p,X}^p {\mathbb {E}}\Bigl (\int _0^T \Vert g_t\Vert ^2\,{\mathrm{d}}t\Bigr )^{p/2},\end{aligned}$$

where the constant \(C_{p,X}\) only depends on p and the constant D in the definition of 2-smoothness for X. For \(2\le p<\infty \) the inequality holds with

$$\begin{aligned} C_{p,X} = 10D\sqrt{p}. \end{aligned}$$

Theorem 4.1 considers the more general situation of a cylindrical Brownian motion with covariance given by the inner product of a Hilbert space H and progressively measurable processes g with values in the space \(\gamma (H,X)\) of \(\gamma \)-radonifying operators from H to X (the definition of this space is recalled in Sect. 2).

For evolution families, Theorem 1.1 is new even for Hilbert spaces X. In the 2-smooth case it completely settles the asymptotic optimality problem; this is new even in the semigroup case. The proof of the theorem is very different from [43, 44] and [94] and combines ideas of Kotelenez [56] and Seidler [90]. Seidler’s proof of the \(O(\sqrt{p})\) bound for the constant in Burkholder inequality in 2-smooth Banach spaces is based on a clever modification of the Burkholder–Rosenthal inequality due to Pinelis [81]. We further extend Pinelis’s inequality by accommodating additional predictable contraction operators in it which enable us to merge the inequality with a splitting technique already used by Kotelenez.

Theorems 1.1 and 4.1 are also applicable in the setting where the evolution family S itself is not contractive, but admits a dilation to a contractive evolution family on a 2-smooth Banach space. In the semigroup case, the boundedness of the \(H^\infty \)-calculus of the generator A of angle \(<\frac{1}{2}\pi \) implies that the semigroup has a dilation to an isometric \(C_0\)-group (see [33, 50, 90, 95, 101]). In this case, however, there is no need to use Theorem 1.1 since one can apply the simpler method of [43, 44].

Our method can be used quite naturally to prove the stability (uniformly in time) of certain numerical schemes associated with (1.1). This is pursued in Sect. 5, where we prove that if \((S(t))_{t\geqslant 0}\) is a \(C_0\)-semigroup of contractions on a (2, D)-smooth Banach space X with generator A, and u is a continuous modification of the process \((\int _0^t S(t-s) g_s\,{\mathrm{d}}W_s)_{t\in [0,T]}\), then for any contractive approximation scheme R which approximates \((S(t))_{t\geqslant 0}\) to some order \(\alpha \in (0,1]\) on the domain \({\mathsf {D}}(A)\) one has

$$\begin{aligned} {\mathbb {E}}\sup _{j=0,\ldots ,n}\Vert u_{t_j^{(n)}} - u_j^{(n)}\Vert ^p\rightarrow 0 \ \ \hbox {as} \ \ n\rightarrow \infty , \end{aligned}$$
(1.5)

where

$$\begin{aligned} {\left\{ \begin{array}{ll} u_0^{(n)} &{} :=0, \\ u_j^{(n)} &{}:= R(T/n) \Bigl (u_{j-1}^{(n)} + \displaystyle \int _{t_{j-1}^{(n)}}^{t_{j}^{(n)}} g_s \,{\mathrm{d}}W_s\Bigr ), \quad j=1,\dots ,n. \end{array}\right. } \end{aligned}$$
(1.6)

The crucial observation underlying (1.5) is that the sequence \((u_j^{(n)})_{j=0}^n\) defined by (1.6) is precisely of the right format to apply our extension of Pinelis’s inequality. For \(C_0\)-semigroups which are not necessarily contractive and functions \(g\in L^p(\Omega ;L^2(0,T;{\mathsf {D}}(A))\), we show that convergence holds with the following explicit rate:

$$\begin{aligned} \Bigl ({\mathbb {E}}\sup _{j=0,\ldots ,n}\Vert u_{t_j^{(n)}} - u_j^{(n)}\Vert ^p\Bigr )^{1/p} \le C \frac{\sqrt{\log (n+1)}}{n^\alpha } \Vert g\Vert _{L^p(\Omega ;L^2(0,T;{\mathsf {D}}(A)))}, \end{aligned}$$
(1.7)

where C is a constant independent of n and g. This estimate is somewhat simpler, in that it directly uses Seidler’s version of the Burkholder inequality of Proposition 2.6 in combination with a simple trick, in Proposition 2.7, involving switching back and forth from \(\ell _n^\infty (X)\) to \(\ell _n^{q(n)}(X)\) for a clever choice of q(n). This can be done at the expense of a constant \(n^{1/q(n)}\), exploiting the fact proven in Proposition 2.2 that \(\ell ^{q(n)}(X)\) is 2-smooth for \(2\le q<\infty \) with constant of order \(\sqrt{q(n)}\). This appears to be a new technique whose potential deserves further investigation.

Examples of numerical schemes to which our abstract results can be applied include the splitting method (with \(R(t) = S(t)\) with \(\alpha =1\)), the implicit Euler method (with \(R(t) = (I-tA)^{-1}\) and \(\alpha =1/2\)), and the Crank–Nicholson method (with \(R(tA) = (2+tA)(2-tA)^{-1}\) and \(\alpha =2/3\)). Moreover, if g takes values in suitable intermediate spaces between X and \({\mathsf {D}}(A^m)\) with \(m\geqslant 1\), appropriate rates of convergence can be obtained for each of these methods.

We expect that the new results in the simple linear setting will provide new insights for approximation of nonlinear SPDEs also by adapted time schemes and plan to address this in future work.

To illustrate the main result we consider the stochastic heat equation with the implicit Euler scheme (cf. Example 5.15). For simplicity, here we state the result in terms of Sobolev spaces. In Example 5.15, the use of Bessel potential spaces allows us to take the smoothness exponent m fractional and also negative. Further examples can be found in Sect. 5.3.

Example 1.2

(Heat equation, implicit Euler scheme) Consider the inhomogeneous stochastic heat equation on \({{\mathbb {R}}}^d\):

$$\begin{aligned} {\left\{ \begin{array}{ll} \,{\mathrm{d}}u_t &{}= \Delta u_t + \sum _{k\geqslant 1} g_{t}^{k}\,{\mathrm{d}}W_t^k, \quad t\in [0,T]. \\ u_0 &{} = 0. \end{array}\right. } \end{aligned}$$
(1.8)

Here, \(W=(W^k)_{k\geqslant 1}\) is a sequence of independent standard Brownian motions. We further assume that each \(g^k:\Omega \times [0,T]\times {{\mathbb {R}}}^d\rightarrow {{\mathbb {R}}}\) is progressively measurable and that \(p\in (0,\infty )\), \(q\in [2, \infty )\), and \(m\in {{\mathbb {N}}}=\{0,1, \ldots \}\) are such that

$$\begin{aligned} \Vert g\Vert _{{\mathbb {W}}^{m,q,p}}^p:=\sum _{i=1}^d \sum _{j=0}^{m}{\mathbb {E}}\Big \{\int _0^T \Big (\int _{{{\mathbb {R}}}^d}\Big (\sum _{k\geqslant 1}|\partial _i^j g_k(t,x)|^2\Big )^{q/2} \,{\mathrm{d}}x\Big )^{2/q}\,{\mathrm{d}}t\Big \}^{p/2}\end{aligned}$$

is finite. For \(n=1,2,\dots \) set \(t_j^{(n)} := jT/n\) and consider the partition \(\pi ^{(n)} := \{t_j^{(n)}: j=0,\ldots , n\}\). Let \((S(t))_{t\geqslant 0}\) denote the heat semigroup on \(L^q({{\mathbb {R}}}^d)\) and set

$$\begin{aligned} u_t:= \int _0^t S(t-s) g_s\,{\mathrm{d}}W_s, \quad t\in [0,T]. \end{aligned}$$

This stochastic integral is well defined as an \(L^q({{\mathbb {R}}}^d)\)-valued Itô integral by Proposition 2.6 and (2.8).

Define the discrete approximation by \(u_0^{(n)} :=0\), and

$$\begin{aligned} u_j^{(n)} := (1-\tfrac{T}{n}\Delta )^{-1} \Big (u_{j-1}^{(n)} + \int _{t_{j-1}^{(n)}}^{t_{j}^{(n)}} g_s \,{\mathrm{d}}W_s\Big ), \quad j=1,\dots ,n, \end{aligned}$$

Let \(W^{j,q}({{\mathbb {R}}}^d)\) be the Sobolev space of smoothness j and integrability q. Then the following results hold:

$$\begin{aligned} {\mathbb {E}}\sup _{j=0,\ldots ,n}\Vert u_{t_j^{(n)}} - u_j^{(n)}\Vert ^p_{W^{m-2,q}({{\mathbb {R}}}^d)} \le \Bigg ( C_{p,q,d,m} \frac{\sqrt{\log (n+1)}}{n}\Bigg )^p \Vert g\Vert _{{\mathbb {W}}^{m,q,p}}^p, \&m\geqslant 2, \\ {\mathbb {E}}\sup _{j=0,\ldots ,n}\Vert u_{t_j^{(n)}} - u_j^{(n)}\Vert ^p_{W^{m-1,q}({{\mathbb {R}}}^d)} \le \Bigg (C_{p,q,d,m} \frac{\sqrt{\log (n+1)}}{n^{1/2}}\Bigg )^p \Vert g\Vert _{{\mathbb {W}}^{m,q,p}}^p,\&m\geqslant 1, \\ \lim _{n\rightarrow \infty }{\mathbb {E}}\sup _{j=0,\ldots ,n}\Vert u_{t_j^{(n)}} - u_j^{(n)}\Vert ^p_{W^{m,q}({{\mathbb {R}}}^d)} \rightarrow 0 \quad \qquad \qquad \qquad \qquad \qquad \qquad \qquad&m\geqslant 0. \end{aligned}$$

This follows from Theorems 5.13 and 5.14.

In the final Sect. 6 we extend some of results to stochastic convolutions involving random evolution families, which arise naturally if the operator family \((A(t))_{t\in [0,T]}\) in (1.1) depends on a random parameter in an adapted way. That this is possible at all in the abstract setting of evolution equations in infinite dimensions is quite remarkable. It requires replacing the Itô integral with the forward integral of [88] in order to avoid adaptedness problems. Stochastic convolution in the forward sense is known to still give the weak solution to (1.1) (see [62, Proposition 5.3], [85, Theorem 4.9] and Theorem 6.6 below). In the parabolic setting, space-time regularity results have been derived by Pronk and the second-named author in [85] using so-called pathwise mild solutions (see Proposition 6.2) and a simple integration by parts trick. Pathwise mild solutions have been recently used to study quasilinear PDEs in [30, 59, 70] and random attractors in [60]. The new maximal estimates proved in our current paper are expected to have implications for these results as well.

For adapted families \((A(t))_{t\in [0,T]}\), maximal inequalities can be alternatively derived via Itô’s formula (see [95] and references therein). In contrast to the results obtained here, however, this does not lead to constants of order \(O(\sqrt{p})\) as \(p\rightarrow \infty \). In the setting of monotone (possible nonlinear) operators and \(p=2\), the Itô formula argument is applicable in a wider setting (see [63]). Some extensions to \(p>2\) have been obtained recently in [72].

2 Preliminaries

Throughout this paper we work over the real scalar field. Unless otherwise stated, random variables and stochastic processes are defined on a probability space \((\Omega ,{\mathscr {F}},{\mathbb {P}})\) which we consider to be fixed throughout. On this probability space we fix a filtration \(({\mathscr {F}}_t)_{t\in [0,T]}\) once and for all. Standard notions from the theory of stochastic processes always refer to this filtration. Whenever we consider stochastic integrals with respect to a (cylindrical) Brownian motion or a more general type of driving process, it is always assumed that it is adapted with respect to this filtration. The conditional expectation of a random variable \(\xi \) with respect to a sub-\(\sigma \)-algebra \({\mathscr {G}}\subseteq {\mathscr {F}}\) will be denoted by \({\mathbb {E}}_{\mathscr {G}}(\xi )\). The progressive \(\sigma \)-algebra associated with \(({\mathscr {F}}_t)_{t\in [0,T]}\), i.e., the \(\sigma \)-algebra generated by sets of the form \(B\times A\) with \(B\in {\mathscr {B}}([0,t])\) and \(A\in {\mathscr {F}}_t\), where t ranges over [0, T], is denoted by \({\mathscr {P}}\). We will use the subscript \({\mathscr {P}}\) to denote the closed subspace of all progressively measurable process in a given space of processes.

When X is a Banach space, under an X-valued random variable we understand a strongly measurable function (i.e., a function which is the pointwise limit of a sequence of simple functions) from \(\Omega \) into X; for details the reader is referred to [48, 49]. For the purposes of this article, an X-valued process is a family of X-valued random variables indexed by [0, T]. Two processes \((g_t)_{t\in [0,T]}\) and \((h_t)_{t\in [0,T]}\) are said to be modifications of each other if for al \(t\in [0,T]\) we have \(g_t = h_t\) almost surely (with exceptional set that may depend on t). A process \((g_t)_{t\in [0,T]}\) with values in X is said to be progressively measurable if g is strongly measurable as an X-valued function on the measurable space \(([0,T]\times \Omega ,{\mathscr {P}})\). It is a deep result in the theory of stochastic processes that every adapted and strongly measurable X-valued stochastic process admits a progressively measurable modification; an elementary proof is offered in [77].

2.1 2-Smooth Banach spaces

A Banach space X is said to have martingale type \(p\in [1,2]\) if there exists a constant \(C\ge 1\) such that

$$\begin{aligned} {\mathbb {E}}\Vert f_N\Vert {p} \leqslant C^p \Bigl ({\mathbb {E}}\Vert f_0\Vert ^p+\sum _{n=1}^N\Vert f_n - f_{n-1}\Vert ^p\Bigl ) \end{aligned}$$

for all X-valued \(L^p\)-martingales \((f_n)_{n=0}^N\). A Banach space X is called (pD)-smooth, where \(p\in [1,2]\) and \(D\ge 1\), if for all \(x,y\in X\) we have

$$\begin{aligned} \Vert x+y\Vert ^p+\Vert x-y\Vert ^p\leqslant 2\Vert x\Vert ^p+2D^p\Vert y\Vert ^p. \end{aligned}$$

A Banach space is called p-smooth if it is (pD)-smooth for some \(D\ge 1\).

By a fundamental result due to Pisier [82] every p-smooth Banach space has martingale type p, and conversely every Banach space with martingale type p admits an equivalent p-smooth norm. Moreover, if X has martingale type p with constant C, an equivalent (pC)-smooth norm can be found; if X is (pD)-smooth, then X has martingale type p with constant at most 2C (and the constant 2 can be omitted for \(p=2\), see Remark 2.5). Detailed proofs of these facts can be found in [83, 104, 106].

The class of 2-smooth Banach space is of particular interest from the point of view of stochastic analysis. It includes all Hilbert spaces (with \(D=1\), by the parallelogram identity) and the spaces \(L^p(\mu )\) with \(2\le p<\infty \) (with \(D=\sqrt{p-1}\), see [81, Proposition 2.1] and Proposition 2.2 below). The reason for being interested in 2-smooth spaces rather than spaces with martingale type 2 is as follows. Martingale type 2 is preserved under passing to equivalent norms, but this is not the case for 2-smoothness. In the results to follow, semigroups and evolution families of contractions (i.e., operators of norm \(\le 1\)) play a distinguished role. Since contractivity need not be preserved under passing to equivalent norms, such a distinguished role cannot be expected in the setting of martingale type 2 spaces. In this connection the following interesting question seems to be an open: if X has martingale type 2 and supports a \(C_0\)-semigroup (or \(C_0\)-evolution family), does there exist an equivalent (2, D)-smooth norm with respect to which the semigroup (or evolution family) is contractive?

In what follows we recall some useful properties of 2-smooth Banach spaces that will be needed in this paper.

If X is (2, D)-smooth, then by [94, Lemma 2.1] and its proof the function

$$\begin{aligned}\rho (x):= \Vert x\Vert ^2\end{aligned}$$

is Fréchet differentiable on X and its derivative is Lipschitz continuous. Conversely, if \(\rho \) is twice Fréchet differentiable and \(\rho ''(x)(y,y)\leqslant 2D^2\Vert y\Vert ^2\) at every \(x\in X\), then X is (2, D)-smooth (see [81] for a more general version of this converse). Unlike in finite dimensions, Lipschitz continuity does not imply almost everywhere differentiability (the latter even being meaningless in the absence of a reference measure). One way to get around this is to consider the functions

$$\begin{aligned}\rho _{x,y}(t):= \rho (x+ty) = \Vert x+ ty\Vert ^2. \end{aligned}$$

The following lemma is implicit in [81]. For the reader’s convenience we include a proof.

Proposition 2.1

For any Banach space X and constant \(D\ge 1\) the following assertions are equivalent:

  1. (1)

    X is (2, D)-smooth;

  2. (2)

    For all \(x,y\in X\) the function \(\rho _{x,y}(t):= \rho (x+ty) = \Vert x+ty\Vert ^2\) is differentiable on \({{\mathbb {R}}}\), its derivative is Lipschitz continuous, and satisfies

    $$\begin{aligned} \rho _{x,y}'(t) - \rho _{x,y}'(s) \le 2D^2(t-s)\Vert y\Vert ^2, \quad s,t\in {{\mathbb {R}}}, \ t\ge s. \end{aligned}$$

Proof

(1)\(\Rightarrow \)(2): Fix \(x,y\in X\). The differentiability of \(\rho _{x,y}(t)= \Vert x+ty\Vert ^2\) follows from the Fréchet differentiability of \(\rho \), and by the chain rule we have \(\rho _{x,y}'(t) = \langle y,\rho '(x+ty)\rangle \). Lipschitz continuity of \(\rho '\) follows from [24, Lemma V.3.5] and implies the Lipschitz continuity of \(\rho _{x,y}'\). It follows that the second derivative \(\rho _{x,y}''(t)\) exists for almost every \(t\in {{\mathbb {R}}}\), and in the points where it exists it is given by

$$\begin{aligned} \rho _{x,y}''(t)&= \lim _{h\rightarrow 0} \frac{1}{h^2}((\rho _{x,y}(t+h) + \rho _{x,y}(t-h) - 2\rho _{x,y}(t)) \\&= \lim _{h\rightarrow 0} \frac{1}{h^2}( \Vert (x + ty)+hy\Vert ^2 + \Vert (x + ty)-hy\Vert ^2 - 2\Vert x+ty\Vert ^2). \end{aligned}$$

Therefore, by 2-smoothness, \( \rho _{x,y}''(t) \le 2D^2 \Vert y\Vert ^2\) in these points. This implies that \(\rho _{x,y}'(t) - \rho _{x,y}'(s) \le 2D^2(t-s)\Vert y\Vert ^2\) for all \(t\ge s\).

(2)\(\Rightarrow \)(1): For all \(x,y\in X\) we have

$$\begin{aligned} \Vert x + y\Vert ^2 + \Vert x -y\Vert ^2 - 2\Vert x\Vert ^2&= \int _0^1 \rho _{x,y}'(t)\,{\mathrm{d}}t - \int _{-1}^0 \rho _{x,y}'(t)\,{\mathrm{d}}t \\&= \int _0^1 \rho _{x,y}'(t)- \rho _{x,y}'(t-1)\,{\mathrm{d}}t \le 2D^2\Vert y\Vert ^2. \end{aligned}$$

\(\square \)

As an application we prove the following vector-valued analogue of [81, Proposition 2.1]. It will be needed in the proof of Proposition 2.7, which in turn is applied in Sect. 5.

Proposition 2.2

Let \((S,{\mathscr {A}},\mu )\) be a measure space and X be a (2, D)-smooth Banach space. Then for all \(2\le p<\infty \) the space \(L^p(S;X)\) is \((2,\sqrt{p-2+D^2})\)-smooth.

Notice that \(D\ge 1\) implies \(p-2+D^2\le D^2(p-1)\), so in particular \(L^p(S;X)\) is \((2,D\sqrt{p-1})\)-smooth.

Proof

The proof is based on the equivalence in Proposition 2.1. For Banach spaces X with the property that \(\Vert \cdot \Vert ^2\) is twice continuously Fréchet differentiable the proof can be somewhat simplified.

Throughout the proof we use \(\Vert \cdot \Vert \) and \(\Vert \cdot \Vert _p\) to denote the norms of X and \(L^p(S;X)\), respectively. Thus if \(f\in L^p(S;X)\), then \(\Vert f\Vert \) is the function \(s\mapsto \Vert f(s)\Vert \) in \(L^p(S)\).

As in [24, Theorem V.1.1] one checks that the functions

$$\begin{aligned} \psi _p(x):= \Vert x\Vert ^p, \quad \Psi _{p}(g):= \Vert g\Vert _p^p, \end{aligned}$$

are Fréchet differentiable and

$$\begin{aligned} \langle f,\Psi _{p}'(g)\rangle = \int _S \langle f,\psi _p'(g)\rangle \,{\mathrm{d}}\mu , \qquad f,g\in L^p(S;X), \end{aligned}$$
(2.1)

where the duality \(\langle \cdot ,\cdot \rangle \) between X and \(X^*\) is applied pointwise on S. For \(q\in {{\mathbb {R}}}\) let

$$\begin{aligned} w_{q;x,y}(t)&:= \Vert x+ty\Vert ^q, \qquad x,y\in X; \\ W_{q;f,g}(t)&:= \Vert f+tg\Vert _p^q, \qquad f,g\in L^p(S;X). \end{aligned}$$

The Fréchet differentiability of \(\psi _p\) and \(\Psi _p\) implies the differentiability of \(w_{q;x,y}\) and \(W_{q;f,g}\) (except possibly at \(t=0\) when \(x=0\) and \(y\not =0\), respectively \(f=0\) and \(g\not =0\)). Denoting derivatives with respect to t by \(\partial _t\), for \(q\not =0\) the chain rule gives

$$\begin{aligned} \partial _t w_{q;x,y}(t)&= \frac{q}{2} \Vert x+ty\Vert ^{q-2}\partial _t w_{2;x,y}(t) = q \Vert x+ty\Vert ^{q-2}\langle y,\psi _{2}'(x+ty)\rangle \\ \partial _t W_{q;f,g}(t)&= \frac{q}{2} \Vert f+tg\Vert _p^{q-2}\partial _t W_{2;f,g}(t) = q \Vert f+tg\Vert _p^{q-2}\langle g,\Psi _{2}'(f+tg)\rangle , \end{aligned}$$

where \(\Psi _{2}(g):= \Vert g\Vert _p^2\). Also,

$$\begin{aligned} \psi _p'(x) = \frac{p}{2}\Vert x\Vert ^{p-2}\psi _2'(x), \quad \Psi _p'(f) = \frac{p}{2}\Vert f\Vert _p^{p-2}\Psi _2'(f). \end{aligned}$$

Combining these identities with (2.1), we obtain

$$\begin{aligned} \begin{aligned} \frac{1}{2}\partial _t W_{2;f,g}(t)&= \langle g,\Psi _2'(f+tg)\rangle \\&= \frac{2}{p} \Vert f+tg\Vert _p^{2-p} \langle g,\Psi _p'(f+tg)\rangle \\&= \frac{2}{p} \Vert f+tg\Vert _p^{2-p} \int _S \langle g, \psi _p'(f+tg)\rangle \,{\mathrm{d}}\mu \\&= \Vert f+tg\Vert _p^{2-p} \int _S \Vert f+tg\Vert ^{p-2} \langle g, \psi _2'(f+tg)\rangle \,{\mathrm{d}}\mu \\&= \frac{1}{p} \Vert f+tg\Vert _p^{2-p} \int _S \partial _t w_{p;f,g}(t)\,{\mathrm{d}}\mu . \end{aligned} \end{aligned}$$
(2.2)

Since X is 2-smooth and Lipschitz functions are almost everywhere differentiable, for all \(x,y\in X\) the function \(w_{2;x,y}\) is twice differentiable almost everywhere by Proposition 2.1. The exceptional set may depend on the pair (xy), however, so in order to be able to differentiate the right-hand side of (2.2) under the integral we will consider simple functions \(f,g\in L^p(S;X)\) from this point onward. Then the right-hand side of (2.2) is differentiable for almost all \(t\in {{\mathbb {R}}}\) and

$$\begin{aligned} \&\partial _t^2 W_{2;f,g}(t) \\&\quad = \frac{2}{p} \partial _t (\Vert f+tg\Vert _p^{2-p}) \int _S \partial _t w_{p;f,g}(t)\,{\mathrm{d}}\mu + \frac{2}{p} \Vert f+tg\Vert _p^{2-p} \partial _t \int _S \partial _t w_{p;f,g}(t)\,{\mathrm{d}}\mu \\&\quad = \frac{2}{p} \partial _t ((W_{2;f,g}(t))^{1-\frac{p}{2}}) \int _S \partial _t w_{p;f,g}(t)\,{\mathrm{d}}\mu + \frac{2}{p} \Vert f+tg\Vert _p^{2-p}\! \int _S \partial _t^2 w_{p;f,g}(t)\,{\mathrm{d}}\mu \\&\quad = \Big (\frac{2}{p}-1\Big )\Vert f+tg\Vert _p^{-p}\partial _t W_{2;f,g}(t)\int _S \partial _t w_{p;f,g}(t)\,{\mathrm{d}}\mu \\&\qquad + \frac{2}{p}\Vert f+tg\Vert _p^{2-p}\int _S \partial _t^2 w_{p;f,g}(t)\,{\mathrm{d}}\mu \\&\qquad {\mathop {=}\limits ^{(*)}} \frac{2}{p}\Big (\frac{1}{p}-\frac{1}{2}\Big )\Vert f+tg\Vert _p^{2-2p}\Bigl (\int _S \partial _t w_{p;f,g}(t)\,{\mathrm{d}}\mu \Bigr )^2 \\&\qquad + \frac{2}{p}\Vert f+tg\Vert _p^{2-p}\int _S \partial _t^2 w_{p;f,g}(t)\,{\mathrm{d}}\mu \\&\qquad {\mathop {\le }\limits ^{(**)}} \frac{2}{p}\Vert f+tg\Vert _p^{2-p}\int _S \partial _t^2 w_{p;f,g}(t)\,{\mathrm{d}}\mu , \end{aligned}$$

where \((*)\) follows from (2.2) and \((**)\) from the assumption \(2\le p<\infty \). Now

$$\begin{aligned} \partial _t^2 w_{p;x,y}(t)= & {} p \partial _t [\Vert x+ty\Vert ^{p-2} \langle y, \psi _2'(x+ty)\rangle ] \\= & {} p\partial _t(\Vert x+ty\Vert ^{p-2})\langle y, \psi _2'(x+ty)\rangle + p\Vert x+ty\Vert ^{p-2} \partial _t \langle y, \psi _2'(x+ty)\rangle \\= & {} 2p\Big (\frac{p}{2}-1\Big )\Vert x+ty\Vert ^{p-4}\langle y, x+ty\rangle ^2 + \frac{p}{2}\Vert x+ty\Vert ^{p-2} \partial _t^2 w_{2;x,y}(t) \\\le & {} 2p\Big (\frac{p}{2}-1\Big )\Vert x+ty\Vert ^{p-2}\Vert y\Vert ^2 + pD^2\Vert x+ty\Vert ^{p-2} \Vert y\Vert ^2. \end{aligned}$$

Applying this with \(x = f(\cdot )\) and \(y = g(\cdot )\) we obtain

$$\begin{aligned} \partial _t^2 W_{2;f,g}(t)&\le 2(p-2 +D^2) \Vert f+tg\Vert _p^{2-p}\int _S \Vert f+tg\Vert ^{p-2} \Vert g\Vert ^2\,{\mathrm{d}}\mu . \end{aligned}$$

By Hölder’s inequality with \(r = p/(p-2)\) and \(r' = p/2\) we obtain that \(W_{2;f,g}\) is twice differentiable almost everywhere and

$$\begin{aligned} \partial _t^2 W_{2;f,g}(t) \le 2(p-2+D^2)\Vert g\Vert _p^2. \end{aligned}$$
(2.3)

Since f and g are simple, the 2-smoothness of X and Proposition 2.1 imply that \(t\mapsto \partial _t W_{2;f,g}\) is Lipschitz continuous. Therefore it follows from (2.3) that \(t\mapsto \partial _t W_{2;f,g}\) is Lipschitz continuous with Lipschitz constant \(2(D^2+p-2)\Vert g\Vert _p^2\). The proof of the implication (2)\(\Rightarrow \)(1) of Proposition 2.1 then gives the inequality

$$\begin{aligned} \Vert f+g\Vert ^2+\Vert f-g\Vert ^2\leqslant 2\Vert f\Vert ^p+ 2(D^2+p-2)\Vert g\Vert ^2 \end{aligned}$$

for simple \(f,g\in L^p(S;X)\). The inequality for general \(f,g\in L^p(S;X)\) follows by approximation. \(\square \)

Remark 2.3

By Pisier’s characterisation of 2-smoothness in terms of the modulus of uniform smoothness [82], the fact that 2-smoothness of X implies the 2-smoothness of \(L^p(\mu ;X)\) for all \(2\le p<\infty \) follows from [31]. A quantitative version is proved in [71, Corollary 2.3] where it is shown that if the modulus of uniform smoothness of a Banach space satisfies \( \varrho _X(\tau ) \le s \tau ^2\) for all \(\tau >0\), then the modulus of uniform smoothness of \(L^p(\mu ;X)\) satisfies

$$\begin{aligned} \varrho _{L^p(\mu ,X)}(\tau )\le (4s+4p)\tau ^2, \quad \tau >0. \end{aligned}$$
(2.4)

By Pisier’s result, this implies that \(L^p(\mu ;X)\) is (2, E)-smooth for some \(E\ge 1\), but the bound for E obtained this way is worse than ours. We will show this by demonstrating that our Proposition 2.2 gives a slight improvement of the constant (2.4). Indeed, by [106, Proposition 3.1.2], the bound \(\varrho _X(\tau ) \le s \tau ^2\) for \(\tau >0\) implies that X is \((2,\sqrt{1+4s})\)-smooth. Consequently Proposition 2.2 implies that \(L^p(\mu ;X)\) is \((2,\sqrt{p-1+4s})\)-smooth. Another application of [106, Proposition 3.1.2] then gives that

$$\begin{aligned} \varrho _{L^p(\mu ;X)}(\tau ) \le (4s+p-1)\tau ^2, \quad \tau >0. \end{aligned}$$

Following [81] we will use Proposition 2.1 to derive some further useful inequalities for the function

$$\begin{aligned} w (t):= w_{x,y}(t):= (\rho (x+ty))^{1/2} = \Vert x+ ty\Vert , \end{aligned}$$

where x and y are fixed elements in a (2, D)-smooth Banach space. Evidently w is Lipschitz continuous with \(|w(t) - w(s)| \le |t-s|\Vert y\Vert \), so w is almost everywhere differentiable with

$$\begin{aligned} |w'(t)| \le \Vert y\Vert . \end{aligned}$$
(2.5)

We start from the elementary observation that \(\sinh a \le a\cosh a\) for \(a\ge 0\). Hence when \(w''w \ge 0\), Proposition 2.1 implies the almost everywhere inequalities

$$\begin{aligned} (\cosh w)''&= (w')^2\cosh w + w''\sinh w \\&\le ((w')^2 + w''w)\cosh w = \tfrac{1}{2} (w^2)''\cosh w \le D^2 \Vert y\Vert ^2\cosh w, \end{aligned}$$

whereas if \(w''w<0\), then (2.5) implies

$$\begin{aligned} (\cosh w)'' = (w')^2\cosh w + w''\sinh w \le (w')^2\cosh w \le \Vert y\Vert ^2\cosh w. \end{aligned}$$

Combining these inequalities we obtain the almost everywhere inequality

$$\begin{aligned} (\cosh w)'' \le D^2 \Vert y\Vert ^2\cosh w. \end{aligned}$$
(2.6)

The next lemma was obtained in [81, Proposition 2.5 and the proof of Theorem 3.2]. We present a more direct argument which avoids the smoothing procedure and reduction to the finite dimensional setting used in [81, Lemma 2.2, Lemma 2.3, and Remark 2.4].

Lemma 2.4

Let X be a (2, D)-smooth Banach space and let \(\xi ,\eta \in L^2(\Omega ;X)\). Let \({\mathscr {G}}\subseteq {\mathscr {F}}\) be a sub-\(\sigma \)-algebra. If \(\xi \) is strongly \({\mathscr {G}}\)-measurable and \({\mathbb {E}}_{{\mathscr {G}}}{\eta }= 0\), then

$$\begin{aligned} {\mathbb {E}}_{{\mathscr {G}}}(\Vert \xi +\eta \Vert ^2)&\leqslant \Vert \xi \Vert ^2 + D^2 {\mathbb {E}}_{{\mathscr {G}}}(\Vert \eta \Vert ^2). \end{aligned}$$

If, moreover, \(\xi ,\eta \in L^\infty (\Omega ;X)\), then

$$\begin{aligned} {\mathbb {E}}_{{\mathscr {G}}}(\cosh (\Vert \xi +\eta \Vert ))\leqslant \big (1+D^2 {\mathbb {E}}_{{\mathscr {G}}}(e^{\Vert \eta \Vert }-1 - \Vert \eta \Vert )\big ) \cosh (\Vert \xi \Vert ).\end{aligned}$$

Proof

Fix \(x,y\in X\). As before we let \(\rho _{x,y}(t): = (w_{x,y}(t))^2= \Vert x+ty\Vert ^2 = \rho (x+ty)\) for \(t\in {{\mathbb {R}}}\). Then \(\rho _{x,y}\) is continuously differentiable and \(\rho _{x,y}'\) is Lipschitz continuous with constant \(2D^2\Vert y\Vert ^2\) by Proposition 2.1. Taylor’s formula then gives

$$\begin{aligned}\Vert x+ty\Vert ^2 = \rho _{x,y}(t)&= \rho _{x,y}(0) + t \rho _{x,y}'(0) + \int _0^t \rho _{x,y}'(s)-\rho _{x,y}'(0) \,{\mathrm{d}}s \\&\leqslant \Vert x\Vert ^2 + t \langle \rho '(x), y\rangle + t D^2\Vert y\Vert ^2. \end{aligned}$$

Setting \(x= \xi (\omega )\), \(y=\eta (\omega )\), \(t=1\), and taking conditional expectations, we obtain

$$\begin{aligned}{\mathbb {E}}_{{\mathscr {G}}}(\Vert \xi +\eta \Vert ^2)\leqslant \Vert \xi \Vert ^2 + {\mathbb {E}}_{{\mathscr {G}}}(\langle \rho '(\xi ), \eta \rangle )+ D^2{\mathbb {E}}_{{\mathscr {G}}}(\Vert \eta \Vert ^2).\end{aligned}$$

It remains to note that \({\mathbb {E}}_{{\mathscr {G}}}\langle \rho '(\xi ), \eta \rangle = \langle \rho '(\xi ), {\mathbb {E}}_{{\mathscr {G}}}\eta \rangle = 0\).

For the second assertion note that the function \(\zeta :{{\mathbb {C}}}\rightarrow {{\mathbb {C}}}\) defined by \(\zeta (z) := \cosh (z^{1/2})\), is entire. Let \(u(t) : = \cosh (\Vert x+ty\Vert ) = \cosh (w(t))= \zeta (\rho _{x,y}(t))\). By (2.6),

$$\begin{aligned} u(t)&= u(0) + t u'(0) + \int _0^t (t-s) u''(s) \,{\mathrm{d}}s \\&\leqslant \cosh (\Vert x\Vert ) + t \rho _{x,y}'(0)\zeta '(\Vert x\Vert ^2) \langle \rho '(x), y\rangle \\&\quad + D^2 \Vert y\Vert ^2\!\!\int _0^t (t-s)\cosh (\Vert x+sy\Vert )\,{\mathrm{d}}s. \end{aligned}$$

Since \(\cosh (\Vert x+sy\Vert ) \leqslant \cosh (\Vert x\Vert +s\Vert y\Vert )\leqslant e^{s\Vert y\Vert }\cosh (\Vert x\Vert )\) for \(s\ge 0\), the integral on the right-hand side satisfies

$$\begin{aligned} \Vert y\Vert ^2\int _0^t (t-s) \cosh (\Vert x+sy\Vert ) \,{\mathrm{d}}s&\leqslant \cosh (\Vert x\Vert ) \Vert y\Vert ^2\int _0^t (t-s) e^{s\Vert y\Vert }\,{\mathrm{d}}s \\&= \cosh (\Vert x\Vert ) (e^{t\Vert y\Vert } - 1 - t\Vert y\Vert ). \end{aligned}$$

Combining the estimates with \(x= \xi (\omega )\), \(y=\eta (\omega )\), \(t=1\), and taking conditional expectations, we obtain

$$\begin{aligned} \&{\mathbb {E}}_{{\mathscr {G}}}(\cosh (\Vert \xi +\eta \Vert )) \\&\quad \leqslant \cosh (\Vert \xi \Vert ) +\rho _{x,y}'(0) \zeta '(\Vert \xi \Vert ^2) {\mathbb {E}}_{{\mathscr {G}}}(\langle \rho '(\xi ), \eta \rangle ) \\&\qquad + D^2 {\mathbb {E}}_{{\mathscr {G}}}(e^{\Vert \eta \Vert }-1 - \Vert \eta \Vert ) \cosh (\Vert \xi \Vert ). \end{aligned}$$

The result follows from this by using once more that \({\mathbb {E}}_{{\mathscr {G}}}(\langle \rho '(\xi ), \eta \rangle ) = 0\). \(\square \)

Remark 2.5

Applying the first part of this lemma iteratively to Rademacher sums, we obtain the folklore result that (2, D)-smoothness implies martingale type 2 with constant D.

2.2 Stochastic integration in 2-smooth Banach spaces

Let \({\mathscr {H}}\) a Hilbert space. An \({\mathscr {H}}\)-isonormal process on \(\Omega \) is a mapping \({\mathscr {W}}: {\mathscr {H}}\rightarrow L^2(\Omega )\) with the following two properties:

  1. (i)

    For all \(h\in {\mathscr {H}}\) the random variable \({\mathscr {W}}h\) is Gaussian;

  2. (ii)

    For all \(h_1,h_2\in {\mathscr {H}}\) we have \({\mathbb {E}}({\mathscr {W}}h_1 \cdot {\mathscr {W}}h_2) = (h_1|h_2)\).

It is easy to see that every \({\mathscr {H}}\)-isonormal process is linear and that for all \(h_1,\dots ,h_N\in {\mathscr {H}}\) the \({{\mathbb {R}}}^N\)-valued random variable \(({\mathscr {W}}h_1,\dots , {\mathscr {W}}h_N)\) is jointly Gaussian. For more details the reader is referred to [51, 74].

If H is another Hilbert space, an H-cylindrical Brownian motion indexed by [0, T] is an isonormal process \(W: L^2(0,T;H) \rightarrow L^2(\Omega )\). Following common practice we write

$$\begin{aligned} W_t h:= W({\mathbf{1}}_{(0,t)}\otimes h), \qquad t\in [0,T], \ h\in H. \end{aligned}$$

For each \(h\in H\), the scalar-valued process \(Wh= (W_th)_{t\in [0,T]}\) is then a Brownian motion, which is standard if and only if h has norm one. Two such Brownian motions \(Wh_1\) and \(Wh_2\) are independent if and only if \(h_1\) and \(h_2\) are orthogonal in H. We say that W is adapted to the filtration \(({\mathscr {F}}_t)_{t\in [0,T]}\) on \(\Omega \) if \(W(f\otimes h)\in L^2(\Omega ,{\mathscr {F}}_t)\) for all \(f\in L^2(0,T)\) supported in (0, t) and all \(h\in H\). In what follows we always assume that H-cylindrical Brownian motions are adapted to \(({\mathscr {F}}_t)_{t\in [0,T]}\).

The space of finite rank operators from a Hilbert space H into a Banach space X is denoted by \(H\otimes X\). Every finite rank operator \(T\in H\otimes X\) can be represented in the form \(T = \sum _{n=1}^N h_n\otimes x_n\) with \((h_n)_{n=1}^N\) orthonormal in H and \((x_n)_{n=1}^N\) a sequence in X. We then define

$$\begin{aligned} \Vert T\Vert _{\gamma (H,X)}^2 = {\mathbb {E}}\Big \Vert \sum _{n=1}^N \gamma _n x_n\Big \Vert ^2, \end{aligned}$$
(2.7)

where \((\gamma _n)_{n=1}^N\) is a sequence of independent standard Gaussian random variables. It is an easy consequence of the preservation of joint Gaussianity under orthogonal transformations that the norm \(\Vert \cdot \Vert _{\gamma (H,X)}\) is well defined. The completion of \(H\otimes X\) with respect to this norm is denoted by \(\gamma (H,X)\). The natural inclusion mapping from \(H\otimes X\) into \({\mathscr {L}}(H,X)\) extends to a contractive inclusion mapping \(\gamma (H,X)\subseteq {\mathscr {L}}(H,X)\). A linear operator \(T\in {\mathscr {L}}(H, X)\) is said to be \(\gamma \)-radonifying if it belongs to \(\gamma (H,X)\). For \(1\le p<\infty \) the Kahane–Khintchine inequalities guarantee that replacing \(L^2\)-norms by \(L^p\)-norms in (2.7) gives an equivalent norm on \(\gamma (H,X)\). The space \(\gamma (H,X)\), when endowed with this equivalent norm, will be denoted by \(\gamma _p(H,X)\).

For Hilbert spaces K we have

$$\begin{aligned} \gamma (H,K) ={\mathscr {L}}_2(H,K) \end{aligned}$$

isometrically, where \({\mathscr {L}}_2(H,K)\) is the space of Hilbert–Schmidt operators from H to K. For \(1\le p<\infty \) and any Banach space X the identity mapping on \(H\otimes L^p(\mu )\) extends to an isometric isomorphism of Banach spaces

$$\begin{aligned} \gamma _p(H, L^p(\mu )) \simeq L^p(\mu ;H). \end{aligned}$$
(2.8)

For \(H = L^2(\nu )\) this identifies \(\gamma (L^2(\nu ), L^p(\mu ))\) with the space \(L^p(\mu ;L^2(\nu ))\) of ‘square functions’ using terminology from harmonic analysis. For more details the reader is referred to [49, Chapter 9].

A stochastic process \(\Phi :[0,T]\times \Omega \rightarrow {\mathscr {L}}(H,X)\) is called an adapted finite rank step process if there exist \(0=s_0<s_1<\ldots <s_n=T\), random variables \(\xi _{ij}\in L^\infty (\Omega ,{\mathscr {F}}_{s_{j-1}})\otimes X\) (the subspace of \(L^\infty (\Omega ;X)\) of strongly \({\mathscr {F}}_{s_{j-1}}\)-measurable random variables taking values in a finite-dimensional subspace of X) for \(i=1, \ldots , m\) and \(j=1, \ldots , n\), and an orthonormal system \(h_1,\dots , h_m\) in H such that

$$\begin{aligned} \Phi = \sum _{j=1}^{n} {\mathbf{1}}_{(s_{j-1},s_{j}]} \sum _{i=1}^m h_i\otimes \xi _{ij}. \end{aligned}$$
(2.9)

For such processes the stochastic integral with respect to the H-cylindrical Brownian motion W is defined by

$$\begin{aligned}\int _0^t \Phi _s \,{\mathrm{d}}W_s := \sum _{j=1}^n \sum _{i=1}^m (W_{s_{j}\wedge t}-W_{s_{j-1}\wedge t})h_i \otimes \xi _{ij}, \ \ t\in [0,T].\end{aligned}$$

Since \(t\mapsto W_th\), being a Brownian motion, has a continuous modification, it follows that \(t\mapsto \int _0^t\Phi _s \,{\mathrm{d}}W_s\) has a continuous modification. Such modifications will always be used in the sequel. It was shown by Neidhardt in his PhD thesis [73] (see also [22, 100]) that if \(\Phi \) is an adapted finite rank step process, then

$$\begin{aligned} {\mathbb {E}}\Big \Vert \int _0^T \Phi _t \,{\mathrm{d}}W_t\Big \Vert ^2\le D^2 \Vert \Phi \Vert _{L^2(\Omega ;L^2(0,T;\gamma (H,X)))}^2. \end{aligned}$$
(2.10)

By (2.10), standard localisation arguments, and Doob’s inequality, the stochastic integral can be extended to arbitrary progressively measurable processes \(\Phi : [0,T]\times \Omega \rightarrow \gamma (H,X)\) for which the \(L^2(0,T;\gamma (H,X))\)-norm is finite almost surely and the resulting stochastic integral process \((\int _0^t \Phi _s \,{\mathrm{d}}W_s)_{t\in [0,T]}\) has a continuous modification. At this juncture it is useful to observe that a process \(\Phi : [0,T]\times \Omega \rightarrow \gamma (H,X)\) is progressively measurable (as a process with values in the Banach space \(\gamma (H,X)\)) if and only if \(\Phi h: [0,T]\times \Omega \rightarrow X\) is progressively measurable (as a process with values in X) for all \(h\in H\); this follows from [49, Example 9.1.16].

The following version of the classical Burkholder inequality is the result of contributions of many authors [2, 8, 9, 22, 23, 75].

Proposition 2.6

Let X be a (2, D)-smooth Banach space, let W be an adapted H-cylindrical Brownian motion on \(\Omega \), and let \(0<p<\infty \). For all adapted finite rank step process \(\Phi : [0,T]\times \Omega \rightarrow \gamma (H,X)\) we have

$$\begin{aligned} {\mathbb {E}}\sup _{t\in [0,T]}\Big \Vert \int _0^t \Phi _s \,{\mathrm{d}}W_s\Big \Vert ^p\le C_{p,D}^p \Vert \Phi \Vert _{L^p(\Omega ;L^2(0,T;\gamma (H,X)))}^p, \end{aligned}$$

where \(C_{p,D}\) is a constant depending only on p and D.

By using Pinelis’s version of the Burkholder–Rosenthal inequalities [81], Seidler [90] has shown that the constant \(C_{p,D}\) has the same asymptotic behaviour for \(p\rightarrow \infty \) as in the scalar-valued setting, i.e.,

$$\begin{aligned} C_{p,D} = C_D O(\sqrt{p}) \ \ \hbox { as }p\rightarrow \infty . \end{aligned}$$

As a special case of our main result we will recover Seidler’s result, with \(C_{p,D} = 10 D \sqrt{p}\) if \(2\le p<\infty \), by setting \(S(t,s) \equiv I\) in Theorem 4.1.

As a consequence of Proposition 2.6 we obtain the following result, which will be useful in the error analysis of numerical schemes for SPDEs in Sect. 5.

Proposition 2.7

Let X be a (2, D)-smooth Banach space and let \(0< p< \infty \). Let \(\Phi := (\Phi ^{(k)})_{k=1}^n\) be a finite sequence in \( L^p_{{\mathscr {P}}}(\Omega ;L^2(0,T;\gamma (H,X)))\) and set

$$\begin{aligned}I^{\Phi }_n := \Big ({\mathbb {E}}\sup _{t\in [0,T], k\in \{1,\ldots ,n\}}\Big \Vert \int _0^t \Phi _s^{(k)} \,{\mathrm{d}}W_s\Big \Vert ^p\Big )^{1/p}.\end{aligned}$$

Then

$$\begin{aligned} I^{\Phi }_n&\leqslant C_{p,D} \sqrt{\log n} \Vert \Phi \Vert _{L^p(\Omega ;L^2(0,T;\gamma (H,\ell ^\infty _n(X))))} \text {if }n\geqslant 3, \end{aligned}$$
(2.11)
$$\begin{aligned} I^{\Phi }_n&\leqslant K_{p,D} \log n \Vert \Phi \Vert _{L^p(\Omega ;L^2(0,T;\ell ^\infty _n(\gamma (H,X))))} \text {if }n\geqslant 8. \end{aligned}$$
(2.12)

If \(2\le p<\infty \), these estimates holds with \(C_{p,D} = 10D\sqrt{2e p}\) and \(K_{p,D} = 10D e \sqrt{p}\).

The bound (2.12) is simpler to use, but (2.11) will give a better result in the applications later on.

Proof

The method of proof is inspired by [28]. The idea is to view the sequence \(\Phi = (\Phi ^{(k)})_{k=1}^n\) as an \(\ell ^q_n(X)\)-valued process for a clever choice of \(q= q(n) \in [2,\infty )\).

We begin with the proof of (2.11). Since \(\ell ^q_n(X)\) is \((2,D\sqrt{q})\)-smooth by Proposition 2.2, by Proposition 2.6 we have

$$\begin{aligned} I^{\Phi }_n&\leqslant \Big ({\mathbb {E}}\sup _{t\in [0,T]}\Big \Vert \int _0^t \Phi _s \,{\mathrm{d}}W_s\Big \Vert ^p_{\ell ^q_n(X)} \Big )^{1/p} \leqslant C_{p,q,D} \Vert \Phi \Vert _{L^p(\Omega ;L^2(0,T;\gamma (H,\ell ^q_n(X))))} \\&\leqslant C_{p,q,D} n^{1/q} \Vert \Phi \Vert _{L^p(\Omega ;L^2(0,T;\gamma (H,\ell ^\infty _n(X))))}, \end{aligned}$$

and if \(2\le p<\infty \) we may take \(C_{p,q,D} = 10D\sqrt{pq}\). The estimate (2.11) follows from this by taking \(q=2\log n\), which belongs to the interval \([2, \infty )\) if \(n\ge 3\).

To prove (2.12) we argue in the same way, but this time we use that for a sequence \(\Gamma :=(\Gamma _k)_{k=1}^n\) with \(\Gamma _k\in \gamma (H,X)\),

$$\begin{aligned} \Vert \Gamma \Vert _{\gamma (H,\ell ^q_n(X))}&\leqslant \Vert \Gamma \Vert _{\gamma _q(H,\ell ^q_n(X))} \\&= \Vert \Gamma \Vert _{\ell ^q_n(\gamma _q(H,X))}\leqslant n^{1/q} \Vert \Gamma \Vert _{\ell ^\infty _n(\gamma _q(H,X))}\leqslant n^{1/q} \sqrt{q} \Vert \Gamma \Vert _{\ell ^\infty _n(\gamma (H,X))}, \end{aligned}$$

applying the Kahane–Khintchine inequalities (see [49, Theorem 6.2.6]) in the last step. Now (2.12) follows by taking \(q= \log n\). \(\square \)

Remark 2.8

The same method of proof can be used to show that if X is (2, D)-smooth, then \(\ell ^\infty _n(X)\) has martingale type 2 with constant \(\sqrt{D^2 -2 + 2\log n}\) if \(n\geqslant 3\).

3 Extending Pinelis’s Burkholder–Rosenthal inequality

On the probability space \((\Omega ,{\mathscr {F}},{\mathbb {P}})\) we consider a finite filtration \(({\mathscr {F}}_j)_{j= 0}^k\) and denote by \({\mathbb {E}}_j:= {\mathbb {E}}_{{\mathscr {F}}_j}\) the conditional expectation with respect to \({\mathscr {F}}_{j}\). When \( (f_j)_{j= 0}^k\) is an X-valued martingale with respect to \(({\mathscr {F}}_j)_{j= 0}^k\), we denote by \((df_j)_{j=1}^k\) its difference sequence, i.e., \(df_j := f_j - f_{j-1}\). We further define the non-negative random variables \(f_j^\star \) (for \(0\le j\le k\)) and \(df_j^\star \) and \(s_j(f)\) (for \(1\le j\le k)\) by

$$\begin{aligned} f_j^\star := \max _{0\le i\le j} \Vert f_i\Vert , \quad df_j^\star := \max _{1\le i\le j} \Vert df_i\Vert , \quad s_j(f):= \Bigl (\sum _{i=1}^j {\mathbb {E}}_{i-1} \Vert df_{i}\Vert ^2\Bigr )^{1/2}, \end{aligned}$$

and we set \(f^\star := f^\star _k\), \(df^\star := df^\star _k\), and \(s(f) := s_k(f)\).

If \({\mathscr {G}}\) is a sub-\(\sigma \)-algebra of \({\mathscr {F}}\), we call the X-valued random variables \(\xi \) and \(\eta \) conditionally equi-distributed given \({\mathscr {G}}\) if for all Borel sets \(B\subseteq X\) we have

$$\begin{aligned} {\mathbb {E}}_{\mathscr {G}}{\mathbf{1}}_{\{\xi \in B\}} = {\mathbb {E}}_{\mathscr {G}}{\mathbf{1}}_{\{\eta \in B\}}. \end{aligned}$$

As in [48, Lemma 4.4.5] one sees that this is equivalent to the requirement that

$$\begin{aligned} {\mathbb {E}}(f(\xi )|{\mathscr {G}}) = {\mathbb {E}}(f(\eta )|{\mathscr {G}}) \end{aligned}$$
(3.1)

for all measurable functions \(f:X \rightarrow X\) such that \(f(\xi ), f(\eta )\in L^1(\Omega ;X)\).

An adapted X-valued sequence \((\xi _j)_{j=1}^k\) is called conditionally symmetric given \(({\mathscr {F}}_j)_{j=0}^k\) if for all Borel sets \(B\subseteq X\) and \(1\le j\le k\) the random variables \(\xi _j\) and \(-\xi _j\) are conditionally equi-distributed given \({\mathscr {F}}_{j-1}\). Taking \(f(x) = {\mathbf{1}}_{\{\Vert x\Vert \leqslant r\}}x\) in (3.1), it follows that for conditionally symmetric sequences we have \( {\mathbb {E}}_{j-1} ({\mathbf{1}}_{\{\Vert \xi _j\Vert \leqslant r\}}\xi _j) = - {\mathbb {E}}_{j-1} ({\mathbf{1}}_{\{\Vert \xi _j\Vert \leqslant r\}}\xi _j) \), i.e.,

$$\begin{aligned} {\mathbb {E}}_{j-1} ({\mathbf{1}}_{\{\Vert \xi _j\Vert \leqslant r\}}\xi _j) = 0. \end{aligned}$$
(3.2)

A random operator on X is a mapping \(V:\Omega \rightarrow {\mathscr {L}}(X)\) such that \(\omega \mapsto V(\omega )x\) is strongly measurable for all \(x\in X\), and a random contraction on X is a random operator on X whose range consists of contractions.

The main result of this section is the following extension of Pinelis’s version of the Rosenthal–Burkholder inequality [81]. Recently, other extensions of some of Pinelis’s estimates for p-smooth Banach spaces have been obtained in [66].

Theorem 3.1

Let X be a (2, D)-smooth Banach space. Suppose that \( (f_j)_{j= 0}^k\) is an adapted sequence of X-valued random variables, \( (g_j)_{j=0}^k\) is an X-valued martingale, \((V_j)_{j=1}^k\) is a sequence of random contractions on X which is strongly predictable (i.e., each \(V_jx\) is strongly \({\mathscr {F}}_{j-1}\) measurable for all \(x\in X\)), and assume that we have \(f_0=g_0=0\) and

$$\begin{aligned}f_j = V_{j} f_{j-1} + dg_j, \qquad j=1, \ldots , k.\end{aligned}$$

Then for all \(2\le p<\infty \) we have

$$\begin{aligned} \Vert f^\star \Vert _p \le 30p \Vert dg^\star \Vert _p + 40D\sqrt{p} \Vert s(g)\Vert _p.\end{aligned}$$

If, moreover, \((g_j)_{j=0}^k\) has conditionally symmetric increments, then

$$\begin{aligned} \Vert f^\star \Vert _p \le 5p \Vert dg^\star \Vert _p + 10D\sqrt{p} \Vert s(g)\Vert _p.\end{aligned}$$

Here and in the rest of the paper, \(\Vert \cdot \Vert _p\) is the norm of \(L^p(\Omega )\). The proof of Theorem 3.1 closely follows that of [81, Theorem 4.1] (which, up to the value of the constants, corresponds to taking \(V_j = I\) and \(g_j = f_j\)). We point out that even in the case \(p=2\), Theorem 3.1 is not obvious because the additional predictable sequence \((V_j)_{j=1}^{k}\) destroys the martingale structure of f.

The proof in [81] is written up rather concisely and therefore we shall present the proof of Theorem 3.1 in full detail. At the same time this provides the opportunity to give more precise information on the constants.

We need some auxiliary results, the first of which is a classical ‘good \(\lambda \)’ inequality (see [11, Lemma 7.1]).

Lemma 3.2

Suppose that g and h are non-negative random variables and suppose that \(\beta > 1\), \(\delta > 0\), and \(\varepsilon > 0\) are such that for all \(\lambda >0\) we have

$$\begin{aligned} {\mathbb {P}}(g> \beta \lambda ,\, h< \delta \lambda ) < \varepsilon {\mathbb {P}}(g > \lambda ). \end{aligned}$$

If \(1\le p<\infty \) and \(\beta ^p\varepsilon <1\), then

$$\begin{aligned} {\mathbb {E}}g^p \le \frac{(\beta /\delta )^p}{1-\beta ^p \varepsilon } {\mathbb {E}}h^p. \end{aligned}$$

The next lemma is a minor extension of [81, Theorem 3.4].

Lemma 3.3

Suppose that \((g_j)_{j=0}^k\) is a martingale with values in a (2, D)-smooth Banach space X with \(g_0=0\) and let \((h_j)_{j=0}^{k-1}\) be an adapted sequence of random variables with values in X. Set

$$\begin{aligned} f_0:=0, \qquad f_j := h_{j-1} + dg_j, \quad 1\le j\le k, \end{aligned}$$

and assume that \(\Vert h_{j}\Vert \leqslant \Vert f_{j}\Vert \) almost surely for all \(0\le j\le k-1\). Suppose further that \(\Vert dg^\star \Vert _\infty \le a\) and \(\Vert s(g)\Vert _\infty \le b/D\) for some \(a>0\) and \(b>0\). Then for all \(r> 0\) we have

$$\begin{aligned} {\mathbb {P}}(f^\star \ge r) \le 2\Bigl (\frac{eb^2}{ra}\Bigr )^{r/a}. \end{aligned}$$

Proof

We begin by noting that the almost sure conditions \(f_0 = 0\), \(\Vert h_{j-1}\Vert \le \Vert f_{j-1}\Vert \), \(f_j := dg_j + h_{j-1}\), and \(dg_j\le a\) imply that the random variables \(h_{j-1}\) and \(f_j\), \(j=1,\dots ,k\), are essentially bounded and \(h_0=0\) almost surely.

Fix \(\lambda >0\) and \(1\le j\le k\). By Lemma 2.4,

$$\begin{aligned} {\mathbb {E}}_{j-1}\cosh (\lambda \Vert f_j\Vert )&= {\mathbb {E}}_{j-1}\cosh (\lambda \Vert h_{j-1} + dg_j\Vert ) \\&\leqslant \Bigl (1+ D^2 {\mathbb {E}}_{j-1} (e^{\lambda \Vert dg_j\Vert } -1 -\lambda \Vert dg_j\Vert ) \Bigr )\cosh (\lambda \Vert h_{j-1}\Vert ) \\&\leqslant \Bigl (1+ D^2 {\mathbb {E}}_{j-1} (e^{\lambda \Vert dg_j\Vert } -1 -\lambda \Vert dg_j\Vert ) \Bigr )\cosh (\lambda \Vert f_{j-1}\Vert ) \\&=:(1+e_j)\cosh (\lambda \Vert f_{j-1}\Vert ). \end{aligned}$$

Note that the random variables \(e_j\) are non-negative. This means that the sequence \((G_j)_{j=0}^k\) defined by

$$\begin{aligned} G_0=1, \quad G_j:= \Bigl (\prod _{i=1}^j(1+e_i)\Bigr )^{-1}\cosh (\lambda \Vert f_{j}\Vert ), \quad j=1,\dots ,k, \end{aligned}$$

is a positive supermartingale. Fix \(r>0\) and set \(\tau := \min \{1\le j\le k: \, \Vert f_j\Vert \ge r\}\) on the set \(\{f^\star \ge r\} = \{\max _{1\le j\le k} \Vert f_j\Vert \ge r\}\) and \(\tau := \infty \) on its complement. By the optional sampling theorem, the sequence \((G_{\tau \wedge j})_{j=0}^k\) is a positive supermartingale. It follows that \( {\mathbb {E}}{\mathbf{1}}_{\{\tau \le k\}}G_\tau \le {\mathbb {E}}G_{\tau \wedge k} \le {\mathbb {E}}G_0 = 1.\) Therefore, by the inequality \(\cosh u > \frac{1}{2}e^u\) and Chebyshev’s inequality,

$$\begin{aligned} {\mathbb {P}}(f^\star \ge r) = {\mathbb {P}}(\tau \le k)&={\mathbb {P}}\Bigl (\tau \le k,\, G_\tau \ge \Bigl \Vert \prod _{j=1}^k (1+e_j)\Bigr \Vert _\infty ^{-1}\cosh (\lambda r)\Bigr ) \\&\le {\mathbb {P}}\Bigl (\tau \le k,\,G_\tau \ge \frac{1}{2}\Bigl \Vert \prod _{j=1}^k (1+e_j)\Bigr \Vert _\infty ^{-1}e^{\lambda r}\Bigr ) \\&\le 2 \exp (-\lambda r) \Big \Vert \prod _{j=1}^k (1+e_j)\Bigr \Vert _\infty {\mathbb {E}}{\mathbf{1}}_{\{\tau \le k\}}G_\tau \\&\le 2 \exp (-\lambda r) \Big \Vert \prod _{j=1}^k (1+e_j)\Bigr \Vert _\infty \le 2 \exp \Bigl (-\lambda r + \Bigl \Vert \sum _{j=1}^k e_j\Bigr \Vert _\infty \Bigr ), \end{aligned}$$

the last inequality being elementary.

The function defined by \(\psi (0) := \frac{1}{2}\) and \(\psi (u) := (e^u -1-u)/u^2\) for \(u\not =0\) is increasing, and therefore for all \(\lambda >0\) we have

$$\begin{aligned} {\mathbb {E}}_{j-1} (e^{\lambda \Vert dg_j\Vert } -1 - \lambda \Vert dg_j\Vert ) \le \frac{1}{a^2} (e^{\lambda a} -1 - \lambda a){\mathbb {E}}_{j-1}\Vert dg_j\Vert ^2 . \end{aligned}$$

Combining this with the definition of the random variables \(e_j\) and the assumption \(\Vert s(g)\Vert _\infty \le b/D\), we obtain the pointwise inequalities

$$\begin{aligned} \sum _{i=1}^k e_i&= D^2\sum _{i=1}^k {\mathbb {E}}_{j-1} (e^{\lambda \Vert dg_j\Vert } -1 -\lambda \Vert dg_j\Vert ) \\&\le \frac{D^2}{a^2} (e^{\lambda a} -1 - \lambda a) \sum _{i=1}^k {\mathbb {E}}_{j-1}\Vert dg_j\Vert ^2 \le \frac{b^2}{ a^2}(e^{\lambda a} -1 - \lambda a). \end{aligned}$$

Taking the supremum norm and substituting the result into above tail estimate for \(f^\star \) we arrive at

$$\begin{aligned} {\mathbb {P}}(f^\star \ge r)&\le 2 \exp \Bigl (-\lambda r + \frac{b^2}{ a^2}(e^{\lambda a} -1 - \lambda a)\Bigr ). \end{aligned}$$

Up to this point the choice of \(\lambda >0\) was arbitrary. Optimising the choice of \(\lambda >0\) leads to the estimate

$$\begin{aligned} {\mathbb {P}}(f^\star \ge r)&\le 2 \exp \Bigl (\frac{r}{a} - \Bigl (\frac{r}{a}+ \frac{b^2}{a^2}\Bigr )\ln \Bigl (1+\frac{ra}{b^2}\Bigr )\Bigr ) \end{aligned}$$

which, by elementary estimates, implies the inequality in the statement of the lemma. \(\square \)

The next lemma gives a sufficient condition in order that Lemma 3.2 can be applied and extends [81, Lemma 4.2]. Terminology is as in Theorem 3.1.

Lemma 3.4

Let X be a (2, D)-smooth Banach space X. Suppose that \((g_j)_{j=0}^k\) is a martingale with values in X with \(g_0=0\) such that each \(dg_j\) is \({\mathscr {F}}_{j-1}\)-conditionally symmetric, the sequence of random operators \((V_j)_{j=1}^k\) on X is strongly predictable and contractive. Let \((f_j)_{j=0}^k\) be the sequence of random variables defined by

$$\begin{aligned} f_0 := 0, \qquad f_j := V_{j} f_{j-1} + dg_j, \quad j= 1, \ldots , k. \end{aligned}$$

Then for all \(\lambda ,\delta _1, \delta _2>0\) and \(\beta > 1+\delta _2\) we have

$$\begin{aligned} {\mathbb {P}}( f^\star> \beta \lambda , \, w \le \lambda ) \le \varepsilon {\mathbb {P}}(f^\star >\lambda ), \end{aligned}$$

where

$$\begin{aligned} w = (\delta _2^{-1}dg^\star ) \vee (\delta _1^{-1}Ds(g)), \quad \varepsilon = 2\Bigg (\frac{e\delta _1^2}{N\delta _2^2}\Bigg )^N, \quad N = \frac{\beta -1-\delta _2}{\delta _2}. \end{aligned}$$

Proof

Fix \(\lambda ,\delta _1, \delta _2>0\) and \(\beta > 1+\delta _2\). Setting \(\overline{g}_0:= 0\) and

$$\begin{aligned}\overline{g}_j := \sum _{i=1}^j {\mathbf{1}}_{\{\Vert dg_i\Vert \le \delta _2 \lambda \}} dg_i,\quad j=1,\dots ,k,\end{aligned}$$

by (3.2) we have \({\mathbb {E}}_{j-1}d\overline{g}_j = 0\). Set \(\overline{f_0}:= f_0=0\), \(h_0:=0\), and

$$\begin{aligned} {\overline{f}}_j := V_{j}{\overline{f}}_{j-1} + d{\overline{g}}_j , \quad h_j := V_{j} h_{j-1}+{\mathbf{1}}_{\{\mu <j\le \tau \wedge \nu \}}d{\overline{g}}_j, \quad j=1,\dots ,k, \end{aligned}$$

where the stopping times \(\mu \), \(\nu \), and \(\tau \) are defined by

$$\begin{aligned} \mu&:= \inf \{0\le j\le k:\, \Vert \overline{f}_j\Vert>\lambda \}, \\ \nu&:= \inf \{0\le j\le k:\, \Vert \overline{f}_j\Vert>\beta \lambda \}, \\ \tau&:= \inf \{0\le j\le k-1:\, s_{j+1}(\overline{g}) > \delta _1 D^{-1} \lambda \}; \end{aligned}$$

we set \(\mu := \infty \), \(\nu :=\infty \), and \(\tau :=\infty \) if the respective sets over which the infima are taken are empty. Note that the sequence \((h_j)_{j=0}^k\) is adapted. Notice that \(h_j = 0\) on the set \(\{j\le \mu \}\); in particular \(h_\mu = 0\).

On the set \(\{w \le \lambda \}\) we have \(dg^\star \le \delta _2\lambda \) and in particular \(\Vert dg_i\Vert \le \delta _2\lambda \) and therefore \(dg_i =d\overline{g}_i\) for all \(i=0,\dots ,k\), so \(\overline{f}_j = f_j\) for all \(j=0\dots k\). It follows that

$$\begin{aligned} {\mathbb {P}}( f^\star> \beta \lambda , \, w \le \lambda ) = {\mathbb {P}}(\overline{f}^\star > \beta \lambda , \, w \le \lambda ). \end{aligned}$$

It also follows that \(s(\overline{g}) \le \delta _1D^{-1}\lambda \), so \(\tau = \infty \).

On the set \(\{\overline{f}^\star > \beta \lambda \}\) we have \(\mu \le \nu \le k\), \(\Vert \overline{f}_{\mu -1}\Vert \le \lambda \), and \(\Vert \overline{f}_{\nu }\Vert > \beta \lambda \). Consequently, for any contraction S on X, on the set \( \{\overline{f}^\star > \beta \lambda , \,w \le \lambda \}\) we have

$$\begin{aligned} \Vert \overline{f}_{\nu } - S \overline{f}_{\mu }\Vert \geqslant \Vert \overline{f}_{\nu }\Vert - \Vert \overline{f}_{\mu }\Vert \geqslant \Vert \overline{f}_{\nu }\Vert - \Vert \overline{f}_{\mu -1}\Vert -\Vert d\overline{g}_{\mu }\Vert > \beta \lambda - \lambda - \delta _2\lambda . \end{aligned}$$

On this set we also have

$$\begin{aligned} \begin{aligned} h_{\nu }&= 0 \ \ \hbox {if }\mu =\nu , \\ h_{\nu }&= \overline{f}_{\nu } - V_{\nu ,\mu }\overline{f}_{\mu } \ \ \hbox {if } \mu <\nu ,\hbox { where} \ \ V_{\nu ,\mu }= V_{\nu }\circ \ldots \circ V_{\mu +1}. \end{aligned} \end{aligned}$$
(3.3)

The first identity in (3.3) follows from \( h_\mu = V_{\mu }h_{\mu -1}= \dots = V_{\mu }\circ \dots \circ V_1 h_0 = 0\), recalling that \(h_0=0\). The second identity follows from \(h_\mu =0\) and induction pointwise on \(\Omega \), noting that if \(\mu \leqslant n < n+1\le \nu \), then

$$\begin{aligned} h_{n+1} = V_{n+1} h_{n}+d{\overline{g}}_{n+1} = V_{n+1} (\overline{f}_{n} - V_{n,\mu }\overline{f}_{\mu })+d{\overline{g}}_{n+1} = \overline{f}_{n+1} - V_{n+1,\mu } \overline{f}_{\mu }, \end{aligned}$$

where we used the definitions of h and f, the linearity of \(V_{n+1}\), and the induction hypothesis. Therefore, on the set \( \{\overline{f}^\star > \beta \lambda , \,w \le \lambda \}\), we obtain

$$\begin{aligned} h^\star \ge \Vert h_{\nu }\Vert= & {} \Vert \overline{f}_{\nu } - V_{\nu ,\mu } \overline{f}_{\mu }\Vert > (\beta -1-\delta _2)\lambda . \end{aligned}$$

We have shown that

$$\begin{aligned} {\mathbb {P}}(\overline{f}^\star> \beta \lambda , \, w \le \lambda ) = {\mathbb {P}}(\overline{f}^\star> \beta \lambda , \, w \le \lambda ) \le {\mathbb {P}}(h^\star > (\beta -1-\delta _2)\lambda ). \end{aligned}$$

Let \(0\le n\le k\) be such that \({\mathbb {P}}(\Omega _n)>0\), with \(\Omega _n := \{\mu =n\}\). We claim that (a) the random variables \({\mathbf{1}}_{\{\mu <j\le \tau \wedge \nu \}}d{\overline{g}}_j\) form a martingale difference sequence on the probability space \((\Omega _n, {\mathscr {F}}|_{\Omega _n}, {\mathbb {P}}_n)\), where \({\mathscr {F}}|_{\Omega _n} := \{F\cap \Omega _n:\, F\in {\mathscr {F}}\}\) and \({\mathbb {P}}_n:={\mathbb {P}}/{\mathbb {P}}(\Omega _n)\), and (b) for this martingale difference sequence the conditions of Lemma 3.3 are satisfied on the probability space \(\Omega _n\), with \(f_j\), \(g_j\), and \(h_j\) replaced by the restrictions to \(\Omega _n\) of \(h_j\), \(\gamma _j:= {\mathbf{1}}_{\{\mu <j\le \tau \wedge \nu \}}d{\overline{g}}_j\), and \( V_{j+1} h_j\) respectively, and with \(a = \delta _2\lambda \), and \(b = \delta _1\lambda \).

Indeed, fix \(1\le j\le k\). If \(j\le n\), then \(j\le \mu \) on \(\Omega _n\) and therefore \(\gamma _j = {\mathbf{1}}_{\{\mu <j\le \tau \wedge \nu \}}d{\overline{g}}_j =0\) on \(\Omega _n\). If \(j>n\), then \(\{\mu <j\le \tau \wedge \nu \}\cap \Omega _n = \{j\le \tau \wedge \nu \}\cap \Omega _n\) is \({\mathscr {F}}_{j-1}\)-measurable as a subset of \(\Omega \) and \({\mathscr {F}}_{j-1}|_{\Omega _n}\)-measurable as a subset of \(\Omega _n\) and consequently for all \(F\in {\mathscr {F}}_{j-1}|_{\Omega _n}\subseteq {\mathscr {F}}_{j-1}\) we obtain

$$\begin{aligned}\int _{F} \gamma _j \,{\mathrm{d}}{\mathbb {P}}_n = \frac{1}{{\mathbb {P}}(\Omega _n)}\int _{F\cap \{j\le \tau \wedge \nu \}} d\overline{g}_j \,{\mathrm{d}}{\mathbb {P}}= 0\end{aligned}$$

since \({\mathbb {E}}_{j-1}d\overline{g}_j=0\). This proves part (a) of the claim.

Turning to part (b) of the claim, the condition \(d{\gamma }^\star \leqslant \delta _2\lambda \) of Lemma 3.3 is immediate from the definition, and the adaptedness of \(V_{j+1} f_{j}\) as well as the pointwise inequalities \(\Vert V_{j+1} f_{j}\Vert \leqslant \Vert f_{j}\Vert \) are also clear. The pointwise inequality \(s({\gamma })\leqslant \delta _1D^{-1}\lambda \) on \(\Omega _n = \{\mu = n\}\) follows from

$$\begin{aligned} s(\gamma )&= \Big (\sum _{j=1}^{k} {\mathbb {E}}_{j-1}^{n}({\mathbf{1}}_{\{\mu <j\leqslant \tau \wedge \nu \}} \Vert d\overline{g}_j\Vert ^2)\Big )^{1/2} \\&= \Big (\sum _{j=n+1}^{k} {\mathbb {E}}_{j-1}^{n}({\mathbf{1}}_{\{j\leqslant \tau \wedge \nu \}} \Vert d\overline{g}_j\Vert ^2)\Big )^{1/2} \\&{\mathop {=}\limits ^{(*)}} \Big (\sum _{j=n+1}^{k} {\mathbb {E}}_{j-1}({\mathbf{1}}_{\{j\leqslant \tau \wedge \nu \}} \Vert d\overline{g}_j\Vert ^2)\Big )^{1/2} \\&= \Big (\sum _{j=n+1}^{\tau \wedge \nu \wedge k} {\mathbb {E}}_{j-1}(\Vert d\overline{g}_j\Vert ^2)\Big )^{1/2}\leqslant s_{\tau \wedge k}(\overline{g})\leqslant \delta _1D^{-1}\lambda , \end{aligned}$$

where \((*)\) follows from the \({\mathscr {F}}_{j-1}|_{\Omega _n}\)-measurability of \(\{j\leqslant \tau \wedge \nu \}\cap \Omega _n\) for \(j>n\) and the last step uses the definition of \(\tau \).

Putting together the various inequalities and applying Lemma 3.3 on the space \(\Omega _n\) as indicated above, taking \(r = (\beta -1-\delta _2)\lambda \), and using that \(h^\star =0\) on \(\{\mu = \infty \}\), by definition of \(\varepsilon \) we arrive at

$$\begin{aligned} {\mathbb {P}}( f^\star> \beta \lambda , \, w \le \lambda )&\le {\mathbb {P}}(h^\star> (\beta -1-\delta _2)\lambda ) \\&= \sum _{n\geqslant 0} {\mathbb {P}}(\mu = n) {\mathbb {P}}_n(h^\star> (\beta -1-\delta _2)\lambda ) \\&\le \sum _{n\geqslant 0} {\mathbb {P}}(\mu = n) \cdot 2 \Bigl (\frac{e (\delta _1\lambda )^2}{(\beta -1-\delta _2)\lambda (\delta _2\lambda )}\Bigr )^{(\beta -1-\delta _2)/\delta _2} \\&= \sum _{n\geqslant 0} {\mathbb {P}}(\mu = n) \cdot 2 \Bigl (\frac{e\delta _1^2}{N\delta _2^2}\Bigr )^{N} = \varepsilon {\mathbb {P}}(\mu <\infty ) = \varepsilon {\mathbb {P}}(f^\star >\lambda ). \end{aligned}$$

\(\square \)

Proof of Theorem 3.1

Step 1. We first consider the conditional symmetric case. Combining Lemmas 3.2 (with \(g = f^\star \) and \(h = w\)) and 3.4 (with the choice of \(\varepsilon \), N and w made there) we arrive at the estimate

$$\begin{aligned} \Vert f^\star \Vert _p&\le \frac{\beta }{(1-\beta ^p\varepsilon )^{1/p}} \Vert (\delta _2^{-1}dg^\star ) \vee (\delta _1^{-1}D s(g))\Vert _p, \end{aligned}$$

valid for all choices of \(\lambda >0\), \(\delta _1, \delta _2>0\), \(\beta > 1+\delta _2\) satisfying \( \beta ^p \varepsilon <1\).

With the choices

$$\begin{aligned} \delta _1 := \frac{1}{4\sqrt{p}}, \quad \delta _2 := \frac{1}{2p}, \quad \beta := 2 + \delta _2 = 2+\frac{1}{2p} \end{aligned}$$

we have \(N = \frac{\beta -1-\delta _2}{\delta _2} =2p\geqslant 4\), \(\beta ^{p/N} = (2+\frac{1}{2p})^{1/2} \le (\frac{9}{4})^{1/2} = \frac{3}{2}\) and \(\varepsilon = 2(e/8)^N\), so

$$\begin{aligned} (\beta ^p\varepsilon )^{1/N} = \beta ^{p/N} \cdot 2^{1/N} \frac{e}{8} \leqslant \frac{3}{2}\cdot 2^{1/4}\cdot \frac{e}{8}=:\theta \approx 0.60611\ldots <1, \end{aligned}$$

so Lemma 3.2 can be applied with these choices. This gives \(1-\beta ^p \varepsilon \ge 1-\theta ^N \ge 1-\theta ^4 \approx 0.8650\ldots \,\), so \(\beta /(1-\beta ^p \varepsilon )^{1/p} \le \frac{9}{4}\cdot (1-\theta ^4)^{1/2} \approx 2.0926\ldots \) and consequently

$$\begin{aligned} \Vert f^\star \Vert _p\le & {} \frac{\beta }{(1-\beta ^p\varepsilon )^{1/p}}\Bigl ( 2p\Vert dg^\star \Vert _p + 4\sqrt{p} D\Vert s(g)\Vert _p\Bigr )\\\leqslant & {} 5p \Vert dg^\star \Vert _p + 10 D\sqrt{p} \Vert s(g)\Vert _p. \end{aligned}$$

This completes the proof in the conditional symmetric case.

Step 2 The general case will be reduced to the conditional symmetric case. This is a variation of a standard symmetrisation argument (cf. the proof of [47, Theorem 4.1]). In view of the rather intricate setting and in order to obtain explicit constants, we present some details.

Using the terminology of [21, Chapter 6], let \((d\widetilde{g}_j)_{j=0}^k\) be the decoupled tangent sequence of \((dg_j)_{j=0}^k\) on a possibly enlarged probability space. There exists a \(\sigma \)-algebra \({\mathscr {G}}\) such that the sequence \((d\widetilde{g}_j)_{j=0}^k\) is \({\mathscr {G}}\)-conditionally independent and such that

$$\begin{aligned}{\mathbb {P}}(d\widetilde{g}_j\in \cdot |{\mathscr {G}}) ={\mathbb {P}}(d\widetilde{g}_j\in \cdot |{\mathscr {F}}_{j-1}) = {\mathbb {P}}(dg_j\in \cdot |{\mathscr {F}}_{j-1}).\end{aligned}$$

Moreover we may assume that \({\mathscr {G}} = {\mathscr {F}}_k\), trivially extending the latter \(\sigma \)-algebra to the larger probability space (see [21, p. 294]). Let \(\widetilde{f}_0 := f_0=0\) and \(\widetilde{f}_j := V_j \widetilde{f}_{j-1} + d\widetilde{g}_j\). Setting \(F_j := f_j-\widetilde{f}_j\) and \(G_j := g_j - \widetilde{g}_j\), we have \(F_0 = 0\) and \(F_j = V_j F_{j-1} + dG_j\). The differences \(dG_j\) are conditionally symmetric. Therefore, by the symmetric case of Theorem 3.1,

$$\begin{aligned} \Vert f^\star \Vert _p \leqslant \Vert F^\star \Vert _p + \Vert \widetilde{f}^\star \Vert _p \leqslant 5p \Vert dG^\star \Vert _p + 10D\sqrt{p} \Vert s(G)\Vert _p + \Vert \widetilde{f}^\star \Vert _p. \end{aligned}$$

We estimate each of the three terms on the right-hand side.

As in [46, Lemma 1 and p. 227],

$$\begin{aligned}\Vert dG^\star \Vert _p\leqslant \Vert dg^\star \Vert _p + \Vert d\widetilde{g}^\star \Vert _p\leqslant 3\Vert dg^\star \Vert _p.\end{aligned}$$

To estimate s(G) we note that \(s(G)\leqslant s(g) + s(\widetilde{g}) = 2s(g)\), where we used that \({\mathbb {E}}_{j-1} \Vert d\widetilde{g}_j\Vert ^2 ={\mathbb {E}}_{j-1} \Vert d{g}_j\Vert ^2\) (see [48, Lemma 4.4.5]). Thus

$$\begin{aligned} 5p \Vert dG^\star \Vert _p + 10D\sqrt{p} \Vert s(G)\Vert _p\leqslant 15p \Vert dg^\star \Vert _p + 20D\sqrt{p} \Vert s(g)\Vert _p \end{aligned}$$
(3.4)

To estimate \(\Vert \widetilde{f}^\star \Vert _p\), let \((d{\overline{g}}_j)_{j=1}^k\) be yet another decoupled tangent sequence of \((dg_j)_{j=1}^k\) on a further enlarged probability space. This sequence can be chosen in such a way that \((d{\overline{g}}_j)_{j=1}^k\) and \((d\widetilde{g}_j)_{j=1}^k\) are \({{\mathscr {G}}}\)-conditionally independent with \({\mathscr {G}}\) as before. Let \({\overline{f}}_0 := f_0=0\) and \({\overline{f}}_j := V_j {\overline{f}}_{j-1} + d{\overline{g}}_j\). Then also \(({\overline{f}}_j)_{j=0}^k\) and \((\widetilde{f}_j)_{j=0}^k\) are \({{\mathscr {G}}}\)-conditionally independent. Therefore, by Jensen’s inequality and the fact that \({\mathbb {E}}_{{\mathscr {G}}} {\overline{f}}_j = 0\) (which follows by induction using \({\mathbb {E}}_{{\mathscr {G}}} d{\overline{g}}_j =0\)),

$$\begin{aligned} {\mathbb {E}}_{{{\mathscr {G}}}}\Vert \widetilde{f}^\star \Vert ^p = {\mathbb {E}}_{{{\mathscr {G}}}}\Vert (\widetilde{f}_j)_{j=0}^k\Vert _{\ell ^\infty _k(X)}^p \leqslant {\mathbb {E}}_{{{\mathscr {G}}}}\Vert (\widetilde{f}_j)_{j=0}^k - ({\overline{f}}_j)_{j=0}^k\Vert _{\ell ^\infty _k(X)}^p = {\mathbb {E}}_{{{\mathscr {G}}}}|{\overline{F}}^\star |^p, \end{aligned}$$

where \({\overline{F}}_j = \widetilde{f}_j - {\overline{f}}_j\) and \({\overline{G}}_j = \widetilde{g}_j - {\overline{g}}_j\). Then \(F_0 = 0\) and \({\overline{F}}_j = V_j {\overline{F}}_{j-1} + d{\overline{G}}_j\). As before, \(({\overline{G}}_j)_{j=1}^n\) is conditionally symmetric and therefore, by the symmetric case of Theorem 3.1,

$$\begin{aligned} \Vert \widetilde{f}^\star \Vert _p&\leqslant \Vert {\overline{F}}^\star \Vert _p&\leqslant 5p \Vert d{\overline{G}}^\star \Vert _p + 10D\sqrt{p} \Vert s({\overline{G}})\Vert _p \\&\leqslant 15p \Vert dg^\star \Vert _p + 20D\sqrt{p} \Vert s(g)\Vert _p, \end{aligned}$$

where the last step is the same as (3.4).

The desired inequality is obtained by combining all estimates. \(\square \)

Remark 3.5

choices of the parameters \(\beta \), \(\delta _1\) and \(\delta _2\) lead to related inequalities, with a different behaviour of the constants in p. In particular, as in [81, Theorem 4.1] one can prove that there exists a constant C such that for all \(p\in [2, \infty )\)

$$\begin{aligned} \Vert f^\star \Vert _p \le \frac{C p}{\log p}(\Vert dg^\star \Vert _p + D \Vert s(g)\Vert _p),\end{aligned}$$

and the latter growth is known to be optimal in the scalar case (see [47]).

The next result extrapolates Theorem 3.1 to exponents \(0<p<2\). By using a variation of the method in [11, pp. 38-39], an estimate is obtained without the term \(\Vert dg^*\Vert _p\).

Corollary 3.6

Let X be a (2, D)-smooth Banach space. Suppose that \( (f_j)_{j= 0}^k\) is an adapted sequence of X-valued random variables, \( (g_j)_{j=0}^k\) is an X-valued martingale, \((V_j)_{j=1}^k\) is a sequence of random contractions on X which is strongly predictable (i.e., each \(V_jx\) is strongly \({\mathscr {F}}_{j-1}\) measurable for all \(x\in X\)), and assume that we have \(f_0=g_0=0\) and

$$\begin{aligned}f_j = V_{j} f_{j-1} + dg_j, \qquad j=1, \ldots , k.\end{aligned}$$

Then for all \(0<p<2\) we have

$$\begin{aligned} \Vert f^\star \Vert _p \le (300D)^{2/p}\Vert s(g)\Vert _p.\end{aligned}$$

If, moreover, \((g_j)_{j=0}^k\) has conditionally symmetric increments, then

$$\begin{aligned} \Vert f^\star \Vert _p \le (100D)^{2/p} \Vert s(g)\Vert _p.\end{aligned}$$

Proof

By Doob’s maximal inequality and the fact that X has martingale type 2 with constant D (by Remark 2.5)

$$\begin{aligned}\Vert dg^\star \Vert _2\leqslant 2 \Vert g^\star \Vert _2 \le 4\Vert g\Vert _2\leqslant 4D \Vert s(g)\Vert _2.\end{aligned}$$

Therefore, Theorem 3.1 implies

$$\begin{aligned} \Vert f^\star \Vert _2\leqslant ( 4A+B)D \Vert s(g)\Vert _2, \end{aligned}$$
(3.5)

where \((A,B) = (10, 10\sqrt{2})\) if g has conditionally symmetric increments and \((A,B) = (60, 40\sqrt{2})\) in the general case.

For non-negative random variables Z and exponents \(0<q<1\) we have the identity

$$\begin{aligned} {\mathbb {E}}|Z|^q = q(1-q) \int _0^\infty {\mathbb {E}}(Z\wedge \lambda ) \lambda ^{q-2} \,{\mathrm{d}}\lambda . \end{aligned}$$
(3.6)

Setting \(K = ( 4A+B)D+1\), we claim that

$$\begin{aligned}{\mathbb {E}}(|f^\star |^2\wedge \lambda ) \le K^2 {\mathbb {E}}(s(g)^2\wedge \lambda ), \ \ \ \ \lambda >0.\end{aligned}$$

Once this has been verified, upon taking \(q=p/2\), \(Z = |f^\star |^2\), and then \(Z= s(g)\) in (3.6), we obtain

$$\begin{aligned} {\mathbb {E}}|f^\star |^p&= q(1-q) \int _0^\infty {\mathbb {E}}(|f^\star |^2\wedge \lambda ) \lambda ^{q-2} \,{\mathrm{d}}\lambda \\&\leqslant K^2 q(1-q) \int _0^\infty {\mathbb {E}}(|s(g)|^2\wedge \lambda ) \lambda ^{q-2} \,{\mathrm{d}}\lambda =K^2 {\mathbb {E}}|s(g)|^p \end{aligned}$$

and the result follows.

To prove the claim, set \(\tau := \inf \{0\leqslant n\leqslant k-1: \sum _{j=1}^{n+1} {\mathbb {E}}_{j-1} \Vert dg_j\Vert ^2\geqslant \lambda \}\), with the convention that \(\tau := k\) if the set is empty. Let the adapted sequence of random variables \((F_{j})_{j=0}^k\) be defined by \(F_0 := 0\) and

$$\begin{aligned}F_j := W_j F_{j-1} + dG_j,\quad j=1,\dots , k,\end{aligned}$$

where \(W_j := V_j\) if \(0\le j\le \tau \), \(W_j := I\) if \(j>\tau \), and \(dG_j := {\mathbf{1}}_{\{0\le j\le \tau \}} dg_j\). One checks that \(f_{j\wedge \tau } = F_j\) for all \(j=0,\ldots , k\). Applying (3.5) to F gives

$$\begin{aligned} {\mathbb {E}}\sup _{0\leqslant j\leqslant k}\Vert f_{j\wedge \tau }\Vert ^2 = {\mathbb {E}}|F^{\star }|^2&\leqslant (4A+B)^2D^2 {\mathbb {E}}s(G)^2 \\&= (4A+B)^2D^2 {\mathbb {E}}\sum _{0\leqslant j\leqslant k} {\mathbf{1}}_{\{0\le j\le \tau \}} {\mathbb {E}}_{j-1} \Vert dg_j\Vert ^2 \\&\leqslant (4A+B)^2D^2{\mathbb {E}}(s(g)^2\wedge \lambda ). \end{aligned}$$

Since \(|f^{\star }|^2\wedge \lambda \leqslant \sup _{0\leqslant j\leqslant k}\Vert f_{j\wedge \tau }\Vert ^2 + \lambda {\mathbf{1}}_{\{\tau <k\}}\), we obtain

$$\begin{aligned} {\mathbb {E}}(|f^\star |^2\wedge \lambda ) \leqslant {\mathbb {E}}\sup _{0\leqslant j\leqslant k}\Vert f_{j\wedge \tau }\Vert ^2 + {\mathbb {E}}({\mathbf{1}}_{\{\tau <k\}}\lambda ) \leqslant K^2{\mathbb {E}}(s(g)^2\wedge \lambda ), \end{aligned}$$

which gives the claim. \(\square \)

4 Maximal inequalities for stochastic convolutions

A family \((S(t,s))_{0\le s\le t\le T}\) of bounded operators on a Banach space X is called a \(C_0\)-evolution family if:

  1. (1)

    \(S(t,t) = I\) for all \(t\in [0,T]\);

  2. (2)

    \(S(t,r) = S(t,s) S(s,r)\) for all \(0\le r\le s\le t\le T\);

  3. (3)

    The mapping \((t,s) \rightarrow S(t,s)\) is strongly continuous on the set \(\{0\le s\le t\le T\}\).

\(C_0\)-Evolution family typically arise as the solution operators for the linear time-dependent problem \(u'(t) = A(t)u(t)\) in much the same way as \(C_0\)-semigroups solve the time-independent problem \(u'(t) = Au(t)\). The reader is referred to [29, 78, 91] for systematic treatments. If \((S(t))_{t\ge 0}\) is a \(C_0\)-semigroup on X, then \(S(t,s):= S(t-s)\) defines a \(C_0\)-evolution family \((S(t,s))_{0\le s\le t\le T}\) for every \(0<T<\infty \).

4.1 The main result

The following theorem is the main result of this paper.

Theorem 4.1

Let \((S(t,s))_{0\leqslant s\leqslant t\leqslant T}\) be a \(C_0\)-evolution family of contractions on a (2, D)-smooth Banach space X and let W be an adapted H-cylindrical Brownian motion on \(\Omega \). Then for every \(g\in L_{{\mathscr {P}}}^0(\Omega ;L^2(0,T;\gamma (H,X)))\) the process \((\int _0^t S(t,s)g_s\,{\mathrm{d}}W_s)_{t\in [0,T]}\) has a continuous modification which satisfies, for all \(0<p<\infty \),

$$\begin{aligned}{\mathbb {E}}\sup _{t\in [0,T]}\Big \Vert \int _0^t S(t,s)g_s\,{\mathrm{d}}W_s\Big \Vert ^p\leqslant C_{p,D}^p \Vert g\Vert _{L^p(\Omega ;L^2(0,T;\gamma (H,X)))}^p,\end{aligned}$$

with a constant \(C_{p,D}\) depending only on p and D. For \(2\le p<\infty \) the inequality holds with \(C_{p,D} = 10D\sqrt{p}\).

The stochastic integral is well defined by (2.10). By rescaling, more generally it may be assumed that there exists a \(\lambda \geqslant 0\) such that that

$$\begin{aligned} \Vert S(t,s)\Vert \le e^{\lambda (t-s)}, \qquad 0\leqslant s\leqslant t\leqslant T. \end{aligned}$$

The estimate of the theorem then holds with constant \(C_{p,D}\) replaced with \(e^{\lambda T} C_{p,D} \).

Proof

The proof is split into four steps. In the first two steps we prove the theorem for \(2\le p<\infty \), in the third step we consider the case \(0<p<2\), and in the fifth the pathwise continuity assertion for \(p=0\).

Step 1. Fix a partition \(\pi := \{r_0,\dots ,r_N\}\), where \(0= r_0<r_1<\ldots <r_N=T\), and let \((K(t,s))_{0\leqslant s\leqslant t\leqslant T}\) be a family of contractions on X with the following properties:

  1. (i)

    \(K(t,\cdot )\) is constant on \([r_{j-1}, r_j)\) for all \(t\in [0,T]\) and \(j = 1,\ldots , N\);

  2. (ii)

    \(K(\cdot , s)\) is strongly continuous for all \(s\in [0,T]\);

  3. (iii)

    \(S(t,r) K(r,s) = K(t,s)\) for all \(0\le s\le r\le t\le T\).

Let \(g\in L_{{\mathscr {P}}}^p(\Omega ;L^2(0,T;\gamma (H,X)))\) and define the process \((v_t)_{t\in [0,T]}\) by

$$\begin{aligned} v_t := \int _0^t K(t,s) g_s \,{\mathrm{d}}W_s, \quad t\in [0,T]. \end{aligned}$$

Properties (i) and (ii) imply that the process \((v_t)_{t\in [0,T]}\) is well defined and has a modification with continuous paths. Indeed, for \(t\in [r_{j-1}, r_j]\)

$$\begin{aligned} \int _0^t K(t,s) g_s \,{\mathrm{d}}W_s = \sum _{k=1}^{j-1} K(t,r_{k-1}) \int _{r_{k-1}}^{r_{k}} g_s\,{\mathrm{d}}W_s + K(t,r_{j-1})\int _{r_{j-1}}^{t} g_s\,{\mathrm{d}}W_s, \end{aligned}$$

which can be seen to have a continuous modification. Working with such a modification, we will first prove that for all \(2\le p<\infty \) we have

$$\begin{aligned} \Big \Vert \sup _{t\in [0,T]}\Vert v_t\Vert \Big \Vert _p\leqslant 10D\sqrt{p}\Vert g\Vert _{L^p(\Omega ;L^2(0,T;\gamma (H,X)))}. \end{aligned}$$
(4.1)

By a limiting argument it suffices to consider \(p>2\).

For the proof of (4.1), by density we may assume that g is as in (2.9), i.e.,

$$\begin{aligned} g = \sum _{j=1}^{k} {\mathbf{1}}_{(s_{j-1},s_{j}]} \sum _{i=1}^\ell h_i\otimes \xi _{ij}, \end{aligned}$$

where \(0= s_0<s_1<\ldots <s_k=T\) and \(h_i\) and \(\xi _{ij}\) are as in (2.9). Refining \(\pi \) if necessary, we may assume that \(s_j\in \pi \) for all \(j=0,\dots ,k\). We prove (4.1) in two steps.

Step 1a. Let \(\pi ' = \{t_0, t_1, \ldots , t_m\}\subseteq [0,T]\) be another partition. It suffices to prove the bound

$$\begin{aligned} \Big \Vert \sup _{t\in \pi '}\Vert v_t\Vert \Big \Vert _p \le a_{\pi '} + 10D\sqrt{p} \Vert g\Vert _{L^p(\Omega ;L^2(0,T;\gamma (H,X)))} \end{aligned}$$
(4.2)

with \(a_{\pi '} = o(\hbox {mesh}(\pi '))\) as mesh\((\pi ')\rightarrow 0\). Refining \(\pi '\) if necessary, we may assume that \(\pi \subseteq \pi '\).

For fixed \(j=1,\ldots ,m\) we have, by property (iii),

$$\begin{aligned}f_{j} := v_{t_j}&= S(t_j, t_{j-1}) v_{t_{j-1}} + \int _{t_{j-1}}^{t_j} K(t_j,s) g_s \,{\mathrm{d}}W_s \\&=: V_{j} f_{j-1} + dG_j, \end{aligned}$$

where we set \(V_{j} := S(t_j, t_{j-1})\) and \(dG_j:=\int _{t_{j-1}}^{t_j} K(t_j,s) g_s \,{\mathrm{d}}W_s\). We further set \(f_0:=0\) and \(G_0:=0\). By using the symmetry of normally distributed random variables as in [48, Proposition 4.4.6] it is seen that the difference sequence \((dG_j)_{j=1}^m\) is conditionally symmetric. Therefore, by Theorem 3.1,

$$\begin{aligned} \Vert f^\star \Vert _p \le 5p \Vert dG^\star \Vert _p + 10D\sqrt{p} \Vert s(G)\Vert _p, \end{aligned}$$
(4.3)

where \(f = (f_j)_{j=0}^m\) and \(G = (G_j)_{j=0}^m\).

Step 1b. For all \(q\in [2, \infty )\) and all \(1\le j\le m\), the independence of \(W_{t_{j}}-W_{t_{j-1}}\) and \({\mathscr {F}}_{t_{j-1}}\) implies (see [105, 9.10])

$$\begin{aligned} {\mathbb {E}}_{j-1}\Vert dG_j\Vert ^q&= {\mathbb {E}}_{j-1}\Big \Vert \sum _{i=1}^\ell (W_{t_{j}}-W_{t_{j-1}})h_i K(t_j, t_{j-1})g_{t_{j-1}} h_i\Big \Vert ^q \\&\leqslant {\mathbb {E}}_{j-1}\Big \Vert \sum _{i=1}^\ell (W_{t_{j}}-W_{t_{j-1}})h_i g_{t_{j-1}} h_i\Big \Vert ^q \\&= \widetilde{\mathbb {E}}\Big \Vert \sum _{i=1}^\ell (t_{j} - t_{j-1})^{1/2} \widetilde{\gamma }_{ij} g_{t_{j-1}} h_i \Big \Vert ^q \\&= (t_{j} - t_{j-1})^{q/2} \Vert g_{t_{j-1}}\Vert _{\gamma _q(H,X)}^q, \end{aligned}$$

where \((\widetilde{\gamma }_{ij})_{i\geqslant 1,j\geqslant 1}\) is a doubly indexed Gaussian sequence on an independent probability space \((\widetilde{\Omega },\widetilde{\mathscr {F}},\widetilde{\mathbb {P}})\) and \(\gamma _q(H,X)\) denotes the space \(\gamma (H,X)\) endowed with the equivalent \(L^q\)-norm as discussed in Sect. 2.2. We used that \(K(t_j, s) = K(t_j, t_{j-1})\) and \(g_s = g_{t_{j-1}}\) for \(s\in [t_{j-1}, t_j)\). Consequently,

$$\begin{aligned} \begin{aligned} \sum _{j=1}^m {\mathbb {E}}_{j-1}\Vert dG_j\Vert ^q&= \sum _{j=1}^m (t_{j} - t_{j-1})^{q/2} \Vert g_{t_{j-1}}\Vert _{\gamma _q(H,X)}^q \\&\leqslant (\text {mesh}(\pi ))^{\frac{q}{2}-1} \Vert g\Vert _{L^2(0,T;\gamma _q(H,X))}^q. \end{aligned} \end{aligned}$$
(4.4)

Applying (4.4) with \(q=p\) and taking expectations, we obtain

$$\begin{aligned} \Vert dG^\star \Vert _p^p&\leqslant \Big \Vert \Big (\sum _{j=1}^m \Vert dG_j\Vert ^p \Big )^{1/p}\Big \Vert _p^p \\&= {\mathbb {E}}\sum _{j=1}^m \Vert dG_j\Vert ^p \leqslant (\text {mesh}(\pi ))^{\frac{p}{2}-1} {\mathbb {E}}\Vert g\Vert _{L^p(0,T;\gamma _p(H,X))}^p. \end{aligned}$$

Applying (4.4) with \(q=2\), we obtain

$$\begin{aligned}\Vert s(G)\Vert _p \leqslant \Vert g\Vert _{L^p(\Omega ;L^2(0,T;\gamma (H,X)))}.\end{aligned}$$

Substituting these bounds into (4.3), we obtain

$$\begin{aligned} \Big \Vert \sup _{t\in \pi '}\Vert v_t\Vert \Big \Vert _p = \Vert f^\star \Vert _p&\leqslant 5p \Vert dG^\star \Vert _p + 10D\sqrt{p} \Vert s(G)\Vert _p \\&\leqslant 5p\, (\text {mesh}(\pi '))^{\frac{1}{2}-\frac{1}{p}} \Vert g\Vert _{L^p(\Omega ;L^p(0,T;\gamma _p(H,X)))} \\&\qquad + 10D\sqrt{p}\Vert g\Vert _{L^p(\Omega ;L^2(0,T;\gamma (H,X)))}. \end{aligned}$$

Since \(p>2\), this proves (4.2) for finite rank adapted step processes g.

Step 2 Fix \(g\in L_{{\mathscr {P}}}^p(\Omega ;L^2(0,T;\gamma (H,X)))\) and \(n\in {{\mathbb {N}}}\). Set \(\sigma _n(s) := j 2^{-n}T\) for \(s\in [j2^{-n}T, (j+1)2^{-n}T)\) and define \(S_n(t,s) := S(t,\sigma _n(s))\) and

$$\begin{aligned}v^{(n)}_t := \int _0^t S_n(t,s) g_s \,{\mathrm{d}}W_s.\end{aligned}$$

The assumptions (i)–(iii) in Steps 1 and 2 apply to \(K(t,s) = S_n(t,s)\), \(N = 2^n\), and \(r_j = j2^{-n}T\). By what has been shown in these steps, the process \(v^{(n)}\) has a continuous modification. Moreover, noting that for \(n\geqslant m\) we have

$$\begin{aligned} v^{(n)}_t - v^{(m)}_t = \int _0^t S_n(t,s)(I- S(\sigma _n(s),\sigma _m(s))) g_s \,{\mathrm{d}}W_s, \end{aligned}$$

from Step 1 we obtain

$$\begin{aligned} \Big \Vert \sup _{t\in [0,T]} \Vert v^{(n)}-v^{(m)}\Vert \Big \Vert _p&\leqslant 10D\sqrt{p} \big \Vert (I- S(\sigma _n(\cdot ),\sigma _m(\cdot )))g\big \Vert _{L^p(\Omega ;L^2(0,T;\gamma (H,X)))}. \end{aligned}$$

Since the right-hand side tends to zero by the dominated convergence theorem, \((v^{(n)})_{n\geqslant 1}\) is a Cauchy sequence in \(L^p(\Omega ;C([0,T];X))\) and hence converges to some \(\widetilde{v}\) in \(L^p(\Omega ;C([0,T];X))\). On the other hand, for all \(t\in [0,T]\) we have

$$\begin{aligned} v^{(n)}_t\rightarrow \int _0^t S(t,s) g_s \,{\mathrm{d}}W_s =: u_t \end{aligned}$$

with convergence in \(L^2(\Omega ;X)\). Therefore, \(\widetilde{v}\) is the required continuous modification of u. Applying Step 1 again we obtain

$$\begin{aligned} \Big \Vert \sup _{t\in [0,T]} \Vert u_t\Vert \Big \Vert _{p} = \lim _{n\rightarrow \infty } \Big \Vert \sup _{t\in [0,T]} \Vert v^{(n)}_t\Vert \Big \Vert _{p}\leqslant 10D\sqrt{p} \Vert g\Vert _{L^p(\Omega ;L^2(0,T;\gamma (H,X)))}. \end{aligned}$$

Step 3 In the case \(0<p<2\) one can argue in the same way as in the previous steps, using Corollary 3.6 instead of Theorem 3.1. The estimate (4.3) simplifies as the term \(\Vert dG^*\Vert _p\) does not appear anymore. Alternatively, one could use a standard extrapolation argument involving Lenglart’s inequality [87, Proposition IV.4.7].

Step 4 The continuity assertion for \(p=0\) follows by a standard localisation argument. \(\square \)

As a consequence of Theorem 4.1, a simple optimisation argument in the exponent p gives the following exponential tail estimate (see [95, Corollary 4.4] for details).

Corollary 4.2

(Exponential tail estimate) If, in addition to the conditions of Theorem 4.1, we have \(g\in L^\infty (\Omega ;L^2(0,T;\gamma (H,X)))\), then

$$\begin{aligned}{\mathbb {P}}\Bigg (\sup _{t\in [0,T]}\Big \Vert \int _0^t S(t,s)g_s \,{\mathrm{d}}W_s\Big \Vert \geqslant r\Bigg ) \leqslant 2\exp \Bigg (-\frac{r^2}{2\sigma ^2}\Bigg ), \qquad r>0,\end{aligned}$$

where \(\sigma ^2 = 100eD^2\Vert g\Vert _{L^\infty (\Omega ;L^2(0,T;\gamma (H,X)))}^2\).

This method to derive exponential tail estimates only uses that the constant \(C_{p,X}\) in the maximal estimate has order \(O(\sqrt{p})\) for \(p\rightarrow \infty \). By the same method, similar exponential tail estimates can therefore be deduced from all other results in this paper where the constant is of asymptotic order \(O(\sqrt{p})\) .

Remark 4.3

Under additional assumptions on the evolution family (which are satisfied in the case of \(C_0\)-semigroups of contractions), a variant of Itô’s formula can be used to give an alternative proof of the estimate of Corollary 4.2 with sharper variance \(\sigma ^2 = 2D^2\Vert g\Vert _{L^\infty (\Omega ;L^2(0,T;\gamma (H,X)))}^2\) (see [95, Theorem 5.6]).

4.2 The non-contractive case

We briefly discuss two sets of sufficient conditions for the existence of continuous versions and the validity of maximal estimates for general (i.e., not necessarily contractive) \(C_0\)-evolution families \((S(t,s))_{0\leqslant s\leqslant t\leqslant T}\). The first of these replaces the condition ‘\(g\in L_{{\mathscr {P}}}^0(\Omega ;L^2(0,T;\gamma (H,X)))\)’ by ‘\(g\in L_{{\mathscr {P}}}^0(\Omega ;L^q(0,T;\gamma (H,X)))\) for some \(q>2\)’. Under this stronger assumption, a maximal inequality for general \(C_0\) semigroups on Hilbert spaces was obtained by Da Prato, Kwapień, and Zabczyk [20] by the so-called factorization method. It was extended to \(C_0\)-evolution families on Hilbert by Seidler [89]. His proof extends mutatis mutandis to give the following result, which is taken from [95] where a further discussion is to be found.

Proposition 4.4

(Additional time regularity) Let \((S(t,s))_{0\le s\le t\le T}\) be a \(C_0\)-evolution family on a (2, D)-smooth Banach space X and let \(2<q<\infty \). For all \(g\in L_{{\mathscr {P}}}^0(\Omega ;L^q(0,T;\gamma (H,X)))\) the process \((\int _0^t S(t,s)g_s \,{\mathrm{d}}W_s)_{t\in [0,T]}\) has a continuous modification which satisfies, for all \(0<p\le q\),

$$\begin{aligned} {\mathbb {E}}\sup _{t\in [0,T]}\Big \Vert \int _0^t S(t,s)g_s \,{\mathrm{d}}W_s\Big \Vert ^p \leqslant C_{p,q,D,T}^p C_{S,T}^p\Vert g\Vert _{L^p(\Omega ;L^q(0,T;\gamma (H,X)))}^p, \end{aligned}$$

where \(C_{S,T} := \sup _{0\le s\le t\le T}\Vert S(t,s)\Vert \).

In the second result we assume that g has additional space regularity. Although this may not seem surprising, we have not been able to find a reference for this in the literature, and for this reason we provide a detailed proof. The result will play a role in Theorem 5.13, where convergence rates for time discretisation schemes are studied under space regularity assumptions on g.

When A is generator of a \(C_0\)-semigroup on the Banach space X, for \(\nu \in (0,1)\) we denote by \(X_{\nu ,\infty } = :(X,{\mathsf {D}}(A))_{\nu ,\infty }\) the real interpolation space between X (see [65] for more details).

Proposition 4.5

(Additional space regularity) Let A be the generator of a \(C_0\)-semigroup \(S=(S(t))_{t\geqslant 0}\) on a (2, D)-smooth Banach space X and let \(0<\nu <1\). For all \(g\in L_{{\mathscr {P}}}^0(\Omega ;L^2(0,T;\gamma (H,X_{\nu ,\infty })))\) the process \((\int _0^t S(t-s)g_s\,{\mathrm{d}}W_s)_{t\in [0,T]}\), as an X-valued process, has a continuous modification which satisfies, for all \(0<p<\infty \),

$$\begin{aligned}{\mathbb {E}}\sup _{t\in [0,T]}\Big \Vert \int _0^t S(t-s)g_s\,{\mathrm{d}}W_s\Big \Vert ^p\leqslant C_{p,D,T,\nu }^p C_{S,T}^p \Vert g\Vert _{L^p(\Omega ;L^2(0,T;\gamma (H,X_{\nu ,\infty })))}^p,\end{aligned}$$

where \(C_{S,T} = \sup _{0\leqslant t\leqslant T} \Vert S(t)\Vert \).

Proof

By localisation and Lenglart’s inequality, it suffices to prove the continuity and maximal estimate for \(p>\frac{1}{2\nu }\).

We have

$$\begin{aligned}\int _0^t S(t-r)g_r\,{\mathrm{d}}W_r = \int _0^t (S(t-r)-I)g_r\,{\mathrm{d}}W_r + \int _0^t g_r\,{\mathrm{d}}W_r =:u_t+v_t.\end{aligned}$$

By Proposition 2.6, v has a continuous version satisfying the required maximal estimate, so it remains to prove the same for u. For this we will use the Kolmogorov–Chentsov continuity criterion [87, Theorem I.2.1].

For \(0\leqslant s\leqslant t\leqslant T\) we have

$$\begin{aligned} \Vert S(t) -S(s)\Vert _{{\mathscr {L}}(X,X)} \leqslant 2 C_{S,T}, \qquad \Vert S(t) -S(s)\Vert _{{\mathscr {L}}({\mathsf {D}}(A),X)} \leqslant C_{S,T} |t-s|. \end{aligned}$$

Therefore, by interpolation,

$$\begin{aligned} \begin{aligned} \Vert S(t)- S(s)\Vert _{{\mathscr {L}}(X_{\nu ,\infty },X)}&\leqslant \Vert S(t)- S(s)\Vert _{{\mathscr {L}}(X_{\nu ,\infty },X)}\\&\leqslant 2C_{S,T} |t-s|^{\nu }. \end{aligned} \end{aligned}$$
(4.5)

Next, for \(0\leqslant s\leqslant t\leqslant T\) we have

$$\begin{aligned} u_t-u_s = \int _0^s (S(t-r)-S(s-r))g_r \,{\mathrm{d}}W_r + \int _s^t (S(t-r)-I)g_r \,{\mathrm{d}}W_r. \end{aligned}$$

Taking \(L^p(\Omega ;X)\)-norms, from Proposition 2.6 we obtain

$$\begin{aligned}&{\mathbb {E}}\Big \Vert \int _0^s (S(t-r)-S(s-r))g_r \,{\mathrm{d}}W_r\Big \Vert ^p\\&\quad \leqslant C_{p,D}^p {\mathbb {E}}\Vert (S(t-\cdot )-S(s-\cdot ))g\Vert _{L^2(0,s;\gamma (H,X))}^p \\&\quad \leqslant (K |t-s|^{\nu })^p {\mathbb {E}}\Vert g\Vert _{L^2(0,T;\gamma (H,X_{\nu ,\infty }))}^p, \end{aligned}$$

where \(K = 2C_{S,T} C_{p,D} \). Similarly,

$$\begin{aligned} {\mathbb {E}}\Big \Vert \int _s^t (S(t-r)-I)g_r \,{\mathrm{d}}W_r\Big \Vert ^p&\leqslant C_{p,D}^p {\mathbb {E}}\Vert (S(t-\cdot )-I) g\Vert _{L^2(s,t;\gamma (H,X))}^p \\&\leqslant (K|t-s|^{\nu })^p {\mathbb {E}}\Vert g\Vert _{L^2(0,T;\gamma (H,X_{\nu ,\infty }))}^p. \end{aligned}$$

It follows that

$$\begin{aligned} {\mathbb {E}}\Vert u_t-u_s\Vert ^p\leqslant K^p |t-s|^{\nu p} {\mathbb {E}}\Vert g\Vert _{L^2(0,T;\gamma (H,X_{\nu ,\infty }))}^p. \end{aligned}$$

Now we will use the assumption \(p>\frac{1}{2\nu }\), which allows us to apply the Kolmogorov–Chentsov continuity criterion. It implies that for \(0<\delta <2\nu -\frac{1}{p}\) the process u has a (\(\delta \)-Hölder) continuous version which satisfies

$$\begin{aligned} {\mathbb {E}}\Vert u\Vert _{C^{\delta }([0,T];X)}^p\leqslant K^p C_{p,T,\delta ,\nu }^p{\mathbb {E}}\Vert g\Vert _{L^2(0,T;\gamma (H,X_{\nu ,\infty }))}^p. \end{aligned}$$

Together with the bound \(\sup _{t\in [0,T]}\Vert u(t)\Vert \leqslant T^{\delta }\Vert u\Vert _{C^{\delta }([0,T];X)}\) and the estimate for v, this implies the maximal inequality in the statement of of the proposition. \(\square \)

Remark 4.6

The same result holds if we replace \(X_{\nu ,\infty }\) by any Banach space which continuously embeds into \(X_{\nu ,\infty }\). In particular this implies to complex interpolation spaces and fractional domain spaces.

4.3 Martingales as integrators: Hilbert spaces

In the remainder of this section we consider stochastic convolutions driven by an \(L^2\)-martingale \((M_t)_{t\in [0,T]}\) with values in a separable Hilbert space H. For details on stochastic integration in this setting we refer to [68, 69] and the summary in [44]. We will use a couple of notions from the theory of stochastic processes that have not been introduced in Sect. 2 but are otherwise completely standard; see for instance [55, 87].

In the present subsection we also let X be a Hilbert space; the case where X is a 2-smooth Banach space is discussed in the next subsection. By a standard argument involving the essential separability of the ranges of strongly measurable functions, there is no loss of generality in assuming X to be separable. This is relevant as we cite some results from the literature which are stated for separable spaces.

For details on the concepts we introduce below we refer to [68, Chapter 4], where proofs of the various claims made below can be found. We denote by \(\langle M_t\rangle _{t\in [0,T]}\) the predictable quadratic variation of M, and by \(\langle \!\langle M_t\rangle \!\rangle _{t\in [0,T]}\) the predictable tensor quadratic variation of M taking values in the space of trace class operators \({\mathscr {L}}_1(H)\). The covariance process \((Q_{M,t})_{t\in [0,T]}\) is defined as the Radon–Nikodým derivative \(Q_M = \frac{\mathrm{d}\langle \!\langle M\rangle \!\rangle }{\mathrm{d}\langle M\rangle }\) (note that \({\mathscr {L}}_1(H)\) has the Radon–Nikodým property: this space is separable and is canonically isometric to the dual of the space of compact operators on H; see [48, Theorems 1.3.21, D.2.6]). Then \(Q_M\) is positive and trace class with \(\mathrm{tr}(Q_M)= 1\) almost everywhere on \([0,T]\times \Omega \). For processes \(g:[0,T]\times \Omega \rightarrow {\mathscr {L}}(H,X)\) which are predictable in the strong operator topology, one has

$$\begin{aligned} {\mathbb {E}}\Big \Vert \int _0^T g_t \,{\mathrm{d}}M_t\Big \Vert ^2 = {\mathbb {E}}\int _0^T \Vert g_t Q_{M,t}^{1/2}\Vert _{{\mathscr {L}}_2(H,X)}^2 \,{\mathrm{d}}\langle M\rangle _t, \end{aligned}$$
(4.6)

whenever the right-hand side of (4.6) is finite. Moreover, the predictable quadratic variation is given by

$$\begin{aligned} \Big <\int _0^\cdot g_s \,{\mathrm{d}}M_s\Big >_t = \int _0^t \Vert g_s Q_{M,s}^{1/2}\Vert _{{\mathscr {L}}_2(H,X)}^2 \,{\mathrm{d}}\langle M\rangle _s. \end{aligned}$$
(4.7)

In these identities, \({\mathscr {L}}_2(H,X)\) denotes the space of Hilbert–Schmidt operators from H to X.

The following theorem shows that the main result of [56] also holds with a strong type estimate instead of a weak estimate. A similar result was obtained in [57] under additional assumptions on the evolution family \((S(t,s))_{0\leqslant s\leqslant t\leqslant T}\). The result also covers the Poisson case; this can be seen in the same way as in [44, Sect. 3].

Theorem 4.7

Let \((S(t,s))_{0\leqslant s\leqslant t\leqslant T}\) be a \(C_0\)-evolution family of contractions on a Hilbert space X and let M be a continuous (respectively, càdlàg) local \(L^2\)-martingale with values in H. Let \(g:[0,T]\times \Omega \rightarrow {\mathscr {L}}(H,X)\) be a process such that g(h) is predictable for all \(h\in H\) and

$$\begin{aligned} \int _0^T \Vert g_t Q_{M,t}^{1/2}\Vert _{{\mathscr {L}}_2(H,X)}^2 \,{\mathrm{d}}\langle M\rangle _t<\infty \ \ \hbox {almost surely.} \end{aligned}$$

Then the process \((\int _0^t S(t,s)g_s\,{\mathrm{d}}M_s)_{t\in [0,T]}\) has a continuous (respectively, càdlàg) modification. Moreover, if \(0<p\le 2\), then

$$\begin{aligned}{\mathbb {E}}\sup _{t\in [0,T]}\Big \Vert \int _0^t S(t,s)g_s\,{\mathrm{d}}M_s\Big \Vert ^p\leqslant C^p_p {\mathbb {E}}\Big (\int _0^T \Vert g_t Q_{M,t}^{1/2}\Vert _{{\mathscr {L}}_2(H,X)}^2 \,{\mathrm{d}}\langle M\rangle _t\Big )^{p/2},\end{aligned}$$

where \(C_p\) is a constant depending only on p. For \(p=2\) the inequality holds with \(C=300\).

This result can be extended to a larger class of processes g by a density argument, but the description of the space is quite technical. The interested reader is referred to [44, 68].

Proof

By Lenglart’s theorem and a localisation argument as in Theorem 4.1 it suffices to consider \(p=2\). Moreover, by localisation we may assume that M is a continuous (respectively, càdlàg) \(L^2\)-martingale. By approximation it furthermore suffices to consider adapted step processes g. We will focus on the continuous case, the càdlàg case being similar. Only the required changes in the proof of Theorem 4.1 will be indicated.

First of all, \(\Vert g\Vert _{L^2(0,T;\gamma (H,X))}\) must be replaced by \(\int _0^T \Vert g_t Q_{M,t}^{1/2}\Vert _{{\mathscr {L}}_2(H,X)}^2 \,{\mathrm{d}}\langle M\rangle _t\) throughout. With this adjustment, up to (4.3) the proof is verbatim the same. By Theorem 3.1 with \(p=2\) we find that

$$\begin{aligned} \Vert f^\star \Vert _2&\le 60 \Vert dG^\star \Vert _2 + 40\sqrt{2} \Vert s(G)\Vert _2. \end{aligned}$$

Noting that \(\Vert dG^\star \Vert _2\leqslant 2\Vert G^\star \Vert _2 \leqslant 4\Vert G\Vert _2 = 4 \Vert s(G)\Vert _2\) by Doob’s maximal inequality and combining the above with (4.6) and the bound \(\Vert K(t_j, s)\Vert \leqslant 1\), we obtain

$$\begin{aligned} \Vert f^\star \Vert _2^2 \leqslant C^2\Vert s(G)\Vert _2^2&= C^2\sum _{j=1}^m {\mathbb {E}}\Vert dG_j\Vert ^2 \\&= C^2 \sum _{j=1}^m {\mathbb {E}}\int _{t_{j-1}}^{t_j}\Vert K(t_j,s) g(s)Q_M^{1/2}\Vert _{{\mathscr {L}}_2(H,X)}^2 \,{\mathrm{d}}\langle M\rangle _s \\ {}&\leqslant C^2{\mathbb {E}}\int _{0}^{T}\Vert g(s)Q_M^{1/2}\Vert _{{\mathscr {L}}_2(H,X)}^2 \,{\mathrm{d}}\langle M\rangle _s, \end{aligned}$$

where \(C = 240+40\sqrt{2} < 300\). \(\square \)

Remark 4.8

Let us explain how to extend Theorem 4.7 to arbitrary \(2\le p< \infty \) in the case of continuous local martingales. In particular this extends [44, (1.13)] to the case of evolution families.

If M is a continuous local martingale with values in H, then Theorem 4.7 extends to exponents \(2\le p<\infty \) with \(C_p = 40\sqrt{p}\). As an immediate consequence, Corollary 4.2 holds with W replaced by M and with

$$\begin{aligned}\sigma ^2 = 1600 e \Vert gQ_{M}^{1/2}\Vert _{L^\infty (\Omega ;L^2(0,T;{\mathscr {L}}_2(H,X)))}^2 \end{aligned}$$

The proof is similar to those of Theorems 4.1 and 4.7, but some modifications are required which we sketch below.

By a stopping time argument we may assume that \(\Vert M\Vert \) and \(\langle M\rangle \) are uniformly bounded on \([0,T]\times \Omega \). By approximation it can be assumed that g is an adapted finite rank step process. Then up to (4.3) the proof is the same. Theorem 3.1 gives that

$$\begin{aligned} \Vert f^\star \Vert _p&\le 30p \Vert dG^\star \Vert _p + 40\sqrt{p} \Vert s(G)\Vert _p. \end{aligned}$$

Moreover the following extension of (4.6) holds:

$$\begin{aligned}{\mathbb {E}}\sup _{t\in [0,T]}\Big \Vert \int _0^t g_t \,{\mathrm{d}}M_t\Big \Vert ^p \eqsim _p {\mathbb {E}}\Big (\int _0^T \Vert g_t Q_{M,t}^{1/2}\Vert _{{\mathscr {L}}_2(H,X)}^2 \,{\mathrm{d}}\langle M\rangle _t\Big )^{p/2}.\end{aligned}$$

Since g is uniformly bounded it follows that

$$\begin{aligned}&\Vert dG^\star \Vert ^p_p \leqslant \sum _{j=1}^m {\mathbb {E}}\Vert dG_j\Vert ^p \\&\quad \leqslant C_g^p \sum _{j=1}^m {\mathbb {E}}|\langle M\rangle _{t_j} - \langle M\rangle _{t_{j-1}}|^{{p}/{2}} \leqslant C_g^p {\mathbb {E}}(\sup _{j}|\langle M\rangle _{t_j} - \langle M\rangle _{t_{j-1}}|^{\frac{p-2}{2}}) |\langle M_T\rangle |. \end{aligned}$$

By dominated convergence the right-hand side tends to zero as the mesh size tends to 0. The result follows once we have shown that

$$\begin{aligned}s(G)^2 \rightarrow \int _0^T \int _0^T \Vert g_t Q_{M,t}^{1/2}\Vert _{{\mathscr {L}}_2(H,X)}^2 \,{\mathrm{d}}\langle M\rangle _t\end{aligned}$$

with convergence in \(L^{p/2}(\Omega )\). If we replace \(s(G)^2\) by \({\widetilde{s}}(G)^2 := \sum _{j=1}^m \Vert dG_j\Vert ^2\) this follows from (4.7) (as explained in [12, Sect. 4], the scalar case considered in [27] extends to the Hilbert space). The proof will be completed by showing that

$$\begin{aligned} {\mathbb {E}}|{\widetilde{s}}(G)^2 - s(G)^2|^{q}\rightarrow 0 \end{aligned}$$

for any \(q\in [1, \infty )\). Without loss of generality we may take \(q\geqslant 2\) and since g is an adapted finite rank step process. To prove the convergence in \(L^q(\Omega )\) we note that by the scalar case of Theorem 3.1, applied with \(V_j = I\) and martingale differences \(dL_j = \Vert dG_j\Vert ^2 - {\mathbb {E}}_{j-1}(\Vert dG_j\Vert ^2)\), for all \(2\le q<\infty \) we have

$$\begin{aligned} \Vert {\widetilde{s}}(G)^2 - s(G)^2\Vert _q&\leqslant 30 q\Vert dL^\star \Vert _q + 40\sqrt{q}\Vert s(L)\Vert _q \\&\leqslant 60q \Vert d G^\star \Vert _{2q}^2 + 80\sqrt{q} \Big \Vert \Big (\sum _{j=1}^m {\mathbb {E}}_{j-1} \Vert dG_j\Vert ^4\Big )^{1/2}\Big \Vert _q. \end{aligned}$$

We have already seen that the first term tends to 0 as the mesh size tends to zero. For the second term we use [48, Proposition 3.2.8] and Hölder’s inequality to find that

$$\begin{aligned} \Big \Vert \Big (\sum _{j=1}^m {\mathbb {E}}_{j-1} \Vert dG_j\Vert ^4\Big )^{1/2}\Big \Vert _q&\leqslant \frac{q^2}{4} \Big \Vert \Big (\sum _{j=1}^m \Vert dG_j\Vert ^4\Big )^{1/2}\Big \Vert _q \!\!\leqslant \frac{q^2}{4}\Vert dG^\star \Vert _{2q} \Vert s(G)\Vert _{2q} {\rightarrow } 0 \end{aligned}$$

as \(\text {mesh}(\pi )\rightarrow 0\).

4.4 Martingales as integrators: 2-smooth UMD Banach spaces

As before we let H be a separable Hilbert space and turn to the case where X is a (2, D)-smooth Banach space with the UMD property. Discussions of UMD spaces can be found in [48, 83]. Rather than introducing this property here, we content ourselves by mentioning that examples of Banach spaces with this property include Hilbert spaces, \(L^p\)-spaces with \(1<p<\infty \) and most classical function spaces constructed from these. We will prove an extension of the maximal estimate of the preceding subsection to this setting by using some results from [107]. To avoid technicalities with non-predictable quadratic variations we only consider continuous local martingales with values in H. In that case the quadratic variation considered in [107] coincides with the one of Sect. 4.3 (see [68, Theorem 20.5]).

Let \(g:[0,T]\times \Omega \rightarrow {\mathscr {L}}(H,X)\) be a process such that g(h) is predictable for all \(h\in H\) and

$$\begin{aligned} \Vert g_t Q_{M,t}^{1/2}\Vert _{\gamma (L^2(0,T;H),d\langle M\rangle _t;X)}^2<\infty \ \ \hbox {almost surely}. \end{aligned}$$

By [102, Theorem 4.1] (see also [107, Corollary 7.4 and Remark 7.6]) these assumptions enable one to construct a stochastic integral \(\int _0^t g_s \,{\mathrm{d}}M_s\) which, for all \(0<p<\infty \), satisfies the two-sided estimate

$$\begin{aligned} {\mathbb {E}}\sup _{t\in [0,T]}\Big \Vert \int _0^t g_s \,{\mathrm{d}}M_s\Big \Vert ^p \eqsim _{p,X} {\mathbb {E}}\Big (\Vert g_t Q_{M,t}^{1/2}\Vert _{\gamma (L^2(0,T;H),d\langle M\rangle _t;X)}^2\Big )^{p/2} \end{aligned}$$
(4.8)

whenever the expression on the right-hand side is finite. If in addition X has type 2 (which holds if X is 2-smooth), then by [93, Theorem 6.1]

$$\begin{aligned} \Vert g_t Q_{M,t}^{1/2}\Vert _{\gamma (L^2(0,T;H),d\langle M\rangle _t;X)}^2 \leqslant \tau _{2,X}^2\int _0^T \Vert g_t Q_{M,t}^{1/2}\Vert _{\gamma (H,X)}^2 \,{\mathrm{d}}\langle M\rangle _t, \end{aligned}$$
(4.9)

where \(\tau _{2,X}\) is the type 2 constant of X. We will consider processes for which the right-hand side is finite almost surely.

Theorem 4.9

Let X be a (2, D)-smooth UMD Banach space. Let \((S(t,s))_{0\leqslant s\leqslant t\leqslant T}\) be a \(C_0\)-evolution family of contractions on X and let M be a continuous local martingale with values in H. Let \(g:[0,T]\times \Omega \rightarrow {\mathscr {L}}(H,X)\) be a process such that \(g(h):[0,T]\times \Omega \rightarrow X\) is predictable for all \(h\in H\) and

$$\begin{aligned} \int _0^T \Vert g_t Q_{M,t}^{1/2}\Vert _{\gamma (H,X)}^2 \,{\mathrm{d}}\langle M\rangle _t<\infty \ \ \hbox {almost surely}. \end{aligned}$$

Then the process \((\int _0^t S(t,s)g_s\,{\mathrm{d}}M_s)_{t\in [0,T]}\) has a continuous modification which satisfies, for all \(0<p<\infty \),

$$\begin{aligned} {\mathbb {E}}\sup _{t\in [0,T]}\Big \Vert \int _0^t S(t,s)g_s\,{\mathrm{d}}M_s\Big \Vert ^p\leqslant C_{p,X}^p {\mathbb {E}}\Big (\int _0^T \Vert g_t Q_{M,t}^{1/2}\Vert _{\gamma (H,X)}^2 \,{\mathrm{d}}\langle M\rangle _t\Big )^{p/2}, \end{aligned}$$

where \(C_{p,X}\) is a constant depending only on p and X.

Proof

We argue as in Theorem 4.7 and Remark 4.8. Since we may assume that g takes values in a finite dimensional subspace of X, as in Remark 4.8 it follows that \(\Vert dG^\star \Vert _p\rightarrow 0\) as the mesh\((\pi )\rightarrow 0\). It remains to estimate s(G). By a standard argument (4.8) and (4.9) imply

$$\begin{aligned}{\mathbb {E}}_{j-1} \Vert dG_j\Vert ^2 \leqslant C_{X}^2 {\mathbb {E}}_{j-1} \Big (\int _{t_{j-1}}^{t_j} \Vert g_t Q_{M,t}^{1/2}\Vert _{\gamma (H,X)}^2\,{\mathrm{d}}\langle M\rangle _t\Big )=:C_{X}^2{\mathbb {E}}_{j-1} (\xi _j) ,\end{aligned}$$

where \(C_X\) is a constant only depending on X. Therefore, by [48, Proposition 3.2.8],

$$\begin{aligned}&\Vert s(G)\Vert _p^p \leqslant C_{X}^p {\mathbb {E}}\Big (\sum _{j=1}^m {\mathbb {E}}_{j-1} (\xi _j)\Big )^{p/2}\!\! \\&\quad \leqslant (p/2)^{p/2} C_{X}^p {\mathbb {E}}\Big (\sum _{j=1}^m \xi _j\Big )^{p/2} \\&\quad = (p/2)^{p/2} C_{X}^p {\mathbb {E}}\Big (\int _{0}^{T} \Vert g_t Q_{M,t}^{1/2}\Vert _{\gamma (H,X)}^2 \,{\mathrm{d}}\langle M\rangle _t \Big )^{p/2}. \end{aligned}$$

The proof can now be completed as before.

Observe that this method gives the result with \(C_{p,X} = \frac{40}{\sqrt{2}}p C_X\) for \(p\geqslant 2\), which is linear in p as \(p\rightarrow \infty \); this contrasts with the \(O(\sqrt{p})\) growth obtained in all other places in the paper. \(\square \)

The infinite dimensional version of the Dambis–Dubins–Schwarz theorem of [102, Theorem 4.9] suggests that the correct order of the constant in Theorem 4.9 is \(O(\sqrt{p})\).

We expect that a large portion of Theorem 4.9 extends to the setting of (non necessarily continuous) local martingales if one replaces the predictable quadratic variation \(\langle M\rangle \) by the process [M] as defined in [68, Theorem 20.5]. However, usually it is preferred to work with a predictable quadratic variation. An alternative substitute for predictability has been recently developed in [25] in the Poisson case and in [26, 108] for general local martingales, but the norms are much more complicated to work with. It would be interesting to see if one can combine our techniques with the estimates in [25, 26] for \(X = L^q\) with \(2\le q< \infty \), or in [108] for more general Banach spaces X.

5 Applications to time discretisation

In this section we will apply our abstract results to prove stability of certain numerical approximations of stochastic evolution equations with additive noise of the form

$$\begin{aligned} {\left\{ \begin{array}{ll} \,{\mathrm{d}}u_t &{}= A(t)u_t\,{\mathrm{d}}t + g_t \,{\mathrm{d}}W_t, \qquad t\in [0,T], \\ u_0 &{}= 0. \end{array}\right. } \end{aligned}$$
(5.1)

This setting covers to both parabolic and hyperbolic time-dependent SPDEs; the latter class includes the stochastic wave equation and the Schrödinger equation. To solve (5.1) numerically one typically uses discretisation in time and space [54, 64]. Here we will only consider time discretisation, leaving space-time discretisation and the extension to semi-linear equations with multiplicative noise for a future publication. In that respect the results presented here serve as a proof-of-principle only. We mainly focus on the splitting scheme and the implicit Euler scheme, although the method is robust and can be applied to other schemes as well.

In what follows, for \(n=1,2,\dots \) we set \(t_j^{(n)} := jT/n\) and consider the partition

$$\begin{aligned} \pi ^{(n)} := \{t_j^{(n)}: j=0,\ldots , n\} \end{aligned}$$

as a discretision of the interval [0, T]. We fix a process \(g\in L_{{\mathscr {P}}}^0(\Omega ;L^2(0,T;\gamma (H,X)))\) and consider the continuous martingale

$$\begin{aligned} M_t := \int _0^t g_s \,{\mathrm{d}}W_s, \quad t\in [0,T]. \end{aligned}$$

For \(j=0,\dots , n\) we set

$$\begin{aligned} d_j^{(n)} M := M_{t_{j}^{(n)}} - M_{t_{j-1}^{(n)}} = \int _{t_{j-1}^{(n)}}^{t_{j}^{(n)}} g_s \,{\mathrm{d}}W_s. \end{aligned}$$
(5.2)

In the presence of a \(C_0\)-evolution family \((S(t,s))_{0\leqslant s\leqslant t\leqslant T}\) we set

$$\begin{aligned} u_t:= \int _0^t S(t,s) g_s\,{\mathrm{d}}W_s, \quad t\in [0,T]. \end{aligned}$$

This covers the special case of \(C_0\)-semigroups by letting \(S(t,s) =S(t-s)\).

5.1 The splitting method

Our first result gives stability of a time discretisation scheme for the stochastic convolution process involving a \(C_0\)-evolution family of contractions called the splitting method (also called the exponential Euler method). This scheme has already been employed in the proof of Theorem 4.1. An extension to random evolution families is discussed in Remark 6.8.

Theorem 5.1

(Uniform convergence of the splitting method) Let \((S(t,s))_{0\leqslant s\leqslant t\leqslant T}\) be a \(C_0\)-evolution family of contractions on a (2, D)-smooth Banach space X. Let \(g\in L_{{\mathscr {P}}}^p(\Omega ;L^2(0,T;\gamma (H,X)))\) with \(0< p<\infty \). Define, for \(n\ge 1\),

$$\begin{aligned} {\left\{ \begin{array}{ll} u_0^{(n)} &{} :=0, \\ u_j^{(n)} &{} := S(t_{j}^{(n)}, t_{j-1}^{(n)}) ( u_{j-1}^{(n)} + d_j^{(n)} M), \quad j= 1,\dots ,n, \end{array}\right. } \end{aligned}$$

where \(d_j^{(n)} M\) is given by (5.2). Then for all \(n\ge 1\) we have

$$\begin{aligned} {\mathbb {E}}\sup _{j=0,\ldots ,n}\Vert u_{t_j^{(n)}} - u_j^{(n)}\Vert ^p\leqslant C_{p,D}^p {\mathbb {E}}\Vert s\mapsto (S(s,\sigma _n(s)) - I) g_s\Vert _{L^2(0,T;\gamma (H,X))}^p, \end{aligned}$$
(5.3)

where \(\sigma _n(s)= t_{j-1}^{(n)}\) for \(s\in [t_{j-1}^{(n)}, t_j^{(n)})\). In particular,

$$\begin{aligned} \lim _{n\rightarrow \infty }{\mathbb {E}}\sup _{j=0,\ldots ,n}\Vert u_{t_j^{(n)}} - u_j^{(n)}\Vert ^p = 0. \end{aligned}$$
(5.4)

For \(2\le p<\infty \) the estimate (5.3) holds with \(C_{p,D} = 10D\sqrt{p}\).

The process u has a continuous modification by Theorem 4.1. We will not need this modification in the proof, because the suprema in (5.3) and (5.4) are taken with respect to finite index sets. This remark applies to all results in this subsection and the next (in Theorem 5.13 the existence of the continuous modification follows from Proposition 4.5).

Proof

To simplify notation we fix \(n\geqslant 1\) and write \(t_j := t_j^{(n)}\), \(v_j:= u_j^{(n)}\), and \(d_j M:= d_j^{(n)}M\). By induction one checks that \(v_0=0\) and

$$\begin{aligned} v_{k} = \sum _{j=1}^k S(t_k, t_{j-1}) d_jM, \qquad k=1,\dots ,n. \end{aligned}$$

Therefore,

$$\begin{aligned} u_{t_k} - v_{k}&= \sum _{j=1}^k \int _{t_{j-1}}^{t_j} (S(t_k,s) - S(t_k, t_{j-1})) g_s \,{\mathrm{d}}W_s \\&= \sum _{j=1}^k \int _{t_{j-1}}^{t_j} S(t_k, s) (I - S(s,t_{j-1})) g_s \,{\mathrm{d}}W_s \\&= \sum _{j=1}^k \int _{t_{j-1}}^{t_j} S(t_k, s) (I - S(s,\sigma _n(s)) g_s \,{\mathrm{d}}W_s \\&= \int _{0}^{t_k} S(t_k,s) (I - S(s,\sigma _n(s)) g_s \,{\mathrm{d}}W_s \end{aligned}$$

and hence, by Theorem 4.1,

$$\begin{aligned} {\mathbb {E}}\sup _{j=0,\dots ,n}\Vert u_{t_j} - v_{j}\Vert ^p\leqslant & {} {\mathbb {E}}\sup _{t\in [0,T]} \Big \Vert \int _{0}^{t} S(t,s) (I - S(s,\sigma _n(s)) g_s \,{\mathrm{d}}W_s\Big \Vert ^p \\\leqslant & {} C_{p,D}^p{\mathbb {E}}\Vert s\mapsto (I - S(\cdot ,\sigma _n(\cdot ))g_s\Vert ^p_{L^2(0,T;\gamma (H,X))}. \end{aligned}$$

The assertion \(E_n\rightarrow 0\) as \(n\rightarrow \infty \) follows by dominated convergence in combination with the convergence criterion [49, Theorem 9.1.14]. \(\square \)

In the next corollary we obtain explicit convergence rates for processes g taking values in intermediate spaces. In order to make the statement easy to formulate we only consider the case of semigroup generators.

Corollary 5.2

(Uniform convergence of the splitting method with decay rate) Let \((S(t))_{t\ge 0}\) be a \(C_0\)-contraction semigroup on a (2, D)-smooth Banach space X. As in the preceding theorem, for \(n\ge 1\) let

$$\begin{aligned} {\left\{ \begin{array}{ll} u_0^{(n)} &{} :=0, \\ u_j^{(n)} &{} := S(t_{j}^{(n)}- t_{j-1}^{(n)}) (u_{j-1}^{(n)} + d_j^{(n)}M), \quad j= 1,\dots ,n, \end{array}\right. } \end{aligned}$$

where \(d_j^{(n)} M\) is given by (5.2). Let \(X_{\nu } := (X,{\mathsf {D}}(A))_{\nu ,\infty }\) for \(\nu \in (0,1)\) and \(X_{1} := {\mathsf {D}}(A)\), where A is the generator of the semigroup. If \(g\in L_{{\mathscr {P}}}^p(\Omega ;L^2(0,T;\gamma (H,X_{\nu })))\) with \(0< p<\infty \), then for all \(n\ge 1\) we have

$$\begin{aligned}{\mathbb {E}}\sup _{j=0,\ldots ,n}\Vert u_{t_j^{(n)}} - u_j^{(n)}\Vert ^p\leqslant \Bigl (2 C_{p,D} \bigl (\frac{T}{n}\bigr )^{\nu }\Bigr )^p \Vert g\Vert _{L^p(\Omega ;L^2(0,T;\gamma (H,X_{\nu })))}^p.\end{aligned}$$

For \(2\le p<\infty \) the inequality holds with \(C_{p,D} = 10D\sqrt{p}\).

A version of the above result for \(C_0\)-semigroups which are not necessarily contractive and a general class of discretisation schemes will proved in Theorem 5.13.

Proof

Since \(\Vert (I - S(t))x\Vert \leqslant 2 \Vert x\Vert \) and

$$\begin{aligned} \Vert (I - S(t))x\Vert&\leqslant \int _0^t \Vert S(s) Ax\Vert \,{\mathrm{d}}s \leqslant t \Vert A x\Vert , \end{aligned}$$

for \(0<\nu <1\) by interpolation we obtain

$$\begin{aligned}\Vert (I - S(t))x\Vert \leqslant 2 t^{\nu } \Vert x\Vert _{X_{\nu }}.\end{aligned}$$

For \(\nu =1\) we have

$$\begin{aligned} \Vert (I - S(t))x\Vert \leqslant t \Vert x\Vert _{{\mathsf {D}}(A)} = t \Vert x\Vert _{X_1}. \end{aligned}$$

The result now follows from Theorem 5.1 and the ideal property (see [49, Theorem 9.1.10]). \(\square \)

Remark 5.3

(Pathwise convergence) If we assume \(p \nu >1\) in Corollary 5.2, then for all \(\beta \in (0,\nu -\frac{1}{p})\) there exists a random variable \(\xi \in L^p(\Omega )\) such that, almost surely,

$$\begin{aligned}\sup _{j=0,\ldots ,n}\Vert u_{t_j^{(n)}} - u_j^{(n)}\Vert \leqslant n^{-\beta } \xi .\end{aligned}$$

Indeed, setting \(\xi := (\sum _{n\geqslant 1} n^{\beta p} \sup _{j=0,\ldots ,n} \Vert u_{t_j^{(n)}} - u_j^{(n)}\Vert ^p)^{1/p}\), by Corollary 5.2 we have

$$\begin{aligned} {\mathbb {E}}|\xi |^p \leqslant \Bigg (2 C_{p,D} \bigl (\frac{T}{n}\bigr )^{\nu }\Bigg )^p \sum _{n\geqslant 1} n^{\beta p} n^{-\nu p}, \end{aligned}$$

the sum on the right-hand side being convergent since \((\nu -\beta )p>1\).

5.2 General time discretisation methods

We now investigate whether analogues of Theorem 5.1 hold for general time discretisation methods. Before returning to convergence questions, we consider a stability result for abstract numerical schemes featuring random operators \(V_{j,n}\) satisfying an \({\mathscr {F}}_{t_{j-1}}\)-measurability condition. In particular, the operators are allowed to depend on u and g up to time \(t_{j-1}\). This makes this result applicable to nonlinear problems.

Proposition 5.4

(Stability) Let X be a (2, D)-smooth Banach space and assume that \(g\in L_{{\mathscr {P}}}^p(\Omega ;L^2(0,T;\gamma (H,X)))\) with \(2\le p<\infty \). For \(n=1,2,\dots \) and \(j=1,\dots , n\) assume that the random contraction \(V_{j,n}:\Omega \rightarrow {\mathscr {L}}(X)\) is such that \(V_{j,n}x\) is strongly \({\mathscr {F}}_{t_{j-1}^{(n)}}\)-measurable for all \(x\in X\), and define

$$\begin{aligned} {\left\{ \begin{array}{ll} u_0^{(n)} &{} :=0, \\ u_{j}^{(n)} &{}: = V_{j,n} (u_{j-1}^{(n)} + d_j^{(n)}M), \quad j=1, \ldots , n, \end{array}\right. } \end{aligned}$$

where \(d_j^{(n)}M\) is given by (5.2). Then

$$\begin{aligned}{\mathbb {E}}\sup _{j=0,\ldots ,n}\Vert u_j^{(n)}\Vert ^p\leqslant K_{p,D}^p \Vert g\Vert _{L^p(\Omega ;L^2(0,T;\gamma (H,X)))}^p,\end{aligned}$$

where \(K_{p,D} = \frac{100 D p^{5/2}}{p-1} + \frac{10}{\sqrt{2}} D^2 p\).

Proof

We fix \(n\ge 1\) and write \(t_j:=t_j^{(n)}\), \(d_{j}M := d_j^{(n)}M\), and \(d_{j}\widetilde{M} := V_{j,n} d_j^{(n)}M\). Theorem 3.1 and the contractivity of \(V_{j,n}\), and Doob’s maximal inequality imply that

$$\begin{aligned} \Big \Vert \sup _{j=0,\dots ,n}\Vert u_j^{(n)}\Vert \Big \Vert _p&\leqslant 5p \Vert d\widetilde{M}^{\star }\Vert _p + 10D\sqrt{p} \Vert s(\widetilde{M})\Vert _p \\&\leqslant 5p \Vert dM^{\star }\Vert _p + 10D\sqrt{p} \Vert s(M)\Vert _p \\&\leqslant 10p \Vert M^{\star }\Vert _p + 10D\sqrt{p} \Vert s(M)\Vert _p \\&\leqslant \frac{10p^2}{p-1}\Vert M_T\Vert _p + 10D\sqrt{p} \Vert s(M)\Vert _p. \end{aligned}$$

We will estimate the terms on the right-hand side separately. By Proposition 2.6,

$$\begin{aligned} \Vert M_T\Vert _p \leqslant 10D\sqrt{p} \Vert g\Vert _{L^p(\Omega ;L^2(0,T;\gamma (H,X)))}. \end{aligned}$$

To estimate s(M), by (2.10) we have

$$\begin{aligned} {\mathbb {E}}_{j-1}\Vert d_jM\Vert ^2\leqslant D^2 {\mathbb {E}}_{j-1} \Vert g\Vert _{L^2(t_{j-1},t_{j};\gamma (H,X))}^2=:D^2 {\mathbb {E}}_{j-1}(\xi _j). \end{aligned}$$

By the dual of Doob’s maximal inequality (see [48, Proposition 3.2.8]) and using \(p/2\geqslant 1\)

$$\begin{aligned} \Vert s(M)\Vert _p^p&\leqslant D^p {\mathbb {E}}\Big (\sum _{j=1}^n {\mathbb {E}}_{j-1}(\xi _j) \Big )^{p/2} \\&\leqslant (p/2)^{p/2} D^p {\mathbb {E}}\Big (\sum _{j=1}^n \xi _j \Big )^{p/2} = (p/2)^{p/2} D^p {\mathbb {E}}\Vert g\Vert _{L^2(0,T;\gamma (H,X))}^p. \end{aligned}$$

The required estimate follows by combining the estimates. \(\square \)

Remark 5.5

For \(p=2\) the inequality holds with \(K_{2,D} = 40 D + 10\sqrt{2} D^2\). This is because in the case \(p=2\) we can use (2.10) instead of Proposition 2.6.

Remark 5.6

In the setting of monotone operators on Hilbert spaces, a related stability result for \(p=2\) for the implicit Euler method can be found in [38, Theorem 2.6].

Returning to the problem of convergence, the convergent numerical schemes which we will consider are given in the following definition.

Definition 5.7

Let X be a Banach space. An \({\mathscr {L}}(X)\)-valued scheme is a function \(R:[0,\infty )\rightarrow {\mathscr {L}}(Y,X)\). If A generates a \(C_0\)-semigroup S on X and Y us a Banach space continuously and densely embedded in X, an \({\mathscr {L}}(X)\)-valued scheme R is said to approximate S to order \(\alpha >0\) on Y if for all \(T>0\) there exists a constant \(K\geqslant 0\) such that for all integers \(n\geqslant 1\) and \(t\in [0,T]\) we have

$$\begin{aligned} \Vert R(t/n)^n - S(t)\Vert _{{\mathscr {L}}(Y,X)}\leqslant K (t/n)^{\alpha }. \end{aligned}$$
(5.5)

A scheme R is said to be contractive if \(\Vert R(t)\Vert \le 1\) for all \(n\ge 1\) and \(t\ge 0\).

If R approximates S to order \(\alpha \) on Y and there exists a constant \(C\ge 0\) such that

$$\begin{aligned} \Vert R(t/n)^n\Vert \leqslant C \ \ \hbox {and} \ \ \Vert S(t)\Vert \leqslant C \ \ \hbox {for all} \ \ n\geqslant 1, \ \ t\in [0,T], \end{aligned}$$

then by real interpolation it approximates S to order \(\theta \alpha \) on the real interpolation spaces \((X, Y)_{\theta ,\infty }\) for \(\theta \in (0,1)\) with estimate

$$\begin{aligned} \Vert R(t/n)^n - S(t)\Vert _{{\mathscr {L}}((X, Y)_{\theta ,\infty },X)}\leqslant (2C)^{1-\theta } K^{\theta } (t/n)^{\theta \alpha }, \ \ t\ge 0. \end{aligned}$$

An interesting special case arises when \(Y = {\mathsf {D}}(A^m)\). If an \({\mathscr {L}}(X)\)-valued scheme R approximates S to order \(\alpha \) on \({\mathsf {D}}(A^m)\), then R approximates S to order \(\theta \alpha \) on \((X,{\mathsf {D}}(A^m))_{\theta ,\infty }\).

Proposition 5.8

Let \(T>0\) and suppose that there exists a constant \(C\geqslant 0\) such that for all \(t\in (0,T]\) and integers \(n\geqslant 1\), \(\Vert R(t/n)^n\Vert \leqslant C\) and \(\Vert S(t)\Vert \leqslant C\). Suppose that the \({\mathscr {L}}(X)\)-valued scheme R approximates S to order \(\alpha \) on \({\mathsf {D}}(A^m)\) for some integer \(m\ge 1\), and let \(0<\theta <1\). Then R approximates S to order \(\theta \alpha \) on \((X,{\mathsf {D}}(A^m))_{\theta ,\infty }\).

Since the continuous embedding \({\mathsf {D}}((-A)^{\theta m})\hookrightarrow (X,{\mathsf {D}}(A^m))_{\theta ,\infty }\) holds, we obtain the following: If \(\Vert S(t)\Vert \leqslant M e^{\mu t}\) for all \(t\ge 0\), with \(M\ge 1\) and \(\mu \in {{\mathbb {R}}}\), then R approximates S to order \(\theta \alpha \) on the fractional domain \({\mathsf {D}}((\mu -A)^{\theta m})\).

We will now review some examples of numerical schemes satisfying the conditions of the above definition. Classical references include [6, 45] and, for analytic semigroups, [19]. A new and unified approach to approximation of semigroups which sharpens several classical estimates has been recently developed in [34, 35].

Part (1) of the next theorem follows from [6, Theorem 4]; see also [45]. More elaborate versions on interpolation spaces can be found in [58]. Part (2) follows from [61, Theorem 4.2] by interpolating the stability result [19, Theorem 5] using Proposition 5.8 (see [42, Theorem 9.2.3] for a direct approach, which also does not rely on \(0\in \varrho (A)\)).

Theorem 5.9

(Time discretisation) Let \(r:{{\mathbb {C}}}\rightarrow {{\mathbb {C}}}\) be a rational function such that \(|r(z)|\leqslant 1\) for all \(\mathfrak {R}z \le 0\), and assume that there exists an integer \(\ell \geqslant 1\) such that

$$\begin{aligned} |r(z) - e^{z}| = O(z^{\ell +1}) \ \ \hbox {as} \ \ z\rightarrow 0. \end{aligned}$$

Let A be the generator of a bounded \(C_0\)-semigroup on \((S(t))_{t\geqslant 0}\) a Banach space X and set

$$\begin{aligned} R(t) := r(tA), \qquad t\ge 0. \end{aligned}$$
  1. (1)

    R approximates S to order \(\eta (\ell ,k)\) on \({\mathsf {D}}(A^{k})\) for all integers \(k\in \{1, \ldots ,\ell +1\}\setminus \{\frac{\ell +1}{2}\}\), where

    $$\begin{aligned}\eta (\ell ,k) = \left\{ \begin{array}{ll} k-\frac{1}{2}, &{} \hbox {if }k<\frac{\ell +1}{2}; \\ \frac{k\ell }{\ell +1}, &{} \hbox {if }\frac{\ell +1}{2}<k\leqslant \ell +1. \end{array} \right. \end{aligned}$$

If the semigroup is analytic and bounded on a sector, then:

  1. (2)

    R approximates S to order \(\nu \) on \({\mathsf {D}}((-A)^{\nu })\) for all \(\nu \in (0,\ell ]\).

Example 5.10

(Time discretisation for \(C_0\)-semigroups) Let A be the generator of a bounded \(C_0\)-semigroup \((S(t))_{t\geqslant 0}\) on a Banach space X. For each of the functions r below we set

$$\begin{aligned} R(t) := r(tA), \qquad t\ge 0. \end{aligned}$$

Then R approximates S in each of the following cases:

  1. (1)

    Splitting: \(r(z) = e^{z}\), to any order on X.

  2. (2)

    Implicit Euler: \(r(z) = (1-z)^{-1}\), to order \(\alpha \) on \({\mathsf {D}}((-A)^{2\alpha })\) for all \(\alpha \in (0,1]\) (see [35, Theorem 1.3] or [58, Corollary 4.4]).

  3. (3)

    Crank–Nicholson: \(r(z) = (2+z)(2-z)^{-1}\), to order \(\nu \) on \({\mathsf {D}}((-A)^{k})\) for points \((k,\nu )\) on the graph of the piecewise linear function connecting the points \((\frac{1}{2}, 0)\), \( (1,\frac{1}{2})\), \((2,\frac{4}{3})\), and (3, 2) (see [58, Theorem 1.1 and 4.1]). If moreover R is stable (see Proposition 5.12 for sufficient conditions), then the order is \(\nu \) on \({\mathsf {D}}((-A)^{{3\nu }/{2}})\) for any \(\nu \in (0,2]\) (see [58, Corollary 4.4]).

Example 5.11

(Time discretisation for analytic \(C_0\)-semigroups) Let A be the generator of a bounded analytic \(C_0\)-semigroup \((S(t))_{t\geqslant 0}\) on X. For each of the functions r below we set

$$\begin{aligned} R(t) := r(tA), \qquad t\ge 0. \end{aligned}$$

Then R approximates S in each of the following cases:

  1. (1)

    splitting: \(r(z) = e^{z}\), to any order on X.

  2. (2)

    implicit Euler: \(r(z) = (1-z)^{-1}\), to order \(\nu \) on \({\mathsf {D}}((-A)^{\nu })\) for any \(\nu \in (0,1]\).

  3. (3)

    Crank–Nicholson: \(r(z) = (1+\frac{1}{2}z)(1-\frac{1}{2}z)^{-1}\), to order \(2\nu \) on \({\mathsf {D}}(A^{2\nu })\) for any \(\nu \in (0,1]\).

If A generates a contractive \(C_0\)-semigroup \((S(t))_{t\geqslant 0}\) the splitting method and implicit Euler methods lead to contractive approximants \(S_n(t)\). In the following proposition we discuss another class of examples where this holds. It applies to all numerical schemes of the form \(R(t) = r(tA)\) considered in Theorem 5.9 and includes all schemes considered in [6, 45]. We use the notation

$$\begin{aligned} \Sigma _\sigma = \{z\in {{\mathbb {C}}}\setminus \{0\}: \ |\arg (z)|<\sigma \}, \end{aligned}$$

where the argument is taken from \((-\pi ,\pi ]\).

Proposition 5.12

Let A be the generator of a \(C_0\)-semigroup of contractions on a Hilbert space. Suppose that \(r:\Sigma _\sigma \rightarrow {{\mathbb {C}}}\) is holomorphic for some \(\frac{1}{2}\pi<\sigma <\pi \) and satisfies \(|r(z)|\leqslant 1\) for all \(\mathfrak {R}z\ge 0\). Then \(\Vert r(-tA)\Vert \leqslant 1\) for all \(t>0\), where \(r(-tA)\) is defined through the \(H^\infty \)-calculus of \(-A\).

The proof is immediate from [49, Theorem 10.2.24]. The proposition is false beyond the Hilbert space setting. Indeed, for the operator \(A = \mathrm{d}/\mathrm{d}x\) on \(X=L^p({{\mathbb {R}}})\) with \(p\ne 2\) or \(X=C_0({{\mathbb {R}}})\), in [5] it was shown that contractivity of R(t) fails for a general class of schemes (see also [19] for the Crank–Nicholson scheme).

In what follows we restrict ourselves to the semigroup setting, but expect the results to extend to evolution families under suitable additional conditions. In the next theorem we obtain convergence rates for a rather general class of discretisation schemes, which in case of the splitting method turn out to be equal to the ones of Corollary 5.2 up to a logarithmic term. Modulo this term, the theorem extends Corollary 5.2 in two ways:

  • Contractivity of S is not needed;

  • The result holds for arbitrary approximation schemes.

The proof directly uses Seidler’s version of the Burkholder inequality of Proposition 2.6 in combination Proposition 2.7 and works for \(C_0\)-semigroup and numerical schemes that are not necessarily contractive. The results of Sects. 3 and 4 are not used. One should carefully note, however, that inhomogeneities g taking values in \(\gamma (H,X_{\nu })\) are considered, where \(X_{\nu }\) is a suitable intermediate space between X and \({\mathsf {D}}(A^m)\). The case of inhomogeneities g taking values in \(\gamma (H,X)\) will be considered in Theorem 5.14 and does require contractivity.

Theorem 5.13

(Convergence rates without contractivity) Let A be the generator of a \(C_0\)-semigroup \(S=(S(t))_{t\geqslant 0}\) on a (2, D)-smooth Banach space X and let R be an \({\mathscr {L}}(X)\)-valued scheme approximating S to order \(\alpha \) on a Banach space Y continuously embedded in \( X_{\alpha }\) for some \(\alpha \in (0,1]\), where \(X_{\alpha } := (X,{\mathsf {D}}(A))_{\alpha ,\infty }\) if \(\alpha \in (0,1)\) and \(X_{1} := {\mathsf {D}}(A)\). Let \(g\in L_{{\mathscr {P}}}^p(\Omega ;L^2(0,T;\gamma (H,Y)))\) with \(0< p<\infty \), and let \(u_t:=\int _0^t S(t-s) g_s\,{\mathrm{d}}W_s\) for \(t\in [0,T]\). Define, for \(n\ge 1\),

$$\begin{aligned} {\left\{ \begin{array}{ll} u_0^{(n)} &{} :=0, \\ u_j^{(n)} &{}:= R(T/n)(u_{j-1}^{(n)} + d_j^{(n)}M), \quad j=1,\dots ,n, \end{array}\right. } \end{aligned}$$
(5.6)

where \(d_j^{(n)}M\) is given by (5.2). Then for all \(n\geqslant 3\),

$$\begin{aligned} {\mathbb {E}}\sup _{j=0,\ldots ,n}\Vert u_{t_j^{(n)}} - u_j^{(n)}\Vert ^p \le \Bigl (LC_{p,D}\frac{\sqrt{\log (n+1)}}{n^{\alpha }}\Bigr )^p \Vert g\Vert _{L^p(\Omega ;L^2(0,T;\gamma (H,Y)))}^p, \end{aligned}$$
(5.7)

where \(L:=(2 K_{\alpha ,Y}C_{S,T} +K)T^{\alpha }\), with \(K_{\alpha ,Y}\) the norm of the embedding \(Y\hookrightarrow X_{\alpha }\), \(C_{S,T} := \sup _{t\in [0,T]}\Vert S(t)\Vert \), and K the constant in (5.5).

If \(2\le p<\infty \), the estimate holds with \(C_{p,D} = 10D\sqrt{2ep}\).

Examples of numerical schemes satisfying the conditions of the theorem can be obtained from Examples 5.10 and 5.11. Note that the embedding condition \(Y\hookrightarrow X_{\alpha }\) is satisfied for the real interpolation spaces \((X, {\mathsf {D}}(A))_{\alpha ,r}\) with \(1\le r\le \infty \), the complex interpolation spaces \([X, {\mathsf {D}}(A)]_{\alpha }\) and the fractional domain spaces \({\mathsf {D}}((\mu -A)^\alpha )\) for suitable \(\mu \in \varrho (A)\) for all \(\alpha \in (0,1)\).

As in Remark 5.3, (5.7) implies almost sure pathwise convergence of order \(n^{-\beta }\), provided that \(\alpha p>1\) and \(\beta \in (0,\alpha -\frac{1}{p})\).

Proof

Let \(S_n:[0,T]\rightarrow {\mathscr {L}}(X)\) be given by

$$\begin{aligned}S_n(t) := R(T/n)^j, \qquad t\in [t^{(n)}_{j-1}, t^{(n)}_j), \ j=1,\ldots ,n.\end{aligned}$$

With this notation,

$$\begin{aligned} u_k^{(n)}&= \sum _{j=1}^k R(T/n)^{k-j+1} d_j^{(n)}M \\&= \sum _{j=1}^k \int _{t^{(n)}_{j-1}}^{t^{(n)}_j} S_n(t^{(n)}_k-s) g_s \,{\mathrm{d}}W_s = \int _0^{t^{(n)}_k} S_n(t^{(n)}_k-s) g_s \,{\mathrm{d}}W_s. \end{aligned}$$

Therefore,

$$\begin{aligned} u(t^{(n)}_k) - u_k^{(n)} = \int _0^{T} {\mathbf{1}}_{[0,t^{(n)}_k]}(s) (S(t^{(n)}_k-s) -S_n(t^{(n)}_k-s)) g_s \,{\mathrm{d}}W_s. \end{aligned}$$

By the bound (2.11) in Proposition 2.7, for \(n\ge 3\) we have

$$\begin{aligned}&\Big ({\mathbb {E}}\sup _{j=0,\ldots ,n}\Vert u(t^{(n)}_k) - u_k^{(n)}\Vert ^p\Big )^{1/p} \\&\leqslant C_{p,D}\sqrt{\log (n+1)} \Vert (s,k)\mapsto {\mathbf{1}}_{[0,t^{(n)}_k]}(s) (S(t^{(n)}_k-s) \\&\quad -S_n(t^{(n)}_k\!-s)) g_s\Vert _{L^p(\Omega ;L^2(0,T;\gamma (H,\ell ^\infty _n(X))))} \\&\leqslant C_{p,D} \sqrt{\log (n+1)}\sup _{s\in [0,T]} \Vert S(s) -S_n(s)\Vert _{{\mathscr {L}}(Y,X)} \Vert g\Vert _{L^p(\Omega ;L^2(0,T;\gamma (H,Y)))}, \end{aligned}$$

where we may take \(C_{p,D} = 10D\sqrt{2ep}\) if \(2\le p<\infty \).

By (4.5), for \(0\leqslant s\leqslant t\leqslant T\) we have

$$\begin{aligned}\Vert S(t)- S(s)\Vert _{{\mathscr {L}}(Y,X)}\leqslant K_{\alpha ,Y} \Vert S(t)- S(s)\Vert _{{\mathscr {L}}(X_{\alpha },X)} \leqslant 2 K_{\alpha ,Y} C_{S,T} |t-s|^\alpha .\end{aligned}$$

Hence from the assumption on the numerical scheme we conclude that for all \(s\in [t^{(n)}_{j-1}, t^{(n)}_j)\),

$$\begin{aligned} \Vert S(s) -S_n(s)\Vert _{{\mathscr {L}}(Y,X)}&= \Vert S(s)- S(t^{(n)}_{j})+ S(t^{(n)}_{j})- R(T/n)^j\Vert _{{\mathscr {L}}(Y,X)} \\&\leqslant \Vert S(s)- S(t^{(n)}_{j})\Vert _{{\mathscr {L}}(Y,X)} + \Vert S(t^{(n)}_{j})- R(T/n)^j\Vert _{{\mathscr {L}}(Y,X)} \\&\leqslant 2K_{\alpha ,Y}C_{S,T} (T/n)^{\alpha } + K (t_j^{(n)}/j)^{\alpha } \\&\leqslant (2K_{\alpha ,Y}C_{S,T} + K)T^{\alpha } n^{-\alpha }. \end{aligned}$$

\(\square \)

For \(C_0\)-semigroups of contractions and contractive discretisation schemes, the next theorem provides uniform convergence in time for inhomogeneities g taking values in \(\gamma (H,X)\).

Theorem 5.14

(Convergence for contractive schemes) Let A be the generator of a \(C_0\)-contraction semigroup \(S=(S(t))_{t\geqslant 0}\) on a (2, D)-smooth Banach space X. Let R be an \({\mathscr {L}}(X)\)-valued contractive scheme approximating S to some order \(\alpha \in (0,1]\) on \({\mathsf {D}}(A)\). Let \(g\in L_{{\mathscr {P}}}^p(\Omega ;L^2(0,T;\gamma (H,X)))\) with \(2\leqslant p<\infty \) and let \(u_t:= \int _0^t S(t,s) g_s\,{\mathrm{d}}W_s\) for \(t\in [0,T]\). Defining \((u^{(n)}_j)_{j=0}^n\) as in the preceding theorem, we have

$$\begin{aligned} \lim _{n\rightarrow \infty } {\mathbb {E}}\sup _{j=0,\ldots ,n}\Vert u_{t_j^{(n)}} - u_j^{(n)}\Vert ^p =0.\end{aligned}$$

Proof

Let \(\ell _{n+1}^\infty (X):= \bigoplus _{j=0}^n X\) with norm \(\Vert (x_0,\dots ,x_n)\Vert := \max _{j=0,\ldots ,n}\Vert x_j\Vert \) and \(Z_p^{(n)} := L^p(\Omega ;\ell _{n+1}^\infty (X))\). Let \(J,J^{(n)}:L^p_{{\mathscr {P}}}(\Omega ;L^2(0,T;\gamma (H,X)))\rightarrow Z_p^{(n)} \) be the linear operators given by

$$\begin{aligned}(J g)_j = u_{t_j}^{(n)}, \ \ \text {and} \ \ (J^{(n)} g)_j := u_{j}^{(n)},\qquad j=0,\dots ,n.\end{aligned}$$

By Theorem 4.1 and Proposition 5.4, the operators J and \(J^{(n)}\) are (uniformly) bounded with \(\Vert J\Vert \leqslant C_{p,D}\) and \(\Vert J_n\Vert \leqslant K_{p,D}\) respectively, the latter constant being defined as in Proposition 2.7.

To prove convergence in \(Z_p^{(n)}\), fix \(\varepsilon >0\) and let \(f\in L^p(\Omega ;L^2(0,T;\gamma (H,{\mathsf {D}}(A))))\) be such that \(\Vert g-f\Vert _{L^p(\Omega ;L^2(0,T;\gamma (H,X)))}<\varepsilon \). By the boundedness and linearity of J and \(J^{(n)}\),

$$\begin{aligned} \Vert J(g)&- J^{(n)}(g)\Vert _{Z_p^{(n)}} \\&\leqslant \Vert J(g) - J(f)\Vert _{Z_p^{(n)}} + \Vert J(f) - J^{(n)}(f)\Vert _{Z_p^{(n)}} + \Vert J^{(n)}(f) - J^{(n)}(g)\Vert _{Z_p^{(n)}} \\&\leqslant (C_{p,D} + K_{p,D}) \varepsilon + \Vert J(f) - J^{(n)}(f)\Vert _{Z_p^{(n)}}, \end{aligned}$$

and the last term tends to zero as \(n\rightarrow \infty \) by Theorem 5.13. Since \(\varepsilon >0\) was arbitrary the result follows. \(\square \)

5.3 Applications to SPDE

We will now apply the results to some simple examples of stochastic PDE and compare the results with results available in the literature. It goes without saying that with additional work more sophisticated problems can be treated. While this will be taken up in forthcoming work, the objective here is to treat some model problems in order to see where our methods can be expected to improve the presently available rates.

We begin with the stochastic heat equation. The results of the next example can be extended to more general uniformly elliptic operators with space-dependent coefficients. As will follow from Sect. 6, if one is only interested in the splitting method the coefficients can even be taken progressively measurable in \((t,\omega )\).

Example 5.15

(Stochastic heat equation) Consider the inhomogeneous stochastic heat equation on \({{\mathbb {R}}}^d\):

$$\begin{aligned} {\left\{ \begin{array}{ll} \,{\mathrm{d}}u_t &{}= \Delta u_t + \sum _{k\geqslant 1} g_{t}^{k}\,{\mathrm{d}}W_t^k, \quad t\in [0,T]. \\ u_0 &{} = 0. \end{array}\right. } \end{aligned}$$
(5.8)

We assume that \(g = (g^{k})_{k\geqslant 1}\) belongs to \(L^p_{{\mathscr {P}}}(\Omega ;L^2(0,T;H^{\lambda ,q}({{\mathbb {R}}}^d;\ell ^2)))\) with \(0<p<\infty \), and \(W=(W^k)_{k\geqslant 1}\) is a sequence of independent standard Brownian motions. We can view W as an \(\ell ^2\)-cylindrical Brownian motion in a natural way by putting, for \(h = (k_k)_{k\ge 1}\in \ell ^2\), \(W_t h:= W({\mathbf{1}}_{(0,t)\otimes h}) := \sum _{k\ge 1} h_k W_k\), noting that the sum on the right-hand side converges in \(L^2(\Omega )\). As is well known, the operator \(\Delta \) generates an analytic \(C_0\)-semigroup of contractions on the Bessel potential spaces \(H^{\lambda ,q}({{\mathbb {R}}}^d)\) and \({\mathsf {D}}(\Delta ) = H^{\lambda +2,q}({{\mathbb {R}}}^d)\) for all \(\lambda \in {{\mathbb {R}}}\) and \(1<q<\infty \).

Let us now assume that \(2\le q<\infty \). By Theorem 4.1, the mild solution u to the problem (5.8) has a continuous modification with values in \(H^{\lambda ,q}({{\mathbb {R}}}^d)\) which satisfies

$$\begin{aligned} {\mathbb {E}}\sup _{t\in [0,T]}\Vert u_t\Vert _{H^{\lambda ,q}({{\mathbb {R}}}^d)}^p \leqslant C_{p,q}^p {\mathbb {E}}\Vert g\Vert _{L^2(0,T;H^{\lambda ,q}({{\mathbb {R}}}^d;\ell ^2))}^p, \end{aligned}$$

where we may take \(C_{p,q} = 10\sqrt{p}(q-1)\) if \(2\le p<\infty \). Here we used that \(H^{\lambda ,q}({{\mathbb {R}}}^d)\) is \((2,\sqrt{q-1})\)-smooth by Proposition 2.2 and that

$$\begin{aligned}\Vert g_t\Vert _{\gamma (\ell ^2,H^{\lambda ,q}({{\mathbb {R}}}^d))}\leqslant \Vert g_t\Vert _{\gamma _q(\ell ^2,H^{\lambda ,q}({{\mathbb {R}}}^d))} = \Vert \gamma \Vert _q \Vert g_t\Vert _{H^{\lambda ,q}({{\mathbb {R}}}^d;\ell ^2)}\end{aligned}$$

by Hölder’s inequality and [49, Proposition 9.3.2]), where \(\gamma \) is a standard Gaussian random variable (whose moments satisfy \(\Vert \gamma \Vert _q\leqslant \sqrt{q-1}\)).

We consider the approximation scheme (5.6) for the splitting (S), implicit Euler (IE), and Crank–Nicholson (CN) schemes discussed in Example 5.11. Each of them leads to a sequence of approximate solutions \((u_j^{(n)})_{j=0}^n\), \(n\ge 1\), for which we define the approximation errors

$$\begin{aligned}E_{n,\beta } := \Big ({\mathbb {E}}\sup _{j=0,\ldots ,n}\Vert u_{t_j^{(n)}} - u_j^{(n)}\Vert ^p_{H^{\lambda -2\beta ,q}({{\mathbb {R}}}^d)}\Big )^{1/p}.\end{aligned}$$

These numbers also depend on \(p,q,\lambda \) and d, but the rates in the estimates below will be independent of these parameters. By Theorem 5.14, \(E_{n,0}\rightarrow 0\) for (S) and (IE). For \(q=2\), (CN) is contractive by Proposition 5.12 and again we obtain \(E_{n,0}\rightarrow 0\). Moreover, we can give rates of convergence for each of these methods. These are given in Table 1 for the errors \(E_{n,\beta }\) with \(\beta \in (0,1]\) (up to constants depending on pq). The assertions follow from Example 5.11, and Corollary 5.2 and Theorem 5.13 applied with \(X = H^{\lambda -2\beta , q}({{\mathbb {R}}}^d)\), \({\mathsf {D}}(\Delta ) = H^{\lambda -2\beta +2, q}({{\mathbb {R}}}^d)\) and \(Y = H^{\lambda , q}({{\mathbb {R}}}^d) = [X,{\mathsf {D}}(\Delta )]_{\beta }\).

Table 1 Approximation errors for the stochastic heat equation

Up to a logarithmic term the convergence rates are the same for the three schemes, independently of \(p\in (0, \infty )\). Although (S) and (CN) have better orders of convergence, the convergence rate of the approximation errors \(E_{n,\beta }\) cannot exceed \(\beta \) due to limitations in Corollary 5.2 and Theorem 5.13.

We next consider a simple non-parabolic equation. Here, higher order schemes give better rates of convergence. Other non-parabolic examples, including wave equation on \({{\mathbb {R}}}^d\) (for \(q=2\)), can be treated similarly.

Example 5.16

(Stochastic transport equation) Consider the following transport equation on \({{\mathbb {R}}}\):

$$\begin{aligned} {\left\{ \begin{array}{ll} \,{\mathrm{d}}u_t &{}= \partial _x u_t + \sum _{k\geqslant 1} g_{t}^{k}\,{\mathrm{d}}W_t^k, \quad t\in [0,T], \\ u_0 &{} = 0. \end{array}\right. } \end{aligned}$$
(5.9)

Here \(g\in L^p_{{\mathscr {P}}}(\Omega ; L^2 (0,T;H^{\lambda ,q}({{\mathbb {R}}}^d;\ell ^2)))\) with \(0< p<\infty \). It is well known that \(\partial _x\) generates a \(C_0\)-contraction semigroup on \(H^{\lambda ,q}({{\mathbb {R}}})\) for all \(\lambda \in {{\mathbb {R}}}\) and \(1\le q<\infty \).

Let us now assume that \(2\le q<\infty \). As before, by Theorem 4.1, the mild solution u to the problem (5.9) has a continuous modification with values in \(H^{\lambda }({{\mathbb {R}}})\) which satisfies

$$\begin{aligned} {\mathbb {E}}\sup _{t\in [0,T]}\Vert u_t|_{H^{\lambda ,q}({{\mathbb {R}}})}^p\leqslant C_{p,q}^p {\mathbb {E}}\Vert g\Vert _{L^2(0,T;H^{\lambda ,q}({{\mathbb {R}}};\ell ^2))}^p, \end{aligned}$$

where may take \(C_{p,q} = 10\sqrt{p}(q-1)\) if \(2\le p,\infty \). As before, for \(\beta \geqslant 0\) let

$$\begin{aligned}E_{n,\beta } := \Big ({\mathbb {E}}\sup _{j=0,\ldots ,n}\Vert u_{t_j^{(n)}} - u_j^{(n)}\Vert ^p_{H^{\lambda -\beta ,q}({{\mathbb {R}}})}\Big )^{1/p}.\end{aligned}$$

By Theorem 5.14 we have \(E_{n,0}\rightarrow 0\) for (S) and (IE), and if \(q=2\) the same holds for (CN) by Proposition 5.12.

Table 2 gives the estimates for the errors \(E_{n,\beta }\) for suitable intervals for \(\beta \) (up to constants depending on pq). The assertions follow from Example 5.10 (using Proposition 5.12 for (CN) if \(q=2\)), Corollary 5.2, and Theorem 5.13 applied with \(X = H^{\lambda -\beta , q}({{\mathbb {R}}})\), \({\mathsf {D}}(A^m) = H^{\lambda -\beta +m, q}({{\mathbb {R}}})\) and \(Y = H^{\lambda , q}({{\mathbb {R}}}) = [X,{\mathsf {D}}(A^m)]_{\beta /m}\) for \(m=1\) for (S), \(m=2\) for (IE), and \(m=3\) for (CN). Note that \(\phi (8/5) = 1\); since the convergence rate cannot exceed 1, there is no point in considering values \(\beta >\frac{8}{5}\).

Table 2 Approximation errors for the stochastic transport equation, where \(\phi \) is the piecewise linear function connecting the points \(\left( \frac{1}{2}, 0\right) \), \(\left( 1,\frac{1}{2}\right) \), and \(\left( 2,\frac{4}{3}\right) \)

Our final example concerns the Schrödinger equation.

Example 5.17

(Stochastic Schrödinger equation) Consider the following heat equation on \({{\mathbb {R}}}^d\):

$$\begin{aligned} {\left\{ \begin{array}{ll} \,{\mathrm{d}}u_t &{}= i\Delta u_t + \sum _{k\geqslant 1} g_{t}^{k}\,{\mathrm{d}}W_t^k, \quad t\in [0,T]. \\ u_0 &{} = 0. \end{array}\right. } \end{aligned}$$

We assume that \(g\in L^p_{{\mathscr {P}}}(\Omega ; L^2 (0,T;H^{\lambda }({{\mathbb {R}}}^d;\ell ^2)))\) for some \(0< p<\infty \), where \(H^{\lambda }({{\mathbb {R}}}^d) = H^{\lambda ,2}({{\mathbb {R}}}^d)\). It is well known that \(i\Delta \) generates a unitary \(C_0\)-group on \(H^{\lambda }({{\mathbb {R}}}^d)\) for all \(\lambda \in {{\mathbb {R}}}\). As before, by Theorem 4.1, the mild solution u to the problem (5.9) has a continuous modification with values in \(H^{\lambda }({{\mathbb {R}}}^d)\) which satisfies

$$\begin{aligned} {\mathbb {E}}\sup _{t\in [0,T]}\Vert u_t\Vert _{H^{\lambda }({{\mathbb {R}}}^d)}^p\leqslant C_p^p{\mathbb {E}}\Vert g\Vert _{L^2(0,T;H^{\lambda }({{\mathbb {R}}}^d;\ell ^2))}^p, \end{aligned}$$

where we may take \(C_p = 10 \sqrt{p}\) if \(2\le p<\infty \).

As before let

$$\begin{aligned}E_{n,\beta } := \Big ({\mathbb {E}}\sup _{j=0,\ldots ,n}\Vert u_{t_j^{(n)}} - u_j^{(n)}\Vert ^p_{H^{\lambda -2\beta }({{\mathbb {R}}}^d)}\Big )^{1/p}.\end{aligned}$$

By Theorem 5.14, \(E_{n,0}\rightarrow 0\) for (S), (IE), and (CN) (using Proposition 5.12 for the latter).

Table 3 gives the estimates for the errors \(E_{n,\beta }\) (up to constants depending on p) for suitable intervals for \(\beta \). The assertions follow from Example 5.10, Corollary 5.2, and Theorem 5.13 applied with \(X = H^{\lambda -2\beta }({{\mathbb {R}}}^d)\) and \(Y = H^{\lambda , q}({{\mathbb {R}}}^d) = [X,{\mathsf {D}}(A^m)]_{\beta /m}\) for \(m=1\) for (S), \(m=2\) for (IE), and \(m=3\) for (CN).

Table 3 Approximation errors for the stochastic Schrödinger equation

We are aware of only few papers dealing with convergence uniformly in time in infinite dimensions. In [37] the splitting method is considered for (possibly degenerate) parabolic problems with gradient noise. The inhomogeneities have to be uniformly bounded in time. The same methods are considered in [18] for semi-linear stochastic parabolic problems. No contractivity of the semigroups needs to be assumed and convergence in Hölder norms is obtained under \(L^p\)-integrability conditions in time with \(p>2\). See Table 4 for a comparison of the convergence rates.

In [18] (in the setting of UMD spaces) and [38] (in the setting of monotone operators on Gelfand triples \(V\hookrightarrow X\hookrightarrow V^*\)), the implicit Euler scheme was considered with uniform convergence in time, but these results seem not to be comparable to ours due to the fact that an additional discretisation of the noise term is allowed. In the latter reference, convergence rates of order \(n^{-\nu }\) are obtained under the assumption that the solution u belong to \(C^{\nu }([0,T];L^2(\Omega ;V)) \cap L^2(\Omega ;L^\infty (0,T;V))\). Results on uniform convergence in time (and sometimes even convergence in Hölder norms in time) for schemes involving space and time discretisation can be found in many papers, including [14,15,16, 36, 39, 53, 80, 109]. Results concerning uniform convergence in case of white noise and discretisation in time only can be found in [3, 4, 40, 41]. Some results are with explicit rates and some are not, but the schemes considered in these papers are different.

In the parabolic setting, results on convergence of the form

$$\begin{aligned} \sup _{j=0,\ldots ,n} {\mathbb {E}}\Vert u(t_j^{(n)}) - u_j^{(n)}\Vert ^p\rightarrow 0 \end{aligned}$$
(5.10)

(notice the reversed order of supremum and expectation) with explicit rates, which can even be faster than 1/n, can be found in [17, 54, 64] and references therein.

Table 4 Comparison of rates in the parabolic setting

For non-parabolic problems no systematic results seem to be available on uniform convergence in time. In [103] uniform convergence with explicit rates has been obtained for a nonlinear wave equation with the splitting scheme. The fact that the underlying semigroup is a group allows us to write

$$\begin{aligned}\int _0^t S(t-s) g_s d W_s = S(t)\int _0^t S(-s) g_s d W_s\end{aligned}$$

and uniform convergence can be obtained from standard maximal estimates for martingales. In [32] the authors obtain uniform convergence results in case the semigroup admits a dilation to a group. Our results do not rely on the above identity and therefore are applicable in the case of arbitrary contractive \(C_0\)-semigroups, and the convergence holds with the same rate. Even more is true: for arbitrary \(C_0\)-semigroups and general numerical schemes the same convergence rates can be obtained up to a logarithmic factor.

6 Maximal inequalities for random stochastic convolutions

In this section we consider the time-dependent problem

$$\begin{aligned} {\left\{ \begin{array}{ll} \,{\mathrm{d}}u_t &{}= A(t)u_t\,{\mathrm{d}}t + g_t \,{\mathrm{d}}W_t, \qquad t\in [0,T], \\ u_0 &{}= 0., \end{array}\right. } \end{aligned}$$
(6.1)

with random operators A(t). More precisely we assume that \((A(t,\omega ))_{(t,\omega )\in [0,T]\times \Omega }\) is an adapted family of closed operators acting in X which satisfy suitable conditions, to be made precise below, guaranteeing the generation of an adapted evolution family. We will assume throughout that W is an adapted H-cylindrical Brownian motion on \(\Omega \). and that \(g:[0,T]\times \Omega \rightarrow \gamma (H,X)\) is progressively measurable; recall that this is equivalent to the requirement that \(g(h): [0,T]\times \Omega \rightarrow X\) is progressively measurable for all \(h\in H\). Many of the results of this section are expected to extend to more general martingales.

6.1 The forward stochastic integral

In analogy with the non-random case one expects that (6.1) admits a mild solution given as before by the stochastic convolution process \(\int _0^t S(t,s)g_s\,{\mathrm{d}}W_s\). This stochastic integral, however, cannot be defined as an Itô stochastic integral because the random variables S(ts)x are only assumed to be \({\mathscr {F}}_t\)-measurable rather than \({\mathscr {F}}_s\)-measurable and consequently the integrand will not be progressively measurable in general.

To overcome this problem we use the forward stochastic integral, introduced and studied by Russo and Vallois [88] in the scalar-valued setting. Following [62, 85, 86] we define its vector-valued analogue as follows. Fix an orthonormal basis \((h_k)_{k\geqslant 1}\) of H. For processes \(\Phi \in L^0(\Omega ;L^2(0,T;\gamma (H,X)))\) and \(n=1,2,\dots \) define

$$\begin{aligned} I^-(\Phi ,n) := n\sum _{k=1}^n \int _0^T \Phi _s h_k (W_{(s+1)/n} - W_s) h_k\,{\mathrm{d}}s. \end{aligned}$$

The process \(\Phi \) is forward stochastically integrable if the sequence \((I^-(\Phi ,n))_{n\geqslant 1}\) converges in probability. If this is the case, the limit is independent of the choice of orthonormal basis and is called the forward stochastic integral of \(\Phi \). We write

$$\begin{aligned}\int _0^T \Phi _s\,{\mathrm{d}}W_s^- := I^{-}(\Phi ) := \lim _{n\rightarrow \infty }I^-(\Phi ,n).\end{aligned}$$

Notice that \(\Phi \) is not assumed to be progressively measurable. It is easy to see that if \(\Phi \) is a finite rank step process, then \(\Phi \) is forward integrable. In case \(\Phi \) is progressively measurable and integrable in the Itô sense, then the forward stochastic integral exists and coincides with the Itô integral (see [86, Proposition 3.2]).

In order to apply the forward integral to our problem we make following Hypothesis:

Hypothesis 6.1

The family \((S(t,s,\omega ))_{0\le s\le t\le T,\,\omega \in \Omega }\) is an adapted \(C_0\)-evolution family of contractions on X, i.e.,

  1. (i)

    \((S(t,s,\omega )_{0\le s\le t\le T}\) is a \(C_0\)-evolution family of contractions for every \(\omega \in \Omega \);

  2. (ii)

    \(S(t,s,\cdot )x\) is strongly \({\mathscr {F}}_t\)-measurable for all \(0\leqslant s\leqslant t\leqslant T\) and \(x\in X\).

Furthermore we assume:

  1. (iii)

    Y is a Banach space, continuously embedded in X, and for almost all \(\omega \in \Omega \) we have \(S(t,\cdot ,\omega )y \in W^{1,1}(0,t;X)\) for all \(t\in (0,T]\) and \(y\in Y\) and

    $$\begin{aligned} \Vert (S(t,\cdot ,\omega )y)\Vert _{W^{1,1}(0,t;X)}\leqslant C(\omega )\Vert y\Vert _Y\end{aligned}$$

    for some function \(C:\Omega \rightarrow [0,\infty )\) independent of \(y\in Y\) and \(0 \leqslant s\leqslant t\leqslant T\).

We have the following sufficient condition for forward integrability (see [86, Corollary 5.3], which extends to the current setting).

Proposition 6.2

Suppose that Hypothesis 6.1 holds, with X a 2-smooth Banach space, and let \(g:[0,T]\times \Omega \rightarrow \gamma (H,Y)\) be a finite rank adapted step process. Then process \((S(t,s)g_s)_{s\in [0,t]}\) is forward integrable on [0, t] and almost surely we have

$$\begin{aligned} \int _0^t S(t,s)g_s\,{\mathrm{d}}W_s^- = S(t,0)\int _0^t g_s \,{\mathrm{d}}W_s + \int _0^t \partial _s S(t,s) \int _s^t g_r \,{\mathrm{d}}W_r \,{\mathrm{d}}s. \end{aligned}$$
(6.2)

Moreover, the process \((\int _0^t S(t,s)g_s\,{\mathrm{d}}W_s^-)_{t\in [0,T]}\) has a continuous modification.

The right-hand side of (6.2) is well defined by the hypothesis and the assumption that g takes values in Y. By the almost sure pathwise continuity of \(\int _0^\cdot g_s \,{\mathrm{d}}W_s\), the forward integral in (6.2) admits a continuous modification.

Remark 6.3

In the setting where S is generated by an adapted family \((A(t))_{t\in [0,T]}\) satisfying suitable parabolicity assumptions, the right-hand side of (6.2) is called the pathwise mild solution of (6.1). Pathwise mild solutions were introduced and extensively studied in [85]. In the parabolic case, \(\partial _s S(t,s)\) typically extends to a bounded operator on X and \(\Vert \partial _s S(t,s)\Vert \leqslant C(t-s)^{-1}\), where C depends on \(\omega \in \Omega \). Since \(\int _0^\cdot g_r \,{\mathrm{d}}W_r\) is almost surely Hölder continuous under \(L^p(0,T)\)-integrability assumptions on g with \(p>2\), the right-hand side of (6.2) exists pathwise as a Bochner integral.

It is quite difficult to prove estimates for the forward integral directly. A major advantage of using the right-hand side of (6.2) is that one can obtain estimates using only Itô and Bochner integrals.

6.2 The maximal inequality

We will now extend the maximal estimate of Theorem 4.1 to random evolution families, replacing the Itô stochastic integral of that theorem by the forward stochastic integral. The precise sense in which the forward integral constitutes a solution of the problem (6.1) will be addressed subsequently in Theorem 6.6. Even without the supremum on the left-hand side, the estimate in Theorem 6.4 is new.

Theorem 6.4

Suppose that Hypothesis 6.1 holds,with X a 2-smooth Banach space, and let \(g:[0,T]\times \Omega \rightarrow \gamma (H,Y)\) be a finite rank adapted step process. Then for all \(0<p<\infty \) we have

$$\begin{aligned} {\mathbb {E}}\sup _{t\in [0,T]}\Big \Vert \int _0^t S(t,s) g_s \,{\mathrm{d}}W_s^-\Big \Vert ^p \leqslant C_{p,D}^p \Vert g\Vert _{L^p(\Omega ;L^2(0,T;\gamma (H,X)))}^p, \end{aligned}$$

where the constant \(C_{p,D}\) only depends on p and D. For \(2\le p<\infty \) the inequality holds with \(C_{p,D} = 10D\sqrt{p}\).

Proof

The proof is similar to that of Theorem 4.1, but with some extra technicalities which justify a detailed presentation.

Step 1 Let \(g:[0,T]\times \Omega \rightarrow \gamma (H,X)\) be an adapted finite rank step process, say

$$\begin{aligned} g = \sum _{j=1}^{k} {\mathbf{1}}_{(s_{j-1},s_{j}]} \sum _{i=1}^\ell h_i\otimes \xi _{ij} \end{aligned}$$

as in (2.9). For the moment there is no need to insist that g be Y-valued; this will only be needed in the last step of the proof.

Fix \(0<\delta <T\) and set \(S^{\delta }(t,s) := S((t-\delta )^+,(s-\delta )^+)\) for \(0\leqslant s\leqslant t\leqslant T\). Fix a partition \(\pi := \{r_0,\dots ,r_N\}\), where \(0= r_0<r_1<\ldots <r_N=T\), and let \((K(t,s,\omega ))_{0\leqslant s\leqslant t\leqslant T,\,\omega \in \Omega }\) be a family of contractions on X with the following properties:

  1. (i)

    \(K(t,\cdot ,\omega )\) is constant on \([r_{j-1},r_j)\) for all \(t\in [0,T]\), \(\omega \in \Omega \), and \(j=1, \ldots , N\);

  2. (ii)

    \(K(\cdot , s,\omega )\) is strongly continuous for all \(s\in [0,T]\) and \(\omega \in \Omega \);

  3. (iii)

    \(S^\delta (t,r,\omega ) K(r,s,\omega ) = K(t,s,\omega )\) for all \(0\leqslant r\leqslant s\leqslant t\leqslant T\) and \(\omega \in \Omega \);

  4. (iv)

    \(K(t,s,\cdot )x\) is strongly \({\mathscr {F}}_{(t-\delta )^+}\)-measurable for all \(0\le s\leqslant t\leqslant T\).

By refining \(\pi \) we may assume that \(|r_j-r_{j-1}|\leqslant \delta \) for \(j=1,\ldots ,N\) and that \(s_j\in \pi \) for all \(j=0,\ldots ,k\).

Define the process \((v_t)_{t\in [0,T]}\) by

$$\begin{aligned} v_t := \int _0^t K(t,s) g_s \,{\mathrm{d}}W_s^{-}, \end{aligned}$$
(6.3)

this forward integral being well defined since the integrand is a finite rank step process. For \(t\in [0,r_1]\) the above integral coincides with the Itô integral since \(K(t,s,\cdot )\) is strongly \({\mathscr {F}}_{0}\)-measurable. By (iii), for \(r_{j-1} \le s\le t< r_j\) we have

$$\begin{aligned} v_t = S^{\delta }(t,s) v_{s} + \int _{s}^{t} K(t,r) g_r \,{\mathrm{d}}W_r, \end{aligned}$$
(6.4)

where the stochastic integral is again an Itô integral since the random variable \(K(t,r,\cdot )\) does not depend on \(r\in [s,t]\subseteq [r_{j-1},r_j)\) by (i) and is strongly \({\mathscr {F}}_{r_{j-1}}\)-measurable by (iv) and the inclusion \({\mathscr {F}}_{(t-\delta )^+}\subseteq {\mathscr {F}}_{r_{j-1}}\) (using that \((t-\delta )^+\le r_{j-1}\)). Properties (i) and (ii) imply that v has a modification with continuous paths. Working with such a modification, we will first prove that for all \(2\le p<\infty \) one has

$$\begin{aligned} \Big \Vert \sup _{t\in [0,T]}\Vert v_t\Vert \Big \Vert _p \leqslant 10D\sqrt{p}\Vert g\Vert _{L^p(\Omega ;L^2(0,T;\gamma (H,X)))}. \end{aligned}$$

By a limiting argument it suffices to consider exponents \(2<p<\infty \).

Let \(\pi ' = \{t_0,t_1, \ldots , t_m\}\subseteq [0,T]\) be another partition. It suffices to prove

$$\begin{aligned} \Big \Vert \sup _{t\in \pi '} \Vert v_t \Vert \Big \Vert \le a_\pi + 10D\sqrt{p} \Vert g\Vert _{L^p(\Omega ;L^2(0,T;\gamma (H,X)))} \end{aligned}$$
(6.5)

with \(a_\pi = o(\hbox {mesh}(\pi ))\) as mesh\((\pi )\rightarrow 0\). Refining \(\pi '\) if necessary, we may assume that \(\pi '\subseteq \pi \) and that mesh\((\pi ')<\delta \).

For fixed \(j=1,\ldots ,m\) we have, by (6.4),

$$\begin{aligned} f_{j} := v_{t_j}&= S^{\delta }(t_j, t_{j-1}) v_{t_{j-1}} + \int _{t_{j-1}}^{t_j} K(t_j,s) g_s \,{\mathrm{d}}W_s \\&=: V_{j} f_{j-1} + dG_j, \end{aligned}$$

where we set \(V_{j} := S^{\delta }(t_j, t_{j-1})\) and \(dG_j:= \int _{t_{j-1}}^{t_j} K(t_j,s) g_s \,{\mathrm{d}}W_s\). We further set \(f_0:=0\) and \(G_0:=0\). As in the proof of Theorem 4.1 the sequence \((dG_j)_{j=1}^m\) is conditionally symmetric and an application of Theorem 3.1 gives

$$\begin{aligned} \Vert f^\star \Vert _p \le 5p \Vert dG^\star \Vert _p + 10D\sqrt{p} \Vert s(G)\Vert _p. \end{aligned}$$

Proceeding as in Step 1b of the proof of Theorem 4.1 we obtain (6.5).

Step 2 Fix \(n\in {{\mathbb {N}}}\) and set \(\sigma _n(s) := j 2^{-n}T\) for \(s\in [j2^{-n}T, (j+1)2^{-n}T)\). Set \(S_n^{\delta }(t,s) := S((t-\delta )^+,\sigma _n((s-\delta )^+))\) and define \(v^{(n)}_t\) as in (6.3) with \(K(t,s) = S_n^\delta (t,s)\). The assumptions (i)–(iv) in Step 1 apply to \(K(t,s) = S_n^\delta (t,s)\), \(N = 2^n\), and \(r_j = j2^{-n}T\). By what has been shown in this step, the process \(v_t\) has a continuous modification. Moreover, for \(n\geqslant m\) the process

$$\begin{aligned} v^{(n)}_t - v^{(m)}_t = S^{\delta }(t,s) (v_{s}^{(n)}- v_{s}^{(m)}) + \int _{s}^{t} K(t,r) (I- S(\sigma _n(r),\sigma _m(r))) g_r\,{\mathrm{d}}W_r\end{aligned}$$

is strongly progressively measurable. Moreover,

$$\begin{aligned} \&\Big \Vert \sup _{t\in [0,T]} \Vert v^{(n)}-v^{(m)}\Vert \Big \Vert _p \\&\quad \leqslant 10D\sqrt{p} \big \Vert (I- S(\sigma _n((\cdot -\delta )^+),\sigma _m((\cdot -\delta )^+)))g\big \Vert _{L^p(\Omega ;L^2(0,T;\gamma (H,X)))}. \end{aligned}$$

Since the right-hand side tends to 0 by dominated convergence, \((v^{(n)})_{n\geqslant 1}\) is a Cauchy sequence with respect to the norm of \(L^p(\Omega ;C([0,T];X))\) and hence converges to some \(\widetilde{v}^{\delta } \in L^p(\Omega ;C([0,T];X))\). By Step 1,

$$\begin{aligned} \Big \Vert \sup _{t\in [0,T]} \Vert \widetilde{v}_t^{\delta }\Vert \Big \Vert _{p} = \lim _{n\rightarrow \infty } \Big \Vert \sup _{t\in [0,T]} \Vert v^{(n)}_t\Vert \Big \Vert _{p}\leqslant 10D\sqrt{p} \Vert g\Vert _{L^p(\Omega ;L^2(0,T;\gamma (H,X)))}. \end{aligned}$$
(6.6)

We will show next that \( \widetilde{v}_t^{\delta }=\int _0^t S^{\delta }(t,s) g_s \,{\mathrm{d}}W_s^{-}\) almost surely for each \(t\in [0,T]\). To this end let \(\pi '' = \{t_0, \ldots , t_M\}\) with \(0=t_0<\ldots <t_M=T\) with mesh\((\pi '')<\delta \). We define an X-valued process \((v_t^{\delta })_{t\in [0,T]}\) by setting \(v^{\delta }_0 := 0\) and, recursively,

$$\begin{aligned}v_t^{\delta } = S^{\delta }(t,t_{j-1}) v_{t_{j-1}} + \int _{t_{j-1}}^{t} S^{\delta }(t,s) g_s \,{\mathrm{d}}W_s, \qquad t\in (t_{j-1},t_{j}].\end{aligned}$$

The stochastic integral is well defined since for all \(t_{j-1}\leqslant s\leqslant t\leqslant t_j\) the random variable \(S^{\delta }(t,s) = S((t-\delta )^+, (s-\delta )^+)\) is strongly \({\mathscr {F}}_{t_{j-1}}\)-measurable. Using the elementary properties of forward integrals we can rewrite this definition as the forward integral

$$\begin{aligned} v^{\delta }(t) = \int _0^t S^{\delta }(t,s) g_s \,{\mathrm{d}}W_s^{-}, \qquad t\in [0,T]. \end{aligned}$$
(6.7)

We claim that for each \(t\in [0,T]\) we have \(v^{\delta }(t) = \widetilde{v}^{\delta }(t)\) almost surely. Indeed, by (2.10),

$$\begin{aligned} \Big \Vert \int _{t_{j-1}}^{t} S^{\delta }_n(t,s) g_s&\,{\mathrm{d}}W_s - \int _{t_{j-1}}^{t} S^{\delta }(t,s) g_s \,{\mathrm{d}}W_s\Big \Vert _{L^2(\Omega ;X)} \\&\leqslant D\Vert (S^{\delta }_n(t,s)- S^{\delta }(t,s))g_s\Vert _{L^2(\Omega ;L^2(0,t;\gamma (H,X)))}\rightarrow 0 \end{aligned}$$

as \(n\rightarrow \infty \) by dominated convergence. Therefore, the terms in the recursive identities (6.7) converge to the correct limit and the claim is proved.

Step 3 We will next show that

$$\begin{aligned} \lim _{\delta \downarrow 0} \int _0^t S^{\delta }(t,s) g_s \,{\mathrm{d}}W_s^- = \int _0^t S(t,s) g_s \,{\mathrm{d}}W_s^- \end{aligned}$$

in \(L^0(\Omega ;C([0,T];X))\). This will be done by providing an alternative formula for \(\int _0^t S^{\delta }(t,s) g_s \,{\mathrm{d}}W_s^{-}\) in which we can let \(\delta \downarrow 0\). Here it will be important that g takes values in Y.

Fix \(t\in (0,T]\). Since \(\Vert \partial _s (S(t,s)y)\Vert _X\leqslant C\Vert y\Vert _Y\) with a constant C independent of \(0<s<t\leqslant T\), it follows from Proposition 6.2 that the forward stochastic convolution integral \(u_t := \int _0^t S(t,s) g_s \,{\mathrm{d}}W_s^{-} \) exists and is almost surely equal to

$$\begin{aligned} S(t,0) \int _0^t g_{r} \,{\mathrm{d}}W_r + \int _0^t \partial _s S(t,s) \int _s^t g_r \,{\mathrm{d}}W_r \,{\mathrm{d}}s.\end{aligned}$$

Similarly,

$$\begin{aligned} v^{\delta }(t)&= S((t-\delta )^+,0) \int _0^t g_{r} \,{\mathrm{d}}W_r + \int _0^t \partial _s S((t-\delta )^+,(s-\delta )^+) \int _s^t g_r \,{\mathrm{d}}W_r \,{\mathrm{d}}s \\&= S((t-\delta )^+,0) \int _0^t g_{r} \,{\mathrm{d}}W_r + \int _{0}^{(t-\delta )^+} \partial _s S((t-\delta )^+,s) \int _{s+\delta }^t g_r \,{\mathrm{d}}W_r \,{\mathrm{d}}s. \end{aligned}$$

Letting \(\delta \downarrow 0\), by the piecewise strong continuity of \(t\mapsto \partial _s S(t,s)\) on Y and dominated convergence we obtain that \(v^{\delta }(t)\rightarrow u(t)\) almost surely.

By dominated convergence one also obtains that u has a continuous modification. To prove the maximal estimate for this modification it suffices to show that for any finite set \(\pi \in [0,T]\),

$$\begin{aligned} \Big \Vert \sup _{t\in \pi } \Vert u_t \Vert \Big \Vert \le 10D\sqrt{p} \Vert g\Vert _{L^p(\Omega ;L^2(0,T;\gamma (H,X)))}. \end{aligned}$$

Using that (6.6) and \(v^{\delta }(t) = \widetilde{v}^{\delta }(t)\) for \(t\in \pi \), this follows from Fatou’s lemma:

$$\begin{aligned} \Big \Vert \sup _{t\in \pi } \Vert u_t \Vert \Big \Vert _p \leqslant \liminf _{\delta \downarrow 0}\Big \Vert \sup _{t\in \pi } \Vert v_t^{\delta } \Vert \Big \Vert \le 10D\sqrt{p} \Vert g\Vert _{L^p(\Omega ;L^2(0,T;\gamma (H,X)))}. \end{aligned}$$

Step 4 The case \(0<p<2\) follows again by using Corollary 3.6 instead of Theorem 3.1, or by an extrapolation argument involving Lenglart’s inequality. \(\square \)

If the embedding \(Y\hookrightarrow X\) is dense we can use the maximal inequality of the theorem to see that for all \(0<p<\infty \) the mapping

$$\begin{aligned} g\mapsto \int _0^t S(t,s) g_s \,{\mathrm{d}}W_s^- \end{aligned}$$

has a unique extension to a continuous linear operator

$$\begin{aligned} J_p: L_{{\mathscr {P}}}^p(\Omega ;L^2(0,T;\gamma (H,X)))\rightarrow L^p(\Omega ;C([0,T];X)). \end{aligned}$$

Moreover, by a standard localisation argument, J has a unique extension to a continuous linear operator

$$\begin{aligned} J: L_{{\mathscr {P}}}^0(\Omega ;L^2(0,T;\gamma (H,X)))\rightarrow L^0(\Omega ;C([0,T];X)). \end{aligned}$$

It is not guaranteed, however, that for general \(g\in L_{{\mathscr {P}}}^0(\Omega ;L^2(0,T;\gamma (H,X)))\) the process Jg is given by a forward stochastic convolution again, nor is this clear if we replace \(L^0\) and J by \(L^p\) and \(J_p\). The same problem occurs if we use the right-hand side in the identity in Proposition 6.2.

Since \(J_p\) satisfies the same estimate as in Theorem 6.4, we immediately obtain an extension of the exponential tail estimate of Corollary 4.2 in the current setting. As in Remark 4.3 under more restrictive conditions on the random evolution family, but with better bound on the variance \(\sigma ^2\) a similar result was obtained in [95, Remark 5.8].

The next theorem addresses the question in what sense \(J_pg\) and Jg “solve” the problem (6.1). Some additional assumptions are needed to establish the precise relation between the random evolution family S and the random operator A.

Hypothesis 6.5

Hypothesis 6.1 is satisfied. Furthermore, the random operator family \(A:[0,T]\times \Omega \rightarrow {\mathscr {L}}(Y,X)\) has the property that Ay is strongly progressively measurable for all \(y\in Y\). Furthermore the following conditions hold:

  1. (i)

    For almost all \(\omega \in \Omega \) we have \(S(t,\cdot ,\omega )y \in W^{1,1}(0,t;X)\) for all \(t\in [0,T]\) and \(y\in Y\), and for almost all \(s\in [0,t]\) we have \(\partial _s S(t,s)y=-S(t,s)A(s) y\) and

    $$\begin{aligned}\Vert S(t,s)A(s)y\Vert _{X}\leqslant C\Vert y\Vert _Y,\end{aligned}$$

    where \(C:\Omega \rightarrow [0,\infty )\) is independent of \(y\in Y\) and \(0\leqslant s<t\leqslant T\).

  2. (ii)

    For almost all \(\omega \in \Omega \) we have \(S(\cdot ,s,\omega )y \in W^{1,1}(s,T;X)\) for all \(s\in [0,T]\) and \(y\in Y\), and for almost all \(t\in [s,T]\) we have \(\partial _t S(t,s)y=A(t)S(t,s) y\) and

    $$\begin{aligned}\Vert A(t)S(t,s)y\Vert _{X}\leqslant C\Vert y\Vert _Y,\end{aligned}$$

    where \(C:\Omega \rightarrow [0,\infty )\) is independent of \(y\in Y\) and \(0\leqslant s<t\leqslant T\).

  3. (iii)

    There exists a dense subspace \(F\subseteq X^*\) such that \(F\subseteq {\mathsf {D}}(A(t,\omega )^*)\) for all \((t,\omega )\in [0,T]\times \Omega \), and almost surely the mapping \((t,\omega )\mapsto \langle x, A(t,\omega )^*x^*\rangle \) belongs to \(L^\infty (0,T)\) for all \(x\in X\) and \(x^*\in F\).

In the proof below we will combine (iii) with the observation that if \(f:(0,T)\rightarrow X\) is integrable and \(g:(0,T)\rightarrow X^*\) has the property that \(\langle x,g\rangle \in L^\infty (0,T)\) for all \(x\in X\), then the function \(t\mapsto \langle f(t),g(t)\rangle \) is integrable and

$$\begin{aligned} \int _0^T |\langle f(t),g(t)\rangle |\le \Vert f\Vert _1 \sup _{\Vert x^*\Vert \le 1} \Vert \langle x,g\rangle \Vert _\infty , \end{aligned}$$

the supremum on the right-hand side being finite by a closed graph argument. Indeed, this estimate is clear for simple functions f and the general case follows by approximation.

Under the above hypothesis a process \(u\in L_{{\mathscr {P}}}^0(\Omega ;L^1(0,T;X))\) is called a weak solution of (6.1) if for all \(x^*\in F\), a.s. for all \(t\in [0,T]\),

$$\begin{aligned}\langle u_t,x^*\rangle = \int _0^t \langle u_s, A(s)^* x^*\rangle \,{\mathrm{d}}s +\int _0^t g_s^* x^* \,{\mathrm{d}}W_{s}.\end{aligned}$$

In many situations weak solutions are known to be unique. However, we will not address this issue here.

Theorem 6.6

Suppose that Hypothesis 6.5 holds, with X a 2-smooth Banach space, and assume in addition that the embedding \(Y\hookrightarrow X\) is dense. Then for every \(g\in L_{{\mathscr {P}}}^0(\Omega ;L^2(0,T;\gamma (H,X)))\) the process Jg is a weak solution to (6.1).

Proof

We proceed in three steps.

Step 1 First let \(g:[0,T]\times \Omega \rightarrow {\mathscr {L}}(H,Y)\) be an adapted finite rank step processes and write \(v^g_t = \int _0^t g \,{\mathrm{d}}W\). From Proposition 6.2, Theorem 6.4 and Hypothesis 6.5(i) it is immediate that

$$\begin{aligned} {\mathbb {E}}\sup _{t\in [0,T]}\Vert u^g_t\Vert ^p \leqslant C_{p,D}^p \Vert g\Vert _{L^p(\Omega ;L^2(0,T;\gamma (H,X)))}^p, \end{aligned}$$
(6.8)

and

$$\begin{aligned}u_t^g = S(t,0) v_t^g - \int _0^t S(t,s)A(s) (v_t^g - v_s^g) \,{\mathrm{d}}s.\end{aligned}$$

We check next that \(u^g\) is a weak solution. For this we use a variation of the argument in [85, Theorem 4.9]. For all \(x\in Y\),

$$\begin{aligned} \int _0^t S(t,s) A(s) x \,{\mathrm{d}}s= -x + S(t,0)x \ \ \ \text {and} \ \ \int _r^t A(s) S(s,r) x \,{\mathrm{d}}r= S(t,r)x - x \end{aligned}$$
(6.9)

Therefore, applying the first part of (6.9) with \(x = v_t^g\), we obtain

$$\begin{aligned} u_t^g = v_t^g + \int _0^t S(t,s)A(s) v_s^g \,{\mathrm{d}}s. \end{aligned}$$
(6.10)

To claim that \(u^g\) is a weak solution it remains to check that

$$\begin{aligned}\left<\int _0^t S(t,s)A(s) v_s^g \,{\mathrm{d}}s, x^*\right> = \int _0^t \langle u^g_s, A(s)^*x^*\rangle \,{\mathrm{d}}s.\end{aligned}$$

Note that the integral on the right-hand side is well defined as a Lebesgue integral almost surely. To prove the claim we note that by (6.10), Fubini’s theorem and the second part of (6.9) (or rather, its weak version \(\int _r^t \langle S(s,r) x, A^*(s) x^*\rangle \,{\mathrm{d}}r = \langle S(t,r) x, x^*\rangle - \langle x, x^*\rangle \), the point being that in the argument below the vector \(x=A(t) v^g_r\) need not belong to Y),

$$\begin{aligned}&\int _0^t \langle u^g_s, A(s)^*x^*\rangle \,{\mathrm{d}}s \\&\quad = \int _0^t \langle v_s^g,A(s)^* x^*\rangle \,{\mathrm{d}}s + \int _0^t \int _0^s \langle S(s,r)A(r) v_r^g,A(s)^* x^*\rangle \,{\mathrm{d}}r \,{\mathrm{d}}s \\&\quad = \int _0^t \langle v_s^g,A(s)^* x^*\rangle \,{\mathrm{d}}s + \int _0^t \int _r^t \langle S(s,r)A(r) v_r^g,A(s)^* x^*\rangle \,{\mathrm{d}}s\,{\mathrm{d}}r \\&\quad = \int _0^t \langle v_s^g,A(s)^* x^*\rangle \,{\mathrm{d}}s + \int _0^t \langle S(t,r)A(r) v_r^g,x^*\rangle \,{\mathrm{d}}r - \int _0^t \langle A(r) v_r^g,x^*\rangle \,{\mathrm{d}}r \\&\quad = \int _0^t \langle S(t,r)A(r) v_r^g,x^*\rangle \,{\mathrm{d}}r, \end{aligned}$$

which gives the required identity.

Step 2 Let \(g\in L^p_{{\mathscr {P}}}(\Omega ;L^2(0,T;\gamma (H,X)))\) with \(0<p<\infty \) and choose a sequence of Y-valued adapted finite rank step processes \((g^{(n)})_{n\geqslant 1}\) such that \(g^{(n)}\rightarrow g\) in \(L^p(\Omega ;L^2(0,T;\gamma (H,X)))\). Then from (6.8) applied to \(g^{(n)}-g^m\) we obtain that \((u^{g^{(n)}})_{n\geqslant 1}\) is a Cauchy sequence and therefore converges to some u in \(L^p(\Omega ;C([0,T];X))\). By Step 1, \(u^{g^{(n)}}\) is a weak solution and thus

$$\begin{aligned}\langle u_t^{g^{(n)}},x^*\rangle = \int _0^t \langle u_s^{g^{(n)}}, A(s)^* x^*\rangle \,{\mathrm{d}}s +\int _0^t (g_s^{(n)})^* x^* \,{\mathrm{d}}W_{s}.\end{aligned}$$

Letting \(n\rightarrow \infty \) in this identity we conclude that \(u^g\) is a weak solution. The maximal inequality is obtained by applying (6.8) with \(g_n\) and letting \(n\rightarrow \infty \). \(\square \)

Remark 6.7

In [62, Proposition 5.3], restrictive conditions in terms of Malliavin differentiability of S are given under which the forward stochastic integral \(u_t = \int _0^t S(t,s) g_s \,{\mathrm{d}}W_s^-\) exists, has a continuous modification, and is a weak solution. Inspection of the proof shows that that if one sets \(u_t^{(n)} := I^{-}({\mathbf{1}}_{[0,t]} S(t,\cdot ) g^{(n)})\), one needs that \(\sup _{t\in [0,T]}\Vert u_t - u_t^{(n)}\Vert _{L^1(\Omega ;X)} \rightarrow 0\). Although this is likely to hold in many situations, such considerations can be avoided by using the right-hand side of (6.2).

Remark 6.8

Theorem 5.1 extend mutatis mutandis to random evolution families. The only required change is to use the forward integral in the proof and to apply Theorem 6.4 instead of Theorem 4.1. To obtain explicit decay rates under the assumption that g has spatial smoothness, i.e., g takes values in a Banach space Y continuously embedded in X, one requires estimates for \(\Vert S(s,\sigma _n(s)) - I\Vert _{{\mathscr {L}}(Y,X)}\). In some applications (e.g. [78, Sect. 5.2]) such estimates are available.