1 Introduction, Statement of Problem and Overview of Results

Data assimilation is a computational technique that aims at blending a dynamical model of a physical process with observational data of this process while at the same time preserving the integrity of the model and making optimal use of the observational information. It constitutes a fundamental technique for modeling real-world phenomena. Numerical weather prediction and ocean state estimation are examples of scientific disciplines that rely on data assimilation (Kalnay 2003; Wunsch 1997; Forget et al. 2015). Data assimilation determines a best estimate of the unknown current state of the atmosphere or the ocean by steering the model trajectory toward time distributed observations of the past. The estimate of the current state is used as initial condition for a prediction. The rationale behind is, the better the estimate of the present state, the better the prediction of the future evolution.

The data assimilation algorithm that we consider here is known as “4D-var” or “adjoint method” of optimal control and belongs to the class of variational algorithms. Variational data assimilation algorithms strive for minimizing the distance between observations and model solution by solving an optimization problem for an appropriately chosen cost functional. The initial condition acts as a control variable of the optimization problem that fits the model trajectory over a finite time horizon to the observations. In contrast, feedback control algorithms such as “continuous data assimilation ” or “nudging” drive the model over an infinite time horizon toward observations by means of a feedback term (Kalnay 2003). Recently, these algorithms have been improved and rigorous results have been proven for a range of fluid dynamical equations, including the Navier–Stokes or the planetary geostrophic equations (see, e.g., Azouani and Titi 2014; Foias et al. 2016; Desamsetti et al. 2019; Farhat et al. 2016; Korn 2009) and the primitive equation (Pei 2019). A complementary class of assimilation algorithms is given by stochastic data assimilation such as the Kalman filter and its variants (Evensen 2003; Reich and Cotter 2015; Houtekamer and Zhang 2016), for mathematical aspects we refer to Schillings and Stuart (2017, 2018). Variational as well as stochastic algorithms are of great relevance in atmosphere-ocean science. Research in this area focuses on improvement and extension of computational algorithms in order to improve numerical weather predictions or ocean state estimation. The complexity of data assimilation algorithms and their applications have reached an impressive level (see, e.g., Bonavita et al. 2018; Isaksen et al. 2010; Gauthier et al. 2007). It is therefore important to understand fundamental properties of these algorithms. Results that are similar in spirit to our work can be found in Agoshkov and Ipatova (2007), where a variational algorithm was analyzed that uses observations of sea surface temperature and of sea surface elevation to determining the control variables of heat flux and water flux that occur in the surface boundary conditions of the oceanic primitive equations. The authors used a cost functional based on the \(L^2\)-norm and showed the existence of minimizers. In Shutyaev (1997), the more general setting of a data assimilation problem for quasi-linear evolution equations in Hilbert spaces was studied. To illustrate the general theory the existence of minimizers and the convergence of an iterative adjoint algorithm was established for a two-dimensional barotropic fluid. For fluid dynamical equations such as the 2D Navier–Stokes or the 3D Navier–Stokes-\(\alpha \) equations similar results on optimal control using initial conditions, boundary conditions or external forcing as control variable can be found in Fursikov (1999), Abergel and Temam (1990) and Korn (2009). In Korn (2019), an idealized coupled atmosphere-ocean model was introduced and based on the regularity of the coupled equations the existence of optimal initial conditions and the convergence of a steepest descent method was proven.

In the 4D-var algorithm, the crucial modeling decision is how the distance to the observational data and to the background state is measured, i.e., how the cost functional is specified. The classical choice is the \({L^{2}}\)-norm. In this paper, we suggest to use for the primitive equations the \({\mathcal {H}^{1}}\)-norm. This norm in the cost functional reflects the \({\mathcal {H}^{1}}\)-regularity of strong solutions of the primitive equations. The optimization determines initial conditions for strong solutions of the primitive equations. With such a choice, we take advantage of a priori knowledge about the dynamics of the primitive equations.

The use of alternative metrics beyond \(L^2([0,T],{\mathcal {L}^{2}}(\Omega ))\) is not new in optimal control of fluids [for a review see Medjo et al. (2008)]. In Bewley et al. (2001), for example, enstrophy-based metrics and Sobolev-type cost functionals were discussed in the context of turbulence control via boundary forcing for a channel flow. The purpose here is to obtain additional regularizing effect due to derivatives in the cost functional. In contrast, in our work here, the goal is not to introduce a regularization of the data assimilation problem but to demonstrate that optimal initial conditions for strong solutions of the primitive equations assimilation problem necessitates the use of \({\mathcal {H}^{1}}\)-norms. We refer to this property as strong solvability of the data assimilation problem. The new and fundamental but decisive element in the suggested formulation of the data assimilation algorithm is to use in the cost functional metrics that are tailored to the regularity of the primitive equations. This paper describes for the primitive equations the consequences of such a choice.

1.1 The Primitive Equations

The primitive equations are a central set of model equations for atmosphere and ocean dynamics. These equations approximate the large-scale dynamics of ocean and atmosphere and describe ocean or atmosphere as thin layer of a rotating, incompressible fluid, together with the Boussinesq approximation and the hydrostatic approximation. The primitive equations are derived as an approximation of the rotating Navier–Stokes equations Li and Titi (1992). The state vector of the primitive equations consists of a horizontal velocity field \(\mathbf{v}=(u,v)\) and a tracer such as temperature \(\theta \). More specifically, denote by \(\Omega :=M\times (-h,0)\) a cylindrical domain, where \(h>0\) denotes the depth of the ocean and \(M\subseteq \mathbb {R}^2\) is bounded, with a smooth boundary \(\partial M\). The primitive equations are given by

$$\begin{aligned}&\partial _t \mathbf{v}+(\mathbf{v}\cdot \nabla ) \mathbf{v}+w\partial _z \mathbf{v}+\nabla p +f\vec {k}\times \mathbf{v}-\frac{1}{Re_1}\triangle \,\mathbf{v}- \frac{1}{Re_2}\partial ^2_{zz} \mathbf{v}=F_{\mathbf{v}}, \end{aligned}$$
(1a)
$$\begin{aligned}&\partial _z p +g\rho =0, \end{aligned}$$
(1b)
$$\begin{aligned}&\nabla \cdot \mathbf{v}+\partial _zw=0, \end{aligned}$$
(1c)
$$\begin{aligned}&\partial _t \theta + (\mathbf{v}\cdot \nabla ) \theta +w\partial _z \theta -\frac{1}{Rt_1}\triangle \theta -\frac{1}{Rt_2}\partial ^2_{zz}\theta =F_\theta , \end{aligned}$$
(1d)
$$\begin{aligned}&\rho =\rho (\theta ):=\rho _0-\alpha (\theta -\theta _0), \end{aligned}$$
(1e)

where w denotes the vertical velocity, f the Coriolis parameter, \(\rho \) denotes an equation of state with a mean density \(\rho _0>0\), a mean temperature \(\theta _0>0\) and a coefficient \(\alpha >0\) of thermal expansion, \(F_{\mathbf{v}},F_\theta \) are forcing terms. The numbers \(Re_1,Re_2>0\) denote the horizontal and vertical Reynolds number that represent the viscosity coefficients. The numbers \(Rt_1,Rt_2>0\) are the horizontal and vertical mixing coefficients for temperature. The operators \(\nabla , \nabla \cdot \) and \(\triangle \) denote the horizontal gradient, divergence and Laplacian, respectively. The partial derivative with respect to the vertical direction is denoted by \(\partial _z\).

The boundary \(\partial \Omega \) consists of a lateral boundary \(\Gamma _s\), a bottom boundary \(\Gamma _b\) and a surface boundary \(\Gamma _u\), i.e., \(\partial \Omega =\Gamma _s\cup \Gamma _b\cup \Gamma _u\). The corresponding boundary conditions are

$$\begin{aligned}&on\ \Gamma _b:=\{(x,y,z)\in \bar{\Omega }:z=-h \}: \mathbf{v}=0, w=0, \end{aligned}$$
(2)
$$\begin{aligned}&\quad \qquad \qquad [\nabla _3\theta ]\cdot n_3=0,\ \end{aligned}$$
(3)
$$\begin{aligned}&on\ \Gamma _s:=\{(x,y,z)\in \bar{\Omega }:(x,y)\in \partial M \}: \ \mathbf{v}=0, w=0, \end{aligned}$$
(4)
$$\begin{aligned}&\quad \qquad \qquad [\nabla _3\theta ]\cdot \vec {n}=0,\ \end{aligned}$$
(5)
$$\begin{aligned}&on\ \Gamma _u:=\{(x,y,z)\in \bar{\Omega }:z=0 \}:\ \partial _z\mathbf{v}=h\tau , w=0, \end{aligned}$$
(6)
$$\begin{aligned}&\quad \qquad \qquad [\nabla _3\theta ]\cdot n_3=-k_\theta (\theta -\theta ^*),\ \end{aligned}$$
(7)

where \(\tau \) is a given 2D wind stress field, \(\vec {n}\) the normal vector at \(\Gamma _s\), \(n_3\) the vertical Cartesian unit vector, and \(\theta ^*\) is a given temperature field at the sea surface. The velocity \(\mathbf{v}\) and the potential temperature \(\theta \) can be modified at \(\Gamma _s\) by adding a \(\tau \)- and \(\theta ^*\)-dependent contribution such that we can assume without loss of generality homogeneous boundary conditions at the ocean surface (see Lions et al. 1992, Section 2.4., Cao and Titi 2007 p. 248, Cao and Titi 2003, p. 202)

$$\begin{aligned} \partial _z\mathbf{v}=0 \ \text { and }\ \partial _z\theta =0,\quad \text {on }\Gamma _u. \end{aligned}$$
(8)

The mathematical analysis of primitive equations dates back to Lions et al. (1992), where the existence of global in time weak solutions was proven for square integrable initial conditions. Weak solutions satisfy the regularity properties \((\mathbf{v},\theta )\in L^\infty ([0,T], {\mathcal {L}^{2}}(\Omega ))\cap L^2([0,T], {\mathcal {H}^{1}}(\Omega ))\)Footnote 1. The uniqueness of weak solutions remains still an open question (for weak solutions see also remark 1 in Section 2). The local existence of strong solutions was shown by Guillén-González et al. (2001). In a seminal paper, Cao and Titi (2007) have proven the global existence and the uniqueness of strong solutions for arbitrary initial data in the Sobolev space \({\mathcal {H}^{1}}\). Strong solutions possess more regularity than weak solutions, namely \((\mathbf{v},\theta )\in L^\infty ([0,T], {\mathcal {H}^{1}}(\Omega ))\cap L^2([0,T], {\mathcal {H}^{2}}(\Omega ))\). The notion of “strong solution” is introduced in below Definition 2.1. The well-posedness result Cao and Titi (2007) uses Neumann boundary conditions for velocity on top and bottom of the domain, and a slip boundary condition at lateral sides. A different proof was given by Kobelkov (2007). This breakthrough has been followed by well-posedness results in \(H^1\) for Neumann boundary conditions at the surface and Dirichlet boundary conditions at the bottom and at the lateral boundary by Kukavica and Ziane (2007). The existence of an attractor of the primitive equations was shown in Chueshov (2014) and Ju (2007). The well-posedness of the primitive equations for \(L^p\)-spaces is proven in Hieber and Kashiwabara (2016). In this work, we use Dirichlet boundary conditions at lateral sides and bottom of the domain and Neumann boundary conditions at the surface, because these are the most common boundary conditions in ocean general circulation models but our results translate to the other boundary conditions mentioned above.

The \({\mathcal {H}^{1}}\)-regularity forms the mathematical basis for the specification of the data assimilation problem. We state now the central theorem for later reference. The space \(\mathcal {V}\) occurring in the Theorem consists of pairs \((\mathbf{v},\theta )\) of velocity and temperature in \({\mathcal {H}^{1}}(\Omega )\) that satisfy the boundary conditions (2)–(7) and is defined in Sect. 2 [see (16)].

Theorem 1.1

(Global Well-Posedness (Cao and Titi 2007; Kukavica and Ziane 2007)) Let a time interval [0, T ], with \( {T} > 0\), be given. Let the forcing satisfy \(F_{\mathbf{v}}, F_\theta \in L^2([0, {T}], {\mathcal {L}^{2}}(\Omega ))\). If the initial conditions for velocity and temperature satisfy \((\mathbf{v}_0, \theta _0) \in \mathcal {V}\), then there exists a unique strong solution \((\mathbf{v}, \theta )\in C([0, {T} ], {\mathcal {H}^{1}}(\Omega ))\cap L^2[0, {T} ], {\mathcal {H}^{2}}(\Omega ))\) of the system (1) on the interval [0, T ] that depends continuously on the initial data.

The regularity of solutions of the primitive equations, described in Theorem 1.1, still leaves room for a broad spectrum of spatiotemporal scales that comprises local and sudden events with large gradients such as fronts. For an analysis of the admissible scales in terms of Nusselt, Rayleigh and Reynolds numbers, we refer to Gibbon and Holm (2011, 2013).

1.2 The Data Assimilation Problem

We now proceed by describing the data assimilation problem for the primitive equations. Let \(\mathcal {X}=L^\infty ([0, T ], {\mathcal {H}^{1}}(\Omega ))\) be the “state space” of strong solutions of the primitive equations with initial conditions \(\mathcal {X}_0={\mathcal {H}^{1}}(\Omega )\). Let time distributed observations \(\psi _{\mathrm{obs}}\) are given. In general, the observations reside in an “observational space” \(\mathcal {Y}\) and an observation operator H maps \(\psi _{\mathrm{obs}}\in \mathcal {Y}\) in the model space \(\mathcal {X}\). In this work, we assume that the observations are already mapped to the model space, i.e., \(\psi _{\mathrm{obs}}\in \mathcal {X}\).

The data assimilation problem consists in determining, for given observations \(\psi _{\mathrm{obs}}\in \mathcal {X}\) an initial condition \(\bar{\psi }_0\in \mathcal {X}_0\) such that

$$\begin{aligned} \begin{aligned}&\mathcal {J}(\bar{\psi }_0,\psi _\mathrm{obs}) = \min _{\psi _0\in \mathcal {X}_0}\mathcal {J}(\psi _0,\psi _\mathrm{obs})\\ \text { and }\quad&\\&\text {the trajectory } \psi (\bar{\psi }_0)\in \mathcal {X}\ \textit{ satisfies the primitive equations}\ (1)\\&\textit{ with initial condition } \bar{\psi }_0. \end{aligned} \end{aligned}$$
(9)

The cost functional \(\mathcal {J}\) in (9) is defined as a sum of a background and an observational term

$$\begin{aligned} \begin{aligned} \mathcal {J}(\psi _0,\psi _\mathrm{obs}):= \mathcal {J}_b(\psi _0) +\mathcal {J}_\mathrm{obs}(\psi _0,\psi _\mathrm{obs}). \end{aligned} \end{aligned}$$
(10)

The observational part \(\mathcal {J}_\mathrm{obs}\) of the cost functional measure the distance between the model state and the observations and is defined by

$$\begin{aligned} \begin{aligned} \mathcal {J}_\mathrm{obs}(\psi _0,\psi _\mathrm{obs})&:=\int _T \big <\mathcal {R}(\mathcal {M}[\psi _0]-\psi _\mathrm{obs}),\mathcal {M}[\psi _0]-\psi _\mathrm{obs}\big >_{\mathcal {X}} \mathrm{d}t. \end{aligned} \end{aligned}$$
(11)

Here, \(\mathcal {M}\) denotes the model operator that advances an initial condition \(\psi _0\) to the model state \(\psi (t)\) at time t, i.e., \(\psi (t)=\mathcal {M}[\psi _0](t)\). The observational error covariance operator \(\mathcal {R}\) provides a statistical weighting of the model-observation misfit, according to the quality of the observations. The scalar product in (11) acts on the spatial variable. The background term \(\mathcal {J}_b\) of the cost functional in (10) is defined by

$$\begin{aligned} \begin{aligned} \mathcal {J}_b(\psi _0):=&\mathcal {J}_b(\psi _0,\psi _\mathrm{back}):= \big <\mathcal {B}(\psi _0-\psi _\mathrm{back}),\psi _0-\psi _\mathrm{back}\big >_{\mathcal {X}_0}, \end{aligned} \end{aligned}$$
(12)

where \(\psi _\mathrm{back}\in \mathcal {X}_0\) is a given background state such as the outcome of a previous forecast and \(\mathcal {B}\) denotes the model error covariance operator. The background term \(\mathcal {J}_b\) incorporates prior information about the system but it can also be interpreted as a Tikhonov regularization of \(\mathcal {J}\). The model error covariance operator \(\mathcal {B}\) provides a weighting according to the estimated model error.

As a consequence of the nonlinearity of the primitive equations, encoded in the model operator \(\mathcal {M}\) in (11), we can for the data assimilation problem (9) not expect a global minimum but at most multitude of local minima. In principle, the uniqueness of the optimization can be enforced by weighting the background and observational parts of the cost functional appropriately but this may come at the expense of discarding the observational information [see Theorem 7 in Korn (2009) for a result in this respect].

1.3 Overview of Results

The relevance of the primitive equations for atmosphere/ocean science and climate research together with the fact that these equations constitute one of the rare examples of a 3D fluid dynamical equation with a global well-posedness result poses the question how this knowledge can be incorporated into a 4D-var data assimilation algorithm. We advocate a formulation of the variational cost functional (10) that reflects the regularity of the underlying dynamical equations. This philosophy has already been demonstrated in the context of an idealized coupled atmosphere-ocean model (Korn 2019).

The classical formulation of 4D-Var based on the \({\mathcal {L}^{2}}\)-norm in space and with initial conditions in \({\mathcal {L}^{2}}(\Omega )\) is associated with the notion of weak solutions. These solutions describe the dynamics in the \(L^2\)-space. For weak solutions the uniqueness as well as the continuous dependency on the initial conditions is unknown but for the adjoint approach of variational data assimilation not only continuity but differentiability with respect to the initial condition is required. The situation changes completely if we employ the notion of strong solutions of the primitive equations, for which uniqueness and continuous dependency of the solution on the initial condition are established.

In view of the data assimilation problem strong solutions require optimal initial conditions with respect to the observations in the space \({\mathcal {H}^{1}}(\Omega )\). These initial conditions generates dynamics in \(L^\infty ([0,{T}], {\mathcal {H}^{1}})\cap L^2([0,{T}], {\mathcal {H}^{2}}(\Omega ))\) and consequently we measure the distance to observations in this space, i.e., we use in the background term \(\mathcal {J}_b\) of the cost functional (10) the \({\mathcal {H}^{1}}\)-norm and in the observational term \(\mathcal {J}_\mathrm{obs}\) the \(L^2([0,{T}], {\mathcal {H}^{1}}(\Omega ))\)-norm. This imposes stronger constraints through the higher derivatives than the \(L^2\)-norm. The \(L^2([0,{T}], {\mathcal {H}^{2}}(\Omega ))\)-norm can not be used in the observational term of the cost functional, because the degree of the spatial derivative in this term is linked via the Gateaux-derivative of the cost functional to a differential operator that is part of the forcing of the adjoint equations. The second-order derivative in the spatial \({H^{2}}\)-norm would result in an adjoint forcing that is no longer square-integrable. For details, we refer to Theorem 4.4 and remark 2 in Sect. 4.2.

In Lemma 3.2, we prove that strong solutions are differentiable with respect to the initial conditions. For strong solutions of the primitive equations, we prove in Theorem 4.1 that local minimizers of the cost functional exists. Theorem 4.4 gives a first-order necessary adjoint condition for these minimizers that provides the basis for their computation by means of an adjoint based optimizations. For strong solutions, we prove the convergence of a steepest-descent algorithm in Theorem 5.3, provided one uses a suitable starting point for the iterative process. Pure Newton methods that allow global convergence have never been used for realistic applications in numerical weather prediction or ocean state estimation due to their prohibitive computational costs. Variants such as quasi-Newton methods are not considered in this paper.

Our formulation of the variational data assimilation for the primitive equations does only modify the optimization framework. Affected are the forcing of the adjoint equation and the gradient that results from an integration of the adjoint equation, while the dynamics of the primitive equations remain untouched. The overall algorithmic structure of an existing 4D-Var code is also left intact. The choice of the \({\mathcal {H}^{1}}\)-norm in the background term results for a computational gradient-based algorithm in a filter of the gradient through an inverse Helmholz operator. The \(L^2([0,{T}], {\mathcal {H}^{1}}(\Omega ))\)-norm in the observational term implies a modification of the forcing of the adjoint equation with a Helmholtz operator applied to the model-data-misfit. Both of these changes are easy to integrate into an existing 4D-Var algorithm (see Fig.1).

The impact of regularity-tailored cost functionals on the quality as well as on the performance of the 4D-Var algorithm has to be assessed in numerical simulations. In particular the potential of the method for controlling turbulent flows has to be investigated. This comprises the comparison of the quality of the \({\mathcal {H}^{1}}\)- with the \({\mathcal {L}^{2}}\)-4D-Var data assimilation and the potential to resolve issues of (\({\mathcal {L}^{2}}\)-) 4D-Var such as strong adjoint sensitivities or large number of local minima (see, e.g., Hoteit et al. 2005, (Köhl and Willebrand 2002; Lorenc 2003). The experimental investigation of these topics is beyond the scope of this paper.

2 Functional Analytic Setup

Let us recall the following basic notation

$$\begin{aligned} \begin{aligned}&\bullet \vec {v}:=(\mathbf{v},w)\ \text {velocity field with horizontal velocity } \mathbf{v}:=(u,v) \text { and vertical velocity } w\\&\bullet \theta \ \text {temperature}\\&\bullet \ \nabla :=(\partial _x,\partial _y),\ \nabla _3:=(\partial _x,\partial _y, \partial _z)\\&\bullet \ \nabla \cdot \mathbf{v}: =\partial _x u+\partial _y v, \ \nabla _3\cdot \mathbf{v}=\partial _x u+\partial _y v+ \partial _z w \end{aligned} \end{aligned}$$
(13)

We define the following fundamental function spaces for velocity and temperature

$$\begin{aligned} \begin{aligned} \tilde{{V_{\mathbf{v}}}}:&=\left\{ \mathbf{v}\in (\mathbf{C}^\infty (\bar{\Omega }))^2 :\ \mathbf{v}\text { satisfies boundary condition } (2),(4), (6)\right. \\&\left. \text { and }\nabla \cdot \mathbf{v}+\partial _z w=0 \right\} ,\\ \tilde{{V_{\theta }}}:&=\left\{ \theta \in C^\infty (\bar{\Omega }) :\theta \text { satisfies boundary condition } (3),(5), (7) \right\} . \end{aligned} \end{aligned}$$
(14)

Denote by \({V_{\mathbf{v}}}\), \({V_{\theta }}\) the closure of \(\tilde{{V_{\mathbf{v}}}}\) in the Sobolev space \(\mathbf{H}^{1}(\Omega )\) and of \(\tilde{{V_{\theta }}}\) in \({H^{1}}(\Omega )\). The norm of \({V_{\mathbf{v}}}\) and \({V_{\theta }}\) is

$$\begin{aligned} \begin{aligned}&||\mathbf{v}||_{\mathbf{H}^{1}}^2:=\int _\Omega |\mathbf{v}(x,y,z)|^2\mathrm{d}x\mathrm{d}y\mathrm{d}z+\int _\Omega |\nabla _3\mathbf{v}(x,y,z)|^2\mathrm{d}x\mathrm{d}y\mathrm{d}z,\\ \text {and}\quad&||\theta ||_{{H^{1}}}^2:=\int _\Omega |\theta (x,y,z)|^2\mathrm{d}x\mathrm{d}y\mathrm{d}z+\int _\Omega |\nabla _3\theta (x,y,z)|^2\mathrm{d}x\mathrm{d}y\mathrm{d}z. \end{aligned} \end{aligned}$$
(15)

The product space by

$$\begin{aligned} \begin{aligned} \mathcal {V}:={V_{\mathbf{v}}}\times {V_{\theta }}. \end{aligned} \end{aligned}$$
(16)

is equipped with the sum of the norms \(||\cdot ||_{{\mathcal {H}^{1}}}:=||\cdot ||_{\mathbf{H}^{1}}+||\cdot ||_{{H^{1}}}\). We denote the \(L^2\)-norm in the velocity and temperature space by \(||\cdot ||_{\mathbf{L}^{2}}\) and \(||\cdot ||_{{L^{2}}}\), and in the velocity-temperature product space by \(||\cdot ||_{{\mathcal {L}^{2}}}:=||\cdot ||_{\mathbf{L}^{2}}+||\cdot ||_{{L^{2}}}\). The \(H^2\)-norm in the velocity-temperature product space is denoted by \(||\cdot ||_{{\mathcal {H}^{2}}}:=||\cdot ||_{\mathbf{H}^{2}}+||\cdot ||_{{H^{2}}}\).

In (1), the vertical velocity w is determined by the constraint (1c) and can by virtue of the boundary conditions be expressed as

$$\begin{aligned} w(x,y,z,t)=-\int _{-z}^0\nabla \cdot \mathbf{v}(x,y,\xi ,t)\, \mathrm{d}\xi =-\nabla \cdot \int _{-z}^0 \mathbf{v}(x,y,\xi ,t)\, \mathrm{d}\xi . \end{aligned}$$
(17)

The pressure term can with the hydrostatic balance be reformulated as

$$\begin{aligned} p(x,y,z,t)=\int _{-h}^z \theta (x,y,\xi ,t)\, \mathrm{d}\xi +p_s(x,y,t), \end{aligned}$$
(18)

where \(p_s\) is the surface pressure and has to be determined.

Definition 2.1

(Strong Solution) (i) Let initial conditions \((\mathbf{v}_0,\theta _0)\in \mathcal {V}\) be given, and let [0, T], \({T}>0\), be a time interval. The vector \((\mathbf{v},\theta )\) is called a strong solution of (1) on [0, T] with boundary conditions (2)–(8) if

$$\begin{aligned} \begin{aligned}&\mathbf{v}\in C([0,{T}],{V_{\mathbf{v}}})\cap L^2([0,{T}], \mathbf{H}^{2}(\Omega )),\\&\theta \in C([0,{T}],{V_{\theta }})\cap L^2([0,{T}],{H^{2}}(\Omega )),\\&\partial _t \mathbf{v}\in L^2([0,{T}], \mathbf{L}^{2}(\Omega )),\\&\partial _t \theta \in L^2([0,{T}], L^2(\Omega )), \end{aligned} \end{aligned}$$

and if it satisfies in the sense of distributions for all \(\Phi \in (C^\infty (\bar{\Omega }))^2\) and \(\phi \in C^\infty (\bar{\Omega })\)

$$\begin{aligned} \int _\Omega&\partial _t \mathbf{v}\cdot \Phi \, \mathrm{d}x +\int _\Omega [(\mathbf{v}\cdot \nabla ) \mathbf{v}]\cdot \Phi \, \mathrm{d}x+\int _\Omega f\vec {k}\times \mathbf{v}\Phi \, \mathrm{d}x\mathrm{d}y\mathrm{d}z \nonumber \\&-\int _\Omega \left[ \left( \nabla \cdot \int _{-h}^z \mathbf{v}(x,y,\xi ,t)\, \mathrm{d}\xi \right) \partial _z \mathbf{v}\right] \cdot \Phi \, \mathrm{d}x\mathrm{d}y\mathrm{d}z \nonumber \\&-\int _\Omega \left[ \nabla \int _{-h}^z g\rho (x,y,\xi ,t)\,\mathrm{d}\xi \right] \cdot \Phi \, \mathrm{d}x\mathrm{d}y\mathrm{d}z +\int _{\Gamma _u}(\nabla p_s)\cdot \Phi \,\mathrm{d}x\mathrm{d}y\mathrm{d}z\nonumber \\&+\int _\Omega \frac{1}{Re_1}\nabla \mathbf{v}\cdot \nabla \Phi + \frac{1}{Re_2}\partial _z \mathbf{v}\cdot \partial _z \Phi \, \mathrm{d}x\mathrm{d}y\mathrm{d}z =F_{\mathbf{v}}, \end{aligned}$$
(19a)
$$\begin{aligned} \int _\Omega&\partial _t \theta \phi \,\mathrm{d}x\mathrm{d}y\mathrm{d}z + \int _\Omega (\mathbf{v}\cdot \nabla ) \theta \phi \,\mathrm{d}x\mathrm{d}y\mathrm{d}z\nonumber \\&-\int _\Omega \left[ \nabla \cdot \int _{-h}^0 \mathbf{v}(x,y,\xi ,t)\, \mathrm{d}\xi \right] \partial _z \theta \, \phi \,\mathrm{d}x\mathrm{d}y\mathrm{d}z \nonumber \\&+\frac{1}{Rt_1}\int _\Omega \nabla _3 \theta \cdot \nabla _3\phi \,\mathrm{d}x\mathrm{d}y\mathrm{d}z +\frac{1}{Rt_2}\int _\Omega \partial _z\theta \partial _z\phi \,\mathrm{d}x\mathrm{d}y\mathrm{d}z=F_{\theta },\nonumber \\ \text {with }&\ \rho :=\rho (\theta ):=\rho _0-\alpha (\theta -\theta _0), \end{aligned}$$
(19b)

where w denotes the vertical velocity, f the Coriolis parameter and \(\rho _0,\theta _0>0\) are a mean density and a mean temperature, respectively, and \(\alpha >0\) the thermal expansion coefficient, \(F_{\mathbf{v}}, F_{\theta }\) denote the external forcing. The numbers \(Re_1,Re_2>0\) denote the horizontal and vertical Reynolds number that represent the viscosity coefficients. The numbers \(Rt_1,Rt_2>0\) are the horizontal and vertical mixing coefficients for temperature.

(ii) We define the state vector \(\psi \) of the primitive equation as pair of velocity and temperature variables

$$\begin{aligned} \psi :=(\mathbf{v},\theta ), \end{aligned}$$
(20)

where \((\mathbf{v},\theta )\) denote a strong solution of the primitive equation. Furthermore, we define the nonlinear terms as follows:

$$\begin{aligned} \mathcal {B}(\psi ,\psi ):=&(\mathcal {B}_v (\mathbf{v},\mathbf{v}),\mathcal {B}_\theta (\theta ,\theta )) \end{aligned}$$
(21)
$$\begin{aligned} \text {with }\quad&\mathcal {B}_v(\mathbf{v},\mathbf{v}):= (\mathbf{v}\cdot \nabla ) \mathbf{v}-\left( \nabla \cdot \int _{-h}^z \mathbf{v}(x,y,\xi ,t)\, \mathrm{d}\xi \right) \partial _z \mathbf{v}, \end{aligned}$$
(22)
$$\begin{aligned}&\mathcal {B}_\theta (\theta ,\theta ):=(\mathbf{v}\cdot \nabla ) \theta - \left( \nabla \cdot \int _{-h}^z \mathbf{v}(x,y,\xi ,t)\, \mathrm{d}\xi \right) \partial _z \theta . \end{aligned}$$
(23)

The dissipation operators are denoted by

$$\begin{aligned} D\psi :=&(D_v \mathbf{v},D_\theta \theta ), \end{aligned}$$
(24)
$$\begin{aligned} \text {with }\quad&D_v\mathbf{v}:=-\frac{1}{Re_1}\triangle \,\mathbf{v}- \frac{1}{Re_2}\partial ^2_{zz} \mathbf{v}, \end{aligned}$$
(25)
$$\begin{aligned}&D_\theta :=-\frac{1}{Rt_1}\triangle \,\theta - \frac{1}{Rt_2}\partial ^2_{zz} \theta . \end{aligned}$$
(26)

The linear terms are given by

$$\begin{aligned}&L\psi :=(L_v \mathbf{v},0) \end{aligned}$$
(27)
$$\begin{aligned} \text {with }&L_v\mathbf{v}:=f\vec {k}\times \mathbf{v}+\int _{-h}^z\theta (x,y,\xi ,t)\,\mathrm{d}\xi )+\nabla p_s. \end{aligned}$$
(28)

With this notation, we write the primitive equations (1) in the form

$$\begin{aligned} \partial _t\psi +\mathcal {B}(\psi ,\psi )+L\psi +D\psi =0, \end{aligned}$$
(29)

and interpret this equation in the distributional sense.

Remark 1

(Weak Solutions) Weak solutions of the primitive equations are defined with less regularity requirements. They satisfy (19) in the sense of distributions but with \((\mathbf{v},\theta )\in L^\infty ([0,{T}],{\mathcal {L}^{2}}(\Omega ))\cap L^2([0,{T}],{\mathcal {H}^{1}}(\Omega ))\). The global existence of weak solutions with initial data in \((\mathbf{v}_0,\Theta _0)\in {\mathcal {L}^{2}}(\Omega )\) has been shown in Lions et al. (1992). The uniqueness is still open. Uniqueness of weak solutions with initial conditions in the space of continuous functions was proven in Kukavica et al. (2014).

We recall a set of inequalities that are used in the sequel. The Ladyshenzkaya inequalities in \(\mathbb {R}^2\) are valid for \(\phi \in H^1(M)\) and for a non-dimensional constant \(C_0\)

$$\begin{aligned} \begin{aligned}&||\phi ||_{L^4(M)}\le C_0||\phi ||_{L^2(M)}^{1/2}||\phi ||_{H^1(M)}^{1/2}. \end{aligned} \end{aligned}$$
(30)

In \(\mathbb {R}^3\), it holds for \(u\in H^1(\Omega )\)

$$\begin{aligned} \begin{aligned}&||u||_{L^3(\Omega )}\le C_0||u||_{L^2(\Omega )}^{1/2}||u||_{H^1(\Omega )}^{1/2}, \\&||u||_{L^6(\Omega )}\le C_0||u||_{H^1(\Omega )}. \end{aligned} \end{aligned}$$
(31)

Lemma 2.2

(Cao and Titi 2003) Let \(u=(u_1,u_2)\in H^2(\Omega )\) and \(f,g\in H^1(\Omega )\). Then

$$\begin{aligned} \begin{aligned}&\left| \int _\Omega \left( \nabla \cdot \int _{-h}^z u(x,y,\xi ,t)\, \mathrm{d}\xi \right) fg\, \mathrm{d}x\mathrm{d}y\mathrm{d}z\right| \\&\le C||f||_{L^2}||u||_{H^1}^{1/2}|| u||_{H^2}^{1/2} ||g||_{L^2}^{1/2}||g||_{H^1}^{1/2}. \end{aligned}\end{aligned}$$

Lemma 2.3

(Gronwall Inequality) Let f be an absolutely continuous function that satisfies If \(\frac{\mathrm{d}f}{\mathrm{d}t}\le gf+h\) for some real integrable functions g(t) and h(t). Then,

$$\begin{aligned} f(t)\le f(0)\exp \left( \int _0^tg(s)\, \mathrm{d}s\right) +\int _0^t h(s)\exp \left( \int _s^t g(y)\, \mathrm{d}y\right) \, \mathrm{d}s. \end{aligned}$$

3 Linearized and Adjoint Primitive Equations, and Differentiability

In this section, we prove regularity results for the linearized and adjoint primitive equations and establish the differentiability of the model solution with respect to the initial condition.

3.1 Linearized Equations and Differentiability

We linearize the primitive model equations around a strong solution \(\psi =(\mathbf{v},\theta )\) of (1). The resulting equations are also referred to as “tangent linear model.” The linearized equations in terms of velocity and temperature variables \((\mathbf{V}, {\Theta })\) are given by

$$\begin{aligned}&\frac{\partial \mathbf{V}}{\partial t} +f\vec {k}\times \mathbf{V}+(\mathbf{V}\cdot \nabla ) \mathbf{v}-\nabla \cdot \int _{-h}^z \mathbf{v}(x,y,\xi ,t)\, \mathrm{d}\xi \, \partial _z \mathbf{V}\nonumber \\&\quad +(\mathbf{v}\cdot \nabla )\mathbf{V}-\nabla \cdot \int _{-h}^z \mathbf{V}(x,y,\xi ,t)\, \mathrm{d}\xi \, \partial _z \mathbf{v}-\nabla \left( \int _{-h}^z \Pi (x,y,\xi ,t)\, \mathrm{d}\xi + P_S\right) \nonumber \\&\quad -\frac{1}{Re_1}\triangle \,\mathbf{V}- \frac{1}{Re_2}\partial ^2_{zz} \mathbf{V}=F_{\mathbf{V}}, \end{aligned}$$
(32a)
$$\begin{aligned}&\nabla \cdot \mathbf{V}+\partial _z {W}=0, \end{aligned}$$
(32b)
$$\begin{aligned}&\frac{\partial {\Theta }}{\partial t}+ \nabla \cdot (\mathbf{V}\theta ) +\partial _z(\mathbf{V}\theta )+\nabla \cdot (\mathbf{v}{\Theta })+\partial _z(\mathbf{v}{\Theta }) -\frac{1}{Rt_1}\triangle {\Theta }- \frac{1}{Rt_2}\partial ^2_{zz} {\Theta }=F_{\Theta }, \end{aligned}$$
(32c)
$$\begin{aligned}&\text {with surface pressure } P_S \text { and with the equation of state given by}\nonumber \\&\Pi :=\rho _0-\alpha ({\Theta }-\theta _0). \end{aligned}$$
(32d)

The boundary conditions for (32) are analogous to the boundary condition (2)–(8) for the primitive equations (1) and read as follows:

$$\begin{aligned}&on\ \Gamma _b:=\{(x,y,z)\in \bar{\Omega }:z=-h \}:\ \mathbf{V}=0, W=0, \end{aligned}$$
(33)
$$\begin{aligned}&\quad \qquad \qquad [\nabla _3{\Theta }]\cdot n_3=0,\ \end{aligned}$$
(34)
$$\begin{aligned}&on\ \Gamma _s:=\{(x,y,z)\in \bar{\Omega }:(x,y)\in \partial M \}: \mathbf{V}=0, W=0, \end{aligned}$$
(35)
$$\begin{aligned}&\quad \qquad \qquad [\nabla _3{\Theta }]\cdot \vec {n}=0,\ \end{aligned}$$
(36)
$$\begin{aligned}&on\ \Gamma _u:=\{(x,y,z)\in \bar{\Omega }:z=0 \}:\ \partial _z\mathbf{V}=0, W=0, \end{aligned}$$
(37)
$$\begin{aligned}&\quad \qquad \qquad [\nabla _3{\Theta }]\cdot n_3=0. \end{aligned}$$
(38)

We write the linearized primitive equations for \(\Psi =({V},{\Theta })\) in the following form

$$\begin{aligned} \begin{aligned}&\partial _t \Psi +\mathcal {B}'[\psi ](\Psi ) +L\Psi +D\Psi =F, \end{aligned} \end{aligned}$$
(39)

where D and L are defined in (24), (27), and where \(\mathcal {B}'[\psi ](\Psi ):=(\mathcal {B}^{'}_v[v]({V}),\mathcal {B}^{'}_\theta [\theta ]({\Theta })\), with

$$\begin{aligned} \begin{aligned} \mathcal {B}_v^{'}[v](\mathbf{V})&:= (\mathbf{v}\cdot \nabla \mathbf{V}) + (\mathbf{V}\cdot \nabla \mathbf{v}) -\left( \nabla \cdot \int _{-h}^z \mathbf{v}(x,y,\xi ,t)\, \mathrm{d}\xi \right) \partial _z \mathbf{V}\\&\qquad -\left( \nabla \cdot \int _{-h}^z \mathbf{V}(x,y,\xi ,t)\, \mathrm{d}\xi \right) \partial _z \mathbf{v},\\ \mathcal {B}_\theta ^{'}[v]({\Theta })&:= (\mathbf{v}\cdot \nabla {\Theta }) + (\mathbf{V}\cdot \nabla \theta ) -\left( \nabla \cdot \int _{-h}^z \mathbf{v}(x,y,\xi ,t)\, \mathrm{d}\xi \right) \partial _z {\Theta }\\&\qquad -\left( \nabla \cdot \int _{-h}^z \mathbf{V}(x,y,\xi ,t)\, \mathrm{d}\xi \right) \partial _z \theta . \end{aligned} \end{aligned}$$
(40)

Theorem 3.1

(Regularity of linearized equations) Let \(\psi :=(\mathbf{v},\theta )\) be a strong solution of (1) on the time interval [0, T], \(T>0\). Suppose \(F:=(F_{\mathbf{V}}, F_{\Theta })\in L^2([0,{T}], {\mathcal {L}^{2}}(\Omega ))\) is given. If the initial conditions for the linearized primitive equations satisfy \(\Psi _0:=(\mathbf{V}_0,{\Theta }_0)\in \mathcal {V}\), then there exists a solution \(\Psi :=(\mathbf{V},{\Theta })\) of the linearized primitive equations (32) with the following properties,

$$\begin{aligned} \begin{aligned}&\mathbf{V}\in C([0,{T}], {V_{\mathbf{v}}})\cap L^2([0,{T}], \mathbf{H}^{2}(\Omega )),\\&{\Theta }\in C([0,{T}], {V_{\theta }})\cap {L^{2}}([0,{T}], {H^{2}}(\Omega )),\\&\partial _t\mathbf{V}\in L^2([0,{T}], \mathbf{L}^{2}(\Omega )),\\&\partial _t{\Theta }\in L^2([0,{T}], {L^{2}}(\Omega )). \end{aligned} \end{aligned}$$
(41)

The solution \(\Psi \) satisfies for \(t\in [0,T]\)

$$\begin{aligned} ||\Psi (t)||_{{\mathcal {H}^{1}}}^2= & {} ||\mathbf{V}(t)||_{\mathbf{H}^{1}}^2+||{\Theta }(t)||_{H^1}^2 \nonumber \\\le & {} [||\mathbf{V}_0||_{\mathbf{H}^{1}}^2+||{\Theta }_0||_{H^1}^2]\ \exp \left\{ \int _0^t K_1(s)+K_2(s)\, \mathrm{d}s\right\} \\+ & {} \,\,c\int _0^t (||F_{\mathbf{V}}(s)||_{\mathbf{L}^{2}}^2 + ||F_{\Theta }(s)||_{L^2}^2) \exp \left\{ \int _0^s K_1(y)+K_2(y)\, \mathrm{d}y\right\} \, \mathrm{d}s, \nonumber \end{aligned}$$
(42)

and

$$\begin{aligned}&\int _0^t||\mathbf{V}(s)||_{\mathbf{H}^{2}}^2+||{\Theta }(s)||_{H^2}^2\mathrm{d}s \nonumber \\&\quad \le [||\mathbf{V}_0||_{\mathbf{H}^{1}}^2+||{\Theta }_0||_{H^1}^2]+\int _0^t( K_1(s)+K_2(s))(||\mathbf{V}(s)||_{\mathbf{H}^{1}}^2+||{\Theta }(s)||_{H^1}^2 )\, \mathrm{d}s\} \nonumber \\&\quad +\,\int _0^t (||F_{\mathbf{V}}(s)||_{\mathbf{L}^{2}}^2 + ||F_{{\Theta }}(s)||_{L^2}^2)\, \mathrm{d}s, \end{aligned}$$
(43)

where

$$\begin{aligned} \begin{aligned} K_1(s)&:= ||\nabla \mathbf{v}||_{\mathbf{L}^{2}}||\nabla \mathbf{v}||_{\mathbf{H}^{1}}+|| \mathbf{v}||_{\mathbf{H}^{1}}^4 +||\nabla \, \partial _z \mathbf{v}||_{\mathbf{L}^{2}}^2||\partial _z \mathbf{v}||_{\mathbf{L}^{2}}^2+1\\&\qquad +||\nabla \, \mathbf{v}||_{\mathbf{L}^{2}}^2||\Delta \, \mathbf{v}||_{\mathbf{L}^{2}}^2 +||\nabla \mathbf{v}||_{\mathbf{H}^{1}}^2 +||\partial _z\nabla \mathbf{v}||_{\mathbf{L}^{2}}^4,\\ K_2(s)&:=||\theta ||_{H^1}^4+||\nabla \, \partial _z \theta ||_{L^2}^2||\partial _z \theta ||_{L^2}^2. \end{aligned} \end{aligned}$$
(44)

Proof

The proof is given in Appendix 1. \(\square \)

The regularity result of the linearized primitive equations in Theorem 3.1 is instrumental for the differentiability result of the next Lemma.

Lemma 3.2

Let \(\psi _0=(\mathbf{v}_0,\theta _0)\in \mathcal {V}\) and a time interval [0, T], \({T}>0\) be given. The mapping \(\psi _0\mapsto \psi (t;\psi _0)\) from \(\mathcal {V}\) into \(L^2([0,{T}],{\mathcal {H}^{1}}(\Omega ))\) that assigns to an initial condition \(\psi _0\) the solution \(\psi (t;\psi _0):=\mathcal {M}[\psi _0](t)\) of the primitive equations (1) at time \(t\in (0,{T}]\) has a Gateaux derivative \(\frac{D\psi }{D\psi _0}\mathbf{a}\) in every direction \(\mathbf{a}=(a_v,a_\theta )\in \mathcal {V}\). Furthermore, \(\frac{D\psi }{D\psi _0}\mathbf{a}\) solves the linearized equations (32) with initial condition \(\psi (t_0)=\mathbf{a}\) and forcing \(F=(F_\mathbf{V},F_{\Theta })=0\).

Proof

Let \(\mathbf{a}:=(a_v,a_\theta )\in \mathcal {V}\). Denote by \(\psi _0, \psi _0+\tau \mathbf{a}\in {\mathcal {H}^{1}}(\Omega )\), \(\tau >0\), two initial conditions and by \(\psi \) and \(\psi _{\tau \mathbf{a}}\) the corresponding strong solutions of the primitive equations (1). They satisfy the equations

$$\begin{aligned} \begin{aligned}&\frac{\partial \psi }{\partial t} +\mathcal {B}(\psi ,\psi ) +L\psi +D\psi =0, \end{aligned}\end{aligned}$$

and

$$\begin{aligned} \begin{aligned}&\frac{\partial \psi _{\tau \mathbf{a}}}{\partial t} +\mathcal {B}(\psi _{\tau \mathbf{a}},\psi _{\tau \mathbf{a}}) +L\psi _{\tau \mathbf{a}} +D\psi _{\tau \mathbf{a}} =0. \end{aligned}\end{aligned}$$

Let \(\Psi \) be the solution of the linearized equations (32), which are linearized around \(\psi \),

$$\begin{aligned} \begin{aligned}&\frac{\partial \Psi }{\partial t} +\mathcal {B}'[\psi ](\Psi ) +L\Psi +D\Psi =0, \end{aligned}\end{aligned}$$

with zero forcing, initial condition \(\Psi (t_0)=\mathbf{a}\) and boundary conditions (33)–(38). In order to prove the lemma, we have show that

$$\begin{aligned} \begin{aligned}&y(\tau ):=\psi _{\tau \mathbf{a}}-\psi -\tau \Psi \\ \text {satisfies }\&\lim _{\tau \rightarrow 0}\frac{||y(\tau )||_{L^2([0,{T}],{\mathcal {H}^{1}})}}{|\tau |} =0. \end{aligned} \end{aligned}$$
(45)

From the definition of y follows that it solves the equation

$$\begin{aligned} \frac{\mathrm{d}y}{\mathrm{d}t} +\mathcal {B}(\psi _{\tau h},\psi _{\tau \mathbf{a}}) -\mathcal {B}(\psi ,\psi ) -\mathcal {B}'[\psi ](\tau \Psi ) +L y +D y =0, \end{aligned}$$
(46)

with initial condition \(y_0=0\). We introduce \(k=(k_u,k_\theta )\) as follows:

$$\begin{aligned} k:=\mathcal {B}(\psi ,\psi )-\mathcal {B}(\psi _{\tau \mathbf{a}},\psi _{\tau \mathbf{a}}) + \mathcal {B}'[\psi ](\psi _{\tau \mathbf{a}}-\psi ). \end{aligned}$$

The components of \(k=(k_v,k_\theta )\) read as follows:

$$\begin{aligned} \begin{aligned} k_v:=&(\mathbf{v}\cdot \nabla )\mathbf{v}-(\mathbf{v}_{\tau \mathbf{a}}\cdot \nabla )\mathbf{v}_{\tau \mathbf{a}} +(\mathbf{v}\cdot \nabla )(\mathbf{v}_{\tau \mathbf{a}}-\mathbf{v}) +((\mathbf{v}_{\tau \mathbf{a}}-\mathbf{v})\cdot \nabla )\mathbf{v}\\ =&\left( (\mathbf{v}_{\tau \mathbf{a}}-\mathbf{v})\cdot \nabla \right) (\mathbf{v}_{\tau \mathbf{a}}-\mathbf{v}),\\ k_\theta :=&(\mathbf{v}_{\tau \mathbf{a}}-\mathbf{v})\cdot \nabla (\theta _{\tau \mathbf{a}}-\theta ). \end{aligned} \end{aligned}$$

Then, Eq. (46) for y becomes

$$\begin{aligned} \frac{\mathrm{d}y}{\mathrm{d}t}+\mathcal {B}'[\psi ](y) +L y +D y = k. \end{aligned}$$
(47)

Note that (47) is a linearized primitive equation with initial condition \(y_0=0\) and an external forcing k that depends on the model solution itself. We prove now the following three inequalities: there exist constants \(K,C>0\) such that

$$\begin{aligned} \begin{aligned} (i)&\ \int _0^T||y(t)||^2_{{\mathcal {H}^{1}}}\mathrm{d}t\le K\int _0^T ||k||_{{\mathcal {L}^{2}}}^2\mathrm{d}t,\\ (ii)&\ ||k_v||_{\mathbf{L}^{2}}\le C||\mathbf{v}-\mathbf{v}_{\tau a_v}||_{\mathbf{H}^{2}}||\mathbf{v}-\mathbf{v}_{\tau a_v}||_{\mathbf{H}^{1}}\\ (iii)&\ ||k_\theta ||_{L^2}\le C||\mathbf{v}-\mathbf{v}_{\tau a_\theta }||_{\mathbf{H}^{2}}||\theta _{\tau a_{\theta }}-\theta ||_{{H^{1}}}. \end{aligned} \end{aligned}$$
(48)

For (i) follows from (175) in the proof of Theorem 3.1 (see Appendix 1) after integration with respect to time over a time interval [0, t], with \(t\in [0,{T}]\), with forcing \(F_{V}=k_v,F_\Theta =k_\theta \), that

$$\begin{aligned} \begin{aligned} \int _0^t||y(\tau )||_{{\mathcal {H}^{1}}}^2\, d\tau&\le K(t) \int _0^t ||k_v(\tau )||_{\mathbf{L}^{2}}^2+||k_\theta ||_{{L^{2}}}^2d\tau . \end{aligned} \end{aligned}$$

where K(t) is a bounded function on [0, T]. This proves assertion (i). For assertion (ii), we derive with the Agmon inequality

$$\begin{aligned} \begin{aligned} ||k_v||_{\mathbf{L}^{2}}&=||\left( (\mathbf{v}_{\tau a_v}-\mathbf{v})\cdot \nabla \right) (\mathbf{v}_{\tau a_v}-\mathbf{v})||_{\mathbf{L}^{2}}\\&\le C||\mathbf{v}-\mathbf{v}_{\tau a_v}||_{\mathbf{H}^{2}}||\mathbf{v}-\mathbf{v}_{\tau a_v}||_{\mathbf{H}^{1}}. \end{aligned} \end{aligned}$$
(49)

Analogously, we obtain

$$\begin{aligned} \begin{aligned} ||k_\theta ||_{{L^{2}}}&\le C||\mathbf{v}-\mathbf{v}_{\tau a_v}||_{\mathbf{H}^{2}}||\theta _{\tau a_{\theta }}-\theta ||_{{H^{1}}}. \end{aligned} \end{aligned}$$
(50)

This proves (ii) and (iii). By combining (i)–(iii), we conclude that

$$\begin{aligned} \int _0^T||y(t)||_{{\mathcal {H}^{1}}}^2\mathrm{d}t \le C\int _0^T( ||\mathbf{v}-\mathbf{v}_{\tau a_v}||_{\mathbf{H}^{1}}^2+||\theta _{\tau a_\theta }-\theta ||_{{H^{1}}}^2 ) ||\mathbf{v}_{\tau a_v}-\mathbf{v}||_{\mathbf{H}^{2}}^2\mathrm{d}t.\nonumber \\ \end{aligned}$$
(51)

Denote the difference between the two strong solutions by \(\hat{\mathbf{v}}:=\mathbf{v}_{\tau a_v}-\mathbf{v}\) and \(\hat{\theta }:=\theta _{\tau a_\theta }-\theta \). According to Theorem 1.1 we have \((\hat{\mathbf{v}},\hat{\theta })\in C([0,{T}], {\mathcal {H}^{1}}(\Omega ))\cap L^2([0,{T}], {\mathcal {H}^{2}}(\Omega ))\). Furthermore \((\hat{\mathbf{v}},\hat{\theta })\) solves the equations

$$\begin{aligned} \begin{aligned}&\frac{\partial \hat{\mathbf{v}}}{\partial t} +(\hat{\mathbf{v}}\cdot \nabla )\mathbf{v}_{\tau a_v} +(\mathbf{v}\cdot \nabla ) \hat{\mathbf{v}} + \hat{w}\partial _z \mathbf{v}_{\tau a_v} +w\partial _z \hat{\mathbf{v}} +\nabla \hat{p} +f\vec {k}\times \hat{\mathbf{v}}\\&\quad -\frac{1}{Re_1}\triangle \, \hat{\mathbf{v}} - \frac{1}{Re_2}\partial ^2_{zz} \hat{\mathbf{v}}=0,\\&\partial _z \hat{p} -g\hat{\Pi }=0, \\&\nabla \cdot \mathbf{v}+\partial _zw=0,\\&\frac{\partial \hat{\theta }}{\partial t} + \hat{\mathbf{v}} \cdot \nabla \theta _{\tau a_\theta } + \mathbf{v}\cdot \nabla \hat{\theta } + \hat{w}\partial _z \theta _{\tau a_\theta } +w\partial _z \hat{\theta } -\frac{1}{Rt_1}\triangle \hat{\theta } - \frac{1}{Rt_2}\partial ^2_{zz}\hat{\theta },\\&\hat{\Pi }=-\alpha \hat{{\Theta }}, \end{aligned} \end{aligned}$$

with initial condition \(\hat{\mathbf{v}}(t_0)=\tau a_v\), \(\hat{\theta }(t_0)=\tau a_\theta \). This equation has a similar structure as the linearized equation (32) with a zero forcing term and homogeneous boundary conditions. Using the regularity properties of the two strong solutions \((\mathbf{v}_{\tau a_v},\theta _{\tau a_\theta }), (\mathbf{v},\theta )\), we can repeat all steps that lead to (175) (cf. Appendix 1), and this inequality implies

$$\begin{aligned} \begin{aligned} ||\hat{\mathbf{v}}(t)||_{\mathbf{H}^{1}}^2+ ||\hat{\theta }(t)||_{{H^{1}}}^2&\le C(t)\left( ||\hat{\mathbf{v}}_0||_{\mathbf{H}^{1}}^2+||\hat{\theta }_0||_{{H^{1}}}^2\right) \\&=C(t)\tau ^2\left( ||a_v||_{\mathbf{H}^{1}}^2+||a_\theta ||_{{H^{1}}}^2\right) , \end{aligned} \end{aligned}$$
(52)

with C(t) bounded on [0, T]. Inserting this in (51) yields

$$\begin{aligned} \begin{aligned} \int _0^T||y(t)||_{{\mathcal {H}^{1}}}^2\mathrm{d}t&\le C(t)\tau ^2\left( ||a_v||_{\mathbf{H}^{1}}^2+||a_\theta ||_{{H^{1}}}^2\right) \int _0^T ||\mathbf{v}_{\tau a_v}-\mathbf{v}||_{\mathbf{H}^{2}}^2\mathrm{d}t. \end{aligned} \end{aligned}$$
(53)

This proves (45), because \(\mathbf{v}_{\tau a_v}-\mathbf{v}\in L^2([0,{T}], \mathbf{H}^{2}(\Omega )^2)\). With the definition of the Gateaux derivative it follows that the solution w of the linearized equations satisfies \(w(\mathbf{a})=(\mathcal {D} \Psi / \mathcal {D}\psi _0)\mathbf{a}.\) \(\square \)

3.2 Adjoint Primitive Equations and Duality Relation

In this section, we study properties of the adjoint primitive equations. Adjoint equations allow to formulate a first-order necessary conditions for minima of the data assimilation cost functional and this condition provides the basis for the actual computation of minima within an optimization algorithm.

The adjoint equations are derived by choosing taking the \(L^2([0,{T}], L^2(\Omega ))\)-scalar product of the linearized primitive equations with a so-called “adjoint state variable\(\widetilde{\Psi }:=(\widetilde{\mathbf{{V}}}, \widetilde{{\Theta }}):=(\widetilde{{U}}, \widetilde{{V}}, \widetilde{{\Theta }})\) . Then, all differential operators are moved from the linear variables to the adjoint variables by means of integration-by-parts. More details are given in Appendix 1 for more details. For an extensive treatment of adjoint equations, we refer to Marchuk et al. (1996).

The resulting adjoint primitive equations are given by

$$\begin{aligned} \begin{aligned}&-\partial _t\widetilde{{U}}+u\partial _x \widetilde{{U}}+\widetilde{{U}}\partial _x u+ v\partial _y\widetilde{{U}}+\widetilde{{U}}\partial _x v +w\partial _z \widetilde{{U}}+\widetilde{{W}}\partial _zu +f\widetilde{{U}}\\&+\widetilde{{U}}\partial _x\theta +\theta \partial _x\widetilde{{\Theta }}+D_v(\widetilde{{U}})=\tilde{F}_{\widetilde{{U}}},\\&-\partial _t\widetilde{{V}}+\widetilde{{V}}\partial _y u+v\partial _y \widetilde{{V}}+ \widetilde{{V}}\partial _y v+u\partial _x\widetilde{{V}}+w\partial _z \widetilde{{V}}+\widetilde{{W}}\partial _zv -f\widetilde{{V}}\\&+ \widetilde{{V}}\partial _y\theta +\theta \partial _y\widetilde{{\Theta }}+D_v(\widetilde{{V}})=\tilde{F}_{\widetilde{{V}}}, \\&\nabla \cdot \widetilde{\mathbf{{V}}}+\partial _z \tilde{W}=0,\\&-\partial _t\widetilde{{\Theta }}+ \mathbf{v}\cdot \nabla _3\widetilde{{\Theta }}+\int _{-h}^z\nabla \cdot \widetilde{\mathbf{{V}}}(x,y,z')\, \mathrm{d}z'+D_\theta (\widetilde{{\Theta }})=\tilde{F}_{\widetilde{{\Theta }}}, \end{aligned} \end{aligned}$$
(54)

where \(D_v(f):=-\frac{1}{Re_1}\triangle \,f - \frac{1}{Re_2}\partial ^2_{zz} f\) for \(f\in \{\widetilde{{U}},\widetilde{{V}}\}\), and \(D_\theta (\widetilde{{\Theta }}):=-\frac{1}{Rt_1}\triangle \,\widetilde{{\Theta }}- \frac{1}{Rt_2}\partial ^2_{zz} \widetilde{{\Theta }}\). Furthermore does \(\tilde{F}:=(\tilde{F}_{\widetilde{{U}}},\tilde{F}_{\widetilde{{V}}},\tilde{F}_{\widetilde{{\Theta }}})\) denote a forcing term.

For \(\widetilde{\Psi }:=(\widetilde{{U}}, \widetilde{{V}}, \widetilde{{\Theta }})\), we write the adjoint primitive equations in the form

$$\begin{aligned} \begin{aligned}&-\partial _t \widetilde{\Psi }+\mathcal {B}^{'*}[\psi ](\widetilde{\Psi }) +L\widetilde{\Psi }+D\widetilde{\Psi }=\tilde{F}, \end{aligned} \end{aligned}$$
(55)

where \(\mathcal {B}^{'*}[\psi ](\widetilde{\Psi })=\left( \mathcal {B}^{'*}[u](\widetilde{{U}}), \mathcal {B}^{'*}[v](\widetilde{{V}}), \mathcal {B}^{'*}[\theta ](\widetilde{{\Theta }})\right) \) is given by

$$\begin{aligned} \begin{aligned}&\mathcal {B}^{'*}[\mathbf{v}](\widetilde{{U}}):=u\partial _x \widetilde{{U}}+\widetilde{{U}}\partial _x u+ v\partial _y\widetilde{{U}}+\widetilde{{U}}\partial _x v +w\partial _z \widetilde{{U}}+\widetilde{{W}}\partial _zu,\\&\mathcal {B}^{'*}[\mathbf{v}](\widetilde{{V}}):=\widetilde{{V}}\partial _y u+v\partial _y \widetilde{{V}}+ \widetilde{{V}}\partial _y v+u\partial _x\widetilde{{V}}+w\partial _z \widetilde{{V}}+\widetilde{{W}}\partial _zv,\\&\mathcal {B}^{'*}[\theta ](\widetilde{{\Theta }}):=\mathbf{v}\cdot \nabla _3\widetilde{{\Theta }}. \end{aligned} \end{aligned}$$
(56)

The operator \(L\widetilde{\Psi }:=(L_{\widetilde{{U}}}\widetilde{{U}},L_{\widetilde{{V}}}\widetilde{{V}}, L_{\widetilde{{\Theta }}}\widetilde{{\Theta }})\) is defined as

$$\begin{aligned}&L_{\widetilde{{U}}}\widetilde{{U}}:=f\widetilde{{U}}+\widetilde{{U}}\partial _x\theta +\widetilde{{\Theta }}\partial _x\theta ,\nonumber \\&L_{\widetilde{{V}}}\widetilde{{V}}:=-f\widetilde{{V}}+ \widetilde{{V}}\partial _y\theta +\widetilde{{\Theta }}\partial _y\theta ,\nonumber \\&L_{\widetilde{{\Theta }}}\widetilde{{\Theta }}:=\int _{-h}^z\nabla \cdot \widetilde{{U}}(x,y,z')\, \mathrm{d}z'. \end{aligned}$$
(57)

The initial conditionFootnote 2 of the adjoint equation is specified at \(t=T\) and given by \((\widetilde{{U}}(t=T),\widetilde{{V}}(t=T), \widetilde{{\Theta }}(t=T))=(\widetilde{{U}}_T,\widetilde{{V}}_T, \widetilde{{\Theta }}_T)\), the boundary conditions for (54) coincide with the corresponding boundary conditions (33)–(38) of the linearized equations and read as follows:

$$\begin{aligned}&on\ \Gamma _b:=\{(x,y,z)\in \bar{\Omega }:z=-h \}:\ \widetilde{\mathbf{{V}}}=0, \tilde{W}=0, \end{aligned}$$
(58)
$$\begin{aligned}&\quad \qquad \qquad [\nabla _3\widetilde{{\Theta }}]\cdot n_3=0,\ \end{aligned}$$
(59)
$$\begin{aligned}&on\ \Gamma _s:=\{(x,y,z)\in \bar{\Omega }:(x,y)\in \partial M \}: \widetilde{\mathbf{{V}}}=0, \end{aligned}$$
(60)
$$\begin{aligned}&\quad \qquad \qquad \tilde{W}=0, [\nabla _3\widetilde{{\Theta }}]\cdot \vec {n}=0,\ \end{aligned}$$
(61)
$$\begin{aligned}&on\ \Gamma _u:=\{(x,y,z)\in \bar{\Omega }:z=0 \}:\ \partial _z\widetilde{\mathbf{{V}}}=0, \tilde{W}=0, \end{aligned}$$
(62)
$$\begin{aligned}&\quad \qquad \qquad [\nabla _3\widetilde{{\Theta }}]\cdot n_3=0. \end{aligned}$$
(63)

The following result about the adjoint primitive equations follows immediately from the previous result on the linearized equations and the definition of the adjoint equations.

Theorem 3.3

(Regularity of Adjoint Equations) Let \(\psi :=(v,\theta )\) be a strong solution of (1) on the time interval [0, T], \(T>0\). Suppose \(F:=(F_{\mathbf{V}}, F_{\Theta })\in L^2([0,{T}], {\mathcal {L}^{2}}(\Omega ))\) is given. If the initial conditions for the adjoint linearized primitive equations, specified at \(t=T\), satisfy \(\widetilde{\Psi }_T:=(\widetilde{\mathbf{{V}}}_T,\widetilde{{\Theta }}_T)\in \mathcal {V}\), then there exists a solution \(\widetilde{\Psi }:=(\widetilde{\mathbf{{V}}},\widetilde{{\Theta }})\) of the adjoint linearized primitive equations (54) with the following properties:

$$\begin{aligned} \begin{aligned}&\widetilde{\mathbf{{V}}}\in C([0,{T}], {V_{\mathbf{v}}})\cap L^2([0,{T}], \mathbf{H}^{2}(\Omega )),\\&\widetilde{{\Theta }}\in C([0,{T}], {V_{\theta }})\cap L^2([0,{T}], {H^{2}}(\Omega )),\\&\partial _t\widetilde{\mathbf{{V}}}\in L^2([0,{T}], \mathbf{L}^{2}(\Omega )),\\&\partial _t\widetilde{{\Theta }}\in L^2([0,{T}], {L^{2}}(\Omega )). \end{aligned} \end{aligned}$$
(64)

The solution \(\widetilde{\Psi }\) satisfies for \(t\in [0,T]\)

$$\begin{aligned} \begin{aligned}&||\widetilde{\Psi }(t)||_{{\mathcal {H}^{1}}}=||\widetilde{\mathbf{{V}}}(t)||_{\mathbf{H}^{1}}^2+||\widetilde{{\Theta }}(t)||_{{H^{1}}}^2\\&\le [||\widetilde{\mathbf{{V}}}_T||_{\mathbf{H}^{1}}^2+||\widetilde{{\Theta }}_T||_{{H^{1}}}^2] \exp \{\int _t^T K_1(s)+K_2(s)\, \mathrm{d}s\}\\&+\,c\int _t^T(||\tilde{F}_{\widetilde{\mathbf{{V}}}}(s)||_{\mathbf{L}^{2}}^2 + ||\tilde{F}_{\widetilde{{\Theta }}}(s)||_{{L^{2}}}^2) \exp \{\int _0^s K_1(y)+K_2(y)\, \mathrm{d}y\}\, \mathrm{d}s, \end{aligned} \end{aligned}$$
(65)

and

$$\begin{aligned}&\int _t^T||\widetilde{\mathbf{{V}}}(s)||_{\mathbf{H}^{2}}^2+||\widetilde{{\Theta }}(s)||_{{H^{2}}}^2\mathrm{d}s\nonumber \\&\quad \le [||\widetilde{\mathbf{{V}}}_0||_{\mathbf{H}^{1}}^2+||\widetilde{{\Theta }}_0||_{{H^{1}}}^2]+\int _0^t( K_1(s)+K_2(s))(||\widetilde{\mathbf{{V}}}(s)||_{\mathbf{H}^{1}}^2+||\widetilde{{\Theta }}(s)||_{{H^{1}}}^2 )\, \mathrm{d}s\}\nonumber \\&\quad +\int _t^T (||\tilde{F}_{\widetilde{\mathbf{{V}}}}(s)||_{\mathbf{L}^{2}}^2 + ||\tilde{F}_{\widetilde{{\Theta }}}(s)||_{{L^{2}}}^2)\, \mathrm{d}s, \end{aligned}$$
(66)

where

$$\begin{aligned} K_1(s):= & {} ||\nabla \mathbf{v}||_{\mathbf{L}^{2}}||\nabla \mathbf{v}||_{\mathbf{H}^{1}}+|| \mathbf{v}||_{\mathbf{H}^{1}}^4 +||\nabla \, \partial _z \mathbf{v}||_{\mathbf{L}^{2}}^2||\partial _z \mathbf{v}||_{\mathbf{L}^{2}}^2+1\nonumber \\&+||\nabla \, \mathbf{v}||_{\mathbf{L}^{2}}^2||\Delta \, \mathbf{v}||_{\mathbf{L}^{2}}^2 +||\nabla \mathbf{v}||_{\mathbf{H}^{1}}^2 +||\partial _z\nabla \mathbf{v}||_{\mathbf{L}^{2}}^4,\nonumber \\ K_2(s):= & {} ||\theta ||_{{H^{1}}}^4+||\nabla \, \partial _z \theta ||_{{L^{2}}}^2||\partial _z \theta ||_{{L^{2}}}^2. \end{aligned}$$
(67)

Proof

The proof follows essentially from the corresponding result on the linearized equations, i.e., Theorem 3.1, and the duality relation between linearized and adjoint equations. \(\square \)

The next lemma follows from the definition of linearized and adjoint equations and integration by parts.

Lemma 3.4

(Adjoint Relation) Let \(F,\tilde{F}\in L^2([0,T], {\mathcal {L}^{2}}(\Omega ))\) be the forcing of the linearized equations and the adjoint equations, respectively. By \(\Psi \) and \(\tilde{\Psi }\), we denote the variables of the linear and adjoint equations. Then,

$$\begin{aligned} \big<F,\tilde{\Psi }\big>_{L^2([0,T]{\mathcal {L}^{2}})}=\big <\tilde{F},\Psi \big >_{L^2([0,T]{\mathcal {L}^{2}})} - \int _\Omega \Psi \cdot \tilde{\Psi }|_0^T\, \mathrm{d}x. \end{aligned}$$

4 Solvability and Computation of the Data Assimilation Problem

4.1 Existence of Local Minima of the Data Assimilation Problem

Based on the regularity results of the previous section, we formulate now the specific data assimilation cost functional for the primitive equations. The model dynamics that we are considering consist of trajectories in \(C([0,{T}],{\mathcal {H}^{1}}(\Omega ))\) and are emerging from initial conditions in \({\mathcal {H}^{1}}(\Omega )\). Let observations \(\psi _\mathrm{obs}\in C([0,{T}],{\mathcal {H}^{1}}(\Omega ))\) be given that reside in the same space as the dynamics of the ocean primitive equations. Suppose furthermore that a background guess \(\psi _\mathrm{back}\in {\mathcal {H}^{1}}(\Omega )\) at initial time is given. We define the cost functional by

$$\begin{aligned} \begin{aligned} \mathcal {J}(\psi _0)&:= \mathcal {J}_b(\psi _0)+\mathcal {J}_\mathrm{obs}(\psi _0). \end{aligned} \end{aligned}$$
(68)

The background term is given by

$$\begin{aligned} \begin{aligned} \mathcal {J}_b(\psi _0)&:= \big <\mathcal {B}(\psi _0-\psi _\mathrm{back}),\psi _0-\psi _\mathrm{back}\big >_{{\mathcal {H}^{1}}}\\&= \sum _{|\alpha |\le 1}\int _\Omega \mathcal {D}^\alpha \mathcal {B}(\psi _0-\psi _\mathrm{back})\cdot \mathcal {D}^\alpha (\psi _0-\psi _\mathrm{back})\, \mathrm{d}x\mathrm{d}y\mathrm{d}t, \end{aligned} \end{aligned}$$
(69)

where \(\mathcal {D}^\alpha \) denotes a derivative with multi-index \(\alpha \). The observational term is defined as

$$\begin{aligned} \begin{aligned}&\mathcal {J}_\mathrm{obs}(\psi _0) :=\int _0^{{T}} \big <\mathcal {R}(\mathcal {M}[\psi _0](t)-\psi _\mathrm{obs}(t)),\mathcal {M}[\psi _0](t)-\psi _\mathrm{obs}(t) \big >_{{\mathcal {H}^{1}}}\,\mathrm{d}t\\&= \sum _{|\alpha |\le 1}\int _0^{{T}}\int _\Omega \mathcal {D}^\alpha \mathcal {R}\left( \mathcal {M}[\psi _0](t)-\psi _\mathrm{obs}(t)\right) \cdot \mathcal {D}^\alpha \left( \mathcal {M}[\psi _0](t)-\psi _\mathrm{obs}(t)\right) \,\mathrm{d}x\mathrm{d}y\mathrm{d}z\mathrm{d}t. \end{aligned} \end{aligned}$$
(70)

The model error covariance operator \(\mathcal {B}\) provides a weighting according to the estimated background error and is assumed to be linear, bounded and positive definite. We do not address here the important problem of modeling error statistics but assume these error covariance operators as given. Furthermore, we assume that the error covariance operators preserve the functional space of model solutions and observations, i.e., we make the

$$\begin{aligned} \begin{aligned} \mathbf{General} Assumption on \mathcal {B}, \mathcal {R}:\&F\in \mathcal {X}\text { implies }\ \mathcal {B}F\in \mathcal {X} \text { and } \mathcal {R}F \in \mathcal {X}. \end{aligned} \end{aligned}$$
(71)

We note that the notion of strong solution, given in Definition 2.1, contains the \(L^2([0,T], {\mathcal {H}^{2}}(\Omega )\)-norm for the regularity of solutions to the primitive equations, while the observational part \(\mathcal {J}_\mathrm{obs}\) of our cost functional in (70) imposes only the weaker \(L^2([0,T], {\mathcal {H}^{1}}(\Omega )\)-norm. The reason for this is the adjoint characterization of optimal initial conditions in Theorem 4.4. Details are explained in Remark 2 after Theorem 4.4 is stated. The proof of Theorem 4.1 does also work if the stronger \(L^2([0,T], {\mathcal {H}^{2}}(\Omega )\)-norm are used in \(\mathcal {J}_\mathrm{obs}\). In the proof, we use the regularity that is given by the well-posedness Theorem 1.1 and not the regularity that is implied by the boundedness of the observational part of the cost functional. We state this as a separate results in Corollary 4.2.

Theorem 4.1

(Optimal Initial Conditions) Let observations \(\psi _\mathrm{obs}\in C([0,{T}], \mathcal {V})\) and a background guess \(\psi _\mathrm{back}\in {\mathcal {H}^{1}}(\Omega )\) be given. Then, there exist minimizers \(\bar{\psi }_0=(\mathbf{v}_0,\theta _0)\in \mathcal {V}\) of the data assimilation problem (9) with cost functional given by (68). We refer to these minimizers also as optimal initial conditions for the primitive equations (1).

Proof

Let \((\psi _{0,n})_n=(\mathbf{v}_{0,n},\theta _{0,n})_n\subseteq \mathcal {V}\subseteq {\mathcal {H}^{1}}(\Omega )\) be a minimizing sequence of initial conditions for the data assimilation problem. We denote by \(\psi _{n}:=\mathcal {M}[\psi _{0,n}]\in C([0,{T}], \mathcal {V})\cap L^2([0,{T}], {\mathcal {H}^{2}}(\Omega ))\) the corresponding solutions of the primitive equations (1). The model-observation difference satisfies \((\mathcal {M}[\psi _0]-\psi _\mathrm{obs})\in C([0,{T}], {\mathcal {H}^{1}}(\Omega ))\). Since the model error covariance operator \(\mathcal {R}\) preserves the space (cf. (71)), it follows that \(\mathcal {R}(\mathcal {M}[\psi _0]-\psi _\mathrm{obs})\in C([0,{T}], {\mathcal {H}^{1}}(\Omega ))\cap L^2([0,{T}], {\mathcal {H}^{2}}(\Omega ))\), and from (70), we infer that the cost functional is well-defined. (68) follows that the sequence of initial conditions \((\psi _{0,n})_n\) is bounded in \({\mathcal {H}^{1}}(\Omega )\), i.e., there exists a \(c>0\) such that uniformly for all \(n\in \mathbb {N}\)

$$\begin{aligned} \begin{aligned} ||\psi _{0,n}||_{{\mathcal {H}^{1}}}\le c. \end{aligned} \end{aligned}$$
(72)

From Theorem 1.1, we see that the sequence of associated solutions \((\psi _{n})_n=(v_{n},\theta _{n})_n\) is bounded in \(C([0,{T}],\mathcal {V})\cap L^2([0,{T}], {\mathcal {H}^{2}}(\Omega ))\), in particular there exists a \(C>0\) such that for all \(n\in \mathbb {N}\)

$$\begin{aligned} \begin{aligned} \sup _{t\in [0,T]}||\psi _n(t)||_{{H^{1}}}\le C\quad \text {and}\quad \int _0^{{T}}||\psi _n(t)||_{{\mathcal {H}^{2}}}^2\mathrm{d}t\le C. \end{aligned} \end{aligned}$$
(73)

Since \({H^{2}}(\Omega )\) is compactly embedded in \({H^{1}}(\Omega )\), we conclude that a subsequence, still denoted \((\psi _{n})_n=(\mathbf{v}_{0,n},\theta _{0,n})_n\), exists and a limit \(\bar{\psi }=(\bar{v},{\bar{{\Theta }}})\), such that \((\psi _{n})_n\) converges to \(\bar{\psi }\) weakly in \(L^2([0,{T}],{\mathcal {H}^{2}}(\Omega ))\) and strongly in \(L^2([0,{T}], {\mathcal {H}^{1}}(\Omega ))\).

We show now that the limit \(\bar{\psi }(t)\) is a strong solution of the primitive equations, i.e., \(\bar{\psi }=\mathcal {M}[\bar{\psi }_0]\). Consider the equation (19) in Definition 2.1 of strong solutions of (1) for the elements of the sequence \(\psi _n=(\mathbf{v}_n,\theta _n)_n\) with initial conditions \((\psi _{0,n})_n=(\mathbf{v}_{0,n},\theta _{0,n})_n\). We integrate over the time interval [0, T], use that \((\partial _t \mathbf{v}_n,\partial _t\theta _n)\in L^2([0,T], {\mathcal {L}^{2}}(\Omega ))\), and obtain for all \(\Phi \in \mathbf{C}^\infty (\Omega )^2\) and \(\phi \in C^\infty (\Omega )\)

$$\begin{aligned}&\int _\Omega (\mathbf{v}_n(T)-\mathbf{v}_n(0))\cdot \Phi \, \mathrm{d}x +\int _0^{{T}}\int _\Omega [ (\mathbf{v}_n\cdot \nabla ) \mathbf{v}_n]\cdot \Phi \, \mathrm{d}x \mathrm{d}t\nonumber \\&\quad +\int _0^{{T}}\int _\Omega f\vec {k}\times \mathbf{v}_n\cdot \Phi \, \mathrm{d}x\mathrm{d}y\mathrm{d}z\mathrm{d}t \nonumber \\&\quad -\int _0^{{T}}\int _\Omega \left( \int _{-h}^z \mathbf{v}_n(x,y,\xi ,t)\, \mathrm{d}\xi \right) \, \cdot \partial _z \mathbf{v}_n\, (\nabla \cdot \Phi )\, \mathrm{d}x\mathrm{d}y\mathrm{d}z\mathrm{d}t\nonumber \\&\quad +\int _0^{{T}}\int _{\Gamma _u}(\nabla p_{s,n})\cdot \Phi \,\mathrm{d}x\mathrm{d}y\mathrm{d}z\mathrm{d}t \nonumber \\&\quad -\int _0^{{T}}\int _\Omega \left( \nabla \int _{-h}^z \rho (\theta _n)(x,y,\xi ,t)\,\mathrm{d}\xi \right) \cdot \Phi \, \mathrm{d}x\mathrm{d}y\mathrm{d}z\mathrm{d}t \nonumber \\&\quad + \frac{1}{Re_1}\int _0^{{T}}\int _\Omega \nabla \mathbf{v}_n\cdot \nabla \Phi \, \mathrm{d}x\mathrm{d}y\mathrm{d}z\mathrm{d}t + \frac{1}{Re_2}\int _0^{{T}}\int _\Omega \partial _\mathbf{v}v_n\cdot \partial _z \Phi \, \mathrm{d}x\mathrm{d}y\mathrm{d}z\mathrm{d}t =0, \nonumber \\&\nabla \cdot \mathbf{v}_n+\partial _zw_n=0,\nonumber \\&\int _0^{{T}}\int _\Omega \partial _t \theta _n\phi \,\mathrm{d}x\mathrm{d}y\mathrm{d}z\mathrm{d}t + \int _0^{{T}}\int _\Omega (\mathbf{v}_n\cdot \nabla ) \theta _n \phi \,\mathrm{d}x\mathrm{d}y\mathrm{d}z\mathrm{d}t\nonumber \\&\quad -\int _0^{{T}}\int _\Omega \left( \nabla \cdot \int _{-h}^0 \mathbf{v}_n(x,y,\xi ,t)\, \mathrm{d}\xi \right) (\partial _z \theta _n)\phi \,\mathrm{d}x\mathrm{d}y\mathrm{d}z\mathrm{d}t \nonumber \\&\quad +\frac{1}{Rt_1}\int _0^{{T}}\int _\Omega \nabla _3 \theta _n\cdot \nabla _3\phi \,\mathrm{d}x\mathrm{d}y\mathrm{d}z\mathrm{d}t +\frac{1}{Rt_2}\int _0^{{T}}\int _\Omega \partial _z\theta _n\partial _z\phi \,\mathrm{d}x\mathrm{d}y\mathrm{d}z\mathrm{d}t\nonumber \\&\quad +k_\theta \int _0^{{T}}\int _{\Gamma _u}\theta _n\phi \, \mathrm{d}x\mathrm{d}y\mathrm{d}t=0, \end{aligned}$$
(74a)

where \(\rho (\theta _n)=\rho _0-\alpha (\theta _n-\theta _0)\) (cf. (1e)). Since \((\psi _{n})_n\) converges strongly in \(L^2([0,{T}],{\mathcal {H}^{1}}(\Omega ))\) to \(\bar{\psi }\in L^2([0,T],{\mathcal {H}^{1}}(\Omega ))\) there exists a subset \([0,T^{'}]\subset [0,{T}]\), such that \([0,{T}]\setminus [0, T^{'}]\) has Lebesgue measure zero and \((v_n(t),\theta _n(t))_n\) converges to \((\bar{v}(t),{\bar{\theta }}(t))\) in \({\mathcal {H}^{1}}\) for all \(t\in [0, T^{'}]\). This implies for almost every \(t_0,t_1\in [0,T]\) as \(n\rightarrow \infty \)

$$\begin{aligned} \begin{aligned}&\int _\Omega \left( \mathbf{v}_n(t_1)-\mathbf{v}_n(t_0) -\bar{\mathbf{v}}(t_1)+\bar{\mathbf{v}}(t_0)\right) \cdot \Phi \, \mathrm{d}x\mathrm{d}y\mathrm{d}z\rightarrow 0. \end{aligned} \end{aligned}$$
(75)

With the convergence of \((\psi _{n})_n\) to \(\bar{\psi }\) in \(L^2([0,{T}],{\mathcal {H}^{1}}(\Omega ))\) the following convergence properties follows

$$\begin{aligned}&\int _0^{{T}}\int _\Omega f\vec {k}\times (\mathbf{v}_n-\bar{\mathbf{v}})\cdot \Phi \, \mathrm{d}x\mathrm{d}y\mathrm{d}z\mathrm{d}t\rightarrow 0,\\&\int _0^{{T}}\int _\Omega (\nabla \int _{-h}^z \left( \rho (\theta _n)(x,y,\xi ,t)-\rho ({\bar{\theta }})(x,y,\xi ,t)\right) \,\mathrm{d}\xi )\cdot \Phi \, \mathrm{d}x\mathrm{d}y\mathrm{d}z\mathrm{d}t \rightarrow 0, \\&\frac{1}{Re_1}\int _0^{{T}}\int _\Omega \nabla (\mathbf{v}_n-\bar{\mathbf{v}})\cdot \nabla \Phi \, \mathrm{d}x\mathrm{d}y\mathrm{d}z\mathrm{d}t\rightarrow 0,\\&\frac{1}{Re_2}\int _0^{{T}}\int _\Omega \partial _z (\mathbf{v}_n-\bar{\mathbf{v}})\cdot \partial _z \Phi \, \mathrm{d}x\mathrm{d}y\mathrm{d}z\mathrm{d}t\rightarrow 0. \end{aligned}$$

For the nonlinear term in the velocity equation (74a), we obtain after integration by parts and with the inequalities of Cauchy–Schwartz, Hölder and Ladyshenzkaya (31),

$$\begin{aligned}&\left| \int _0^{{T}}\int _\Omega [ (\mathbf{v}_n\cdot \nabla ) \mathbf{v}_n +w_n\partial _z \mathbf{v}_n- (\bar{\mathbf{v}}\cdot \nabla ) \bar{\mathbf{v}}-\bar{w}\partial _z\bar{\mathbf{v}}]\cdot \Phi \, \mathrm{d}x\mathrm{d}y\mathrm{d}z \mathrm{d}t\right| \nonumber \\&\quad = \left| \int _0^{{T}}\int _\Omega [ \left( (\mathbf{v}_n-\bar{\mathbf{v}})\cdot \nabla \right) \mathbf{v}_n +(w_n-\bar{w})\partial _z \mathbf{v}_n]\cdot \Phi \, \mathrm{d}x\mathrm{d}y\mathrm{d}z \mathrm{d}t\right. \nonumber \\&\left. \qquad + \int _0^{{T}}\int _\Omega [ \bar{\mathbf{v}}\cdot \nabla )\Phi ]\cdot (\mathbf{v}_n-\bar{\mathbf{v}})+ [\bar{w}\partial _z\Phi ]\cdot (\mathbf{v}_n-\bar{\mathbf{v}})\, \mathrm{d}x\mathrm{d}y\mathrm{d}z \mathrm{d}t\right| \nonumber \\&\quad \le c\int _0^{{T}}||\mathbf{v}_n-\bar{\mathbf{v}}||_{\mathbf{L}^6} ||\nabla \mathbf{v}_n||_{L^2} ||\Phi ||_{\mathbf{L}^3}\mathrm{d}t +c\int _0^{{T}}||w_n-\bar{w}||_{L^2} ||\partial _z \mathbf{v}_n||_{\mathbf{L}^6} ||\Phi ||_{\mathbf{L}^3}\mathrm{d}t\nonumber \\&\qquad +c\int _0^{{T}}||\bar{\mathbf{v}}||_{\mathbf{L}^3}\ ||\nabla \Phi ||_{\mathbf{L}^2}\ ||\mathbf{v}_n-\bar{\mathbf{v}}||_{\mathbf{L}^6}\mathrm{d}t +c\int _0^{{T}} ||\bar{w}||_{L^3}\ ||\partial _z\Phi ||_{\mathbf{L}^2}\ ||\mathbf{v}_n-\bar{\mathbf{v}}||_{\mathbf{L}^6}\mathrm{d}t\nonumber \\&\quad \le c\int _0^{{T}}||\mathbf{v}_n-\bar{\mathbf{v}}||_{\mathbf{H}^{1}} ||\nabla \mathbf{v}_n||_{\mathbf{L}^2} ||\Phi ||_{\mathbf{H}^{1}}\mathrm{d}t +c\int _0^{{T}}||\mathbf{v}_n-\bar{\mathbf{v}}||_{\mathbf{H}^{1}} ||\partial _z \mathbf{v}_n||_{\mathbf{H}^{1}} ||\Phi ||_{\mathbf{H}^{1}}\mathrm{d}t\nonumber \\&\qquad +c\int _0^{{T}}||\bar{\mathbf{v}}||_{\mathbf{H}^{1}}\ ||\nabla \Phi ||_{\mathbf{L}^2} ||\mathbf{v}_n-\bar{\mathbf{v}}||_{\mathbf{H}^{1}}\mathrm{d}t +c\int _0^{{T}} ||\bar{\mathbf{v}}||_{\mathbf{H}^{2}}\ ||\partial _z\Phi ||_{\mathbf{L}^2}\ ||\mathbf{v}_n-\bar{\mathbf{v}}||_{\mathbf{H}^{1}}\mathrm{d}t\nonumber \\&\quad \le c||\mathbf{v}_n-\bar{\mathbf{v}}||_{L^2([0,{T}],\mathbf{H}^{1})} ||\mathbf{v}_n||_{L^2([0,{T}],\mathbf{H}^{1})} ||\Phi ||_{\mathbf{H}^{1}}\nonumber \\&\qquad +c||\mathbf{v}_n-\bar{\mathbf{v}}||_{L^2([0,{T}],\mathbf{H}^{1})} || \mathbf{v}_n||_{L^2([0,{T}],\mathbf{H}^{1})} ||\Phi ||_{\mathbf{H}^{1}}\nonumber \\&\qquad +c||\bar{\mathbf{v}}||_{L^2([0,{T}],\mathbf{H}^{1})}\ ||\nabla \Phi ||_{\mathbf{L}^2} ||\mathbf{v}_n-\bar{\mathbf{v}}||_{L^2([0,{T}],\mathbf{H}^{1})}\nonumber \\&\qquad +c||\bar{\mathbf{v}}||_{L^2([0,{T}],\mathbf{H}^{2})}\ ||\partial _z\Phi ||_{\mathbf{L}^2}\ ||\mathbf{v}_n-\bar{\mathbf{v}}||_{L^2([0,{T}],\mathbf{H}^{1})}. \end{aligned}$$
(76)

The first three terms on the right-hand side of (76) vanish due to the convergence of \((\mathbf{v}_n)_n\) to \(\bar{\mathbf{v}}\) in \(L^2([0,{T}], {\mathcal {H}^{1}}(\Omega ))\) and the boundedness of the remaining terms in the respective integral. For the last term in (76), we note that due to (73), it holds that \(||\bar{\mathbf{v}}||_{L^2([0,{T}],\mathbf{H}^{2})}\le \lim \inf _{n\rightarrow \infty }||\mathbf{v}_n||_{L^2([0,{T}],\mathbf{H}^{2})}< C\). Since \(\bar{\mathbf{v}} \in L^2([0,{T}],\mathbf{H}^{2})\) the convergence of \((\mathbf{v}_n)_n\) in \(L^2([0,T],{H^{1}}(\Omega )\) shows that the last term converges to zero. The convergence of the nonlinear terms in the temperature equations follows analogously. The pointwise convergence of \((\mathbf{v}_n)_n\) to \(\bar{\mathbf{v}}\) in \(\mathbf{H}^{1}\) for almost every \(t\in [0,T]\) implies that \(\bar{\mathbf{v}}\) satisfies the divergence-free constraint for almost every \(t\in [0,T]\). This proves that the limit \(\bar{\psi }\) is a solution of the primitive equations.

It remains to show that that \(\bar{\psi }\in C([0,T], \mathcal {V})\). We show first that \(\partial _t\bar{\psi }\in L^2([0,T],{\mathcal {L}^{2}}(\Omega )\). In order to establish this assertion, we take the \(L^2\)-scalar product of the primitive equations (1a)–(1e) for \(\psi =\bar{\psi }\) with \(\partial _t\bar{\psi }=\partial _t(\bar{\mathbf{v}},\bar{\theta })\). For the velocity equation, we obtain

$$\begin{aligned}&\int _0^T\int _\Omega |\partial _t \bar{\mathbf{v}}|^2\, \mathrm{d}x\mathrm{d}y\mathrm{d}z\mathrm{d}t\nonumber \\&\le \left| \int _0^T\int _\Omega [(\bar{\mathbf{v}}\cdot \nabla ) \bar{\mathbf{v}}]\cdot \partial _t \bar{\mathbf{v}} \, \mathrm{d}x\mathrm{d}y\mathrm{d}z\mathrm{d}t\right| +\left| \int _0^T\int _\Omega \left[ \left( \int _0^z\nabla \cdot \bar{\mathbf{v}}\,\mathrm{d}\xi \right) \partial _z \bar{\mathbf{v}}\right] \cdot \partial _t \bar{\mathbf{v}}\, \mathrm{d}x\mathrm{d}y\mathrm{d}z\mathrm{d}t\right| \nonumber \\&+\left| \int _0^T\int _\Omega \nabla \left( \int _{-h}^z g\bar{\rho }(x,y,\xi ,t)\, \mathrm{d}\xi + \bar{p}_S\right) \cdot \partial _t \bar{\mathbf{v}} \, \mathrm{d}x\mathrm{d}y\mathrm{d}z\mathrm{d}t\right| \nonumber \\&+\left| \int _0^T\int _\Omega \frac{1}{Re_1}\triangle \,\bar{\mathbf{v}}\cdot \partial _t \bar{\mathbf{v}} + \frac{1}{Re_2}\partial _{zz}^2 \bar{\mathbf{v}}\cdot \partial _t \bar{\mathbf{v}}\, \mathrm{d}x\mathrm{d}y\mathrm{d}z\mathrm{d}t\right| , \end{aligned}$$
(77)

where \(\bar{\rho }\) is the density computed from \(\bar{\theta }\) and where we omitted the external forcing for simplicity. We consider the nonlinear terms on the right-hand side. With the inequalities of Hölder, Ladyzhenskaya and Young follows

$$\begin{aligned} \begin{aligned}&\left| \int _0^T\int _\Omega [(\bar{\mathbf{v}}\cdot \nabla ) \bar{\mathbf{v}}]\cdot \partial _t \bar{\mathbf{v}} \, \mathrm{d}x\mathrm{d}y\mathrm{d}z\mathrm{d}t\right| \le \int _0^T||\bar{\mathbf{v}}||_{\mathbf{L}^3} ||\nabla \bar{\mathbf{v}}||_{\mathbf{L}^6} ||\partial _t \bar{\mathbf{v}}||_{\mathbf{L}^{2}}\mathrm{d}t\\&\quad \le \frac{c}{2\epsilon _1}\int _0^T||\bar{\mathbf{v}}||_{\mathbf{H}^{1}}^2 ||\nabla \bar{\mathbf{v}}_{\mathbf{H}^{1}}||^2\mathrm{d}t + \frac{\epsilon _1}{2} \int _{0}^T||\partial _t \bar{\mathbf{v}}||_{\mathbf{L}^{2}}^2, \end{aligned} \end{aligned}$$
(78)

where \(\epsilon _1>0\) is arbitrary. For the vertical advection follows with Lemma 2.2 and the Young inequality

$$\begin{aligned} \begin{aligned}&\left| \int _0^T\int _\Omega \left[ \left( \int _0^z\nabla \cdot \bar{\mathbf{v}}\,\mathrm{d}\xi \right) \partial _z \bar{\mathbf{v}}\right] \cdot \partial _t \bar{\mathbf{v}}\, \mathrm{d}x\mathrm{d}y\mathrm{d}z\mathrm{d}t\right| \\&\quad \le \int _0^T||\bar{\mathbf{v}}||_{\mathbf{H}^{1}}^{1/2}||\bar{\mathbf{v}}||_{\mathbf{H}^{2}}^{1/2} ||\partial _z \bar{\mathbf{v}}||_{\mathbf{H}^{1}}^{1/2} ||\partial _z \bar{\mathbf{v}}||_{\mathbf{L}^{2}}^{1/2} ||\partial _t \bar{\mathbf{v}}||_{\mathbf{L}^{2}}\mathrm{d}t\\&\quad \le \frac{c}{2\epsilon _2}\int _0^T||\bar{\mathbf{v}}||_{\mathbf{H}^{1}}||\bar{\mathbf{v}}||_{\mathbf{H}^{2}}||\partial _z \bar{\mathbf{v}}||_{\mathbf{H}^{1}} ||\partial _z \bar{\mathbf{v}}||_{\mathbf{L}^{2}} +\frac{\epsilon _2}{2} |\int _0^T|\partial _t \bar{\mathbf{v}}||_{\mathbf{L}^{2}}^2\mathrm{d}t, \end{aligned} \end{aligned}$$
(79)

with \(\epsilon _2>0\) . The remaining terms in (77) can be bounded with the inequality of Cauchy–Schwarz and Young. In each of these upper bounds, a term occurs that involves the \(L^2\)-norm of the time derivative \(\partial _t \bar{\mathbf{v}}\) multiplied by an \(\epsilon _i\) as in (78) and (79). If we choose the \(\epsilon _i\) appropriately, we can compensate the time derivative on the right-hand side and the time derivative on the left-hand side. In summary, we obtain the inequality

$$\begin{aligned} \begin{aligned}&\int _0^T|||\partial _t \bar{\mathbf{v}}||^2_{\mathbf{L}^{2}}\mathrm{d}t \le c\int _0^T||\bar{\theta }||_{{H^{1}}}^2 +||\bar{p}_s||_{{H^{1}}}^2\mathrm{d}t +c\int _0^T||\bar{\mathbf{v}}||_{\mathbf{H}^{2}}^2\mathrm{d}t\\&+c\int _0^T||\bar{\mathbf{v}}||_{\mathbf{H}^{1}}^2 ||\nabla \bar{\mathbf{v}}_{\mathbf{H}^{1}}||^2\mathrm{d}t +c\int _0^T||\bar{\mathbf{v}}||_{\mathbf{H}^{1}}||\bar{\mathbf{v}}||_{\mathbf{H}^{2}}||\partial _z \bar{\mathbf{v}}||_{\mathbf{H}^{1}} ||\partial _z \bar{\mathbf{v}}||_{\mathbf{L}^{2}}\mathrm{d}t. \end{aligned} \end{aligned}$$
(80)

Using the regularity \(\bar{\psi }=(\bar{\mathbf{v}},\bar{\theta })\in L^2([0,T], {\mathcal {H}^{2}}(\Omega ))\), we observe that the right-hand side of (80) is finite and this shows that \(\partial _t \bar{\mathbf{v}}\in L^2([0,T], \mathbf{L}^{2}(\Omega ))\). From the same arguments follows for the temperature equation that \(\partial _t \bar{\theta }\in L^2([0,T], {L^{2}}(\Omega ))\). Since \(\bar{\psi }\in L^2([0,T],{\mathcal {H}^{2}}(\Omega )\), \(\partial _t\bar{\psi }\in L^2([0,T],{\mathcal {L}^{2}}(\Omega )\) and \({\mathcal {H}^{2}}\subseteq {\mathcal {H}^{1}}\subseteq {\mathcal {L}^{2}}\) are compactly embedded it follows with the Lions–Aubins compactness lemma that \(\bar{\psi }\in C([0,T],{\mathcal {H}^{1}}(\Omega ))\). This implies that (75) as well as the divergence-free constraint are satisfied for all \(t\in [0,T]\). This proves that the limit \(\bar{\psi }\) is a solution of the primitive equations in the sense of Definition 2.1.

We show now that the initial condition \(\bar{\psi }_0\) minimizes the cost functional \(\mathcal {J}\) in (68). Lower semi-continuity of the scalar product implies for the limit \(\bar{\psi }_0\)

$$\begin{aligned} \big<\mathcal {B}(\bar{\psi }_0-\psi _\mathrm{back}),\bar{\psi }_0-\psi _\mathrm{back} \big>_{{\mathcal {H}^{1}}}&\le \lim \inf _n\big <\mathcal {B}(\psi _{0,n}-\psi _\mathrm{back}),\psi _{0,n}-\psi _\mathrm{back} \big >_{{\mathcal {H}^{1}}},\nonumber \\ \end{aligned}$$
(81)

and for the associated sequence of model solutions

$$\begin{aligned} \begin{aligned}&\int _0^{{T}}\big<\mathcal {R}(\mathcal {M}[\bar{\psi _0}]-\psi _\mathrm{obs}),\mathcal {M}[\bar{\psi _0}]-\psi _\mathrm{obs} \big>_{{\mathcal {H}^{1}}}\,\mathrm{d}t\\&\quad \le \lim \inf _n\int _0^{{T}}\big <\mathcal {R}(\mathcal {M}[\psi _{0,n}]-\psi _\mathrm{obs},\mathcal {M}[\psi _{0,n}]-\psi _\mathrm{obs}\big >_{{\mathcal {H}^{1}}}\, \mathrm{d}t. \end{aligned} \end{aligned}$$
(82)

As a consequence of (81) and (82), we have

$$\begin{aligned} \begin{aligned} \mathcal {J}(\bar{\psi }_0) \le \lim \inf _n\mathcal {J}(\psi _{0,n}). \end{aligned} \end{aligned}$$
(83)

The initial condition \(\bar{\psi }_0\) of \(\bar{\psi }\) is a minimizer of \(\mathcal {J}\), defined in (68). \(\square \)

From the proof of Theorem 4.1 follows immediately the next corollary that replaces the \({\mathcal {H}^{1}}\)-scalar product in the observational part of the cost functional by the \({H^{2}}\)-scalar product. This choice reflects the regularity of the solutions of the primitive equations. With regard to this aspect, see also Remark 2 below.

Corollary 4.2

Let observations \(\psi _\mathrm{obs}\in L^2([0,{T}], {\mathcal {H}^{2}}(\Omega ))\) and a background guess \(\psi _\mathrm{back}\in {\mathcal {H}^{1}}(\Omega )\) be given. Let the observational term \(\mathcal {J}_\mathrm{obs}\) in (70) of the cost functional (68) be defined by the \({\mathcal {H}^{2}}\)-scalar product

$$\begin{aligned} \begin{aligned}&\mathcal {J}_\mathrm{obs}(\psi _0) :=\int _0^{{T}} \big <\mathcal {R}(\mathcal {M}[\psi _0](t)-\psi _\mathrm{obs}(t)),\mathcal {M}[\psi _0](t)-\psi _\mathrm{obs}(t) \big >_{{\mathcal {H}^{2}}}\,\mathrm{d}t. \end{aligned} \end{aligned}$$
(84)

Then, there exist minimizers \(\bar{\psi }_0=(\mathbf{v}_0,\theta _0)\in {\mathcal {H}^{1}}(\Omega )\) of the data assimilation problem (9) with cost functional given by(68) and background term defined in (69) and observational term in (84).

4.2 Adjoint Characterization of Local Minima

The existence of local minima of the data assimilation problem is guaranteed by Theorem 4.1. We address now the problem of how these optimal initial conditions can be computed. For this purpose, we prove an necessary condition in terms of the adjoint equation. As preparation we need the following Lemma on the regularity of Helmholtz equations for temperature and velocity.

Lemma 4.3

(i) Let \( f\in {L^{2}}(\Omega )\) be given. The equation

$$\begin{aligned} \begin{aligned}&(Id-\triangle _3)\theta =f,\qquad \qquad (\triangle _3:=\partial ^2_{xx}+\partial ^2_{yy}+\partial ^2_{zz})\\ \end{aligned} \end{aligned}$$
(85)

with homogeneous Neumann boundary condition \(\nabla _3\theta \cdot \vec {n}=0\) at \(\partial \Omega \) has a unique solution \(\theta \in {H^{2}}(\Omega )\) that satisfies

$$\begin{aligned} \begin{aligned} ||\theta ||_{{H^{2}}}\le c||f||_{{L^{2}}}. \end{aligned} \end{aligned}$$
(86)

(ii) Let \(F\in \mathbf{L}^{2}(\Omega )^2\) be given. The equation

$$\begin{aligned} \begin{aligned}&(Id-\Delta _3)\mathbf{v}=F,\qquad \qquad (\Delta _3:=\partial ^2_{xx}+\partial ^2_{yy}+\partial ^2_{zz}), \end{aligned} \end{aligned}$$
(87)

with mixed homogeneous boundary conditions

$$\begin{aligned} {\mathbf{v}}=0,\ \text {on}\ \Gamma _s\cup \Gamma _b\quad \text { and }\quad \ \partial _z{\mathbf{v}}=0\ \text {on}\ \Gamma _u, \end{aligned}$$
(88)

has a unique solution \(\mathbf{v}\in \mathbf{H}^{2}{\Omega }\) that satisfies

$$\begin{aligned} \begin{aligned} ||\mathbf{v}||_{\mathbf{H}^{2}}\le c||F||_{\mathbf{L}^{2}}. \end{aligned} \end{aligned}$$
(89)

Proof

For equation (85) with homogeneous Neumann boundary conditions we refer to Theorem 9.26 in Brezis (2010) from which the assertion follows. For equation (87) and boundary condition (88), we note that the associated bilinearform

$$\begin{aligned} b(\mathbf{v}_1,\mathbf{v}_2):=\int _\Omega \mathbf{v}_1\cdot \mathbf{v}_2\mathrm{d}x\mathrm{d}y\mathrm{d}z+\int _\Omega \nabla \mathbf{v}_1\cdot \nabla \mathbf{v}_2\mathrm{d}x\mathrm{d}y\mathrm{d}z, \quad \mathbf{v}_1,\mathbf{v}_2\in {V_{\mathbf{v}}}\end{aligned}$$

satisfies \(b(\mathbf{v}, \mathbf{v})\ge ||\mathbf{v}||_{\mathbf{H}^{1}}^2\) for \(\mathbf{v}\in {V_{\mathbf{v}}}\). The assertion follows from a classical result on elliptic equations (see Necas 2012, Theorem 3.1. on p. 30). \(\square \)

Theorem 4.4

(Adjoint Characterization of Optimal Initial Conditions) Let observations \(\psi _\mathrm{obs}\in C([0,{T}], {\mathcal {H}^{1}}(\Omega ))\cap L^2([0,{T}], {\mathcal {H}^{2}}(\Omega ))\) and a background state \(\psi _\mathrm{back}\in {\mathcal {H}^{1}}(\Omega )\) be given. Denote by \(\bar{\psi }_0=(\bar{\mathbf{v}}_0, {\bar{{\Theta }}}_0)\in {\mathcal {H}^{1}}(\Omega )\) an optimal initial condition of the data assimilation problem (9). Then \(\bar{\psi }_0\) satisfies

$$\begin{aligned} \bar{\psi }_0= \psi _\mathrm{back} - \mathcal {B}^{-1}\mathcal {S}^{-1}\tilde{\Psi }_0, \end{aligned}$$
(90)

where \(\tilde{\Psi }\) is the solution of the adjoint equation (54) with initial condition \(\tilde{\Psi }(T)=0\), specified at time \(t=T\) and with forcing given by

$$\begin{aligned} \tilde{F}:= \mathcal {S}\mathcal {R}(\mathcal {M}[\bar{\psi }_0]-\psi _\mathrm{obs}). \end{aligned}$$
(91)

The operator \(\mathcal {S}:=(\mathcal {S}_\mathbf{v}, \mathcal {S}_\theta )\), with \(\mathcal {S}_\mathbf{v}:=(Id-\Delta _3)\) defined in (87) and boundary conditions (88), and \(\mathcal {S}_\theta :=(Id-\triangle _3)\) defined in (85), with homogeneous Neumann boundary conditions is applied to each component of adjoint state vector \(\tilde{\Psi }_0=(\widetilde{\mathbf{{V}}}, \widetilde{{\Theta }})\).

Remark 2

(Regularity in observational term) Strong solutions of the primitive equations posses regularity in \(L^2([0,T], {\mathcal {H}^{2}}(\Omega ))\) (see Theorem 1.1) but the observational term of the cost functional (70) imposes only the weaker \(L^2([0,T], {\mathcal {H}^{1}}(\Omega ))\)-norm. The reason for this discrepancy is that the adjoint equation (54) that is used in Theorem 4.4 requires a forcing \(\tilde{F}\) in \(L^2([0,T], {\mathcal {L}^{2}}(\Omega ))\). The adjoint forcing in (91) applies through the operator \( \mathcal {S}\) second-order derivatives to the model-observation difference. Taking into account the \(L^2([0,T], {\mathcal {H}^{2}}(\Omega ))\)-regularity of the model solution and the imposed regularity of the observations \(\psi _\mathrm{obs}\) the resulting forcing is indeed in \(L^2([0,T], {\mathcal {L}^{2}}(\Omega ))\).

The degree of derivatives that appear in \( \mathcal {S}\) are linked to the degree of derivatives in the observational part of the cost functional. This becomes evident when the Gateaux-derivative of the cost functional is calculated (see the proof of Theorem 4.4, in particular the integration-by-parts in the second equation of (92)). As this calculation shows, raising the spatial regularity in \(\mathcal {J}_\mathrm{obs}\) from \({\mathcal {H}^{1}}\) to \({\mathcal {H}^{2}}\) implies that in the operator \( \mathcal {S}\) the derivatives are raised from two to four, such that the highest-order term is then given by the biharmonic operator \(\triangle ^2\). For such a fourth-order operator, we can no longer guarantee that the adjoint forcing in (91) satisfies \(\tilde{F}\in L^2([0,T], {\mathcal {L}^{2}}(\Omega ))\). The final calculation of the optimal initial state \(\bar{\psi }_0\) in (90) restores regularity through the smoothing effect of the inverse \(\mathcal {S}^{-1}\) but this requires a square integrable forcing in the adjoint equation.

Proof

For a minimizer \(\bar{\psi }_0:=(\bar{\mathbf{v}}_0, \bar{\theta }_0)\in \mathcal {V}\), which exists according to Theorem 4.1, the Gateaux derivative vanishes such that \(\mathcal {J}'(\bar{\psi }_0;\mathbf{a})=0\) for all perturbations \(\mathbf{a}=(a_v,a_\theta )\in \mathcal {V}\). We calculate the Gateaux derivative of \(\mathcal {J}\) at an arbitrary state \(\psi \) in direction \(\mathbf{a}\) and with integration by parts follows

$$\begin{aligned} \mathcal {J}'(\psi _0;\mathbf{a})= & {} \big<\mathcal {B}(\psi _0-\psi _\mathrm{back}),\mathbf{a}\big>_{{\mathcal {H}^{1}}} + \int _0^{{T}}\int _\Omega \mathcal {R}\left( \mathcal {M}[\psi _0]-\psi _\mathrm{obs}\right) \cdot \left( \frac{D\mathcal {M}[\psi _0]}{D\psi _0}\mathbf{a}\right) \, \mathrm{d}x\mathrm{d}t\nonumber \\&\quad + \int _0^{{T}}\int _\Omega \nabla _3\mathcal {R}\left( \mathcal {M}[\psi _0]-\psi _\mathrm{obs}\right) \cdot \nabla _3\left( \frac{D\mathcal {M}[\psi _0]}{D\psi _0}\mathbf{a}\right) \, \mathrm{d}x\mathrm{d}t\nonumber \\= & {} \big<\mathcal {B}(\psi _0-\psi _\mathrm{back}),\mathbf{a}\big>_{{\mathcal {H}^{1}}} + \int _0^{{T}}\int _\Omega \mathcal {R}\left( \mathcal {M}[\psi _0]-\psi _\mathrm{obs}\right) \cdot \Psi \, \mathrm{d}x\mathrm{d}t\nonumber \\&\quad -\int _0^{{T}}\int _\Omega \triangle _3\mathcal {R}\left( \mathcal {M}[\psi _0]-\psi _\mathrm{obs}\right) \cdot \Psi \, \mathrm{d}x\mathrm{d}t\nonumber \\= & {} \big <\mathcal {B}(\psi _0-\psi _\mathrm{back}),\mathbf{a}\big >_{{\mathcal {H}^{1}}} + \int _0^{{T}}\int _\Omega \mathcal {S}\mathcal {R}\left( \mathcal {M}[\psi _0]-\psi _\mathrm{obs} \right) \cdot \Psi \, \mathrm{d}x\mathrm{d}t, \end{aligned}$$
(92)

where we have used that \(\Psi :=\frac{D\mathcal {M}[\psi _0]}{D\psi _0}\mathbf{a}\), according to Lemma 3.2, satisfies the linearized equation with forcing \(F=0\) and that the boundary integral vanishes as a consequence of the boundary conditions.

Now, we integrate an adjoint equation whose forcing is given by the model-observation difference in the second term on the right-hand side of (92),

$$\begin{aligned} \tilde{F}:=\mathcal {S}\mathcal {R}\left( \mathcal {M}[\psi _0]-\psi _\mathrm{obs}\right) . \end{aligned}$$
(93)

The initial condition of the adjoint equation is \(\tilde{\Psi }(T)=0\). From (92) follows with the adjoint relation in Lemma 3.4 that

$$\begin{aligned} \mathcal {J}'(\psi _0;\mathbf{a})= & {} \big<\mathcal {B}(\psi _0-\psi _\mathrm{back}),\mathbf{a}\big>_{{\mathcal {H}^{1}}} +\int _0^{{T}}\int _\Omega \tilde{F}\cdot {\Psi }\ \mathrm{d}x\mathrm{d}t\nonumber \\= & {} \big<\mathcal {B}(\psi _0-\psi _\mathrm{back}),\mathbf{a}\big>_{{\mathcal {H}^{1}}} +\int _0^{{T}}\int _\Omega {F}\cdot \tilde{\Psi }\ \mathrm{d}x\mathrm{d}t -\int _\Omega \tilde{\Psi }\Psi |_{0}^{{T}}\, \mathrm{d}x\nonumber \\= & {} \big<\mathcal {B}(\psi _0-\psi _\mathrm{back}),\mathbf{a}\big>_{{\mathcal {H}^{1}}} +\int _\Omega \tilde{\Psi }(0)\cdot \Psi (0)\, \mathrm{d}x\nonumber \\= & {} \sum _{|\alpha |\le 1} \big <\mathcal {D}^\alpha \mathcal {B}(\psi _0-\psi _\mathrm{back}),\mathcal {D}^\alpha \mathbf{a}\big >_{{\mathcal {L}^{2}}} +\int _\Omega \tilde{\Psi }(0)\cdot \mathbf{a}\, \mathrm{d}x\nonumber \\= & {} \int _\Omega \left( \mathcal {S} \mathcal {B}(\psi _0-\psi _\mathrm{back}) +\tilde{\Psi }(0)\right) \cdot \mathbf{a}\, \mathrm{d}x. \end{aligned}$$
(94)

For a minimum we have \(\mathcal {J}'(\bar{\psi _0};\mathbf{a})=0\) for all \(\mathbf{a}\), and with (94) we obtain

$$\begin{aligned} \begin{aligned} \mathcal {S}\mathcal {B}(\bar{\psi _0}-\psi _\mathrm{back}) + \tilde{\Psi }_0=0. \end{aligned} \end{aligned}$$
(95)

According to Lemma 4.3 this equations can be solved for \(\bar{\psi _0}=(\bar{\mathbf{v}}_0,\bar{\theta }_0)\) by inverting the operator \(\mathcal {S}=(\mathcal {S}_{\mathbf{v}},\mathcal {S}_{\theta })\). Since \(\tilde{\Psi }_0\in {\mathcal {L}^{2}}(\Omega )\), we obtain \(\mathcal {B}^{-1}\mathcal {S}^{-1}\tilde{\Psi }_0\in {\mathcal {H}^{2}}(\Omega )\). This implies with \(\psi _\mathrm{back}\in {\mathcal {H}^{1}}(\Omega )\) that \(\bar{\psi _0}\in {\mathcal {H}^{1}}(\Omega )\) is given by

$$\begin{aligned} \begin{aligned} \bar{\psi _0}= \psi _\mathrm{back} -\mathcal {B}^{-1}\mathcal {S}^{-1}\tilde{\Psi }_0. \end{aligned} \end{aligned}$$

\(\square \)

We consider now the case of observations that are only square integrable. Consequentially, the \(H^1\)-norm in the observational part of the cost functional (70) has to be replaced by the \(L^2\)-norm. This results in a different adjoint forcing term (see also remark 2 in the next section). The background term \(\mathcal {J}_b\) in (69) remains unchanged, because the \(H^1\)-norm is indispensable in obtaining strong solvability of the data assimilation problem.

Corollary 4.5

(Less Regular Observations) Let observations \(\psi _\mathrm{obs}\in L^2([0,{T}], {\mathcal {L}^{2}}(\Omega ))\) be given. Define the observational term of the cost functional by

$$\begin{aligned} \begin{aligned} \mathcal {J}_\mathrm{obs}^*(\psi _0)&:= \int _0^{{T}}\int _\Omega |\mathcal {M}[\psi _0](t)-\psi _\mathrm{obs}(t)|^2\,\mathrm{d}x\mathrm{d}y\mathrm{d}z\mathrm{d}t. \end{aligned} \end{aligned}$$
(96)

Then there exist optimal initial conditions \(\bar{\psi }_0=(\mathbf{v}_0,\theta _0)\in {\mathcal {H}^{1}}(\Omega )\) for the data assimilation problem (9) using the cost functional (68) with \(\mathcal {J}_\mathrm{obs}^*\) replacing \(\mathcal {J}_\mathrm{obs}\). The minimizer \(\bar{\psi }_0\) satisfies

$$\begin{aligned} \bar{\psi }_0= \psi _\mathrm{back} - \mathcal {S}\mathcal {B}^{-1}\tilde{\Psi }_0, \end{aligned}$$
(97)

where \(\tilde{\Psi }_0\) is the result of integrating the adjoint equation (54) with an initial value \(\tilde{\Psi }(T)=0\) and with the forcing \(\tilde{F}:= \mathcal {R}(\bar{\psi }-\psi _\mathrm{obs})\).

5 Convergence of Gradient-Based Descent Algorithm

The goal of this section is to prove the local convergence of an iterative gradient-based method for determining the optimal initial condition of the data assimilation problem, i.e., the descent method converged provided one uses a starting point that is sufficiently close to the optimal initial condition. Globally convergent algorithms such as the Newton method come with a prohibitively high computational costs for the large-scale problems of numerical weather prediction and ocean state estimation and are not considered here. To prove convergence, we use a general condition that is valid in Hilbert spaces and that relies on the Hessian of the cost functional. The Hessian is calculated through the relation between the Hessian and second-order adjoint equations. This lemma provides the basis of our convergence result.

Lemma 5.1

(Abergel and Temam 1990) Let J be a real-valued function on a Hilbert space X with norm \(|\cdot |\). We make the following assumptions:

  1. (i)

    J is of class \(C^2\) and has a local minimum at a point \(\bar{x}\in X\),

  2. (ii)

    there exists a ball \(B(\bar{x})\subseteq X\) around \(\bar{x}\), and two real numbers m, M, such that the following inequalities hold:

    $$\begin{aligned} m |x||y| \le J''(u;x,y)\le M |x||y|,\qquad \text {for all }u\in B, \text {and }x,y\in X, \end{aligned}$$

    where \(J''[u;x, y]\) is the bilinear form associated with the second derivative of J. Then, the gradient algorithm with initial value \(x_0\in B\) converges to \(\bar{x}\). The gradient algorithm is shown in Figure 1 where the cost functional J corresponds to \(\mathcal {J}\) in Fig. 1, the local minimum \(\bar{x}\) to \(\bar{\psi }\) and the initial value \(x_0\) to \(\psi _0^0\).

The adjoint gradient-based algorithm for which we want to establish convergence is illustrated in Fig. 1.

Fig. 1
figure 1

Gradient algorithm for calculation of minimizers of cost functional. The difference between the \({\mathcal {H}^{1}}\)-norms and the \({\mathcal {L}^{2}}\)-norms in the cost functional appear in steps 2 and 3. For the \({\mathcal {L}^{2}}\) version, the operator \(\mathcal {S}\) in the adjont forcing \(\tilde{F}\) is the identity and step 3 does not exist

The second derivative of the cost functional \(\mathcal {J}\) is related to the Hessian \(H_{\mathcal {J}}[\psi ]\) via

$$\begin{aligned} \mathcal {J}''(\psi _0; \mathcal {W},\mathcal {Z}) = \big <\mathcal {Z},H_{\mathcal {J}}[\psi ]\mathcal {W}\big >,\qquad \text {for }\mathcal {W},\mathcal {Z}\in H^1(\Omega ). \end{aligned}$$
(100)

The calculation of the Hessian \(H_\mathcal {J}\) of the cost functional uses the second-order adjoint equations. The second-order adjoint equations are given by (for details see Appendix 1)

$$\begin{aligned} \begin{aligned}&-\partial _t\hat{U}+u\partial _x \hat{U}+\hat{U}\partial _x u+ v\partial _y\hat{U}+w\partial _z \hat{U}+\hat{V}\partial _y u -f\hat{V}+\partial _x \\&\quad \int _{-h}^z g\hat{\Theta }(x,y,\xi ,t)\,\mathrm{d}\xi +D_v(\hat{U})=\bar{F}_{\hat{U}}+\mathcal {G}_{\hat{U}},\\&-\partial _t\hat{V}+\hat{U}\partial _x v +v\partial _y \hat{V}+ \hat{V}\partial _y v+u\partial _x\hat{V}+w\partial _z \hat{V}+f\hat{U}+\partial _y \\&\quad \int _{-h}^z g\hat{\Theta }(x,y,\xi ,t)\,\mathrm{d}\xi + D_v({V})=\bar{F}_{\hat{V}}+\mathcal {G}_{\hat{V}},\\&\partial _x\hat{U}+\partial _y \hat{V}+ \partial _z \bar{W}=0,\\&-\partial _t\hat{\Theta }+u\partial _x\hat{\Theta }+v\partial _y\hat{\Theta }+w \partial _z\hat{\Theta }+\hat{U}\partial _x\theta +\hat{V}\partial _y\theta +D_\theta (\hat{\Theta })=\bar{F}_{\hat{\Theta }}+\mathcal {G}_{\hat{\Theta }}, \end{aligned} \end{aligned}$$
(101)

where \(D_v(f):=-\frac{1}{Re_1}\triangle \,f - \frac{1}{Re_2}\partial ^2_{zz} f\) and \(D_\theta (\theta ):=-\frac{1}{Rt_1}\triangle \,\theta - \frac{1}{Rt_2}\partial ^2_{zz} \theta \) and with forcing \(\bar{F}:=(\bar{F}_{\hat{U}},\bar{F}_{\hat{V}}, \bar{F}_{\hat{\Theta }})\) and with \(\mathcal {G}:=(\mathcal {G}_{\hat{U}},\mathcal {G}_{\hat{V}}, \mathcal {G}_{\hat{\Theta }})\) given by

$$\begin{aligned} \begin{aligned}&\mathcal {G}_{\hat{U}}:={U}\partial _x \widetilde{{U}}+\widetilde{{U}}\partial _x {U}+ {V}\partial _y\widetilde{{U}}+\widetilde{{U}}\partial _x {V}+\widetilde{{U}}\partial _x{\Theta }+\widetilde{{\Theta }}\partial _x{\Theta },\\&\mathcal {G}_{\hat{V}}:=\widetilde{{V}}\partial _y {U}+{V}\partial _y \widetilde{{V}}+ \widetilde{{V}}\partial _y +{U}\partial _x\widetilde{{V}}+ \widetilde{{V}}\partial _y{\Theta }+\widetilde{{\Theta }}\partial _y{\Theta },\\&\mathcal {G}_{\hat{\Theta }}:=\partial _x({U}\widetilde{{\Theta }})+\partial _y({V}\widetilde{{\Theta }})+\partial _z(W\widetilde{{\Theta }})+\int _{-h}^z\nabla \cdot \widetilde{{U}}(x,y,z')\, \mathrm{d}z'. \end{aligned} \end{aligned}$$
(102)
Fig. 2
figure 2

Algorithm for calculation Hesse matrix via second-order adjoint

The following Theorem provides information about the regularity of the second-order adjoint equations that we need to prove our convergence result.

Theorem 5.2

(Regularity of Second-Order Adjoint Equations) Suppose

  1. 1.

    the initial condition \(\psi _0\in \mathcal {V}\) of the primitive equations (1) is given

  2. 2.

    the initial condition \(\Psi _0\in \mathcal {V}\) of the linearized equations (32) is given

  3. 3.

    the initial condition \(\tilde{\Psi }(T)\in \mathcal {V}\) of the adjoint equations (54), specified at the end point of the time interval is given

  4. 4.

    the forcing satisfies \(\hat{F}=(\hat{F}_{\hat{U}},\hat{F}_{\hat{V}}, \hat{F}_{\hat{\Theta }})\in L^2([0,T],\mathbf{L}^{2}(\Omega )\times {L^{2}}(\Omega ))\).

If the initial conditions of the second-order adjoint equations satisfy \(\hat{\Psi }_0\in \mathcal {V}\), then the system (101) has a unique solution on [0, T] with the properties

$$\begin{aligned} \hat{\Psi }(t)\in C([0,{T}],{\mathcal {H}^{1}}(\Omega ))\cap L^2([0,{T}], {\mathcal {H}^{2}}(\Omega )), \end{aligned}$$

and the state vector \(\hat{\Psi }\) satisfies for \(t\in [0,T]\)

$$\begin{aligned} \begin{aligned} ||\hat{\Psi }(t)||_{{\mathcal {H}^{1}}}^2&\le ||\hat{\Psi }_0||_{{\mathcal {H}^{1}}}^2\exp \left\{ \int _0^t K_1(y)+K_2(y)\, \mathrm{d}y\right\} \\&\quad +\int _{0}^t \left[ ||\hat{F}||_{{\mathcal {L}^{2}}}^2+||\mathcal {G}||_{{\mathcal {L}^{2}}}^2\right] \exp \left\{ \int _0^s K_1(y)+K_2(y)\, \mathrm{d}y\right\} \, \mathrm{d}s, \end{aligned} \end{aligned}$$
(104)

with \(K_1,K_2\) defined in (44) and \(\mathcal {G}\) defined in (102).

Proof

(Sketch of proof) The second-order adjoint equations (101) resemble formally the first-order adjoint equations (54). If one identifies the first-order adjoint variable \(\tilde{\Psi }\) with the second-order adjoint variable \(\bar{\Psi }\), then the difference between the two equations are the additional \(\mathcal {G}\)-terms defined in (102). These terms consist of products of linear variable \(\Psi :=(\mathbf{V},{\Theta })\) and derivatives of the second order adjoint \(\bar{\Psi }\). Using the regularity of linear equations in Theorem 3.1, we can estimate \(\mathcal {G}_{\hat{U}},\mathcal {G}_{\hat{V}},\mathcal {G}_{\hat{\Theta }}\) in the same manner as the products of the state vector \(\psi \) and derivatives of the second-order adjoint \(\bar{\Psi }\) on the left-hand side of (101). With the Agmon inequality, we obtain the following estimate

$$\begin{aligned} \begin{aligned}&\int _0^{T}||\mathcal {G}_{\hat{U}}(s)||_{{\mathcal {L}^{2}}}^2\mathrm{d}s\le \max _{s\in [0,{T}]}||\nabla \widetilde{\mathbf{{V}}}(s)||_{\mathbf{L}^{2}}^2\int _0^{T}||\mathbf{V}(s)||_{\mathbf{H}^{2}}^2\mathrm{d}s\\&\qquad +\max _{s\in [0,{T}]}||\nabla \mathbf{V}(s)||_{\mathbf{L}^{2}}^2 \int _0^{T}||\widetilde{\mathbf{{V}}}(s)||_{\mathbf{H}^{2}}^2\mathrm{d}s\\&\qquad +\max _{s\in [0,{T}]}||\nabla {\Theta }(s)||_{L^2}^2\int _0^{T}||\widetilde{\mathbf{{V}}}(s)||_{\mathbf{H}^{2}}^2+||\widetilde{{\Theta }}(s)||_{\mathbf{H}^{2}}^2\mathrm{d}s. \end{aligned} \end{aligned}$$
(105)

Analogous estimates for \(\mathcal {G}_{\hat{V}}, \mathcal {G}_{\hat{\Theta }}\) imply that \(\mathcal {G}=(\mathcal {G}_{\hat{U}},\mathcal {G}_{\hat{V}}, \mathcal {G}_{\hat{\Theta }})\) satisfies

$$\begin{aligned} \begin{aligned} \int _0^{T}||\mathcal {G}(s)||_{{\mathcal {L}^{2}}}^2\mathrm{d}s&\le ||\Psi (s)||_{C([0,{T}],{\mathcal {H}^{1}})}^2||\widetilde{\Psi }(s)||_{L^2([0,{T}],{\mathcal {H}^{2}})}^2\\&+ ||\widetilde{\Psi }(s)||_{C([0,{T}], {\mathcal {H}^{1}})}^2||\Psi (s)||_{L^2([0,{T}],{\mathcal {H}^{2}})}^2. \end{aligned} \end{aligned}$$
(106)

This shows that \((\mathcal {G}_{\hat{U}}, \mathcal {G}_{\hat{V}}, \mathcal {G}_{\hat{\Theta }})\in L^2([0,{T}], {\mathcal {L}^{2}})\). If we now define \(\tilde{F}:=\mathcal {G}+\bar{F}\), we can cast the second-order adjoint equations in the form of the first-order adjoint equations and apply the estimates in the proof of Theorem 3.3 to prove that a unique solution \(\bar{\Psi }\in C([0,{T}], {\mathcal {H}^{1}}(\Omega ))\cap L^2([0,{T}], {\mathcal {H}^{2}}(\Omega ))\) to (101) exists. \(\square \)

The equations (100) and (103) are used to verify the boundedness of the second derivative of the cost functional. From Theorem 5.2, we infer that the right-hand side of (103) is well-defined in \({\mathcal {H}^{1}}(\Omega )\), i.e., for \(\mathcal {W}\in {\mathcal {H}^{1}}(\Omega )\), we have

$$\begin{aligned} H_{\mathcal {J}}[\psi ]\mathcal {W} = \mathcal {S} \mathcal {B}\mathcal {W} -\bar{\Psi }_0. \end{aligned}$$
(107)

We show now that the algorithm described in Figure 1 converges to an optimal initial condition of the data assimilation problem.

Theorem 5.3

(Convergence) Suppose that the assumptions of Theorem 4.1 are satisfied. Let \(\bar{\psi }_0\in \mathcal {V}\) be an optimal initial condition for the data assimilation problem (9) with cost functional specified by (68). Let \({\psi }_0^{0}\in \mathcal {V}\) be an initial value for the descent algorithm 5 that lies within a ball \(B(\bar{\psi }_0)\subseteq {\mathcal {H}^{1}}(\Omega )\) around \(\bar{\psi }_0\). Define the sequence \((\psi _0^n)_n\) by (98). Then, \((\psi _0^n)_n\) converges to \(\bar{\psi }_0\) in \( {\mathcal {H}^{1}}(\Omega )\).

Proof

The proof of the Theorem is based on Lemma 5.1. We have to establish a lower and upper bound on the Hessian of the cost functional. This is accomplished via bounds on the second-order state variable. In order to ease notation we suppress the index “n” of the sequence, the state variable \(\psi \) in this proof corresponds to \(\psi _n\). The occurring equations, their initial conditions and forcing terms are summarized in Fig. 2.

We infer from Theorem 5.2 with \(\hat{\Psi }_0=0\)

$$\begin{aligned} \begin{aligned} ||\hat{\Psi }(t)||_{{\mathcal {H}^{1}}}^2&\le \int _{t}^T \left[ ||\hat{F}||_{{\mathcal {L}^{2}}}^2+||\mathcal {G}||_{{\mathcal {L}^{2}}}^2\right] \exp \left\{ \int _0^s K_1(y)+K_2(y)\, \mathrm{d}y\right\} \, \mathrm{d}s,\\ \end{aligned} \end{aligned}$$
(108)

with \(K_1,K_2\) defined in (44) and with forcing given by \(\hat{F}:=\sum _{|\alpha |\le 1}\triangle ^{\alpha }\mathcal {R}{\Psi }\).

We estimate now the right-hand side of (108) and begin with the forcing. With assumption (71) on the covariance operators, we obtain the estimate

$$\begin{aligned} \int _t^{T}||\hat{F}(s)||_{{\mathcal {L}^{2}}}^2\mathrm{d}s\le & {} \int _t^{T}\sum _{|\alpha |\le 1}||\triangle ^{\alpha }\mathcal {R}{\Psi }(s)||_{{\mathcal {L}^{2}}}^2\mathrm{d}s \le c\int _t^{T}||\mathcal {R}{\Psi }(t)||_{{\mathcal {H}^{2}}}^2\mathrm{d}s\nonumber \\\le & {} c||\mathcal {R}||\int _t^{T}||\Psi (t)||_{{\mathcal {H}^{2}}}^2\mathrm{d}s, \end{aligned}$$
(109)

where \(||\mathcal {R}||\) denotes the operator norm of the linear operator \(\mathcal {R}\) and \(\Psi \in L^\infty ([0,{T}],{\mathcal {H}^{1}}(\Omega ))\cap L^2([0,{T}],{\mathcal {H}^{2}}(\Omega ))\) is the solution of the linear primitive equations with initial condition \(\Psi _0=\mathcal {W}\) and vanishing forcing. According to (42), this solution satisfies for \(t\in [0,T]\) the inequalities

$$\begin{aligned}&||\Psi (t)||_{{\mathcal {H}^{1}}}^2 \le ||\mathcal {W}||_{{\mathcal {H}^{1}}}^2 \exp \{\int _0^t K_1(s)+K_2(s)\, \mathrm{d}s\} = L_0(t) ||\mathcal {W}||_{{\mathcal {H}^{1}}}^2,\nonumber \\ \text {with }&L_0(t):=\exp \{\int _0^t K_1(s)+K_2(s)\, \mathrm{d}s\}, \end{aligned}$$
(110)

and

$$\begin{aligned}&\int _t^T||\Psi (s)||_{{\mathcal {H}^{2}}}^2\mathrm{d}s\nonumber \\&\quad \le ||\mathcal {W}||_{{\mathcal {H}^{1}}}^2 +||\mathcal {W}||_{{\mathcal {H}^{1}}}^2 \int _t^T( K_1(s)+K_2(s))\exp \{\int _0^s K_1(y)+K_2(y)\, \mathrm{d}y\}\, \mathrm{d}s\nonumber \\&\quad \le ||\mathcal {W}||_{{\mathcal {H}^{1}}}^2\left( 1+ \int _t^T( K_1(s)+K_2(s))\exp \{\int _0^s K_1(y)+K_2(y)\, \mathrm{d}y\}\, \mathrm{d}s\right) \nonumber \\&\quad =L_1(t)||\mathcal {W}||_{{\mathcal {H}^{1}}}^2,\nonumber \\ \text {with }&L_1(t):=\left( 1+ \int _t^T( K_1(s)+K_2(s))\exp \{\int _0^s K_1(y)+K_2(y)\, \mathrm{d}y\}\, \mathrm{d}s\right) . \end{aligned}$$
(111)

The estimates (110) and (111) imply for (109)

$$\begin{aligned} \begin{aligned}&\int _t^{T}||\hat{F}(s)||_{{\mathcal {L}^{2}}}^2\mathrm{d}s \le M_1(t)||\mathcal {W}||_{{\mathcal {H}^{1}}}^2,\\ \text {with }&M_1(t):=c(L_0(t)+L_1(t))||\mathcal {R}||. \end{aligned} \end{aligned}$$
(112)

We continue now with the estimation of the \(\mathcal {G}\)-terms in (108). It follows analogous to (106)

$$\begin{aligned} \begin{aligned} \int _{t}^T||\mathcal {G}(s)||_{{\mathcal {L}^{2}}}^2\mathrm{d}s&\le ||\Psi ||_{C([{t},T],{\mathcal {H}^{1}})}^2||\widetilde{\Psi }||_{L^2([{t},T],{\mathcal {H}^{2}})}^2\\&+ ||\widetilde{\Psi }||_{C([{t},T],{\mathcal {H}^{1}})}^2||\Psi ||_{L^2([{t},T],{\mathcal {H}^{2}})}^2. \end{aligned} \end{aligned}$$
(113)

The adjoint state \(\widetilde{\Psi }\) in (113) which result from integration of the adjoint equations with zero terminal condition and forcing \(\tilde{F}:=\sum _{|\alpha |\le 1}\triangle ^{\alpha }\mathcal {R}({\psi }-\psi _\mathrm{obs})\) satisfies according to Theorem 3.3 for \(t\in [0,T]\) the \(H^1\)-estimate

$$\begin{aligned} \begin{aligned} ||\widetilde{\Psi }(t)||_{{\mathcal {H}^{1}}}^2&\le c\int _t^T ||\tilde{F}(s)||_{{\mathcal {L}^{2}}}^2\exp \{\int _0^s K_1(y)+K_2(y)\, \mathrm{d}y\}\, \mathrm{d}s\\&\le c||L_0||_{L^\infty }\int _t^T ||\tilde{F}(s)||_{{\mathcal {L}^{2}}}^2\, \mathrm{d}s. \end{aligned} \end{aligned}$$
(114)

For the \(H^2\)-bound on the adjoint state in (113) yields Theorem 3.3, combined with (114)

$$\begin{aligned} \begin{aligned}&\int _t^T||\widetilde{\Psi }(s)||_{{\mathcal {H}^{2}}}^2\mathrm{d}s \le \int _t^T( K_1(s)+K_2(s))||\widetilde{\Psi }(s)||_{{\mathcal {H}^{1}}}^2\, \mathrm{d}s +\int _t^T ||\tilde{F}(s)||_{{\mathcal {L}^{2}}}^2\, \mathrm{d}s\\&\quad \le ||L_0||_{L^\infty }\int _t^T( K_1(s)+K_2(s))\int _0^s ||\tilde{F}(y)||_{{\mathcal {L}^{2}}}^2\, \mathrm{d}y\, \mathrm{d}s +\int _t^T ||\tilde{F}(s)||_{{\mathcal {L}^{2}}}^2, \mathrm{d}s\\&\quad \le ||L_0||_{L^\infty }\int _t^T( K_1(s)+K_2(s))\, \mathrm{d}s\int _0^t ||\tilde{F}(y)||_{{\mathcal {L}^{2}}}^2\, \mathrm{d}y\\&\qquad +\int _t^T ||\tilde{F}(s)||_{{\mathcal {L}^{2}}}^2, \mathrm{d}s\\&\quad \le L_2(t)\int _t^T ||\tilde{F}(y)||_{L^2}^2\, \mathrm{d}y,\\ \text {with }&L_2(t):=||L_0||_{L^\infty }\int _t^T( K_1(s)+K_2(s))\, \mathrm{d}s+1. \end{aligned} \end{aligned}$$
(115)

For the adjoint forcing in (114) and (115) follows with the triangle inequality

$$\begin{aligned} \begin{aligned}&\int _t^T||\tilde{F}(s)||_{{\mathcal {L}^{2}}}^2\mathrm{d}s \le \sum _{|\alpha |\le 1}\int _0^t||\triangle ^{\alpha }\mathcal {R}\left( {\psi }(s)-\psi _\mathrm{obs}(s)\right) ||_{{\mathcal {L}^{2}}}^2\mathrm{d}s\\&\quad \le ||\mathcal {R}||\int _0^t||{\psi }(s)-\psi _\mathrm{obs}(s)||_{{\mathcal {H}^{2}}}^2\mathrm{d}s\\&\quad \le ||\mathcal {R}||\int _t^T||{\psi }(s)-\bar{\psi }(s)||_{{\mathcal {H}^{2}}}^2+||\bar{\psi }(s)-\psi _\mathrm{obs}(s)||_{{\mathcal {H}^{2}}}^2\mathrm{d}s =: L_4(t), \end{aligned} \end{aligned}$$
(116)

where \(L_4(t)>0\) is bounded on T, since \({\psi }^*,{\psi }^n,\psi _\mathrm{obs}\in L^2([0,T],{\mathcal {H}^{2}}(\Omega ))\). The function \(L_4\) is bounded uniformly in n, because \((\psi ^n)_n\subseteq B({\psi }^*_0)\).

From (114)-(116) follows for (113)

$$\begin{aligned} \begin{aligned}&\int _{t}^T||\mathcal {G}(s)||_{{\mathcal {L}^{2}}}^2\mathrm{d}s\\&\quad \le ||\Psi ||_{C([{t},T]H^1)}^2||\widetilde{\Psi }||_{L^2([{t},T],{\mathcal {H}^{2}})}^2 + ||\widetilde{\Psi }||_{C([{t},T]H^1)}^2||\Psi ||_{L^2([{t},T],{\mathcal {H}^{2}})}^2\\&\quad \le \left( L_0(t)L_2(t) + c||L_0||_{L^\infty }L_1(t)\right) ||\mathcal {W}||_{{\mathcal {H}^{1}}}^2\int _t^T ||\tilde{F}(s)||_{{\mathcal {L}^{2}}}^2\, \mathrm{d}s\\&\quad \le \left( L_0(t)L_2(t) + c||L_0||_{L^\infty }L_1(t)\right) L_4(t)||\mathcal {W}||_{H^1}=M_2(t)||\mathcal {W}||_{{\mathcal {H}^{1}}}^2,\\ \text {with }&M_2(t):=\left( L_0(t)L_2(t) + c||L_0||_{L^\infty }L_1(t)\right) L_4(t) \end{aligned} \end{aligned}$$
(117)

With (109) and (117), we derive for the upper bound on the second-order state in (108) for \(t\in [0, T]\)

$$\begin{aligned} \begin{aligned} ||\hat{\Psi }(t)||_{{\mathcal {H}^{1}}}^2&\le \int _t^T \left[ ||\hat{F}||_{{\mathcal {L}^{2}}}^2+||\mathcal {G}||_{{\mathcal {L}^{2}}}^2\right] \exp \{\int _0^s K_1(y)+K_2(y)\, \mathrm{d}y\}\, \mathrm{d}s\\&\le L_0(t)(M_1(t)+M_2(t))||\mathcal {W}||_{{\mathcal {H}^{1}}}=M_0(t))||\mathcal {W}||_{{\mathcal {H}^{1}}},\\ \text {with }&M_0(t):=L_0(t)(M_1(t)+M_2(t)). \end{aligned} \end{aligned}$$
(118)

We are now in the position to derive upper and lower bounds on the Hessian. From (118) follows

$$\begin{aligned} \begin{aligned} |\big<\mathcal {Z},H_{\mathcal {J}}[\psi ]\mathcal {W}\big>_{{\mathcal {H}^{1}}}|&= |\big <\mathcal {Z},\mathcal {S} \mathcal {B}\mathcal {W} -\hat{\Psi }_0\big >_{{\mathcal {H}^{1}}}|\\&\le ||\mathcal {Z}||_{{\mathcal {H}^{1}}}\, ||\mathcal {S} \mathcal {B}\mathcal {W} -\hat{\Psi }_0||_{{\mathcal {H}^{1}}}\\&\le ||\mathcal {Z}||_{{\mathcal {H}^{1}}}( ||\mathcal {S} \mathcal {B}\mathcal {W}||_{{\mathcal {H}^{1}}}+||\hat{\Psi }_0||_{{\mathcal {H}^{1}}})\\&\le ||\mathcal {Z}||_{{\mathcal {H}^{1}}}( ||\mathcal {S} \mathcal {B}||_{{\mathcal {H}^{1}}} + \sqrt{M_0}||)\mathcal {W}||_{{\mathcal {H}^{1}}}. \end{aligned} \end{aligned}$$
(119)

Equations (119) and (100) establish the bounds on the second derivative of the cost functional and the application of Lemma 5.1 proves the assertion of the Theorem. \(\square \)