1 Introduction

This work is concerned with an optimal control problem for the scalar wave equation where the control enters as the spatially varying coefficient in the principal part. Informally, we consider the problem

$$\begin{aligned} \left\{ \begin{aligned}&\min _{u,y}\frac{1}{2}\Vert By-y_d\Vert _{{\mathcal {O}}}^2+{\mathcal {R}}(u)\\&\text {s.t.}\quad y_{tt} - {\mathrm {div}}(u\nabla y) = f,\quad y(0)=y_0, \partial _t y(0) = y_1,\\&\qquad {\underline{u}} \le u \le {\overline{u}} \quad \text {almost everywhere (a.e.)}, \end{aligned} \right. \end{aligned}$$
(1)

where \(y_d\) is a given (desired or observed) state, B is a bounded linear observation operator mapping to the observation space \({\mathcal {O}}\), \(\mathcal R\) is a regularization term, \(0<{\underline{u}} < {\overline{u}}\) are constants, and f, \(y_0\), and \(y_1\) (as well as boundary conditions) are given suitably. A precise statement is deferred to Sect. 2. Such problems occur, e.g., in acoustic tomography for medical imaging [1] and non-destructive testing [2] as well as in seismic inversion [3]. In the latter, the goal is the determination of a “velocity model” (as described by the coefficient u) of the underground in a region of interest from recordings (“seismograms”, modeled by \(y_d\)) of reflected pressure waves generated by sources on or near the surface (entering the equation via f, \(y_0\), \(y_1\), or inhomogeneous boundary conditions). If the region contains multiple different materials like rock, oil, and gas, the velocity model changes rapidly or may even have jumps between material interfaces.

In the stationary case, the question of existence of solutions to problem (1) under only pointwise constraints and regularization has received a tremendous amount of attention. However, it was answered in the negative in [4]; this and subsequent investigations led to the concept of H-convergence and, more generally, to homogenization theory; see, e.g., [5,6,7,8,9]. The use of regularization terms or constraints involving higher-order differential operators would certainly guarantee existence but contradicts the goal of allowing piecewise continuous controls u. Such considerations suggest the introduction of total variation regularization in addition to pointwise constraints. In this case, existence can be argued. However, this leads to difficulties in deriving necessary optimality conditions since the sum rule of convex analysis can only be applied in the \(L^\infty (\Omega )\) topology, which would lead to (generalized) derivatives that do not admit a pointwise representation. This difficulty can be circumvented by replacing the pointwise constraints by a (differentiable approximation of a) cutoff function applied to the coefficient in the equation and by using improved regularity results for the optimal state that allow extending the Fréchet derivative of the tracking term from \(L^\infty (\Omega )\) to \(L^s(\Omega )\) for \(s<\infty \) sufficiently large. Together, this allows obtaining derivatives and subgradients in \(L^r(\Omega )\) for some \(r>1\), which can be characterized in a suitable pointwise manner. This was carried out in [10], which considered for \({\mathcal {R}}\) a combination of total variation and multi-bang regularization; the latter is a convex pointwise penalty that promotes controls which take values from a prescribed discrete set (e.g., corresponding to different materials such as rock, oil, and gas), see also [11,12,13].

In the current work, we extend this approach to optimal control and identification of discontinuous coefficients in scalar wave equations by deriving under additional (natural) assumptions on the data the adapted higher regularity results for the wave equation based on elliptic maximal regularity theory [14]; see Assumption 3.1 and Proposition 3.6 below. We also address a suitable discretization of the problem using a stabilized finite element method [15] and its solution by a nonlinear primal–dual proximal splitting method [16,17,18].

Let us briefly comment on related literature. As there is a vast body of work on control and inverse problems for the wave equation, we focus here specifically on the identification of discontinuous (and, in particular, piecewise constant) coefficients. This problem has attracted strong interest over the last few decades, mainly due to its relevance in seismic inversion. Classical works are mainly concerned with the one-dimensional setting – as a model for seismic inversion in stratified or layered media – which allows making use of integral transforms to derive explicit “layer-stripping” formulas; see, e.g., [19,20,21,22]. Regarding the numerous works on wave speed identification in the multidimensional wave equation for seismic inversion, we only mention exemplarily [23,24,25]; see also further literature cited there. The use of total variation penalties for recovering a piecewise constant wave speed in multiple dimensions has been proposed in, e.g., [26,27,28,29,30], although the earlier works employed a smooth approximation of the total variation to allow the numerical solution by standard approaches for nonlinear PDE-constrained optimization. Finally, joint multi-bang and total variation regularization of linear inverse problems and its numerical solution by a primal–dual proximal splitting methods were considered in [31]. We also mention that multi-bang control is related to (but different from) switching controls, where at each instant in time, one and only one from a given set of time-dependent controls should be active; see, e.g., [32].

This work is organized as follows. In the next Sect. 2, we give a formal statement of the optimal control problem (1) and recall the relevant definitions and properties of the functional. We then derive in Sect. 3 the results on regularity, stability, and a priori estimates for solutions of the state equation that will be needed in the rest of the paper. In particular, in Proposition 3.6 we show a Groeger-type maximal regularity result for the wave equation under additional assumptions on the data. Section 4 is devoted to existence and first-order necessary optimality conditions for optimal controls, where we use the mentioned maximal regularity result to show that the latter can be interpreted in a pointwise fashion. We then discuss the numerical computation of solutions using a stabilized finite element discretization (see Sect. 5.1) together with a nonlinear primal–dual proximal splitting method (see Sect. 5.2). This approach is illustrated in Sect. 6 for two examples: a transmission setup motivated by acoustic tomography and a reflection setup modeling seismic tomography.

2 Problem Statement

Let \(\Omega \subset {\mathbb {R}}^d\), \(d \in \{2,3\}\), be a bounded domain with \(C^{2,1}\) regular boundary \(\partial \Omega \) and outer normal \(\nu \). For brevity, we introduce the notation \(H:=L^2(\Omega )\) and \(V:=H^1(\Omega )\) and set \(I:=(0,T)\). Then we consider for \(f\in L^2(I,V)\), \(y_0\in V\), and \(y_1\in H\) the weak solution \(y\in C(\overline{I},V)\cap C^1({\overline{I}},H)\) to

$$\begin{aligned} \left\{ \begin{aligned} \partial _{tt} y - {\mathrm {div}}(u\nabla y)&= f&\text { in } Q := (0,T) \times \Omega , \\ {\partial _\nu y}&= 0&\text { on } \Sigma := (0,T) \times \partial \Omega , \\ y(0)&= y_0,\quad {\partial _t} y(0) = y_1,&\text { on }\Omega . \end{aligned}\right. \end{aligned}$$
(2)

This choice of Neumann boundary conditions corresponds, e.g., for acoustic waves to the situation of reflection at a sound-hard obstacle and for elastic waves to the absence of external forces at the boundary (which is a natural setting for seismic imaging via interior sources). We will discuss existence and regularity of solutions to (2) in the following Sect. 3.

The salient point is of course the coefficient u in the principal part, which we want to control on an open subset \(\omega _c\subseteq \Omega \), which is assumed to have a \(C^{2,1}\) regular boundary. For constants \({\underline{u}}, \overline{u}\) with \(0< {\underline{u}}< \overline{u} < \infty \) we define the set of admissible coefficients

$$\begin{aligned} {\hat{U}} = \left\{ u \in L^\infty (\Omega ):{\underline{u}} \le u(x) \le \overline{u}\quad \text {for a.e.}~x\in \Omega \right\} \end{aligned}$$
(3)

and pick a reference coefficient \({\hat{u}}\in {\hat{U}}\). To map a control u defined on \(\omega _c\) to a coefficient defined on \(\Omega \), we introduce the affine bounded extension operator

$$\begin{aligned} {\hat{E}}: L^2(\omega _c)\rightarrow L^2(\Omega ),\qquad [{\hat{E}}u](x) := {\left\{ \begin{array}{ll} {\hat{u}}(x) + u(x) &{} \text {for }x\in \omega _c,\\ {\hat{u}}(x) &{} \text {for }x\in \Omega \setminus \omega _c. \end{array}\right. } \end{aligned}$$
(4)

The set of controls that can be extended to admissible coefficients is then given by

$$\begin{aligned} U = \left\{ u \in L^\infty (\omega _c):u_{\min } \le u(x) \le u_{\max } \quad \text {for a.e.}~x\in \omega _c\right\} , \end{aligned}$$
(5)

where \(u_{\min }<u_{\max }\) are such that \({\underline{u}} \le \inf _{x\in \omega _c} {\hat{u}}(x) + u_{\min } \le \sup _{x\in \omega _c} {\hat{u}}(x) + u_{\max } \le \overline{u}\). In particular, for \({\hat{u}} \equiv {\underline{u}}\), we have \(u_{\min }=0\) and \(u_{\max } = \overline{u}-{\underline{u}}\).

Moreover, we introduce the observation space \({\mathcal {O}}\) which is assumed to be a separable Hilbert space as well as a linear and bounded observation operator \(B\in {\mathbb {L}}(L^2(Q),{\mathcal {O}})\) with adjoint \(B^*\in {\mathbb {L}}({\mathcal {O}}, L^2(Q))\).

We then consider the optimal control problem

$$\begin{aligned} \min _{u\in BV(\omega _c)\cap U}\frac{1}{2}\Vert By(\hat{E}u)-y_d\Vert _{{\mathcal {O}}}^2+\alpha G(u)+\beta \mathrm {TV}(u), \end{aligned}$$
(6)

where y(u) is a weak solution to (2), G is the multi-bang penalty from [11, 12], \(\mathrm {TV}\) denotes the total variation, and \(\alpha \) and \(\beta \) are positive constants. In the remainder of this section, we recall the definitions and properties of the total variation and the multi-bang penalty relevant to the current work.

Total variation We recall, e.g., from [33,34,35] that the space \(BV(\omega _c)\) is given by those functions \(v\in L^1(\omega _c)\) for which the distributional derivative Dv is a Radon measure, i.e.,

$$\begin{aligned} BV(\omega _c) = \left\{ v\in L^1(\omega _c): \Vert Dv \Vert _{{\mathcal {M}}(\omega _c)} < \infty \right\} . \end{aligned}$$

The total variation of a function \(v\in BV(\omega _c)\) is then given by

$$\begin{aligned} \mathrm {TV}(v) := \Vert Dv \Vert _{{\mathcal {M}}(\omega _c)} = \int _{\omega _c} \mathrm {d}|Dv|_2, \end{aligned}$$

i.e., the total variation (in the sense of measure theory) of the vector measure \(Dv\in {\mathcal {M}}(\omega _c;{\mathbb {R}}^d)=C_0(\omega _c;{\mathbb {R}}^d)^*\). Here, \(|\cdot |_2\) denotes the Euclidean norm on \({\mathbb {R}}^d\); we thus consider in this work the isotropic total variation. For \(v\in L^1(\omega _c)\setminus BV(\omega _c)\), we set \(\mathrm {TV}(v)=\infty \). It follows that \(BV(\omega _c)\) embeds into \(L^r(\omega _c)\) continuously for every \(r\in [1,\frac{d}{d-1}]\) and compactly if \(r < \frac{d}{d-1}\); see, e.g., [33, Cor. 3.49 together with Prop. 3.21]. In addition, the total variation is lower semi-continuous with respect to strong convergence in \(L^1(\omega _c)\), i.e., if \(\{u_n\}_{n\in {\mathbb {N}}}\subset BV(\omega _c)\) and \(u_n\rightarrow u\) in \(L^1(\omega _c)\), we have that

$$\begin{aligned} \mathrm {TV}(u)\le \liminf _{n\rightarrow \infty } \mathrm {TV}(u_n), \end{aligned}$$
(7)

see, e.g., [35, Thm. 5.2.1]. Note that this does not imply that \(\mathrm {TV}(u)<\infty \) and hence that \(u\in BV(\omega _c)\) unless \(\{\mathrm {TV}(u_n)\}_{n\in {\mathbb {N}}}\) has a bounded subsequence. From (7), we also deduce that the convex extended real-valued functional \(\mathrm {TV}:L^p(\omega _c)\rightarrow {\mathbb {R}}\cup \{\infty \}\) is weakly lower semi-continuous for any \(p\in [1,\infty ]\).

Multi-bang penalty Let \(u_{\min } \le u_1<\dots < u_m\le u_{\max }\) be a given set of desired coefficient values. The multi-bang penalty G is then defined similar to [12], where we have to replace the box constraints \(u(x)\in [u_1,u_m]\) by a linear growth to ensure that G is finite on \(L^r(\omega _c)\), \(r<\infty \). For simplicity, we assume in the following that \(u_1 = u_{\min } = 0\) and \(u_m = u_{\max } = {\overline{u}} - {\underline{u}}\) (i.e., \({\hat{u}} = {\underline{u}}\)) and define

$$\begin{aligned} G:L^1(\omega _c)\rightarrow {\mathbb {R}},\qquad G(u) = \int _{\omega _c} g(u(x))\,\mathrm {d}x, \end{aligned}$$

where \(g:{\mathbb {R}}\rightarrow {\mathbb {R}}\) is given by

$$\begin{aligned} g(t) = {\left\{ \begin{array}{ll} -u_m t &{} t \le u_1,\\ \frac{1}{2} \left( (u_{i}+u_{i+1})t - u_iu_{i+1}\right) &{} t \in [u_i,u_{i+1}],\quad 1\le i <m,\\ u_m t - \frac{1}{2} u_m^2 &{} t \ge u_m. \end{array}\right. } \end{aligned}$$
(8)

This definition can be motivated via the convex envelope of \(\delta _{[u_1,u_d]}(t) + \frac{1}{2} |t|^2\) (where \(\delta \) denotes the indicator function in the sense of convex analysis), see [12]; note however that here (as in [10]) g is defined to be finite for every \(t\in {\mathbb {R}}\), while the convex envelope is only finite for \(t\in [u_1,u_2]\). We also remark that for \(m=2\), this reduces in the current setting to the well-known sparsity penalty (i.e., \(G(u) = \Vert u \Vert _{L^1(\omega _c)}\) for any \(u\in U\)).

It can be verified easily that g is continuous, convex, and linearly bounded from above and below, i.e.,

$$\begin{aligned} \frac{1}{2} u_2 |t| \le g(t) \le u_m |t| \qquad \text {for all }t\in {\mathbb {R}}. \end{aligned}$$

Since g is finite (and hence proper), convex, and continuous, the corresponding integral operator \(G:L^r(\omega _c)\rightarrow {\mathbb {R}}\) is finite, convex, and continuous (and hence a fortiori weakly lower semi-continuous) for any \(r\in [1,\infty ]\), see, e.g., [36, Prop. 2.53]. Also, the properties of g imply that

  1. (G1)

    \(G(v) > G(0)=0\) for all \(v\in L^1(\omega _c)\setminus \{0\}\),

  2. (G2)

    \(\frac{1}{2} u_2 \Vert v \Vert _{L^1(\Omega )}\le G(v) \le u_m \Vert v \Vert _{L^1(\omega _c)}\) for all \(v\in L^1(\omega _c)\).

3 The State Equation

We first consider the state equation for a fixed coefficient \(u\in {\hat{U}}\) (i.e., defined and uniformly bounded on the full domain \(\Omega \) and satisfying \({\underline{u}}\le u \le \overline{u}\) almost everywhere). For given \(u\in {\hat{U}}\), \(f\in L^2(I,H)\), \(y_0\in V\), and \(y_1\in H\), we call \(y=y(u)\) a (weak) solution to (2) if \(y\in W:= L^2(I,V) \cap W^{1,2}(I,H)\) and

$$\begin{aligned} \left\{ \begin{aligned} \int _0^T - (\partial _t y, \partial _t v)_H + (u \nabla y(t), \nabla v(t))_{{\mathbb {L}}^2(\Omega )} dt&= \int _0^T (f(t), v(t))_H dt + (y_1,v(0))_H,\\ y(0)&= y_0, \end{aligned} \right. \end{aligned}$$
(9)

for all \(v\in W\) with \(v(T) = 0\). We then have the following existence and natural regularity result.

Lemma 3.1

For every \(u \in {\hat{U}}\) and \((f, y_0, y_1) \in L^2(I,H) \times V \times H\), there exists a unique (weak) solution \(y = y(u) \in Z := C({\overline{V}}) \cap C^1({\overline{H}})\) to (2) satisfying

$$\begin{aligned} \Vert y \Vert _{C({\overline{V}})} + \Vert \partial _{t} y \Vert _{C({\overline{H}})} + \Vert \partial _{tt} y \Vert _{L^2({I}, V^*)} \le C_1(\Vert f \Vert _{L^2(I,H)} + \Vert y_0 \Vert _V + \Vert y_1 \Vert _H) \end{aligned}$$
(10)

for a constant \(C_1\) independent of \((f,y_0, y_1) \in L^2(I,H) \times V \times H\) and \(u \in {\hat{U}}\).

Proof

Except for the estimate on \(\Vert \partial _{tt} y \Vert _{L^2({I}, V^*)}\), the claim follows from [37, Theorem 3.8.2, page 275], where we observe that due to our assumption on \(u\in {\hat{U}}\), the energy is coercive with respect to the seminorm in V; see also [23, Theorem 2.4.5]. The constant \(C_1\) depends on \({\underline{u}}\) and \({\overline{u}}\), but is otherwise independent of \(u\in {\hat{U}}\).

To verify the missing estimate, we use from (10) that

$$\begin{aligned} \Vert y \Vert _{L^2({I}, V)}\le C_1(\Vert f \Vert _{L^2(I,H)} + \Vert y_0 \Vert _V + \Vert y_1 \Vert _H). \end{aligned}$$

Since \(u \in {\hat{U}}\), we deduce that

$$\begin{aligned} \Vert {\mathrm {div}}(u\nabla y) \Vert _{L^2({I}, V^*)}\le C_1(\Vert f \Vert _{L^2(I,H)} + \Vert y_0 \Vert _V + \Vert y_1 \Vert _H). \end{aligned}$$

We further deduce from the state equation that

$$\begin{aligned} \Vert \partial _{tt} y \Vert _{L^2(I,V^*)} \le {\tilde{C}}_1(\Vert f \Vert _{L^2(I,H)} + \Vert y_0 \Vert _V + \Vert y_1 \Vert _H), \end{aligned}$$

with \({\tilde{C}}_1 = C_1+1\). \(\square \)

By the change of variables \(t\mapsto T-t\), we can also apply Lemma 3.1 to the dual problem

$$\begin{aligned} \left\{ \begin{aligned} \int _0^T - (\partial _t \varphi , \partial _t v)_H + (u \nabla \varphi (t), \nabla v(t))_{{\mathbb {L}}^2(\Omega )} dt&= \int _0^T (g(t), v(t))_H dt + (\varphi _1(T),v(T))_H,\\ \varphi (T)&= \varphi _0 \text { in } V, \end{aligned} \right. \end{aligned}$$
(11)

for any \(g\in L^2(I,H)\), \(\varphi _0\in V\), \(\varphi _1 \in H\), and any \(v\in W\) with \(v(0)=0\).

Corollary 3.2

For every \(u\in {\hat{U}}\) and \(g\in L^2(I,H)\), \(\varphi _0\in V\), and \(\varphi _1 \in H\), there exists a unique solution \(\varphi \in Z\) to (11) satisfying

$$\begin{aligned} \Vert \varphi \Vert _{C({\overline{I}},V)} + \Vert \partial _t \varphi \Vert _{C({\overline{I}},H)}+\Vert \partial _{tt} \varphi \Vert _{L^2({I}, V^*)} \le C_1 (\Vert g \Vert _{L^2(I,H)} + \Vert \varphi _0 \Vert _V + \Vert \varphi _1 \Vert _{H}). \end{aligned}$$

Using this result, we can apply an Aubin–Nitsche trick or duality argument to show Lipschitz continuity of \(u\mapsto y(u) \in L^2(I,H)\), which we will need to show differentiability of the tracking term later.

Lemma 3.3

There exists a constant \(L>0\) such that the mapping \(u\mapsto y(u)\) satisfies

$$\begin{aligned} \Vert y(u_1)-y(u_2) \Vert _{L^2(I,H)}\le L \Vert u_1-u_2 \Vert _{L^\infty (\Omega )} \quad \text {for all }u_1,u_2\in {\hat{U}}. \end{aligned}$$

Proof

Let \(u_1,u_2\in {\hat{U}}\) be arbitrary and set \(\delta u := u_1-u_2\) and \(\delta y:= y(u_1)-y(u_2)\). Subtracting the weak equations for \(y(u_1)\) and \(y(u_2)\), we have that \(\delta y \in Z\) satisfies \(\delta y(0) = 0\) and

$$\begin{aligned} \int _0^T - (\partial _t \delta y, \partial _t v)_H + (u_1\nabla \delta y,\nabla \delta v)_{{\mathbb {L}}^2(\Omega )} dt = -\int _0^T (\delta u\nabla y(u_2), \nabla v)_{{\mathbb {L}}^2(\Omega )} dt\nonumber \\ \end{aligned}$$
(12)

for all \(v\in W\) with \(v(T) = 0\).

Let now \(g\in L^2(I,H)\) be arbitrary and consider the corresponding solution \(\varphi _g\in W\) of (11) for \(u=u_1\), \(g_0=0\), and \(g_1=0\). Noting that \(v=\varphi _g\) is a valid test function for (12) and \(w=\delta y\) is a valid test function for (11), we obtain that

$$\begin{aligned} \begin{aligned} (\delta y, g)_{L^2(I,H)}&= -(\partial _t \delta y,\partial _t \varphi _g)_{L^2(I,H)} + (u_1\nabla \delta y,\nabla \delta \varphi _g)_{L^2(I,{\mathbb {L}}^2(\Omega ))} \\&= (\delta u \nabla y(u_2),\nabla \varphi _g)_{L^2(I,{\mathbb {L}}^2(\Omega ))}. \end{aligned} \end{aligned}$$

Using that \(\delta u\in L^\infty (\Omega )\) together with Lemma 3.1 and Corollary 3.2, this implies that

$$\begin{aligned} \begin{aligned} (\delta y, g)_{L^2(I,H)}&\le \Vert \delta u \Vert _{L^\infty (\Omega )} \Vert y(u_2) \Vert _{L^2(I,V)}\Vert \varphi _g \Vert _{L^2(I,V)}\\&\le \Vert \delta u \Vert _{L^\infty (\Omega )} C_1(\Vert f \Vert _{L^2(I,H)}+\Vert y_0 \Vert _V+\Vert y_1 \Vert _H)\Vert g \Vert _{L^2(I,H)} \end{aligned} \end{aligned}$$

for all \(g\in L^2(I,H)\). Since \(L^2(I,H)\) is a Hilbert spaces, taking the supremum over all \(g\in L^2(I,H)\) yields the claim. \(\square \)

In stronger norms, we only have the following weak continuity result, which will be used repeatedly.

Lemma 3.4

Let \(\{u_n\}_{n\in {\mathbb {N}}}\subset {\hat{U}}\) be a sequence with \(u_n\rightarrow u\) in \(L^r(\Omega )\) for some \(r\in [1,\infty )\). Then \(u\in {\hat{U}}\) and \(y(u_n)\rightharpoonup y(u)\) in \(L^2(I,V)\cap W^{1,2}(I,H)\cap W^{2,2}(I,V^*)\). Furthermore, \(y(u_n)\rightharpoonup y(u)\) in H pointwise for all \(t\in [0,T]\).

Proof

The first assertion follows from the fact that \({\hat{U}}\) is closed in \(L^r(\Omega )\). From \(u_n\in {\hat{U}}\) and Lemma 3.1, the corresponding sequence \(\{y(u_n)\}_{n\in {\mathbb {N}}}\) of solutions to (9) is well-defined and bounded in \(L^2(I,V)\cap W^{1,2}(I,H)\cap W^{2,2}(I,V^*)\). By passing to successive subsequences (which we do not distinguish), we thus obtain that

$$\begin{aligned} y_n \rightharpoonup {\bar{y}} \quad \text { weakly in }\quad L^2(I,V) \cap W^{1,2}(I,H)\cap W^{2,2}(I,V^*). \end{aligned}$$

From (9), we in particular have that

$$\begin{aligned}&\int _{0}^{T} \, -(\partial _{t} y_n(t),\partial _{t} \varphi (t))_H + (u_n \nabla y_n(t), \nabla \varphi (t))_{{\mathbb {L}}^2(\Omega )}\,\mathrm {d}t\\&\quad =\int _0^T(f(t), \varphi (t))_H \,\mathrm {d}t+ (y_1,\varphi (0))_H \end{aligned}$$

for arbitrary \(\varphi \in W \cap L^2(I,H^3(\Omega ))\) with \(\varphi (T) =0\). Since \(u_n\rightarrow u\) strongly in \(L^r(\Omega )\) and \(u_n,u\in {\hat{U}}\subset L^\infty (\Omega )\), we have for \(r\in [1,2]\)

$$\begin{aligned} \Vert u_n-u\Vert _{H}\le \Vert u_n-u\Vert _{L^\infty (\Omega )}^{(2-r)/2}\Vert u_n-u\Vert _{L^r(\Omega )}^{r/2}\le C\Vert u_n-u\Vert _{L^r(\Omega )}^{r/2} \end{aligned}$$

and thus for all \(r\in [1,\infty )\)

$$\begin{aligned}&\left| \int _0^T(u_n\nabla y_n-u\nabla {\bar{y}}, \nabla \varphi )_{{\mathbb {L}}^2(\Omega )}\,\mathrm {d}t\right| \\&\quad \le \left| \int _0^T((u_n-u)\nabla \varphi ,\nabla y_n)_{{\mathbb {L}}^2(\Omega )}\,\mathrm {d}t\right| +\left| \int _0^T(\nabla (y_n-{{\bar{y}}}),u\nabla \varphi )_{{\mathbb {L}}^2(\Omega )}\,\mathrm {d}t\right| \\&\quad \le C\Vert y_n\Vert _{L^2(I,V)}\Vert u_n-u\Vert _H\Vert \nabla \varphi \Vert _{L^2(I,C({\overline{\Omega }}))} +\left| \int _0^T(\nabla (y_n-{{\bar{y}}}),u\nabla \varphi )_{{\mathbb {L}}^2(\Omega )}\,\mathrm {d}t\right| \rightarrow 0. \end{aligned}$$

Then we can pass to the limit in the weak formulation to obtain

$$\begin{aligned} \int _{0}^{T}- \, (\partial _{t} {\bar{y}}(t), \partial _{t} \varphi (t))_H + (u \nabla {\bar{y}}(t), \nabla \varphi (t))_{H} - (f(t), \varphi (t))_H \,\mathrm {d}t- (y_1,\varphi (0))_H = 0 \end{aligned}$$

for all such \(\varphi \). Since \(u\in {\hat{U}}\) and since the set of functions with \(\varphi \in W \cap L^2(I,H^3(\Omega ))\) and \(\varphi (T) =0\) is dense in \(\{\varphi \in W: \varphi (0)=0\}\), the last equation also holds for all \(v\in W\) with \(v(T) =0\). The density can be shown by adapting the density argument of \(C^\infty ({\overline{\Omega }})\) in V; see, e.g., [38, Cor. 9.8].

It remains to show that the limit y satisfies the initial condition \(y(0)=y_0\). First, for each \(v\in H\) we have \((y_n(\cdot ),v)_H \rightharpoonup ({{\bar{y}}}(\cdot ),v)_H\) in \(W^{1,2}(I)\). Hence

$$\begin{aligned} (y_n(\cdot ),v)_H \rightarrow ({{\bar{y}}}(\cdot ),v)_H \quad \text { in } C({\overline{I}}). \end{aligned}$$

due to the compact embedding of \(W^{1,2}(I)\) to \(C({\overline{I}})\). In particular this implies that \((y_n(0),v)_H=(y_0,v)_H =({{\bar{y}}}(0),v)_H\) for all \(v\in H\). Since \(v\in H\) was arbitrary, this implies that \({{\bar{y}}}(0) =y_0\). This implies that \(y=y(u)\), and since the solution of (9) is unique, a subsequence–subsequence argument shows that the full sequence converges weakly to y(u). By a similar argument, \(y_n(t) \rightharpoonup {{\bar{y}}}(t)\) in H for all \(t\in {\overline{I}}\). \(\square \)

Stronger continuity of \(u\mapsto y(u)\) can be shown with respect to the \(L^\infty \) topology for the controls.

Lemma 3.5

Assume that \(f\in L^2(I,V)\) and let \(\{u_n\}_{n\in {\mathbb {N}}}\subset {\hat{U}}\) be a sequence with \(u_n\rightarrow u\) in \(L^\infty (\Omega )\). Then \(y(u_n)\rightarrow y(u)\) in \(L^2(I,V)\cap W^{1,2}(I,H)\).

Proof

First, the embedding \(L^\infty (\Omega )\subset L^r(\Omega )\), \(r\in [1,\infty )\), for bounded \(\Omega \) together with Lemma 3.4 shows that \(y_n:= y(u_n)\rightharpoonup y(u)\) in \(L^2(I,V)\cap W^{1,2}(I,H)\cap W^{2,2}(I,V^*)\).

We now introduce for \(u\in {\hat{U}}\) and \(y:=y(u)\) the energy

$$\begin{aligned} {\mathcal {E}}_u(t) := \frac{1}{2} \left( u\nabla y(t),\nabla y(t) \right) _{{\mathbb {L}}^2(\Omega )} + \frac{1}{2}\Vert \partial _t y(t) \Vert _{H}^2 + \frac{1}{2}\Vert y(t) \Vert _H^2 \quad \text {for a.e.}~ t\in I. \end{aligned}$$

By the Lions–Magenes Lemma ([37, Lem. 8.3], cf. also [23, (2.24), p. 24]), we have that

$$\begin{aligned} \frac{d}{dt} {\mathcal {E}}_u(t) = \left( f(t),\partial _t y(t) \right) _{H} + \left( y(t),\partial _t y(t) \right) _H\quad \text {in }L^1(I), \end{aligned}$$

and thus by the fundamental theorem of calculus we find that

$$\begin{aligned} {\mathcal {E}}_u(t) = {\mathcal {E}}_u(0) + \int _0^t \left( f(s),\partial _t y(s) \right) _{H} + \left( y(s),\partial _t y(s) \right) _H\,\mathrm {d}s. \end{aligned}$$
(13)

We now define for \(v\in V\)

$$\begin{aligned} \Vert v \Vert _{V_u}^2:=\left( u\nabla v ,\nabla v \right) _{{\mathbb {L}}^2(\Omega )} + \Vert v \Vert _H^2, \end{aligned}$$

which is an equivalent norm on V for any \(u\in {\hat{U}}\). Subtracting (13) for \({\mathcal {E}}_u\) and \({\mathcal {E}}_{u_n}\) and adding the productive zero then yields for almost every \(t\in I\) that

$$\begin{aligned}&\frac{1}{2} \Vert y(t) \Vert _{V_u}^2 + \frac{1}{2}\Vert \partial _t y(t) \Vert _H^2 -\left( \frac{1}{2} \Vert y_n(t) \Vert _{V_u}^2 + \frac{1}{2}\Vert \partial _t y_n(t) \Vert _H^2\right) \nonumber \\&\quad = \frac{1}{2}\left( (u_n-u)\nabla y_n(t),\nabla y_n(t) \right) _{{\mathbb {L}}^2(\Omega )} + \frac{1}{2}\left( (u-u_n)\nabla y_0,\nabla y_0 \right) _{{\mathbb {L}}^2(\Omega )} \nonumber \\&\qquad + \int _0^t \!\!\left( f(s),\partial _t y(s)-\partial _t y_n(s) \right) _H + \left( y(s)-y_n(s),\partial _t y(s) \right) _H \nonumber \\&\qquad + \left( y_n(s),\partial _t y(s)-\partial _t y_n(s) \right) _H\!\! \,\mathrm {d}s, \end{aligned}$$
(14)

and hence that

$$\begin{aligned}&\left| \frac{1}{2} \Vert y(t) \Vert _{V_u}^2 + \frac{1}{2}\Vert \partial _t y(t) \Vert _H^2 -\left( \frac{1}{2} \Vert y_n(t) \Vert _{V_u}^2 + \frac{1}{2}\Vert \partial _t y_n(t) \Vert _H^2\right) \right| \\&\quad \le \Vert u_n-u \Vert _{L^\infty (\Omega )} \Vert y_n \Vert _{C({\overline{I}},V)}^2\\&\qquad +(\Vert f\Vert _{L^2(I,V)}+\Vert y_n\Vert _{L^2(I,V)})\Vert \partial _t y-\partial _ty_n\Vert _{L^2(I,V^*)}+\Vert \partial _t y\Vert _{L^2(I,H)}\Vert y-y_n\Vert _{L^2(I,H)}. \end{aligned}$$

We know from Lemma 3.4 that \(y_n\rightharpoonup y\) in \(L^2(I,V)\cap W^{1,2}(I,H)\cap W^{2,2}(I,V^*)\). Thus the Aubin–Lions Lemma and the compactness of the embeddings \(V\hookrightarrow H\) and \(H\hookrightarrow V^*\) imply that \(y_n \rightarrow y\) in \(L^2(I,H)\cap W^{1,2}(I,V^*)\). Thus we have

$$\begin{aligned} \int _0^T\Vert y_n(t) \Vert _{V_u}^2+\Vert \partial _t y_n(t) \Vert _{H}^2\,\mathrm {d}t\rightarrow \int _0^T\Vert y(t) \Vert _{V_u}^2+\Vert \partial _t y(t) \Vert _{H}^2\,\mathrm {d}t. \end{aligned}$$
(15)

Since \(\Vert \cdot \Vert _{V_u}\) is an equivalent norm on V, we have that the normed vector space \(V_u := (V,\Vert \cdot \Vert _{V_u})\) is equivalent to V and hence that \(L^2(I,V_u)\cap W^{1,2}(I,H)\simeq L^2(I,V)\cap W^{1,2}(I,H)\). This implies that \(y_n\rightharpoonup y\) also in \(L^2(I,V_u)\cap W^{1,2}(I,H)\), and together with (15) the Radon–Riesz property of Hilbert spaces implies that \(y_n\rightarrow y\) strongly in \(L^2(I,V_u)\cap W^{1,2}(I,H)\). Appealing again to the equivalence of V and \(V_u\) then yields the claim. \(\square \)

Under additional assumptions, we can show an improved regularity result.

Assumption 3.1

The data satisfy \(f\in L^2(I;V)\) and \((y_0,y_1)\in H^2(\Omega )\times H^1(\Omega )\) with \(\partial _\nu y_0 =0\). Furthermore,

  1. (i)

    \({\hat{u}}\) is constant on \(\Omega \setminus \omega _c\) and

  2. (ii)

    \(y_0\) is constant on \(\omega _c\) and \(\overline{\omega _c} \subset \Omega \).

Proposition 3.6

Let \(u\in {\hat{U}}\) and Assumption 3.1 hold. Then there exists \(q > 2\) and a constant \({\hat{C}}\) independent of u such that \(y(u) \in {L^\infty (I,W^{1,q}(\Omega ))}\) and for all \(u \in {\hat{U}}\),

$$\begin{aligned} \Vert y(u) \Vert _{L^\infty (I,W^{1,q}(\Omega ))} \le {\hat{C}}\left( \Vert y_1 \Vert _{V} + \Vert y_0 \Vert _{H^2(\Omega )}+ \Vert f \Vert ^2_{L^2(I,V)}\right) . \end{aligned}$$
(16)

Proof

We proceed in two steps.

Step 1 First we assume that additionally

$$\begin{aligned} f \in W^{2,2}(I,V),\qquad y_0 \in H^3(\Omega ) \quad \text {with}\quad \partial _\nu y_0 = 0, \qquad \text {and }\quad y_1 \in H^2(\Omega ). \end{aligned}$$
(17)

Let \(u\in {\hat{U}}\) and approximate u by \(\{u_n\} \subset C^\infty ({\overline{\Omega }}) \cap {\hat{U}}\) with \(u_n\rightarrow u\) in \(L^r(\Omega )\) for some \(r\in [1,\infty )\) with \(u_n={\hat{u}}\) in \(\Omega \setminus \omega _c\). Such a sequence can be found by first approximating u by \({\tilde{u}}_n \in C^\infty ({\mathbb {R}}^d)\) and \({\tilde{u}}_n = {\hat{u}}\) in \(\Omega \setminus \omega _c\); this sequence in turn is constructed by first introducing an intermediate approximation of functions \({\hat{u}}_n\) with the property that \(\lim _{n\rightarrow \infty } {\hat{u}}_n =u\) in \(L^r(\Omega )\) and \({\hat{u}}_n= {\hat{u}}\) in \(\Omega \setminus {\hat{\omega }}_{c,n}\), where the closure of \({\hat{\omega }}_{c,n}\) is contained in \(\omega _c\) and \(0<\mathrm {dist}(\partial \omega _c, {\hat{\omega }}_{c,n}) \le n^{-1}\). Then we use convolution by mollifiers of the functions \({\hat{u}}_n\) to obtain functions \({\tilde{u}}_n\in C^\infty ({\mathbb {R}}^d)\) that satisfy \(\lim _{n\rightarrow \infty } {\tilde{u}}_n =u\) in \(L^r(\Omega )\), see e.g. [39, page 132], and \({\tilde{u}}_n={\hat{u}}\) in \(\Omega \setminus \omega _c\). Next we choose functions \({\underline{\varphi }}^n\) and \({\overline{\varphi }} _n\) in \(C^\infty ({\mathbb {R}})\) with a Lipschitz constant L and such that \({\underline{u}}\le {\underline{\varphi }}^n \), \({\overline{\varphi }}_n\le {\overline{u}}\), and \({\underline{\varphi }}^n(s)\rightarrow \max ({\underline{u}},s)\), \({\overline{\varphi }}_n(s) \rightarrow \min ({\overline{u}},s)\) for all \(s\in {\mathbb {R}}\), see [39, page 125]. We then set \(u_n= {\overline{\varphi }}_n({\underline{\varphi }}^n ({\tilde{u}}_n))\), and estimate

$$\begin{aligned} \begin{aligned} \Vert u-u_n\Vert _{L^r(\Omega )}&\le \Vert u-{\overline{\varphi }}_n({\underline{\varphi }}^n (u))\Vert _{L^r(\Omega )} + \Vert {\overline{\varphi }}_n({\underline{\varphi }}^n (u)) - {\overline{\varphi }}_n({\underline{\varphi }}^n (u_n)) \Vert _{L^r(\Omega )}\\&\le \Vert u-{\overline{\varphi }}_n({\underline{\varphi }}^n(u))\Vert _{L^r(\Omega )} + L^2 \Vert u-{\tilde{u}}_n\Vert _{L^r(\Omega )} \rightarrow 0, \end{aligned} \end{aligned}$$

using Lebesgue’s bounded convergence theorem.

We replace u in (2) by \(u_n\). Due to the regularity assumptions (17) and the assumption that \(y_0\) is constant on \(\omega _c\), we have

$$\begin{aligned} y_0 \in H^3(\Omega ), \quad y_1 \in H^2(\Omega ), \quad f(0) + {\mathrm {div}}(u_n \nabla y_0) \in V, \quad f'(0) + {\mathrm {div}}(u_n \nabla y_1) \in H. \end{aligned}$$

Together with \({\partial _\nu y_0}= 0\) and \(f \in W^{2,2}(I,V) \subset W^{2,2}(I,H)\), these properties allow applying Theorem 30.3 (with \(k=3\)) and Theorem 30.4 in [40], which guarantee that \(y_n = y(u_n) \in W^{1,2}(I,H^2(\Omega ))\cap W^{2,2}(I,V)\) and \({\partial _\nu y_n}(t) = 0\) on \(\partial \Omega \) for \(t \in \overline{I}\). Then we multiply (2) with \(-{\partial _t} {\mathrm {div}}(u_n\nabla y_n(t))\) and integrate over \(\Omega \). Integrating by parts on the right-hand side and using that \(\partial _\nu \partial _ty_n=0\) on \(\partial \Omega \), we obtain

$$\begin{aligned} (u_n \nabla \partial _{tt} y_n(t), \nabla \partial _t y_n(t))_{{\mathbb {L}}^2(\Omega )} +\frac{1}{2} \partial _t \Vert {\mathrm {div}}(u_n\nabla y_n(t)) \Vert ^2_H = (\nabla f(t), u_n \nabla \partial _t y_n(t))_{{\mathbb {L}}^2(\Omega )} \end{aligned}$$

and thus

$$\begin{aligned}&{\partial _t} \Vert \sqrt{u_n} \nabla \partial _t y_n(t) \Vert ^2_{{\mathbb {L}}^2(\Omega )} + \partial _t \Vert {\mathrm {div}}(u_n \nabla y_n(t)) \Vert ^2_H \\&\quad \le \Vert \sqrt{u_n}\nabla f(t) \Vert ^2_{{\mathbb {L}}^2(\Omega )} + \Vert \sqrt{u_n} \nabla \partial _t y_n(t) \Vert ^2_{{\mathbb {L}}^2(\Omega )}. \end{aligned}$$

Integrating this expression on (0, t), we find for \(t \in (0,T]\) that

$$\begin{aligned}&\Vert \sqrt{u_n} \nabla \partial _t y_n(t) \Vert ^2_{{\mathbb {L}}^2(\Omega )} + \Vert {\mathrm {div}}(u_n \nabla y_n(t)) \Vert ^2_H \\&\quad \le \Vert \sqrt{u_n} \nabla y_1 \Vert ^2_{{\mathbb {L}}^2(\Omega )} + \Vert {\mathrm {div}}(u_n \nabla y_0) \Vert ^2_H \\&\qquad + \int _{0}^{t} \Vert \sqrt{u_n}\nabla f(s) \Vert ^2_{{\mathbb {L}}^2(\Omega )} \,\mathrm {d}s+ \int _{0}^{t} \Vert \sqrt{u_n} \nabla \partial _t y_n(s) \Vert ^2_{{\mathbb {L}}^2(\Omega )} \,\mathrm {d}s. \end{aligned}$$

Gronwall’s inequality then implies that for each \(t \in (0,T]\),

$$\begin{aligned} \Vert {\mathrm {div}}(u_n \nabla y_n(t)) \Vert ^2_H \le \left( \Vert \sqrt{u_n} \nabla y_1 \Vert ^2_{{\mathbb {L}}^2(\Omega )}+ \Vert {\mathrm {div}}(u_n \nabla y_0) \Vert ^2_H + \overline{u}\Vert f \Vert ^2_{L^2(I,V)}\right) e^{T}. \end{aligned}$$

Since \(y_1 \in H^2(\Omega )\), it follows that \(\{\sqrt{u_n} \nabla y_1\}_{n\in {\mathbb {N}}}\) is bounded in \({\mathbb {L}}^2(\Omega )\). Moreover, we have \({\mathrm {div}}(u_n \nabla y_0)|_{\Omega \setminus \omega _c} = {\mathrm {div}}({\hat{u}} \nabla y_0)|_{\Omega \setminus \omega _c} = {\hat{u}} \Delta y_0|_{\Omega \setminus \omega _c}\) and \({\mathrm {div}}(u_n \nabla y_0)|_{\omega _c} =0\). Hence \(\{{\mathrm {div}}(u_n \nabla y_0)\}_{n\in {\mathbb {N}}}\) is bounded in H. We can thus conclude that \(\{{\mathrm {div}}(u_n \nabla y_n)\}_{n\in {\mathbb {N}}}\) is bounded in \(L^\infty (I, H)\).

Our next aim is to obtain \(L^\infty (I,W^{1,q}(\Omega ))\) regularity and boundedness for \(y_n\) for some \(q > 2\). For this purpose, we define for some \(\lambda > 0\)

$$\begin{aligned} g_n := -{\mathrm {div}}(u_n \nabla y_n) + \lambda y_n \end{aligned}$$
(18)

and note that \(\{g_n\}_{n\in {\mathbb {N}}}\) is bounded in \(L^\infty (I,H)\). Furthermore, Sobolev’s embedding theorem implies that \(W^{1,s'}(\Omega ) \hookrightarrow L^2(\Omega )\) for every \(s' \ge 1\) in case \(d = 2\), and for every \(s' \ge \frac{6}{5}\) in case \(d = 3\). Following the notation of [14], we denote by \(W^{-1,s}(\Omega )\) the dual space of \(W^{1,s'}(\Omega )\) with s the conjugate of \(s'\). Then we have \(H \hookrightarrow W^{-1,s}(\Omega )\), where \(s \in [1, \infty )\) for \(d = 2\) and \(s \in [1,6]\) for \(d = 3\). It follows that \(\{g_n\}_{n\in {\mathbb {N}}}\) is bounded in \(L^\infty (I,W^{-1,s}(\Omega ))\). Considering now (18) (together with homogeneous Neumann boundary conditions) for a.e. \(t\in I\) as an equation for \(y_n(t)\), this implies that there exists some \(q > 2\) such that

$$\begin{aligned} \Vert y_n(t)\Vert _{W^{1,q}(\Omega )}\le C\Vert g_n\Vert _H\le {\hat{C}}\left( \Vert y_1 \Vert _{V} + \Vert y_0 \Vert _{H^2(\Omega )} + \Vert f \Vert ^2_{L^2(I,V)}\right) , \end{aligned}$$
(19)

where the constant C depends only on \({\underline{u}}\), \({\overline{u}}\) and q, but not on t, see [14, Thm. 1]. Hence \(\{y_n\}_{n\in {\mathbb {N}}}\) is bounded in \(L^\infty (I,W^{1,q}(\Omega ))\). Since \(L^1(I)\) and \(W^{1,q}(\Omega )\) are separable with the latter being reflexive, \(L^\infty (I,W^{1,q}(\Omega ))\) is the dual of a separable space; see, e.g., [41, Thm. 8.18.3]. Hence there exists a subsequence with \(y_n \rightharpoonup ^* {\bar{y}} \in L^\infty (I,W^{1,q}(\Omega ))\).

Finally, from Lemma 3.4 we also have that \(y_n\rightharpoonup y(u)\) in \(L^2(I,V)\cap W^{1,2}(I,H)\) and hence, by uniqueness of y(u), that \(y(u)={{\bar{y}}} \in L^\infty (I,W^{1,q}(\Omega ))\). Using \(\hbox {weak}^*\) semi-continuity of norms (cf., e.g., [38, p. 63]), we can now pass to the limit in (19) to obtain (16), for those \((y_0,y_1,f)\) which satisfy the additional regularity assumption (17).

Step 2 We relax the requirements on the problem data and choose an arbitrary \((y_0,y_1,f)\in X:= H^2(\Omega )\times H^1(\Omega )\times L^2(I;V)\), with \(\partial _\nu y_0 =0\). Then there exists \((y^n_0,y^n_1,f^n)\in H^3(\Omega )\times H^2(\Omega )\times W^{2,2}(I;V)\), with \(\partial _\nu y^n_0 =0\) such that \(\lim _{n\rightarrow \infty }(y^n_0,y^n_1,f^n)= (y_0,y_1,f) \) in X. As this is standard for the second and third component, we only address the first one. Let \(\omega _c^c := \Omega \setminus \overline{\omega _c}\). By assumption, \(\partial \Omega \cap \partial \omega _c= \emptyset \); in addition, \(\Omega \) and \(\omega _c\) are \(C^{2,1}\) domains, and thus \(\omega _c^c\) is a \(C^{2,1}\) domain as well. It is in this step that the \(C^{2,1}\) regularity of the domains is used. Since \(y_0 \in H^2(\Omega )\), we have \(y_0|_{\partial \Omega } \in H^{3/2}(\partial \Omega )\). Moreover, \(y_0|_{\partial \omega _c}=:y_c\) is constant and \(\partial _\nu y_0|_{\partial \omega ^c_c} = \partial _\nu y_0|_{\partial \omega _c \cup \partial \Omega } =0\). Let \({\tilde{v}}_n \in H^{5/2}(\partial \Omega )\) be such that \({\tilde{v}}_n \rightarrow y_0|_{\partial \Omega }\) in \(H^{3/2}(\partial \Omega )\). Accordingly let \(v_n \in H^3(\omega ^c_c)\) with \(v_n|_{\partial \omega _c}=y_c\), \(\partial _\nu v_n |_{\partial \omega ^c_c}=0\), \(\partial _{x_i x_j}v_n|_{\partial \omega _c}=0\), \(v_n|_{\partial \Omega } = {\tilde{v}}_n\), and \(v_n\rightarrow v\) in \(H^2(\omega ^c_c)\), where \(v\in H^2(\omega ^c_c)\) satisfies \(v|_{\partial \omega _c}=y_c\), \(\partial _\nu v|_{\partial \omega ^c_c}=0\), and \(v|_{\partial \Omega } = y_0|_{\partial \Omega }\). Denoting by \(v_{n,\mathrm {ext}}\) and \(v_{\mathrm {ext}}\) the extensions of \(v_n\) and v by the constant \(y_c\) on \(\omega _c\), we have \(v_{n,\mathrm {ext}} \rightarrow v_{\mathrm {ext}} \in H^2(\Omega )\), \(\partial _\nu v_{n,\mathrm {ext}}|_{\partial \Omega }=0\), \(v_{n,\mathrm {ext}}|_{\omega _c}= v_{\mathrm {ext}}|_{\omega _c}=y_c\), and \(v_{n,\mathrm {ext}} \in H^3(\Omega ) \), where we use that \(\partial _{x_i x_j}v_n|_{\partial \omega _c}=0\). Next, observe that \(y_0-v \in H^2_0(\omega ^c_c)\). This implies the existence of functions \(w_n\in C^\infty (\omega _c^c)\) with compact support in \(\omega ^c_c\) such that \(w_n \rightarrow y_0-v\) in \(H^2(\omega _c^c)\); see, e.g., [42, pp. 17 and 31]. Denoting the extension by zero to \(\omega _c\) of \(w_n\) by \(w_{n,\mathrm {ext}}\), we have \(w_{n,\mathrm {ext}}\in H^3(\Omega ), w_{n,\mathrm {ext}} \rightarrow y_0-v_{\mathrm {ext}}\) in \(H^2(\Omega )\), and \(\partial _\nu w_{n,\mathrm {ext}}=0|_{\partial \Omega }\). Finally, the sequence \(y^n_0=v_{n,\mathrm {ext}}+w_{n,\mathrm {ext}}\) defines the desired approximation of \(y_0\) such that \(y^n_0|_{\omega _c}=y_c\), \(\partial _\nu y_n|_{\partial \Omega }=0\), and \(y^n_0 \rightarrow y_0\) in \(H^2(\Omega )\).

We next define \(y^n\) as the solution to (2) with data \((y_0^n,y_1^n,f^n)\). From Lemma 3.1 and (16) for the smooth data \((y_0^n,y_1^n,f^n)\) we deduce that \(y^n \rightharpoonup y\) in \(L^2(I,V)\cap W^{1,2}(I,H)\) and \(y^n \rightharpoonup ^* y\) in \(L^\infty (I,W^{1,q}(\Omega ))\), where y is the solution of (2) with data \((y_0,y_1,f)\), and

$$\begin{aligned} \Vert y^n \Vert _{L^\infty (I,W^{1,q}(\Omega ))} \le {\hat{C}}\left( \Vert y^n_1 \Vert _{V} + \Vert y^n_0 \Vert _{H^2(\Omega )}+ \Vert f^n \Vert ^2_{L^2(I,V)}\right) \end{aligned}$$

for all n. Passing to the limit as \(n\rightarrow \infty \), we obtain (16) for \((y_0,y_1,f)\in X\) with \(\partial _\nu y_0=0\). \(\square \)

Remark 3.7

If Assumption 3.1 holds, the requirement in Lemma 3.5 on the convergence of \(u_n\) can be relaxed to \(u_n\rightarrow u\) in \(L^{\frac{q}{q-2}}(\Omega )\), where q is the exponent from Proposition 3.6. In this case, the first term on the right-hand side in (14) can be estimated by Hölder’s inequality as

$$\begin{aligned} \int _{\Omega } |u_n-u| |\nabla y_n(t)|^2 \,\mathrm {d}x\le \Vert u_n-u \Vert _{L^{\frac{q}{q-2}}(\Omega )}\Vert y_n(t) \Vert _{L^\infty (I,W^{1,q} (\Omega ))}^2\rightarrow 0, \end{aligned}$$

where we used Proposition 3.6. Then again \(y(u_n) \rightarrow y(u)\) in W.

Remark 3.8

In Assumption 3.1 (ii), the requirement \(\overline{\omega _c} \subset \Omega \) was only used in Step 2 of the proof of Proposition 3.6. It it is not necessary if instead \(y_0\in H^3(\Omega )\) is assumed.

4 Existence and Optimality Conditions

Deriving useful optimality conditions requires replacing the pointwise control constraints with a differentiable approximation of a cutoff function. We thus introduce the superposition operator

$$\begin{aligned} \Phi _\varepsilon : L^1(\omega _c) \rightarrow U, \qquad [\Phi _\varepsilon (u)](x) := \varphi _\varepsilon (u(x)), \end{aligned}$$

where \(\varphi _\varepsilon :{\mathbb {R}}\rightarrow [u_{\min },u_{\max }]\) is such that \(\Phi _\varepsilon \) is Lipschitz continuous from \(L^r(\omega _c)\rightarrow L^r(\omega _c)\) for every \(r\in [1,\infty ]\) and \(\varepsilon \ge 0\) and Fréchet differentiable from \(L^\infty (\omega _c)\rightarrow L^\infty (\omega _c)\) for \(\varepsilon >0\) (and thus ensuring Fréchet differentiability of the tracking term; see Lemma 4.2 below). The construction of such a \(\varphi _\varepsilon \) and the characterization of the Fréchet derivative of \(\Phi _\varepsilon \) via pointwise a.e. multiplication can be carried out in the same way as in [10, § 2.3].

We then consider for \(\varepsilon \ge 0\) the reduced, unconstrained optimization problem

$$\begin{aligned} \min _{u\in BV(\omega _c)} J_\varepsilon (u) \end{aligned}$$
(20)

for

$$\begin{aligned} J_\varepsilon (u) := \frac{1}{2} \Vert By({\hat{E}}\Phi _\varepsilon (u))-y_d\Vert _{{\mathcal {O}}}^2+ \alpha G(u) + \beta TV(u) \end{aligned}$$

for some \(y_d \in {\mathcal {O}}\) and \(\alpha ,\beta >0\), where \(u \mapsto y(u)\) denotes the solution mapping of (9) introduced in the previous section and \({\hat{E}}\) is the affine extension operator from \(\omega _c\) to \(\Omega \) defined in (4). We point out that the role of \(\varepsilon \) is not that of a smoothing parameter for the optimization problem, which remains nonsmooth for \(\varepsilon >0\) due to the presence of G and \(\mathrm {TV}\); it merely influences the behavior of the cutoff function near the upper and lower values of the pointwise bounds for the coefficient.

Existence of optimal controls now follows analogously to [10, Prop. 3.1].

Proposition 4.1

For every \(\varepsilon \ge 0\), there exists a global minimizer \({{\bar{u}}}\in BV(\omega _c)\cap U\) of \(J_\varepsilon \).

Proof

Since \(J_\varepsilon \) is bounded from below, there exists a minimizing sequence \(\{u_n\}_{n\in {\mathbb {N}}}\subset BV(\omega _c)\). Furthermore, we may assume without loss of generality that there exists a \(C>0\) such that

$$\begin{aligned} C\left( \Vert u_n \Vert _{L^1(\omega _c)} + \mathrm {TV}(u_n)\right) \le J_\varepsilon (u_n)\le J_\varepsilon (0) \quad \text {for all }n\in {\mathbb {N}}, \end{aligned}$$

and hence that \(\{u_n\}_{n\in {\mathbb {N}}}\) is bounded in \(BV(\omega _c)\). By the compact embedding of \(BV(\omega _c)\) into \(L^1(\omega _c)\) for any \(d\in {\mathbb {N}}\), we can thus extract a subsequence, denoted by the same symbol, converging strongly in \(L^1(\omega _c)\) to some \({{\bar{u}}}\in L^1(\omega _c)\). Due to the continuity of \(\Phi _\varepsilon \) as well as of \({\hat{E}}\), we have \({\hat{E}}\Phi _\varepsilon (u_n)\rightarrow {\hat{E}}\Phi _\varepsilon ({{\bar{u}}})\in U\) in \(L^1(\omega _c)\).

Lower semi-continuity of G and \(\mathrm {TV}\) with respect to the strong convergence in \(L^1(\omega _c)\) and the weak convergence \(y({\hat{E}}\Phi _\varepsilon (u_n))\rightharpoonup y({\hat{E}}\Phi _\varepsilon ({{\bar{u}}}))\) in \(L^2(Q)\) from Lemma 3.4 yield that

$$\begin{aligned} J_\varepsilon ({{\bar{u}}}) \le \liminf _{n\rightarrow \infty } J_\varepsilon (u_n) \le J_\varepsilon (u) \qquad \text {for all }u\in BV(\omega _c) \end{aligned}$$

and thus that \({{\bar{u}}}\in BV(\omega _c)\) is the desired minimizer.

The fact that \({{\bar{u}}}\in U\) then follows by a contraposition argument based on Stampacchia’s Lemma for BV functions and the pointwise definition of G, see [10, Prop. 3.2]. \(\square \)

The convergence of minimizers of (20) as \(\varepsilon \rightarrow 0\) can be shown along the same lines as indicated at the end of [10, § 3].

We now derive first-order optimality conditions for the solution of (20). To this end, we first show Fréchet differentiability of the tracking term

$$\begin{aligned} F_\varepsilon :L^\infty (\omega _c)\rightarrow {\mathbb {R}},\qquad F_\varepsilon (u) = \frac{1}{2}\Vert By({\hat{E}}\Phi _\varepsilon (u))-y_d \Vert _{{\mathcal {O}}}^2 \end{aligned}$$
(21)

as in [10, Lem. 4.1] by using for given \(u\in L^\infty (\omega _c)\) and \(y\in W\) the definition of the adjoint equation

$$\begin{aligned} \left\{ \begin{aligned} \int _0^T - (\partial _t p, \partial _t \varphi )_H + (u \nabla p, \nabla \varphi )_{{\mathbb {L}}^2(\Omega )} \,\mathrm {d}t&= \int _0^T (B^*(By-y_d), \varphi )_H \,\mathrm {d}t,\\ p(T)&= 0 \text { in } V, \end{aligned} \right. \end{aligned}$$
(22)

for any \(\varphi \in W\) with \(\varphi (0)=0\), which admits a unique solution \(p\in W\) by Lemma 3.1. In the following, we use the regularity of solutions to identify the derivative in \(L^\infty (\omega _c)^*\) with its representation in \(L^1(\omega _c)\), considered as a subset of \(L^\infty (\omega _c)^*\). Since the extension operator \({\hat{E}}\) is affine, we also introduce the corresponding linear extension operator \({\hat{E}}_0 = {\hat{E}}':L^2(\omega _c)\rightarrow L^2(\Omega )\).

Lemma 4.2

For every \(\varepsilon >0\), the mapping \(F_\varepsilon \) defined in (21) is Fréchet differentiable in every \(u\in L^\infty (\omega _c)\), and the Fréchet derivative is given by

$$\begin{aligned} F_\varepsilon '(u) = {\hat{E}}_0^*\left( \int _0^T\nabla y\cdot \nabla p\,\mathrm {d}t\right) \,\Phi _\varepsilon '(u) \in L^1(\omega _c), \end{aligned}$$
(23)

where \(y=y({\hat{E}}\Phi _\varepsilon (u))\) is the solution of (9), p is the solution of (22), and \({\hat{E}}_0^*:L^2(\Omega )\rightarrow L^2(\omega _c)\) is the restriction operator.

If Assumption 3.1 holds, \(F_\varepsilon '(u) \in L^{\frac{2q}{2+q}}(\omega _c)\) for the \(q>2\) given in Proposition 3.6.

Proof

We first show directional differentiability. Let \(w, h\in L^\infty (\omega _c)\) and \(\rho >0\) be arbitrary. We define \({\tilde{y}}(w):=y({\hat{\Phi }}_\varepsilon (w))\). We now insert the productive zero \(B{\tilde{y}}(w)-B{\tilde{y}}(w)\) in \(F_\varepsilon (w+\rho h)\) and expand the square to obtain

$$\begin{aligned} F_\varepsilon (w+\rho h)-F_\varepsilon (w)= & {} \frac{1}{2}\Vert B({\tilde{y}}(w+\rho h)-{\tilde{y}}(w))+(B{\tilde{y}}(w)-y_d) \Vert _{{\mathcal {O}}}^2 \nonumber \\&- \frac{1}{2}\Vert B{\tilde{y}}(w)-y_d \Vert _{{\mathcal {O}}}^2\nonumber \\= & {} \frac{1}{2}\Vert B({\tilde{y}}(w+\rho h)-{\tilde{y}}(w)) \Vert _{{\mathcal {O}}}^2\nonumber \\&+ (B({\tilde{y}}(w+\rho h)-{\tilde{y}}(w)),B{\tilde{y}}(w)-y_d)_{{\mathcal {O}}}. \end{aligned}$$
(24)

For the first term, we can use Lemma 3.3, the boundedness of B and the Lipschitz continuity of \(\Phi _\varepsilon \) to estimate

$$\begin{aligned} \frac{1}{2}\Vert B({\tilde{y}}(w+\rho h)-{\tilde{y}}(w)) \Vert _{{\mathcal {O}}}^2 \le C\rho ^2\Vert h \Vert _{L^\infty (\omega _c)}^2. \end{aligned}$$
(25)

For the second term in (24), we introduce the adjoint state p(w) and use the fact that \(\delta y:={\tilde{y}}(w+\rho h)-{\tilde{y}}(w)\in W\) with \(\delta y(0)=0\). Testing (22) with \(\varphi = \delta y\), and using (9) for \(y={\tilde{y}}(w)\) and \(y={\tilde{y}}(w+\rho h)\), each time with \(v=p(w)\), and inserting a productive zero, we find

$$\begin{aligned}&\left( \delta y,B^*(B{\tilde{y}}(w)-y_d) \right) _{L^2(Q)}\\&\quad = -\left( \partial _t p(w),\partial _t \delta y \right) _{L^2(Q)} + \left( {\hat{E}}\Phi _\varepsilon (w) \nabla p(w),\nabla \delta y \right) _{L^2(I,{\mathbb {L}}^2(\Omega ))}\\&\quad = \left( ({\hat{E}}\Phi _\varepsilon (w+\rho h)-{\hat{E}}\Phi _\varepsilon (w))\nabla {\tilde{y}}(w+\rho h),\nabla p(w) \right) _{L^2(I,{\mathbb {L}}^2(\Omega ))}. \end{aligned}$$

By Lemma 3.3 we have that \({\tilde{y}}(w+\rho h)\rightharpoonup {\tilde{y}}(w)\) in \(L^2(I,V)\) as \(\rho \rightarrow 0^+\). Moreover, since \(\varepsilon >0\) we have that \({\hat{E}}\Phi _\varepsilon \) is Frechet differentiable in \(L^\infty (\omega _c)\) with \({\hat{E}}_0\Phi _\varepsilon '(w),{\hat{E}}_0\Phi _\varepsilon '(w+\rho h)\in L^\infty (\Omega )\). Hence, dividing (24) by \(\rho >0\) and passing to the limit implies in combination with (25) that

$$\begin{aligned} \begin{aligned} F_\varepsilon '(w;h)&:= \lim _{\rho \rightarrow 0^+}\frac{1}{\rho }(F_\varepsilon (w+\rho h) - F_\varepsilon (w)) \\&= \langle h,\Phi _\varepsilon '(w){\hat{E}}_0^* \int _0^T\nabla {\tilde{y}}(w)\cdot \nabla p(w)\,\mathrm {d}t \rangle _{L^\infty (\omega _c),L^1(\omega _c)}. \end{aligned} \end{aligned}$$

Since the mapping \(h\mapsto F_\varepsilon '(w;h)\) is linear and bounded, \({\hat{E}}_0^*(\int _0^T\nabla y(w)\cdot \nabla p(w)\,\mathrm {d}t)\,\Phi _\varepsilon '(w)\) is the Gâteaux derivative of \(F_\varepsilon \) at w. Thus, \(F_\varepsilon \) is Gâteaux differentiable in \(L^\infty (\omega _c)\).

It remains to show that this is also a Fréchet derivative. From the above, we have that

$$\begin{aligned}&\left| F_\varepsilon (w+h)-F_\varepsilon (w)-F_\varepsilon '(w)h\right| \\&\quad \le C\Vert h \Vert _{L^\infty (\omega _c)}^2\\&\qquad + \left| \left( ({\hat{E}}\Phi _\varepsilon (w+h)-{\hat{E}}\Phi _\varepsilon (w))\nabla {\tilde{y}}(w+h)- {\hat{E}}_0\Phi _\varepsilon '(w)h\nabla {\tilde{y}}(w),\nabla p(w) \right) _{L^2(I,{\mathbb {L}}^2(\Omega ))}\right| \\&\quad \le C\Vert h \Vert _{L^\infty (\omega _c)}^2\\&\qquad +\left( \Vert {\hat{E}}\Phi _\varepsilon (w+h)-{\hat{E}}\Phi _\varepsilon (w)- {\hat{E}}_0\Phi _\varepsilon '(w)h \Vert _{L^\infty (\omega _c)}\Vert \nabla {\tilde{y}}(w+h) \Vert _{L^2(I,{\mathbb {L}}^2(\omega _c))}\right. \\&\qquad + \left. \Vert {\hat{E}}_0\Phi _\varepsilon '(w)\Vert _{L^\infty (\omega _c)} \Vert \nabla {\tilde{y}}(w+h)-\nabla {\tilde{y}}(w)\Vert _{L^2(I,{\mathbb {L}}^2(\Omega ))}\right) \Vert \nabla p(w)\Vert _{L^2(I,{\mathbb {L}}^2(\Omega ))}, \end{aligned}$$

and hence that

$$\begin{aligned} \frac{|F_\varepsilon (w+h)-F_\varepsilon (w)-F_\varepsilon '(w)h|}{\Vert h \Vert _{L^\infty (\omega _c)}} \rightarrow 0\quad \text {for}\quad \Vert h\Vert _{L^\infty (\omega _c)}\rightarrow 0 \end{aligned}$$

since \({\tilde{y}}(w+h) \rightarrow {\tilde{y}}(w)\) in \(L^2(I,V)\) by Lemma 3.5.

The regularity follows from \({\tilde{y}}(w),p(w)\in L^2(I,V)\) together with the properties of the norm in Bochner spaces, cf. [43, Cor. V.1]. If Assumption 3.1 holds, Proposition 3.6 yields \({\tilde{y}}(w)\in L^\infty (I,W^{1,q}(\Omega ))\) for some \(q>2\) and hence \({\hat{E}}_0^*\left( \int _0^T\nabla {\tilde{y}}(w)\cdot \nabla p(w)\,\mathrm {d}t\right) \Phi _\varepsilon '(w)\in L^2(I,L^{\frac{2q}{2+q}}(\omega _c))\). \(\square \)

We can now proceed exactly as in [10] to obtain first-order necessary optimality conditions.

Theorem 4.3

([10, Thm. 4.2]) If \(\varepsilon >0\) and Assumption 3.1 holds, every local minimizer \({{\bar{u}}}\in BV(\omega _c)\) to (20) satisfies

$$\begin{aligned} -F_\varepsilon '({{\bar{u}}})\in \alpha \,\partial G({{\bar{u}}}) + \beta \,\partial \mathrm {TV}({{\bar{u}}})\subset L^\frac{2q}{2+q}(\omega _c), \end{aligned}$$
(26)

where G and \(\mathrm {TV}\) are considered as extended real-valued convex functionals on \(L^\frac{2q}{q-2}(\omega _c)\).

Introducing explicit subgradients for the two subdifferentials, we obtain primal–dual optimality conditions.

Corollary 4.4

([10, Cor. 4.3]) For any local minimizer \({{\bar{u}}}\in BV(\omega _c)\) to (20), there exist \({{\bar{q}}} \in L^\frac{2q}{2+q}(\omega _c)\) and \({\bar{\xi }} \in L^{\frac{2q}{2+q}}(\omega _c)\) satisfying

$$\begin{aligned} \left\{ \begin{aligned} 0&= F_\varepsilon '({{\bar{u}}}) + \alpha {{\bar{q}}} + \beta {\bar{\xi }},\\ {{\bar{q}}}&\in \partial G({{\bar{u}}}),\\ {\bar{\xi }}&\in \partial \mathrm {TV}({{\bar{u}}}). \end{aligned} \right. \end{aligned}$$
(27)

These conditions can be further interpreted pointwise. First, using the characterization of Lemma 4.2, we can identify the first term in the first equation with the \(L^{1+\delta }(\omega _c)\) function given by

$$\begin{aligned} \left[ \int _0^T\nabla y(u) \cdot \nabla p(u)\,\mathrm {d}t\right] (x)\Phi _\varepsilon '({{\bar{u}}}(x)) \quad \text {for a.e. }x\in \omega _c. \end{aligned}$$

Second, using the characterization of \(\partial G\) from [10, § 2], we have that

$$\begin{aligned} {{\bar{q}}}(x) \in {\left\{ \begin{array}{ll} \{-u_m\} &{} {{\bar{u}}}(x)< u_1,\\ {[}-u_m, \frac{1}{2}(u_1+u_2)] &{} {{\bar{u}}}(x) = u_1,\\ \{\frac{1}{2}(u_i+u_{i+1})\} &{} {{\bar{u}}}(x) \in (u_i,u_{i+1}), \quad \ 1\le i<m,\\ \left[ \tfrac{1}{2}(u_{i-1}+u_{i}),\tfrac{1}{2}(u_{i}+u_{i+1})\right] &{} {{\bar{u}}}(x) = u_{i},\qquad \qquad 1\le i<m,\\ {[} \frac{1}{2}(u_{m-1}+u_m),u_m] &{} {{\bar{u}}}(x) = u_m,\\ u_m &{} {{\bar{u}}}(x) > u_m. \end{array}\right. } \end{aligned}$$

The interpretation of the final term is more delicate. Informally, \(\xi (x)\) corresponds to the mean curvature of \({{\bar{u}}}(x)\) (if \({{\bar{u}}}\) is smooth at x) or the signed normal to its jump set (if \({{\bar{u}}}\) has a jump discontinuity across a measurable curve of \(d-1\)-dimensional Hausdorff measure greater zero). This can be made more precise using the notion of the full trace from [44], see also [45].

5 Numerical Solution

In this section, we address the numerical solution of (20) using a stabilized space-time finite element discretization for second-order hyperbolic equations [15] and a nonlinear primal–dual proximal splitting algorithm [16,17,18]. Since we now consider a finite-dimensional optimization problem, we can include the constraint \(u \in U\) directly via the multi-bang penalty instead of enforcing it inside the state equation. In this and the following section, we will therefore omit \(\Phi _\varepsilon \) from the discretized tracking term (and, with it, \(\varepsilon \) in general) and define the multi-bang penalty with \(\mathrm {dom}\,g = [u_1,u_d]\) as in [12]; see (34) below.

5.1 Discretization

We consider a mesh \({\mathcal {T}}_h\) consisting of a finite set of triangles or tetrahedra T with a mesh size h. Then we introduce the space \(D_h\subset H^1(\Omega )\cap C({\overline{\Omega }})\) of linear finite elements based on the triangulation \({\mathcal {T}}_h\). A basis of this space is given by the standard hat functions \(\varphi _i\) associated with nodes \(x_i\), \(i=1,\ldots , N_h\), of the triangulation \({\mathcal {T}}_h\). Next we discretize the time interval I uniformly by \(0=t_0< \cdots < t_{N_\tau }=T\) and grid size of \(\tau \). Similarly, we define the space \(D_\tau \subset H^1(I)\cap C({\overline{I}})\) of piecewise linear and continuous functions with respect to these grids. Furthermore we consider the hat functions \(e_i\), \(i=0,\ldots ,N_\tau \), with \(e_i(t_l)=\delta _{il}\) which form a basis of \(D_\tau \). We assume that \(\omega _c\) can be represented by the triangulation \({\mathcal {T}}_h\) exactly and introduce the space

$$\begin{aligned} D_h^c=\{u_h\in D_h \mid {\text {supp}}u_h\subseteq \overline{\omega _c}\}={\text {span}}\{\varphi _i|_{\overline{\omega _c}}\mid x_i\in \overline{\omega _c}\}. \end{aligned}$$

Moreover we introduce the space \(C_h^c\) of piecewise constant functions on the triangles in \(\omega _c\). In the following we also identify \(D_h^c\) with \({\mathbb {R}}^{N_c}\) for \(\dim D_h^c=N_c\) and \(C_h^c\) with \({\mathbb {R}}^{M_c}\) for \(\dim C_h^c=M_c\). Finally we define \(\vartheta :=(\tau ,h) >0\) and introduce the stabilization parameter \(\sigma \ge 0\).

Definition 5.1

We call \(y_{\vartheta }\in D_{\vartheta }:=D_h \otimes D_\tau \) a discrete solution of (9) if \(y_{\vartheta }\) satisfies

$$\begin{aligned}&\int _0^T-(\partial _t y_{\vartheta },\partial _t v)_{H}-(\sigma -\tfrac{1}{6})\tau ^2({\hat{E}}u_h\nabla \partial _t y_{\vartheta },\nabla \partial _tv)_{{\mathbb {L}}^2(\Omega )}+ ({\hat{E}}u_h\nabla y_{\vartheta },\nabla v)_{{\mathbb {L}}^2(\Omega )}\,\mathrm {d}t\nonumber \\&\quad = (y_1,v(0))_{H} + \int _0^T(f,v)_{H}\,\mathrm {d}t\end{aligned}$$
(28)

for all \(v \in D_{\vartheta }\) with \(v(T)=0\) and initial condition \(y_{\vartheta }(0)=S_0 y_0\) defined via

$$\begin{aligned} (S_0 y_0,\varphi )_{H}=(y_0,\varphi )_{H}\quad \forall \varphi \in D_h. \end{aligned}$$

This is a space-time finite element discretization with piecewise linear elements in space and in time. The additional \(\sigma \)-term in (28) serves as a stabilization term, which vanishes for \(\vartheta \rightarrow 0\) and is connected to the error term in the trapezoidal rule for the time integral of the third bilinear form in (28). The stability depends significantly on the value of \(\sigma \) with the method being more stable for larger \(\sigma \); e.g., for \(\sigma \ge 1/4\), the method is unconditionally stable and convergent while for \(0\le \sigma <1/4\), a CFL-like condition has to be satisfied to ensure stability as well as convergence, see [15, Thms. 2.1, 3.1] for homogeneous Dirichlet boundary conditions. At the same time, (28) can be formulated as the following time-stepping scheme: Set \(y_h^0=S_0y_0\) and

$$\begin{aligned}&\left( \frac{y_h^1-y_h^0}{\tau },\varphi \right) _{L^2(\Omega )}+\tau ({\hat{E}}u_h\nabla (\sigma y_h^1+(\tfrac{1}{2}-\sigma )y_h^0),\nabla \varphi )_{{\mathbb {L}}^2(\Omega )}\nonumber \\&\qquad =(y_1,\varphi )_{L^2(\Omega )}+\left( \int _0^{t_1}fe_0\,\mathrm {d}t,\varphi \right) _{L^2(\Omega )}, \end{aligned}$$
(29a)
$$\begin{aligned}&\left( \frac{y_h^{i+1}-2y_h^i+y_h^{i-1}}{\tau },\varphi \right) _{L^2(\Omega )}+\tau \left( {\hat{E}}u_h\nabla (\sigma y_h^{i+1}+(1-2\sigma )y_h^i+\sigma y_h^{i-1}),\nabla \varphi \right) _{{\mathbb {L}}^2(\Omega )}\nonumber \\&\qquad =\left( \int _{t_{i-1}}^{t_{i+1}}fe_i\,\mathrm {d}t,\varphi \right) _{L^2(\Omega )},\quad 1\le i\le N_\tau -1, \end{aligned}$$
(29b)

for all \(\varphi \in D_h\). For \(\sigma =1/4\), this method is equivalent to the implicit, unconditionally stable, and convergent Crank–Nicolson scheme, while for \(\sigma =0\), the method is explicit if the spatial mass matrix is lumped. The main benefit in our context is that this is an adjoint-consistent discretization and therefore can be used to obtain a conforming discretization of (6) in a straight-forward manner.

Next we introduce the discrete control-to-observation operator \(S_\vartheta :U\cap D_h^c\rightarrow {\mathcal {O}}\) defined by \(u\mapsto By_\vartheta \) where \(y_\vartheta \) is the solution of (28) for the coefficient \(u\in U\cap D_h^c\). Let \(\delta >0\). Then the implicit function theorem implies that \(S_\vartheta \) is Fréchet differentiable on the open subset

$$\begin{aligned} \left\{ u \in L^\infty (\omega _c):-\inf _{x\in \omega _c} {\hat{u}}(x)+u_{\min }< u(x) < u_{\max }+\delta \quad \text {for a.e.}~x\in \omega _c\right\} \cap D_h^c. \end{aligned}$$

This set contains \(U\cap D_h^c\). The implicit function theorem is applicable since the following linearized discrete state equation is well-posed in the variable \(\delta y\in D_\vartheta \) for every \(\delta u_h\in D_h^c\):

$$\begin{aligned}&\int _0^T-(\partial _t \delta y_{\vartheta },\partial _tv)_{H}-(\sigma -\tfrac{1}{6})\tau ^2({\hat{E}}u_h\nabla \partial _t \delta y_{\vartheta },\nabla \partial _tv)_{{\mathbb {L}}^2(\Omega )}+ ({\hat{E}}u_h\nabla \delta y_{\vartheta },\nabla v)_{{\mathbb {L}}^2(\Omega )}\,\mathrm {d}t\nonumber \\&\qquad =\int _0^T(\sigma -\tfrac{1}{6})\tau ^2({\hat{E}}\delta u_h\nabla \partial _t y_{\vartheta },\nabla \partial _tv)_{{\mathbb {L}}^2(\Omega )}- ({\hat{E}}\delta u_h\nabla y_{\vartheta },\nabla v)_{{\mathbb {L}}^2(\Omega )}\,\mathrm {d}t\end{aligned}$$
(30)

for all \(v \in D_{\vartheta }\) with \(v(T)=0\) and initial condition \(\delta y_{\vartheta }(0)=0\) as well as \(y_\vartheta =S_\vartheta (u)\). Thus the derivative of \(S_\vartheta \) at u is given by \(S'(u_h):D_h^c\rightarrow {\mathcal {O}},~\delta u_h\mapsto B\delta y_\vartheta \) where \(\delta y_\vartheta \) solves (30) for \(\delta u_h\). Its adjoint (with respect to the \(L^2(Q)\) and \({\mathcal {O}}\) inner product) is given by

$$\begin{aligned} S'(u_h)^*:{\mathcal {O}}\rightarrow C_h^c\, , \qquad o\mapsto \left. \left( \int _0^T(\tfrac{1}{6}-\sigma )\tau ^2\nabla \partial _t y_{\vartheta }\cdot \nabla \partial _tp_\vartheta +\nabla y_{\vartheta }\cdot \nabla p_\vartheta \,\mathrm {d}t\right) \right| _{\omega _c} \end{aligned}$$

where \(y_\vartheta =S_\vartheta (u)\) and \(p_{\vartheta }\) solves the discrete adjoint equation

$$\begin{aligned}&\int _0^T-(\partial _t v,\partial _tp_\vartheta )_{H}-(\sigma -\tfrac{1}{6})\tau ^2({\hat{E}}u_h\nabla \partial _t v,\nabla \partial _tp_\vartheta )_{{\mathbb {L}}^2(\Omega )}+ ({\hat{E}}u_h\nabla v,\nabla p_\vartheta )_{{\mathbb {L}}^2(\Omega )}\,\mathrm {d}t\nonumber \\&\quad = \int _0^T(B^*o,v)_{H}\,\mathrm {d}t\end{aligned}$$
(31)

for all \(v \in D_{\vartheta }\) with \(v(0)=0\) and initial condition \(p_{\vartheta }(T)=0\), which can be formulated as a time-stepping scheme similar to (29).

We now introduce the variables \(y_h\) and \(p_h\) defined by

$$\begin{aligned} y_\vartheta (t,x)=\sum _{i=0}^{N_\tau }y_h^i(x)e_i(t),\quad p_\vartheta (t,x)=\sum _{i=1}^{N_\tau }p_h^i(x)e_i(t). \end{aligned}$$

Thus \((S_\vartheta '(u_h))^*o\) with \(o\in {\mathcal {O}}\) has the representation

$$\begin{aligned} (S_\vartheta '(u_h))^*o= \nabla y_h^\top K\nabla p_h, \end{aligned}$$
(32)

where K is given by

$$\begin{aligned} K:=(\tfrac{1}{6}-\sigma )\tau ^2A_\tau +M_\tau =\tau \begin{pmatrix} \tfrac{1}{2}-\sigma &{} \sigma &{}&{}\\ \sigma &{} 1-2\sigma &{} \sigma &{}\\ &{}\ddots &{}\ddots &{}\ddots \\ &{}&{}&{}\qquad \sigma \\ &{}&{}\sigma &{}\tfrac{1}{2}-\sigma \end{pmatrix} \end{aligned}$$

with the temporal mass matrix \(M_\tau \) and stiffness matrix \(A_\tau \). For \(\sigma =0\), K is a diagonal matrix.

We now address the discretization of the control costs in the optimization (6). Since a function \(u_h\in D_h^c\) is an element of \(H^1(\omega _c)\) and thus its weak derivatives are piecewise constant on the triangulation of \(\omega _c\), we have

$$\begin{aligned} TV(u_h)=\Vert \nabla u_h\Vert _{L^1(\omega _c)}=\sum _{K\in {\mathcal {T}}_h\cap \omega _c}|(\nabla u_h)|_K|_2. \end{aligned}$$

We furthermore approximate the integral in definition of G by the trapezoidal rule to obtain the discrete multi-bang penalty \(G_h:D_h^c \rightarrow {\overline{{\mathbb {R}}}}\). Using these definitions, we obtain the fully discrete optimization problem

$$\begin{aligned} \min _{u_h\in D_h^c\cap U}\frac{1}{2} \Vert S_\vartheta (u_h)-y_d\Vert _{{\mathcal {O}}}^2+ \alpha G_h(u_h) + \beta TV(u_h). \end{aligned}$$
(33)

Note that although (33) is discrete, it is still formulate in function spaces. To apply a minimization algorithm, we now reformulate it in terms of the coefficient vectors for the finite-dimensional functions. First, \(D_h^c\cap U\) can be identified with the set

$$\begin{aligned} U_h=\{\mathbf{u }\in {\mathbb {R}}^{N_c}|~0\le \mathbf{u }_i\le u_{\max }~\forall i=1,\ldots ,N_c\} \end{aligned}$$

through \(u_h=\sum _{x_i\in \overline{\omega _c}}\mathbf{u }_ie_i\). Next we introduce the finite-dimensional subspace \({\mathcal {O}}_h=B(D_\vartheta )\) of \({\mathcal {O}}\) and the discrete control-to-observation on \(U_h\) by \(\mathbf{S }_\vartheta :{\mathbb {R}}^{N_c}\rightarrow {\mathbb {R}}^{N_o}\) with \(N_o=\dim ({\mathcal {O}}_h)\) defined by \(S_\vartheta \). Moreover we define the matrix \(M_{{\mathcal {O}}}\in {\mathbb {R}}^{N_o\times N_o}\) by the mapping \((o_1,o_2)\mapsto (o_1,o_2)_{{\mathcal {O}}}\) for \(o_1,o_2\in {\mathcal {O}}_h\). Thus the inner product and the norm of \({\mathcal {O}}\) in \({\mathcal {O}}_h\) can be identified with \((\mathbf{o }_1,\mathbf{o }_2)_{{\mathcal {O}}_h}=\mathbf{o }_1^\top M_{{\mathcal {O}}}\mathbf{o }_2\) and \(\Vert \mathbf{o }\Vert _{{\mathcal {O}}_h}=((\mathbf{o },\mathbf{o })_{{\mathcal {O}}_h})^{1/2}\) for \(\mathbf{o },~\mathbf{o }_1,~\mathbf{o }_2\in {\mathbb {R}}^{N_o}\). We denote the orthogonal projection onto \({\mathcal {O}}_h\) by \(\pi _{{\mathcal {O}}}\). The operator B restricted to \(D_\vartheta \) can be identified with a matrix \(B_h\in {\mathbb {R}}^{N_o\times N_{\vartheta }}\) for \(N_{\vartheta }=\dim (D_\vartheta )\). Thus \(B^*\) can be identified with \(B_h^\top \). With these identifications, the adjoint operator \(S_\vartheta '(u_h)^*\) for \(u_h\in U\cap D_h^c\) acting on \({\mathcal {O}}_h\) can similarly identified with \(\mathbf{S }_\vartheta '(\mathbf{u })^*:{\mathbb {R}}^{N_o}\rightarrow {\mathbb {R}}^{M_c}\) for some \(\mathbf{u }\in U_h\). Moreover we define the matrix \(A_h\in {\mathbb {R}}^{dM_c\times N_c}\) representing the bilinear form \((\nabla u_h,\xi _h)_{L^2(\Omega )^d}\) for \(u_h\in D_h^c\) and \(\xi _h\in (C_h^c)^d\). Thus we have

$$\begin{aligned} TV(u_h)=\sum _{K\in {\mathcal {T}}_h\cap \omega _c}|(\nabla u_h)|_K|_2=\sum _{K\in {\mathcal {T}}_h\cap \omega _c}|(A_h\mathbf{u })|_K|_2 =: \Vert |A_h \mathbf{u }|_2 \Vert _1=:\mathbf{TV} _h (\mathbf{u }). \end{aligned}$$

Finally, the trapezoidal rule in the definition of \(G_h\) can be expressed in the form of a mass lumping scheme, i.e.,

$$\begin{aligned} G_h(u_h)=\sum _{x_i\in \overline{\omega _c}}d_ig(\mathbf{u }_i)=: \mathbf{G }_h(\mathbf{u }) \end{aligned}$$

where

$$\begin{aligned} g(t) = {\left\{ \begin{array}{ll} \infty &{} t< u_1,\\ \frac{1}{2} \left( (u_{i}+u_{i+1})t - u_iu_{i+1}\right) &{} t \in [u_i,u_{i+1}],\quad 1\le i <m,\\ \infty &{} t > u_m, \end{array}\right. } \end{aligned}$$
(34)

is the scalar multi-bang penalty including the box constraints from [12] and \(d_i=\int _{\omega _c}\varphi _i\,\mathrm {d}x\) are the diagonal entries of the lumped mass matrix, see [46,47,48,49].

Using these notations, we can write (33) equivalently in the form

$$\begin{aligned} \min _{\mathbf{u }\in U_h}\frac{1}{2} \Vert \mathbf{S }_\vartheta (\mathbf{u })-\pi _{{\mathcal {O}}}y_d\Vert _{{\mathcal {O}}_h}^2+ \alpha \mathbf{G }_h(\mathbf{u }) + \beta \mathbf{TV} _h(\mathbf{u }). \end{aligned}$$
(35)

5.2 Primal–Dual Proximal Splitting

To solve (35), we extend the approach in [31] by applying the nonlinear primal–dual proximal splitting method from [16,17,18] together with a lifting trick. For this purpose, we write (35) (omitting the bold notation and the subscripts denoting vectors and discretizations from now on and assuming that \(y_d\in {\mathcal {O}}_h\)) as

$$\begin{aligned} \min _{u\in U} \frac{1}{2} \Vert S(u) - y_d \Vert _{{\mathcal {O}}}^2 + \beta \Vert |A_hu|_2 \Vert _1 + \alpha G(u). \end{aligned}$$

Setting

$$\begin{aligned}&{\mathcal {F}}: {\mathbb {R}}^{N_o} \times {\mathbb {R}}^{2M_c} \rightarrow {\mathbb {R}},&(y,\psi )\mapsto \frac{1}{2}\Vert y-y_d \Vert _{{\mathcal {O}}}^2 + \beta \Vert |\psi |_2 \Vert _1,\\&K:U\rightarrow ({\mathbb {R}}^{N_o} \times {\mathbb {R}}^{2M_c}),&u\mapsto (S(u),A_h u), \end{aligned}$$

we can apply the nonlinear primal–dual proximal splitting algorithm

$$\begin{aligned} \left\{ \begin{aligned} u^{k+1}&= \mathrm {prox}_{\gamma _G \alpha G_h}\left( u^k-\gamma _G K'(u^k)^* \xi ^k\right) \\ {\hat{u}}^{k+1}&= 2 u^{k+1} - u^k\\ \xi ^{k+1}&= \mathrm {prox}_{\gamma _F {\mathcal {F}}^*}\left( \xi ^k + \gamma _F K({\hat{u}}^{k+1})\right) \end{aligned} \right. \end{aligned}$$
(36)

for step sizes \(\gamma _F,\gamma _G>0\) satisfying \(\gamma _F\gamma _G\Vert K'(u)^* \Vert <1\). Convergence can be guaranteed under a second-order type condition for K and possibly further restrictions on the step sizes, whose (very technical) verification is outside the scope of this work. Instead, we restrict the discussion here on deriving the explicit form of (36) in the present setting.

First, we endow \(({\mathbb {R}}^{N_o}\times {\mathbb {R}}^{2M_c})\) with the sum of the inner product induced by \(M_{\mathcal {O}}\) (for \({\mathbb {R}}^{N_o}\)) and the Euclidean inner product (for \({\mathbb {R}}^{2M_c}\)). With respect to this inner product, we obtain the adjoint Fréchet derivative

$$\begin{aligned} K'(u)^*(r,\xi ) = S'(u)^*(r) + A_h^T\psi , \end{aligned}$$

where \(S'(u)^*\) is the fully discrete operator corresponding to (32) with right-hand side \(r\in {\mathcal {O}}\) for the adjoint equation.

The proximal point mapping for the (scaled) multi-bang penalty can be obtained by straightforward calculation based on a case differentiation in the definition of the subdifferential, see [31, Prop. 3.6]; for the sake of completeness, we give the short derivation here in full. By the definition of the proximal mapping, \(w=\mathrm {prox}_{\gamma g}(v) = ({{\,\mathrm{\mathrm {Id}}\,}}+ \gamma \partial g)^{-1}(v)\) holds for any \(v\in {\mathbb {R}}\) if and only if \(v \in \{w\} + \gamma \partial g(w)\). Recalling from [13, § 2] that

$$\begin{aligned} \partial g(v) = {\left\{ \begin{array}{ll} (-\infty ,\tfrac{1}{2}(u_1+u_2)] &{} v = u_1,\\ \{\tfrac{1}{2}(u_{i}+u_{i+1})\} &{} v \in (u_{i},u_{i+1}), \quad 1 \le i< m,\\ {[}\tfrac{1}{2}(u_{i-1}+u_i), \tfrac{1}{2}(u_i+u_{i+1}] &{} v = u_i, \quad 1<i<m,\\ {[}\tfrac{1}{2}(u_{m-1}+u_m),\infty ) &{} v = u_m,\\ \emptyset &{} \text {otherwise}, \end{array}\right. } \end{aligned}$$
(37)

we now distinguish the following cases for w:

  1. (i)

    \(w=u_1\): In this case,

    $$\begin{aligned} v \in \{w\} + \gamma \left( -\infty ,\tfrac{1}{2}(u_1+u_2)\right] = \left( -\infty ,(1+\tfrac{\gamma }{2}) u_1 + \tfrac{\gamma }{2} u_2\right] . \end{aligned}$$
  2. (ii)

    \(w\in (u_i,u_{i+1})\) for \(1\le i< m\): In this case,

    $$\begin{aligned} v \in \{w\} + \gamma \{\tfrac{1}{2}(u_i + u_{i+1})\}, \end{aligned}$$

    which first can be solved for w to yield

    $$\begin{aligned} w = v - \tfrac{\gamma }{2} (u_i+u_{i+1}); \end{aligned}$$

    inserting this into \(w\in (u_i,u_{i+1})\) and simplifying then gives

    $$\begin{aligned} v \in \left( (1+\tfrac{\gamma }{2})u_i + \tfrac{\gamma }{2} u_{i+1}, \tfrac{\gamma }{2} u_i + (1+\tfrac{\gamma }{2})u_{i+1}\right) . \end{aligned}$$
  3. (iii)

    \(w = u_i\), \(1<i<m\): Proceeding as in the first case, we obtain

    $$\begin{aligned} v \in \left[ \tfrac{\gamma }{2} u_{i-1},(1+\tfrac{\gamma }{2})u_i,(1+\tfrac{\gamma }{2}) u_i + \tfrac{\gamma }{2} u_{i+1}\right] . \end{aligned}$$
  4. (iv)

    \(w=u_m\): Similarly, this implies that

    $$\begin{aligned} v \in \left[ \tfrac{\gamma }{2} u_{m-1},(1+\tfrac{\gamma }{2})u_m,\infty \right) . \end{aligned}$$
Fig. 1
figure 1

Illustration of the proximal point mapping \(\mathrm {prox}_{\gamma g}\) of the scalar multi-bang penalty g for \((u_1,u_2,u_3)\) = (0, 1, 2) and \(\gamma =0.2\)

Since this is a complete and disjoint case distinction for \(v\in {\mathbb {R}}\), we obtain the proximal mapping for the scalar penalty g, see Fig. 1. By a standard argument, the proximal point mapping for \(G_h\) is thus given componentwise by

$$\begin{aligned}&{[}\mathrm {prox}_{\gamma \alpha G_h}(v)]_j \\&\quad ={\left\{ \begin{array}{ll} v_j-\frac{\alpha \gamma }{2} (u_{i} + u_{i-1}) &{} \text {if } v_j \in \left( \left( 1+\tfrac{\alpha \gamma }{2}\right) u_{i-1}+\tfrac{\alpha \gamma }{2} u_{i},\left( 1+\tfrac{\alpha \gamma }{2}\right) u_{i}+\tfrac{\alpha \gamma }{2} u_{i-1}\right) ,\\ u_i &{}\text {if } v_j\in \left[ \left( 1+\tfrac{\alpha \gamma }{2}\right) u_{i}+\tfrac{\alpha \gamma }{2} u_{i-1},\left( 1+\tfrac{\alpha \gamma }{2}\right) u_{i}+\tfrac{\alpha \gamma }{2} u_{i+1}\right] , \end{array}\right. } \end{aligned}$$

where we have set \(u_{0}=-\infty \) and \(u_{m+1}=\infty \) to avoid the need for further case distinctions. (Note that we compute the proximal mapping with respect to the inner product induced by the lumped mass matrix such that the weight \(d_i\) cancels.)

Finally, for \({\mathcal {F}}\), we first compute the Fenchel conjugate on \({\mathbb {R}}^{N_o}\times {\mathbb {R}}^{2M_c}\) (with respect to the same inner product as above) as

$$\begin{aligned} {\mathcal {F}}^* (r,\psi ) = \frac{1}{2}\Vert r \Vert _{{\mathcal {O}}}^2 + (r,y^d)_{{\mathcal {O}}} + \delta _{\{|\psi |_2\le \beta \}}(\psi ) \end{aligned}$$

to obtain the proximal point mapping (again, with respect to this inner product)

$$\begin{aligned} \mathrm {prox}_{\gamma {\mathcal {F}}^*}(r,\psi ) = \begin{pmatrix} \frac{1}{1+\gamma } (r - \gamma y^d)\\ \mathrm {proj}_{\{|\psi |_2\le \beta \}}(\psi ) \end{pmatrix}, \end{aligned}$$

where the projection can be computed elementwise for each \(K\in {\mathcal {T}}\cap \omega _c\) as

$$\begin{aligned}{}[ \mathrm {proj}_{\{|\psi |_2\le \beta \}}(\psi )]_K = \frac{\beta \psi }{\max \{\beta ,|\psi |_2\}}. \end{aligned}$$

With these, (36) becomes the following explicit algorithm:

$$\begin{aligned} u^{k+1}&= \mathrm {prox}_{\gamma _G\alpha G_h} \left( u^k-\gamma _G S'(u^k)^*r^k - \gamma _G A_h^T\psi ^k\right) ,\\ {\hat{u}}^{k+1}&= 2 u^{k+1} - u^k,\\ y^{k+1}&= S({\hat{u}}^{k+1}),\\ r^{k+1}&= \frac{1}{1+\gamma _F} \left( r^k + \gamma _F(y^{k+1}- y^d)\right) ,\\ q^{k+1}&= \psi ^k + \gamma _F A_h{{\bar{u}}}^{k+1},\\ \psi ^{k+1}&= \mathrm {proj}_{\{|\psi |_2\le \beta \}}(\psi ). \end{aligned}$$

Note that this requires two solutions of the forward wave equation (as well as one solution of the adjoint equation) in each iteration, since \(y^{k+1}\) is based on the extrapolated vector \({{\bar{u}}}^{k+1}\), while the state vector y required for the computation of \(S'(u^{k+1})^*r^{k+1}\) in the following iteration is based on the original update \(u^{k+1}\).

The iteration is terminated based on the residual norm in an equivalent reformulation of the optimality conditions for (33). Combining the approach of Sect. 4 with standard results from convex analysis (see, e.g, [50, 51]), any local minimizer \({{\bar{u}}}\) of (33) together with the corresponding Lagrange multiplier \({\bar{\psi }}\) and the residual \({{\bar{r}}} := S({{\bar{u}}})-y_d\) can be shown to satisfy

$$\begin{aligned} \left\{ \begin{aligned} {{\bar{u}}}&= \mathrm {prox}_{\gamma _G\alpha G_h} \left( {{\bar{u}}}-\gamma _G S'({{\bar{u}}})^*{{\bar{r}}} - \gamma _G A_h^T{\bar{\psi }}\right) ,\\ {{\bar{r}}}&= \frac{1}{1+\gamma _F} \left( {{\bar{r}}} + \gamma _F(S({{\bar{u}}}) - y_d)\right) ,\\ {\bar{\psi }}&= \mathrm {proj}_{\{|\psi |_2\le \beta \}}({\bar{\psi }}). \end{aligned} \right. \end{aligned}$$

For the first equation, which holds in \(U_h\), we measure the residual in the discrete norm induced by the lumped mass matrix as in the definition of \(G_h\). The second equation holds in \({\mathcal {O}}_h\), and hence we measure the residual in the norm induced by the corresponding mass matrix \(M_{\mathcal {O}}\). Finally, the last equation holds in \({\mathbb {R}}^{2N_c}\) so we use the standard Euclidean norm. The iteration is terminated once the sum of these residuals drops below a given tolerance. For the implementation, note that the residual in the first equation for \((u^k,r^k,\psi ^k)\) reduces to \(u^{k}-u^{k+1}\). On the other hand, the residual in the second equation requires an additional solution of the state equation since here S is applied to \(u^k\) instead of the extrapolated \({{\bar{u}}}^k\). In practice, we thus do not evaluate the stopping criterion in every iteration.

6 Numerical Examples

We now illustrate the above presented approach with two numerical examples. The first is a transmission problem (where waves produced by external forcing pass through the control domain before being observed) loosely motivated by acoustic tomography. The second is a reflection problem (where only reflected, not transmitted, waves are observed) that more closely models seismic inversion. The implementation in Python (using DOLFIN [52, 53], which is part of the open-source computing platform FEniCS [54, 55]) used to generate the following results can be downloaded from https://github.com/clason/tvwavecontrol.

6.1 A Model Acoustic Tomography Problem

Fig. 2
figure 2

transmission example, exact coefficient \(u_e\)

For the first example, we take \(\Omega = (-1,1)\times (-1,2)\) and \(T=3\) and define the control and observation domains

$$\begin{aligned} \omega _c := \{ (x_1,x_2)\in \Omega \mid x_2 \in (0,1) \},\qquad \omega _o := \{ (x_1,x_2)\in \Omega \mid x_2 \in (1,2) \}; \end{aligned}$$

correspondingly, the observation operator is taken as the restriction operator \(By := y|_{{\mathcal {O}}}\) to the observation space \({\mathcal {O}}:= (0,T)\times \omega _o\). The initial conditions are chosen as \((y_0,y_1) = (0,0)\), thus satisfying Assumption 3.1. We now aim to recover a piecewise constant coefficient \(u_e\) with \(u_e(x) \in \{1.0,1.1,1.2,1.3,1.4\}\) almost everywhere; see Fig. 2. Accordingly, we set \({\hat{u}} \equiv 1.0\) and \(u_i = (i-1)/10\), \(i=1,\dots 5\) from noisy observations of the state in \({\mathcal {O}}\). These observations are generated using a source term f that is constructed as a linear combination of point sources which act as Ricker wavelets in time, i.e.,

$$\begin{aligned} f(x,t):= & {} \sum _{i=-9}^9 (\delta _{(i/10,-9/10)}(x) \\&+\,\,\delta _{(0.05+i/10,-8/10)}(x)) 2(1-2(5\pi (t-0.1))^2)e^{-(5\pi (t-0.1))^2}. \end{aligned}$$

(The number and location of source points as well as the amplitude and frequency of the wavelet are chosen such as to obtain a complex enough wave pattern to recover the lateral and depth-wise variations in the coefficient.) The discretization is performed using 64 nodes in each space direction and 128 nodes in time, corresponding to \(h\approx 0.056\) and \(\tau \approx 0.23\). The stabilization constant is set to \(\sigma = 1/4\). The discretized exact solution is then perturbed componentwise by \(10\%\) relative Gaussian noise, i.e., we take

$$\begin{aligned} y_d = B(y(u)) + 0.1\Vert B(y(u)) \Vert _{\infty }\xi , \end{aligned}$$

where \(\xi \) is a vector of independent normally distributed random components with mean 0 and variance 1.

We now compute the reconstruction using the algorithm described in Sect. 5.2, comparing the effects of the total variation and the multi-bang penalty by taking \(\alpha \in \{0,10^{-5}\}\) and \(\beta \in \{0,10^{-4}\}\). In each case, we set the step sizes to \(\gamma _F = 10^{-1}\) and \(\gamma _G = 10^3\) and terminate when the residual norms (evaluated every 10 iterations) drop below \(10^{-6}\). Again, these parameters are chosen to achieve a reasonable reconstruction in as few iterations as possible. (A proper parameter choice rule depending on the noise level and the discretization is left for future work.) The results can be seen in Fig. 3. The case of pure multi-bang regularization (\(\alpha = 10^{-5}\) and \(\beta = 0\), 3680 iterations); see Fig. 3a) shows that indeed \(u(x)\in \{u_1,\dots ,u_5\}\) almost everywhere; however, there is a clear lack of regularity of the reconstruction, which is not surprising as the original function-space problem is not well-posed for \(\beta =0\). In contrast, the reconstruction case of pure \(\mathrm {TV}\) regularization (\(\alpha = 0\) and \(\beta = 10^{-4}\), 1100 iterations; see Fig. 3b) shows a much more regular reconstruction that is constant on large regions; however, these constants are not necessarily from the admissible set \(\{u_1,\dots ,u_5\}\). Finally, combining both multi-bang and total variation regularization (\(\alpha = 10^{-5}\) and \(\beta = 10^{-4}\), 600 iterations; see Fig. 3c) allows recovering more admissible values at the price of penalizing the magnitude of the coefficient value, which prevents the largest value \(u_5=0.4\) from being attained. It is also noteworthy that in this case the tolerance for the residual norm is reached after significantly fewer iterations.

Fig. 3
figure 3

transmission example, effect of multi-bang penalty (\(\alpha \)) and total variation penalty (\(\beta \)) on the reconstruction

To illustrate the effects of variation of the desired values \(u_i\) on the reconstruction, we recompute the last example with the same parameters \(\alpha ,\beta \) but 10% increased values, i.e., \(u_i=1.1(i-1)/10\), \(i=1,\dots ,5\). The results are shown in Fig. 4, where we repeat the exact coefficient from Fig. 2 with adjusted labels in Fig. 4a for better comparison. As can be seen from Fig. 4b, the reconstruction is similar to that for \(\alpha =0\). In particular, the total variation penalty prevents the misspecified desired values from being enforced strongly. This demonstrates that while misspecified values clearly do not have the same positive influence on the reconstruction, they at least do not have a negative influence.

Fig. 4
figure 4

transmission example, effect of variation of \(u_i\) on reconstruction

6.2 A Model Seismic Inverse Problem

We next consider an example which is inspired from seismic tomography. We assume that the data is given in the form of a time series of mean values of the reflected waves y over certain spatial regions \(O_i\). Thus we define the observation space \({\mathcal {O}}=L^2(I)^m\) for \(m\in {\mathbb {N}}\) and the observation operator

$$\begin{aligned} B:L^2(Q)\rightarrow {\mathcal {O}},\qquad y\mapsto \left( \frac{1}{|O_i|}\int _{O_i}y(\cdot ,x)\,\mathrm {d}x\right) _{i=1}^m, \end{aligned}$$

where the \(O_i\subset \Omega \) are the m spatial observation patches. Furthermore we assume that seismic sources are given by s point sources located on the surface \(\Gamma _s\subset \partial \Omega \) whose magnitudes are time dependent and follow a Ricker wavelet of the form

$$\begin{aligned} f_k(t)=a_k(1-2 \pi ^2 h_k^2 (t-t_k)^2) e^{-\pi ^2 h_k^2 (t-t_k)^2} \end{aligned}$$

with \(h,a,t\in {\mathbb {R}}^s\). This leads to the modified state equation

$$\begin{aligned} \left\{ \begin{aligned} \partial _{tt} y - {\mathrm {div}}(u\nabla y)&= 0&\text { in } Q, \\ {\partial _\nu y}&= \sum _{k=1}^sf_k\delta _{x_k}&\text { on } (0,T) \times \Gamma _s, \\ {\partial _\nu y}&= 0&\text { on } (0,T) \times \partial \Omega \setminus \Gamma _s, \\ y(0)&= 0,\quad {\partial _t} y(0) = 0,&\text { on }\Omega \end{aligned} \right. \end{aligned}$$

with \((x_k)_{k=1}^s\subset \Gamma _s\), \((f_k)_{k=1}^s\subset L^2(I)\), and \(\delta _{x_k}\) the Dirac measure supported on \(x_k\). In our concrete example, we chose \(\Omega =(-1,1)^2\), \(\Gamma _s=(-1,1)\times \{1\}\), and \(T=3\). We set \(\Omega _c=(-1,1)\times (-1,0.7)\). The observation patches are chosen as

$$\begin{aligned} O_i= & {} (o_i,o_i+0.2)\times (0.8,1) \quad \text { with }\quad o_i\in \{-1,-0.8,-0.6,\\&-0.4,-0.2,0,0.2,0.4,0.6,0.8\}. \end{aligned}$$

The sources are located at \(x=(-1+k\cdot 0.1,1)\) with \(k=0,\ldots ,20\). The parameters of the Ricker wavelet are set to \(a_k=2\), \(h_k=5\) and \(t_k=0.1\). The offset \({\hat{u}}\) has the constant value 1. Finally, the exact velocity model is given by

$$\begin{aligned} u_e={\left\{ \begin{array}{ll} 3&{}x\in (0.4,0.6)\times (0.1,0.4),\\ 2&{}x\in (-0.8,-0.5)\times (0.2,0.6),\\ 1&{}x\in (-0.2,0.2)\times (0.3,0.5),\\ 0&{}\text {else,} \end{array}\right. } \end{aligned}$$

for the constant reference coefficient \({\hat{u}} \equiv 1\), cf. Fig. 5a.

The recorded data for our experiments are generated by solving the state equation with the exact velocity model \(u_e\) resulting in the exact state \(y_e\). Then we set \(y_d=By_e+\delta n\) with \(\delta \in [0,1]\). The function \(n\in L^\infty (I)^m\) is a disturbance which models measurement errors and exterior influences. In our case we use a function of the form

$$\begin{aligned} n_k(t):=\eta _k r_k :=\eta _k\sum _{i=1}^M\frac{m_{i,k}}{i}\cos (4\pi t-s_{i,k}\pi ), \end{aligned}$$

where \(M\in {\mathbb {N}}\), \(\eta _k=\frac{\Vert (By_e)_k\Vert _{L^\infty (I)}}{\Vert r_k\Vert _{L^\infty (I)}}\), and \(m_{i,k},s_{i,k}\) are uniform random numbers in [0, 1]. Here we take \(M=10\).

For the discretization, we take a tensorial-based triangular mesh with \(N_h=129^2\), \(N_\tau =129\), and \(\sigma =1/4\). The relative noise level is \(\delta =0.05\). An appropriate regularization parameter is given by \(\beta = 10^{-4}\); for simplicity, we set \(\alpha =0\). The iteration is initialized with \(u_0=0\) and the stepsizes are again chosen as \(\gamma _F=10^{-1}\) and \(\gamma _G=10^3\). The iteration is stopped if the absolute residuum is smaller than \(10^{-4}\); in this experiment, this was reached after 1068 iterations. Figure 6 shows the exact and noisy observations on \(O_1\), \(O_5\) and \(O_{10}\). At the onset, we note two high spikes (a negative and a positive one) which are caused by the source wave initiated on boundary points \(\Gamma _s\). The remaining oscillations are caused by the reflection waves originating from the discontinuities of \(u_e\) and from the reflecting boundary; only these carry information about the coefficient, which makes the reconstruction challenging. The results are shown in Fig. 5b, where each color map is scaled individually to show more details. We observe that the positions of the discontinuities in \(u_e\) that are close to the observation patches are well approximated in \({{\bar{u}}}\) and that the corresponding interfaces are quite sharp. However, the approximation quality of the discontinuities becomes worse farther away from the observation region. This is caused by the fact that reflected waves from lower sections of the discontinuities are more dispersed than the reflected waves from the upper sections of the discontinuities.

Fig. 5
figure 5

reflection example: results for \(\alpha =0\), \(\beta =10^{-4}\), 1068 iterations (note the different color bars)

Fig. 6
figure 6

reflection example: exact and noisy observations on \(O_i\), \(i=1,5,10\)

7 Conclusion

We showed existence of solutions to an optimal control problem for the wave equation with the control entering into the principal part of the operator using total variation regularization and a reformulation of pointwise constraints using a cutoff function. Preferential attainment of a discrete set of control values is incorporated through a multi-bang penalty. We also derived an improved regularity result for solutions of the wave equation under additional natural assumptions on the data and the control, which (for smooth cutoff functions) allows obtaining necessary optimality conditions that can be interpreted in a suitable pointwise fashion. Finally, we demonstrated that the optimal control problem can be solved numerically using a combination of a stabilized finite element discretization and a nonlinear primal–dual proximal splitting algorithm.

This work can be extended in several directions. Besides applying the proposed approach to more realistic models of acoustic tomography or seismic imaging for practical applications, it would be worthwhile to consider the case of boundary observations of the state [56], which however may lead to an unbounded observation operator B. A further challenging goal would be deriving sufficient second-order conditions. Such conditions could then be used for obtaining discretization error estimates for the optimal controls or for showing convergence of the nonlinear primal–dual proximal splitting algorithm based on the “three-point condition” on S from [18].