Uncertainty Quantification in Image Segmentation Using the Ambrosio–Tortorelli Approximation of the Mumford–Shah Energy

Hintermüller, Michael; Stengl, Steven-Marian; Surowiec, Thomas M.

doi:10.1007/s10851-021-01034-2

Uncertainty Quantification in Image Segmentation Using the Ambrosio–Tortorelli Approximation of the Mumford–Shah Energy

Open access
Published: 03 July 2021

Volume 63, pages 1095–1117, (2021)
Cite this article

Download PDF

You have full access to this open access article

Journal of Mathematical Imaging and Vision Aims and scope Submit manuscript

Uncertainty Quantification in Image Segmentation Using the Ambrosio–Tortorelli Approximation of the Mumford–Shah Energy

Download PDF

Michael Hintermüller ORCID: orcid.org/0000-0001-9471-2479^1,2,
Steven-Marian Stengl^1,2 &
Thomas M. Surowiec³

2371 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

The quantification of uncertainties in image segmentation based on the Mumford–Shah model is studied. The aim is to address the error propagation of noise and other error types in the original image to the restoration result and especially the reconstructed edges (sharp image contrasts). Analytically, we rely on the Ambrosio–Tortorelli approximation and discuss the existence of measurable selections of its solutions as well as sampling-based methods and the limitations of other popular methods. Numerical examples illustrate the theoretical findings.

Mumford and Shah Model and Its Applications to Image Segmentation and Image Restoration

Automated Image Restoration and Segmentation Using Mumford–Shah-Like Regularization and Topological Asymptotic Expansion

Article 23 April 2022

1 Introduction

In the use of modern imaging techniques in medicine and embedded artificial intelligence systems for an automated investigation of anatomical structure or the classification of objects (human, tree, cloud, etc.), one of the fundamental steps is the division of a given image into distinct areas with characteristic properties; see, e.g., [35, Chapter 2] and [12, Part II]. In this context, image segmentation is concerned with the problem of identifying regions with approximately homogeneous features (such as color or gray values, and texture) within given image data. The boundary curve separating such homogeneous features is called the edge set or simpy the edge.

Over the years, different approaches have been proposed in order to accomplish this task by solving, e.g., suitably defined partial differential equations or by minimizing appropriate energies. The latter technique leads to variational models; see, e.g., [15,16,17, 22, 24,25,26, 30, 32] and the monographs [33, 5, Chapter 4] or [18, Chapter 7] as well as the references therein for an overview.

In this work, we focus on the Mumford–Shah energy (model)

$$\begin{aligned} MS^g(u,\varGamma ) := \frac{1}{2}\int _{\varOmega \backslash \varGamma } |\nabla u|^2 \mathrm {d}x + \alpha \mathcal {H}^{1}(\varGamma ) + \frac{\beta }{2}\Vert u - g\Vert _{L^2(\varOmega )}^2, \end{aligned}$$

(MS)

which was introduced in [32]. Here, $g \in L^2(\varOmega )$ is a given grayscale image, possibly degraded by noise, where $\varOmega \subset \mathbb {R}^2$ represents the image domain, and $\alpha ,\beta >0$ are given parameters. Further, $\mathcal {H}^{1}$ denotes the one-dimensional Hausdorff measure. In order to discuss the structure of $MS^g$, we note that the third term is a least-squares data fitting term involving our desired reconstruction u and the data g. It is inspired by a maximum likelihood argument regarding the statistical properties of the underlying noise. Since the Mumford–Shah model aims to establish a division in monochromatic regions, the (generalized) gradient of the reconstructed image is minimized (penalized when viewed together with the choices of $\alpha $ and $\beta $) over the complement of the edge set $\varGamma $ in $\varOmega $. This motivates the first term in $MS^g$. In order to achieve a certain robustness in finding $\varGamma $ with respect to noise, the length of the edge set is penalized in the second term. Moreover, penalizing the length of $\varGamma $ secures a Caccioppoli property of the edge set.

The aim associated with the above Mumford–Shah model typically is to find a restored image $u: \varOmega \rightarrow \mathbb {R}$ and an edge set $\varGamma $, which separates homogeneous regions of interest, by minimizing $MS^g$; see (MS) above. In this context, the Hausdorff term in $MS^g$ is one ingredient yielding existence of a minimizing pair $(u_g,\varGamma _g)$. We note that such a minimizer is attached to the specific realization of the noise $\xi $ contained in the data, i.e., $g=g_0 + \xi $ with $g_0$ representing the underlying true image.

Obviously, the location of $\varGamma _g$ within $\varOmega $ is influenced by the specific realization $\xi $ of an associated random process and might not properly reflect the location of the true edge $\varGamma _{g_0}$ pertinent to $g_0$. Indeed, rather than computing $\varGamma _g$ one is merely interested in estimating the uncertainty in the location of the reconstructed edge set, given statistical properties of the underlying random process. This brings us into the realm of uncertainty quantification of geometric objects.

In general, one of the fundamental components of predictive estimation in uncertainty quantification is to analyze how uncertainty propagates through a given mathematical model; also known as model prediction [38, Chap. 9]. We observe that the vast majority of the existing literature on the propagation of uncertainty is concerned with well-defined systems whose (unique) solutions are given by real numbers, vectors, matrices, or distributed parameters; see, e.g., [38, Chap. 9, Chap. 10] or for an introduction with emphasis on methods also [39].

However, within the context of image processing (segmentation, in particular), variational models of crack propagation, or free discontinuity problems in general, the output of the model is a geometric quantity, which itself does not even live in a linear space; see [6] and the references therein. Due to the associated complexity, we emphasize the role played by the geometric variable (the edge set). In fact, of the existing methods for the propagation of uncertainty: direct evaluation of the mean and variance of a quantity of interest, sampling methods, perturbation methods, and spectral representations, only sampling methods appear to be possible.

In contrast to inverse problems, where Bayesian techniques admitting probability densities to characterize the true value are utilized, we are here confronted rather with a kind of forward problem. It is also worth mentioning that the model considered in this work yields solution pairs consisting of the restored image and associated edge set, which is not guaranteed to be unique due to a lack of strict convexity of the objective in the Mumford–Shah problem.

Our intention here is to therefore draw attention to an important but somewhat neglected class of problems for uncertainty quantification. As a means of investigating these models, in particular the edge set, we later propose the notion of generalized cumulative distribution function and pointwise quantiles. Of course, pointwise averages, variances, etc., are also conceivable. However, we believe that an estimation of the probability of an edge appearing within a given image patch (or a patch of pixels in the discrete setting) is the more valuable information.

Let us now return to the Mumford–Shah model and introduce the associated variational framework. Given sufficient Sobolev regularity of u on the set $\varOmega \backslash \varGamma $, this leads us to the following minimization problem:

$$\begin{aligned}&\text {minimize } MS^g(u,\varGamma ) \text { over } (u,\varGamma ) \in \mathcal {A},\\&\text {with } \mathcal {A} := \{(u,\varGamma ) : \varGamma \subset \varOmega \text { closed and } u \in H^1(\varOmega \backslash \varGamma ) \}. \end{aligned}$$

Note that (MS) constitutes the renown Mumford–Shah problem. The interested reader is referred to the original paper [32] for more details, and for a condensed overview including a discussion of the existence of solutions we point to [6] and the references therein. Uniqueness of a solution cannot be expected in general as it can be seen in a one-dimensional counter example in [18, Section 7.4.5]. The analytic difficulties arise due to the set-variable $\varGamma $ as well as u being an element of a space depending on this set. Moreover, we have the Hausdorff measure directly appearing in the functional. These aspects not only significantly complicate the analysis of (MS), but they also challenge the numerical treatment.

An elegant way to address these issues is due to Ambrosio and Tortorelli (cf. [10, 11]). Driven by the analytical notion of $\varGamma $-convergence, the main idea is to treat the edge set along with its Hausdorff measure by a phase field approach. In fact, the so-called Ambrosio–Tortorelli functional $AT^g_\varepsilon : H^1(\varOmega ) \times K \rightarrow \mathbb {R}$ is defined by

$$\begin{aligned} \widetilde{AT}^g_\varepsilon (u,z)&:= \frac{1}{2} \int _\varOmega (z^2+\eta ) |\nabla u|^2 \mathrm {d}x \\&\quad + \frac{\alpha }{2}\left( \varepsilon \int _\varOmega |\nabla z|^2 \mathrm {d}x + \frac{1}{4\varepsilon } \int _\varOmega (z-1)^2 \mathrm {d}x\right) \\&\quad + \frac{\beta }{2} \int _\varOmega |u-g|^2 \mathrm {d}x, \end{aligned}$$

together with the constraint set

$$\begin{aligned} K:= \left\{ y \in H^1(\varOmega ) : 0 \le y \le 1 \text { almost everywhere (a.e.) on } \varOmega \right\} . \end{aligned}$$

As before, the quantity $u \in H^1(\varOmega )$ denotes the restored image. The geometric quantity $\varGamma $, on the other hand, is approximated by the function $z \in H^1(\varOmega )$, which can be interpreted as an edge indicator. In this context, the zero-level set of z is associated with an edge and the level set associated with $z = 1$ relates to the absence of such an edge. From this, we see that the first term in $\widetilde{AT}_\varepsilon ^g$ refers to the first term of (MS). The second and third terms approximate the Hausdorff measure. The remaining terms are the data fitting term as well as an additional regularization for the restored image u with weighting factor $\eta > 0$. This term ensures coercivity. In [11], it is shown that the sequence of minimization problems

$$\begin{aligned} \text {minimize } AT^g_\varepsilon (u,z) \text { over } (u,z) \in H^1(\varOmega ) \times K, \end{aligned}$$

(AT)

$\varGamma $-converges in the strong $L^1(\varOmega ;\mathbb {R}^2)$-topology to the original Mumford–Shah problem (MS) as $\varepsilon \rightarrow 0$ and $\eta = o(\varepsilon )$, with $\frac{o(t)}{t} \rightarrow 0$ as $t \rightarrow 0$.

As motivated above, given noisy observations g, we are interested in studying the influence of random errors and other data transformation effects in the image data on the reconstruction of edges in the image. The degradation model underlying our study is given by

$$\begin{aligned} g = L g_0 + \xi , \end{aligned}$$

where L is a linear continuous operator from $L^2(\varOmega )$ to $L^2(\varOmega )$, i.e., $L \in \mathcal {L}(L^2(\varOmega ))$. Note that L may model image blur, subsampling or Fourier and wavelet transforms, respectively, to mention only a few. Moreover, $\xi \in L^2(\varOmega )$ is an oscillating map with zero mean. From a stochastic point of view, L and $\xi $ constitute realizations of stochastic processes. We also assume that L does not annihilate constant functions, i.e. $L 1 \ne 0$.

Since $\varGamma $ in (MS) is a set, a mathematically meaningful notion of uncertainty propagation of this object is not immediately clear. In view of this, the Ambrosio–Tortorelli formulation offers a transformation of this geometric feature to a functional variable which is more amenable to the study of the propagation of uncertainty. Taking account of the additional data transformation operator L in our model, $AT^g_\varepsilon $ becomes

$$\begin{aligned} AT^g_\varepsilon (u,z)&:= \frac{1}{2} \int _\varOmega z^2 |\nabla u|^2 \mathrm {d}x \\&\quad + \frac{\alpha }{2}\left( \varepsilon \int _\varOmega |\nabla z|^2 \mathrm {d}x + \frac{1}{4\varepsilon } \int _\varOmega (z-1)^2 \mathrm {d}x\right) \\&\quad + \frac{\beta }{2} \int _\varOmega |Lu-g|^2 \mathrm {d}x +\frac{\eta }{2} \int _\varOmega |\nabla u|^2 \mathrm {d}x, \end{aligned}$$

which constitutes the segmentation model under investigation in this paper.

The rest of the paper is organized as follows. In Sect. 2, we provide a collection of important results on the Ambrosio–Tortorelli functional concerning properties of the functional as well as first-order conditions for its minimizers. These will be used in Sect. 3 to give a mathematical justification of random processed image and edge indicators in the sense of measurable selections based on the theorem of Kuratowski–Ryll–Nardzewski. In Sect. 4, we briefly discuss frontiers of popular techniques in uncertainty quantification before we turn our attention to sampling-based methods. We conclude our work in Sect. 5 with numerical examples.

2 Properties of the Deterministic Problem

Now we collect several properties of the Ambrosio–Tortorelli functional and its minimizers. In order to reduce technicalities in the subsequent proofs, we establish the following norm equivalence. Below, $\Vert \cdot \Vert _{L^2(\varOmega )}$ denotes the usual $L^2(\varOmega )$-norm on a bounded domain $\varOmega \subset \mathbb {R}^d$ and $H^1(\varOmega )$ is the Sobolev space of functions in $L^2(\varOmega )$ with generalized gradient in $L^2(\varOmega )^d$; see, e.g., [3].

Lemma 1

Let $L \in \mathcal {L}(L^2(\varOmega ))$ with $L 1 \ne 0$. Then the norm

$$\begin{aligned} {\vert \vert \vert }u {\vert \vert \vert }^2 := \Vert \nabla u \Vert _{L^2(\varOmega )}^2 + \Vert L u \Vert _{L^2(\varOmega )}^2 \end{aligned}$$

and the usual $H^1$-norm, i.e., $\Vert u\Vert _{H^1(\varOmega )}^2 = \Vert \nabla u\Vert _{L^2(\varOmega )}^2 + \Vert u\Vert _{L^2(\varOmega )}^2$, are equivalent on $H^1(\varOmega )$.

Proof

Let $u \in H^1(\varOmega )$ be chosen arbitrarily. At first, we see

$$\begin{aligned} {\vert \vert \vert }u {\vert \vert \vert }^2&= \Vert \nabla u\Vert _{L^2(\varOmega )}^2 + \Vert L u\Vert _{L^2(\varOmega )}^2 \\&\le \Vert \nabla u\Vert _{L^2(\varOmega )}^2 + \Vert L\Vert _{\mathcal {L}(L^2(\varOmega ))}^2 \Vert u\Vert _{L^2(\varOmega )}^2 \end{aligned}$$

and hence ${\vert \vert \vert }u {\vert \vert \vert }\le \max \left( 1, \Vert L\Vert _{\mathcal {L}(L^2)} \right) \Vert u\Vert _{H^1(\varOmega )}$, where $\Vert \cdot \Vert _{\mathcal {L}(L^2)}$ denotes the operator norm for linear bounded operators from $L^2(\varOmega )$ to $L^2(\varOmega )$.

For proving the inverse inequality we define the quantity $m := \frac{1}{\lambda ^d(\varOmega )} \int _\varOmega u \mathrm {d}x\in \mathbb {R}$, where $\lambda ^d$ denotes the d-dimensional Lebesgue measure, and let $C_P > 0$ be the Poincaré constant, which is the (smallest) real number fulfilling the Poincaré inequality

$$\begin{aligned} \Vert u - m\Vert _{L^2(\varOmega )} \le C_P \Vert \nabla u\Vert _{L^2(\varOmega )} \text { for all } u \in H^1(\varOmega ), \end{aligned}$$

see, e.g., [2, Corollary 5.4.1]. Using the above, we find

$$\begin{aligned} \Vert u\Vert _{H^1(\varOmega )}^2&= \Vert \nabla u\Vert _{L^2(\varOmega )}^2 + \Vert u\Vert _{L^2(\varOmega )}^2 = \Vert \nabla u\Vert _{L^2(\varOmega )}^2 \\&\quad + \left\| u -m\right\| _{L^2(\varOmega )}^2 + \lambda ^d(\varOmega ) m^2\\&\le (1 + C_P^2) \Vert \nabla u\Vert _{L^2(\varOmega )}^2\\&\quad + \lambda ^d(\varOmega ) \frac{1}{\Vert L1\Vert _{L^2(\varOmega )}^2} \Vert m L1\Vert _{L^2(\varOmega )}^2\\&\le (1 + C_P^2) \Vert \nabla u\Vert _{L^2(\varOmega )}^2 + 2\lambda ^d(\varOmega ) \frac{1}{\Vert L1\Vert _{L^2(\varOmega )}^2}\\&\quad \left( \Vert L(u-m)\Vert _{L^2(\varOmega )}^2 + \Vert Lu\Vert _{L^2(\varOmega )}^2 \right) \\&\le \left( 1 + C_P^2 + 2 \lambda ^d(\varOmega ) C_P^2 \frac{\Vert L\Vert _{\mathcal {L}(L^2(\varOmega ))}^2}{\Vert L1\Vert _{L^2(\varOmega )}^2} \right) \Vert \nabla u\Vert _{L^2(\varOmega )}^2\\&\quad + 2\lambda ^d(\varOmega ) \frac{1}{\Vert L1\Vert _{L^2(\varOmega )}^2}\Vert Lu\Vert _{L^2(\varOmega )}^2\\&\le \max \left( 1 + C_P^2 + 2 \lambda ^d(\varOmega ) C_P^2 \frac{\Vert L\Vert _{\mathcal {L}(L^2(\varOmega ))}^2}{\Vert L1\Vert _{L^2(\varOmega )}^2},\right. \\&\left. \times 2\lambda ^d(\varOmega ) \frac{1}{\Vert L1\Vert _{L^2(\varOmega )}^2} \right) {\vert \vert \vert }u {\vert \vert \vert }^2, \end{aligned}$$

which completes the proof. $\square $

From now on, we equip the Sobolev space $H^1(\varOmega )$ for the processed image u with the new norm ${\left| \left| \left| \ \cdot \ \right| \right| \right| }$, whereas the norm of the z-component remains unchanged. Next we consider continuity properties of the Ambrosio–Tortorelli functional.

Theorem 2

The mapping $AT^g_\varepsilon : {H^1(\varOmega )}\times K \rightarrow \mathbb {R}$ is weakly lower semi-continuous.

Proof

Let $(u_n,z_n) \rightharpoonup (u,z)$ be a weakly convergent sequence in ${H^1(\varOmega )}\times K$. Then there exists a subsequence (not relabeled) such that the limes inferior is attained and $z_n \rightarrow z$ converges additionally pointwise almost everywhere (Fischer-Riesz). Using Lebesgue’s dominated convergence theorem, we obtain $z_n \nabla u_n \rightharpoonup z \nabla u$ in $L^2(\varOmega ;\mathbb {R}^d)$. Eventually, we obtain by the weak lower semi-continuity of the $L^2$-norm that

$$\begin{aligned} AT^g_\varepsilon (u,z)&= \frac{1}{2}\Vert z\nabla u\Vert _{L^2(\varOmega ;\mathbb {R}^d)}^2 + \frac{\eta }{2}\Vert \nabla u\Vert _{L^2(\varOmega ;\mathbb {R}^d)}^2 \\&\quad + \frac{\beta }{2} \Vert L u - g\Vert _{{L^2(\varOmega )}}^2 \\&\quad + \frac{\alpha }{8\varepsilon }\Vert z - 1\Vert _{{L^2(\varOmega )}}^2 + \frac{\alpha \varepsilon }{2} \Vert \nabla z\Vert _{L^2(\varOmega )}^2\\&\le \liminf _{n \rightarrow \infty } \frac{1}{2}\Vert z_n \nabla u_n\Vert _{L^2(\varOmega ;\mathbb {R}^d)}^2 \\&\quad + \liminf _{n \rightarrow \infty } \frac{\eta }{2}\Vert \nabla u_n\Vert _{L^2(\varOmega ;\mathbb {R}^d)}^2\\&\quad + \liminf _{n \rightarrow \infty } \frac{\beta }{2} \Vert L u_n - g\Vert _{{L^2(\varOmega )}}^2 \\&\quad + \liminf _{n \rightarrow \infty }\frac{\alpha }{8\varepsilon }\Vert z_n - 1\Vert _{{L^2(\varOmega )}}^2\\&\quad + \liminf _{n \rightarrow \infty }\frac{\alpha \varepsilon }{2} \Vert \nabla z_n\Vert _{L^2(\varOmega )}^2 \\&\le \liminf _{n \rightarrow \infty } AT^g_\varepsilon (u_n,z_n), \end{aligned}$$

which proves the lower-semicontinuity of $AT^g_\varepsilon $. $\square $

Using the direct method of calculus of variations in combination with Lemma 1 and Theorem 2, it is not difficult to derive the existence of a solution of (AT). Due to the nonconvexity of $AT_\varepsilon ^g$, uniqueness cannot be expected in general.

For the sake of characterizing critical points, we turn our attention to necessary first-order optimality conditions. The derivation of a corresponding system is challenged by the nonsmoothness of $AT^g_\varepsilon $ with respect to the ${H^1(\varOmega )}\times {H^1(\varOmega )}$-topology together with the presence of the constraint set K. We address these issues by using a truncation argument. In fact, for all $u, z \in H^1(\varOmega )$ with $z \in L^\infty (\varOmega )$ it holds that $AT^g_\varepsilon (u,z) < \infty $ as well as

$$\begin{aligned} AT^g_\varepsilon (u,\max (\min (z,1),0)) \le AT^g_\varepsilon (u,z). \end{aligned}$$

The constraint set is then substituted by $z \in X$ with $X:= {H^1(\varOmega )}\cap L^\infty (\varOmega )$ together with the norm $\Vert \cdot \Vert _X^2 := \Vert \cdot \Vert _{{H^1(\varOmega )}}^2 + \Vert \cdot \Vert _{L^\infty (\varOmega )}^2$. Consequently, we obtain

$$\begin{aligned} \min _{(u,z) \in {H^1(\varOmega )}\times K} AT^g_\varepsilon (u,z) = \min _{(u,z) \in {H^1(\varOmega )}\times X} AT^g_\varepsilon (u,z), \end{aligned}$$

(1)

and the respective sets of minimizers coincide. The advantage of the righthand side formulation in (1) lies in the fact that the Ambrosio–Tortorelli functional is continuously Fréchet differentiable on $H^1(\varOmega ) \times X$ with derivative

$$\begin{aligned}&\left\langle D AT^g_\varepsilon (u,z),(\varphi ,\psi )\right\rangle _{H^1 \times X} \\&\quad = \int _\varOmega \left( (z^2 + \eta ) \nabla u \nabla \varphi + \beta (L u-g) L \varphi \right) \mathrm {d}x\\&\qquad + \int _\varOmega \left( z \psi |\nabla u|^2 + \alpha \varepsilon \nabla z \nabla \psi +\frac{\alpha }{4\epsilon }(z-1)\psi \right) \mathrm {d}x\\&\quad =: a(u,z;\varphi ) + b(u,z;\psi ). \end{aligned}$$

Here, $\langle \cdot , \cdot \rangle $ denotes the dual pairing between ${H^1(\varOmega )}\times X$ and its dual. Then the stationarity system associated with the problem on $H^1(\varOmega ) \times X$ in (1) in its weak form reads

$$\begin{aligned} a^g(u,z; \varphi )&= \int _\varOmega (z^2 + \eta ) \nabla u \nabla \varphi \mathrm {d}x\nonumber \\&\quad + \beta \int _\varOmega (L u - g) L \varphi \mathrm {d}x= 0, \end{aligned}$$

(2a)

$$\begin{aligned} b(u,z;\psi )&= \int _\varOmega |\nabla u|^2 z \psi \mathrm {d}x+ \alpha \varepsilon \int _\varOmega \nabla z \nabla \psi \mathrm {d}x\nonumber \\&\quad + \frac{\alpha }{4 \varepsilon } \int _\varOmega (z - 1) \psi \mathrm {d}x= 0. \end{aligned}$$

(2b)

for all $\varphi \in {H^1(\varOmega )}$ and $\psi \in X$. Utilizing coercivity properties of the involved inner products, one readily obtains the a priori bounds

$$\begin{aligned}&\min (\eta ,\beta ) {\vert \vert \vert }u {\vert \vert \vert }\le \beta \Vert g\Vert _{L^2(\varOmega )},\quad \text {and} \end{aligned}$$

(3a)

$$\begin{aligned}&\min \left( \alpha \varepsilon , \frac{\alpha }{4\varepsilon }\right) \Vert z\Vert _{H^1(\varOmega )} \le \frac{\alpha }{4 \varepsilon } \lambda ^d(\varOmega )^\frac{1}{2}. \end{aligned}$$

(3b)

Moreover, we deduce $z \in K$, which results from (2b) by testing with $\psi = -(z-1)^+ + (-z)^+$ with $(\cdot )^+=\max \{0,\cdot \}$ in a pointwise sense; see also [14, Prop. 1.3].

After clarifying existence of solutions and first-order necessary conditions, we next draw our attention to the following convergence result.

Theorem 3

Suppose there exists a bounded sequence $((u_j,z_j))_{j\in \mathbb {N}} \subset {H^1(\varOmega )}\times X$ such that

$$\begin{aligned} |a^g(u_j,z_j;\varphi )|&\le \mu _j {\vert \vert \vert }\varphi {\vert \vert \vert }&\forall \varphi \in {H^1(\varOmega )}, \end{aligned}$$

(4a)

$$\begin{aligned} |b(u_j,z_j;\psi )|&\le \nu _j \Vert \psi \Vert _{{H^1(\varOmega )}} &\forall \psi \in X. \end{aligned}$$

(4b)

Further, let $(\mu _j,\nu _j) \rightarrow 0$ and $z_j \in K$ for all $j\in \mathbb {N}$. Then there exist a point $(u^*,z^*)$ and a subsequence such that

$$\begin{aligned} (u_{j_k},z_{j_k}) \rightarrow (u^*,z^*) \text { in } {H^1(\varOmega ;\mathbb {R}^2)},\quad \text {and}\quad (u^*,z^*)\text { solves }(2). \end{aligned}$$

Proof

The result follows from a straight forward adaptation of the proof in [14, Theorem 4.1]. $\square $

The above result is useful when studying properties of solution sets of (2) and for the convergence analysis of a splitting-type algorithm in function space for iteratively solving (2). We continue here by studying such a splitting method and postpone the former aspect to Sect. 3.

We note that the algorithm splits the overall nonconvex minimization problem into the iterative solution of two subsequent problems. Indeed within one iteration, in each step a convex quadratic minimization problem needs to be solved, which is equivalent to solving a linear partial differential equation (PDE), respectively. The convergence of this iteration scheme can then be obtained by applying a result due to [23] concerning the decay of the distance of two consecutive image iterates.

Corollary 4

Let $(u_j,z_j)_{j\in \mathbb {N}}$ be the sequence generated by the splitting algorithm. Then there exists a stationary point $(u^*,z^*) \in {H^1(\varOmega )}\times X$ and a subsequence such that

$$\begin{aligned} (u_{j_k},z_{j_k}) \rightarrow (u^*,z^*) \text { (strongly) in } {H^1(\varOmega ;\mathbb {R}^2)}. \end{aligned}$$

Proof

Adapting [23, Lemma 2.3.9 and Theorem 2.3.10], we obtain

$$\begin{aligned} \sum _{j = 1}^\infty {\vert \vert \vert }u_{j+1} - u_j {\vert \vert \vert }^2 < \infty , \text { which implies } \lim _{j \rightarrow \infty }{\vert \vert \vert }u_{j+1} - u_j {\vert \vert \vert }= 0. \end{aligned}$$

Since $u_{j+1} ={\text {argmin}}_{u \in {H^1(\varOmega )}} AT^g_\varepsilon (u,z_j)$ it holds that

$$\begin{aligned} a^g(u_{j+1},z_j;\varphi ) = 0 \text { for all } \varphi \in {H^1(\varOmega )}. \end{aligned}$$

Hence, we obtain for every $\varphi \in {H^1(\varOmega )}$ that

$$\begin{aligned}&|a^g(u_j,z_j;\varphi )| \nonumber \\&\quad = |a^g(u_{j+1},z_j;\varphi ) - a^g(u_j,z_j;\varphi )| \nonumber \\&\quad = \left| \int _\varOmega (z_j^2 + \eta ) (\nabla u_{j+1} - \nabla u_j) \nabla \varphi \mathrm {d}x + \beta \int _\varOmega (L u_{j+1} - L u_j) L \varphi \mathrm {d}x \right| \nonumber \\&\quad \le (1+\eta )\Vert \nabla u_{j+1} - \nabla u_j\Vert _{{L^2(\varOmega )}} \Vert \nabla \varphi \Vert _{{L^2(\varOmega )}}\nonumber \\&\qquad + \beta \Vert L u_{j+1} - L u_j\Vert _{{L^2(\varOmega )}} \Vert L \varphi \Vert _{{L^2(\varOmega )}} \nonumber \\&\quad \le \max (1+\eta ,\beta ) {\vert \vert \vert }u_{j+1} - u_j{\vert \vert \vert }\cdot {\vert \vert \vert }\varphi {\vert \vert \vert }. \end{aligned}$$

(5)

Since $b(u_j,z_j;\psi ) = 0$ for all $\psi \in X$, we see that the conditions of Theorem 3 are fulfilled with $\mu _j = \max (1+\eta ,\beta ){\vert \vert \vert }u_{j+1} - u_j {\vert \vert \vert }$ and $\nu _j = 0$. Hence, the assertion follows. $\square $

We point out that the last estimate in (5), leading to $\max (1+\eta , \beta ){\vert \vert \vert }u_{j+1} - u_j{\vert \vert \vert }\le \mathrm {tol}$ for some tolerance $\mathrm {tol}>0$, can also be used as a numerically useful stopping criterion for the splitting algorithm.

With these results at hand, we are now able to derive finer properties of the dependence of the solutions of the minimization problem and the first-order system on the given image data, respectively.

3 Existence of Measurable Selections

Next we introduce the concept of uncertain edges. As motivated earlier, we assume that the real image is inaccessible due to corruption by (random) noise and a deterministic degradation operator. The resulting data are therefore modeled by a random variable, which we denote in the following again by g for the sake of readability. Rather than addressing the issue of finding a reconstruction of the image along with an edge set, we seek to study the propagation of noise into the solution.

3.1 Measurable Selections for Ambrosio–Tortorelli

For the reasons mentioned above, we focus on the Ambrosio–Tortorelli approximation (AT) of the Mumford–Shah energy. The main issue now is to establish existence of random variables of solutions as well as their characterization. For this purpose we define the following set-valued solution operators $\mathcal {S}_{\min } , \mathcal {S}_\mathrm {stat}: {L^2(\varOmega )}\rightrightarrows {H^1(\varOmega ;\mathbb {R}^2)}$ defined by

$$\begin{aligned} \mathcal {S}_{\min }(g)&:= {\text {argmin}}\{AT^g_\varepsilon (u,z) : (u,z) \in {H^1(\varOmega )}\times K\},\\ \mathcal {S}_{\mathrm {stat}}(g)&:= \{(u,z) \in H^1(\varOmega ) \times K : (u,z) \text { solves } (2) \text { for } g\}. \end{aligned}$$

We start by providing structural properties of these operators.

Theorem 5

Let $\mathcal {S}\in \{\mathcal {S}_{\min },\mathcal {S}_\mathrm {stat}\}$. Then the following properties hold true:

(i)
$\mathcal {S}$ has nonempty and compact values.
(ii)
For all closed sets $C \subset {H^1(\varOmega )}\times K$, it holds that $\mathcal {S}^{-1}(C) \subset {L^2(\varOmega )}$ is closed,

where we use $\mathcal {S}^{-1}(C):= \{g \in {L^2(\varOmega )}: \mathcal {S}(g) \cap C \ne \emptyset \}$.

Proof

We split the proof into several steps. Below $C \subset H^1(\varOmega ) \times K$ always denotes a closed set.

Step 1 (Nonemptiness). Since we know that the Ambrosio–Tortorelli problem has a minimizer for every $g \in {L^2(\varOmega )}$, it holds that $\mathcal {S}_{\min }(g) \ne \emptyset $. Since every minimizer fulfils the stationarity system, $\mathcal {S}_\mathrm {stat}(g) \ne \emptyset $ follows, as well.

Step 2 ($\mathcal {S}_{\min }(g)$ is closed). Let $m := \min _{(u,z) \in {H^1(\varOmega )}\times K} AT^g_\varepsilon (u,z)$. Then we rewrite

$$\begin{aligned} \mathcal {S}_{\min }(g)&= {\text {argmin}}\{AT^g_\varepsilon (u,z) : (u,z) \in {H^1(\varOmega )}\times K\} \\&= \{(u,z) \in H^1(\varOmega ) \times K : AT^g_\varepsilon (u,z) \le m\} =: {\text {lev}}_m AT^g_\varepsilon . \end{aligned}$$

Since according to Theorem 2 the Ambrosio–Tortorelli functional is (weakly) l.s.c. on ${H^1(\varOmega )}\times K$ with respect to the ${H^1(\varOmega ;\mathbb {R}^2)}$-topology, we find that $\mathcal {S}_{\min }(g)$ is closed.

Step 3 ($\mathcal {S}_\mathrm {stat}(g)$ and $\mathcal {S}_{\min }(g)$ are compact). Let $(u_n,z_n)_{n\in \mathbb {N}} \subset \mathcal {S}_\mathrm {stat}(g)$ be a sequence of stationary points. Then this sequence is bounded in ${H^1(\varOmega )}\times X$ by the a priori estimate (3), and $z_n \in K$ holds true, as well. We apply Theorem 3 with $\mu _j = \nu _j = 0$ and get the existence of a stationary point together with a strongly ${H^1(\varOmega ;\mathbb {R}^2)}$-convergent subsequence. Thus, the compactness of $\mathcal {S}_\mathrm {stat}$ as well as the compactness of $\mathcal {S}_{\min }(g)$ (using Step 2) are established.

Step 4 (Proof of (ii) for $\mathcal {S}_\mathrm {stat}$). Let $(g_n)_{n\in \mathbb {N}} \subset \mathcal {S}_\mathrm {stat}^{-1}(C)$ be a (strongly) ${L^2(\varOmega )}$-convergent sequence with limit $g \in {L^2(\varOmega )}$. We show $g \in \mathcal {S}_\mathrm {stat}^{-1}(C)$. For this purpose, let $(u_n,z_n)_{n\in \mathbb {N}} \subset C$ be a sequence of stationary points for $g_n$, $n\in \mathbb {N}$. Then, $0 \le z_n \le 1$ holds almost everywhere and

$$\begin{aligned} a^{g_n}(u_n,z_n;\varphi )&= 0 \ \ \forall \varphi \in {H^1(\varOmega )},\\ b(u_n,z_n;\psi )&= 0 \ \ \forall \psi \in X, \end{aligned}$$

for every $n\in \mathbb {N}$. Again, we get the boundedness of $((u_n,z_n))_{n \in \mathbb {N}}$ in $H^1(\varOmega ) \times X$ by the uniform boundedness of $\Vert g_n\Vert _{L^2(\varOmega )}$, $n\in \mathbb {N}$, combined with the a priori bound (3). Moreover, we find

$$\begin{aligned} |a^g(u_n,z_n;\varphi )|&= |a^g(u_n,z_n;\varphi ) - a^{g_n}(u_n,z_n;\varphi )| \\&= \left| \beta \int _\varOmega (g_n - g) L \varphi \mathrm {d}x \right| \\&\le \beta \Vert g_n - g\Vert _{{L^2(\varOmega )}} \Vert L \varphi \Vert _{{L^2(\varOmega )}}. \end{aligned}$$

Since $g_n \rightarrow g$ in ${L^2(\varOmega )}$, the conditions of Theorem 3 are fulfilled for g with $\mu _n = \beta \Vert g_n - g\Vert _{{L^2(\varOmega )}}$ and $\nu _n = 0$. Hence, there exists a stationary point $(u^*,z^*)$ together with a strongly ${H^1(\varOmega ;\mathbb {R}^2)}$-convergent subsequence of $(u_n,z_n)_{n\in \mathbb {N}}$. Since $(u_n,z_n) \subset C$ for all $n\in \mathbb {N}$ and C is closed, we obtain $(u^*,z^*) \in C$. Therefore, $\mathcal {S}_\mathrm {stat}(g) \cap C \ne \emptyset $ and, hence, $g \in \mathcal {S}_\mathrm {stat}^{-1}(C)$.

Step 5 (Proof of (ii) for $\mathcal {S}_{\min }$). Let $(g_n)_{n\in \mathbb {N}} \subset \mathcal {S}_{\min }^{-1}(C)$ be a strongly ${L^2(\varOmega )}$-convergent sequence with limit $g \in {L^2(\varOmega )}$. We show $g \in \mathcal {S}_{\min }^{-1}(C)$. Take $(u_n,z_n)_{n\in \mathbb {N}} \subset C$ as a corresponding sequence of minimizers for $g_n$, $n\in \mathbb {N}$. For all $n \in \mathbb {N}$ and all $(u,z) \in {H^1(\varOmega )}\times K$ it thus holds that

$$\begin{aligned} AT^{g_n}_\varepsilon (u_n,z_n) \le AT^{g_n}_\varepsilon (u,z). \end{aligned}$$

Since every minimizer is a stationary point, we obtain from the arguments of step 4 a stationary point $(u^*,z^*) \in C$ and a strongly ${H^1(\varOmega ;\mathbb {R}^2)}$-convergent subsequence $(u_{n_k},z_{n_k})_{k\in \mathbb {N}}$. Then $(u^*,z^*)$ is also a minimizer. In fact, we have

$$\begin{aligned}&AT^g_\varepsilon (u^*,z^*) \le \liminf _{k \rightarrow \infty } AT^g_\varepsilon (u_{n_k},z_{n_k})\\&\quad = \liminf _{k \rightarrow \infty } \left( AT^{g_{n_k}}_\varepsilon (u_{n_k},z_{n_k})\right. \\&\qquad \left. + \beta (L u_{n_k} - g_{n_k},g_{n_k} - g)_{{L^2(\varOmega )}} \frac{\beta }{2} \Vert g_{n_k} - g\Vert _{{L^2(\varOmega )}}^2 \right) \\&\quad = \liminf _{k \rightarrow \infty } AT^{g_{n_k}}_\varepsilon (u_{n_k},z_{n_k}) \le \lim _{k \rightarrow \infty } AT^{g_{n_k}}_\varepsilon (u,z) \\&\quad = AT^g_\varepsilon (u,z). \end{aligned}$$

Since (u, z) was arbitrary, we conclude that $(u^*,z^*)$ is also a minimizer for g. Hence $\mathcal {S}_{\min }^{-1}(C)$ is closed. This completes the proof. $\square $

The aim is to construct a random variable, whose values are the solutions of (AT) for the given image and for almost every realization of noise. This quantity then defines an associated reconstructed image and edge indicator, respectively. Therefore we aim at using the Kuratowski–Ryll-Nardzewski measurable selection theorem, which we state here for ease of reference and refer the interested reader to [13, Theorem 6.6.7].

Theorem 6

(Kuratowski and Ryll-Nardzewski). Let Y be a Polish space, $(\mathcal {X},\mathcal {A})$ a measurable space and $\varPsi : {\mathcal {X}} \rightrightarrows Y$ a set-valued operator such that

(i)
$\varPsi ({x})$ is nonempty and closed for all $x \in \mathcal {X}$, and
(ii)
for all $U \subset Y$ open, it holds that $\varPsi ^{-1}(U) = \{{x \in \mathcal {X}} : \varPsi ({x}) \cap U \ne \emptyset \} \in {\mathcal {A}}$.

Then there exists a measurable selection, i.e., a mapping $\varPsi _\mathrm {sel}: {\mathcal {X}} \rightarrow Y$ such that $\varPsi _\mathrm {sel}({x}) \in \varPsi ({x})$ for all $x \in \mathcal {X}$.

Given the notation of Theorem 6, we note that the fulfilment of the condition $\varPsi ^{-1}(C) \in {\mathcal {A}}$ for all closed subsets $C \subset Y$ implies assumption (ii). Indeed, let $U \subset Y$ be an arbitrary open set $U \subset Y$. By $U^c$, we denote the complement of U in Y. Rewriting U gives

$$\begin{aligned} U&= \{y \in Y: {\text {dist}}(y, U^c) > 0\} \\&= \left\{ y \in Y : \exists n \in \mathbb {N} : {\text {dist}}(y,U^c) \ge \tfrac{1}{n}\right\} \\&= \bigcup _{n = 1}^\infty \left\{ y \in Y : {\text {dist}}(y,U^c) \ge \tfrac{1}{n}\right\} = \bigcup _{n = 1}^\infty C_n, \end{aligned}$$

with $C_n := \{y \in Y: {\text {dist}}(y,U^c) \ge \tfrac{1}{n}\} = ({\text {dist}}(\ \cdot \ ,U^c))^{-1} ([\tfrac{1}{n},+\infty ))$ as a closed set for every $n \in \mathbb {N}$. Then, we obtain

$$\begin{aligned} \varPsi ^{-1}(U)&= \{ x \in \mathcal {X} : \varPsi (x) \cap U \ne \emptyset \}\\&= \{ x \in \mathcal {X} : \exists n \in \mathbb {N} : \varPsi (x) \cap C_n \ne \emptyset \}\\&= \bigcup _{n = 1}^\infty \varPsi ^{-1}(C_n). \end{aligned}$$

Since $\varPsi ^{-1}(C_n) \in {\mathcal {A}}$ by assumption, we have $\varPsi ^{-1}(U) \in {\mathcal {A}}$.

Based on this observation, we are now able to prove the existence of measurable selections for the operators $\mathcal {S}_{\min }$ and $\mathcal {S}_\mathrm {stat}$.

Corollary 7

Let $\mathcal {S}\in \{\mathcal {S}_{\min }, \mathcal {S}_\mathrm {stat}\}$. Then there exist Borel functions $u_\mathrm {sel}: L^2(\varOmega ) \rightarrow H^1(\varOmega )$ and $z_\mathrm {sel}: L^2(\varOmega ) \rightarrow K$ such that $(u_\mathrm {sel}(g), z_\mathrm {sel}(g)) \in \mathcal {S}(g)$ for all $g \in L^2(\varOmega )$.

Proof

Set $\varPsi \in \{\mathcal {S}_{\min }, \mathcal {S}_\mathrm {stat}\}$, ${\mathcal {X}} = L^2(\varOmega )$, ${\mathcal {A}} = \mathcal {B}(L^2(\varOmega ))$ be the Borel algebra and $Y = H^1(\varOmega ) \times K$. Since $H^1(\varOmega ;\mathbb {R}^2)$ is a separable Hilbert space, it is a Polish space. As closed subsets of Polish spaces are Polish as well (see [40, Proposition A.1] or use [1, Corollary 3.5]) also $Y \subset H^1(\varOmega )^2$ is a Polish space and hence the assumptions of Theorem 6 are fulfilled. $\square $

3.2 Measurable Selections for Mumford–Shah

Similar to the previous subsection, it is possible to derive an analogous result for the Mumford–Shah problem (MS). Hence, we focus on the denoising case, i.e., $L = \mathrm {id}_{L^2(\varOmega )}$. Due to the analytical difficulties concerning the edge set as a geometric quantity, we have drawn our attention to the Ambrosio–Tortorelli model. In the following, we temporarily return to the original Mumford–Shah model. In fact, it is possible to derive a result similar to the one above for the processed image only. For this sake, we first discuss a reformulation of the Mumford–Shah problem (MS) based on special functions of bounded variations.

For this purpose, we recall that the space of functions of bounded variation $BV(\varOmega )$ is defined as the set of $L^1$-functions with bounded total variation. The latter quantity is given by

$$\begin{aligned} |Du|(\varOmega )&:= \sup \left\{ \int _\varOmega u {\text {div}} p \ \mathrm {d}x: p \in C_0^1(\varOmega ; \mathbb {R}^d),\right. \\&\quad \left. |p(x)|_\infty \le 1 \text { for all } x \in \varOmega \right\} . \end{aligned}$$

Accordingly, we have $BV(\varOmega ) = \{ u \in L^1(\varOmega ) : |Du|(\varOmega ) < +\infty \}$. Equipped with the norm $\Vert u\Vert _{BV} = \Vert u\Vert _{L^1(\varOmega )} + |Du|(\varOmega )$ the space $BV(\varOmega )$ is a Banach space. The distributional derivative Du of $u \in BV(\varOmega )$ is a Radon measure which can be decomposed according to

Here, the first term is the part of Du that is absolutely continuous with respect to the Lebesgue measure with $\nabla u$ denoting its density. The terms $u^+, u^-$ are the so-called upper, respectively, lower limits of u and $S_u := \{x \in \varOmega : u^+(x) > u^-(x)\}$ represents the jump set. The second term denotes the part of the measure that is absolutely continuous with respect to the Hausdorff measure restricted to $S_u$, which we denote here as . The remaining part is singular with respect to both the Lebesgue and Hausdorff measure and is referred to as the Cantor part of Du. For the definition of approximate limits as well as further details the reader is referred to [2, Section 10], [21, Section 5] as well as to the monograph [6].

The space of special functions of bounded variation is then defined as

$$\begin{aligned} SBV(\varOmega ) := \{u \in BV(\varOmega ) : \mathcal {C}_u = 0\}, \end{aligned}$$

which is the subspace of BV-functions with vanishing Cantor part.

Using the above notation, it is possible to provide the following equivalent reformulation of the Mumford–Shah problem, originally proposed by [20], as a problem with the processed image contained in the SBV-space, i.e.,

$$\begin{aligned}&\text {minimize } MS_{SBV}^g(u) \text { over } u \in SBV(\varOmega ) \text { with }\nonumber \\&MS_{SBV}^g(u) := \frac{1}{2}\int _{\varOmega } |\nabla u|^2 \mathrm {d}x + \alpha \mathcal {H}^{d-1}(S_u) + \frac{\beta }{2}\Vert u - g\Vert _{L^2(\varOmega )}^2. \end{aligned}$$

(MS-SBV)

We note that a solution must fulfil $u \in L^2(\varOmega )$ and $\nabla u \in L^2(\varOmega ;\mathbb {R}^d)$ for the value of the functional to be finite. Observe further that this formulation has the advantage that the edge set does not enter as an explicit variable, but it is rather encoded in the function u through a jump. On the other hand, a potential drawback of this approach is given by the fact that now u is contained in a space that is neither reflexive nor separable. For details on the equivalence of the two formulations as well as the existence proof for $g \in L^\infty (\varOmega )$ the reader is referred to [20] or to [6, Section 7].

For (MS-SBV), we are now able to derive an analogous version of Theorem 5, when we consider the solution operator as a mapping from $L^\infty (\varOmega )$ to $L^2(\varOmega )$. Let us briefly discuss the requirement on g. In fact, from a practical view point an image signal is the product of a technical measurement process producing data in a fixed and finite range. Therefore the image signal will typically be essentially bounded. From the mathematical viewpoint, we seek to apply a result by Ambrosio addressing SBV functions as well as Theorem 6. In order to do so, we need bounded sequences in $L^\infty (\varOmega )$ and a separable solution space $L^2(\varOmega ) \supset SBV(\varOmega )$.

Theorem 8

Let $T:L^\infty (\varOmega ) \rightrightarrows L^2(\varOmega )$ be the solution mapping of the Mumford–Shah problem (MS-SBV), i.e.,

$$\begin{aligned} T(g) := {\text {argmin}}\left\{ MS_{SBV}^g(u) : u \in SBV(\varOmega ) \right\} . \end{aligned}$$

Then the following properties hold:

(i)
T has nonempty and compact values.
(ii)
For all closed sets $C \subset L^2(\varOmega )$, it holds that $T^{-1}(C) \subset L^2(\varOmega )$ is closed.

Before we provide the proof, we also state that there exists a measurable selection of minimizers, i.e., a Borel function $u_\mathrm {sel}:L^\infty (\varOmega ) \rightarrow L^2(\varOmega )$ with $u_\mathrm {sel}(g) \in T(g)$ for all $g \in L^\infty (\varOmega )$.

Proof

Let $g \in L^\infty (\varOmega )$.

Step 1 (The set $T(g) \subset L^2(\varOmega )$ is compact). We know that the Mumford–Shah problem has a solution (cf. [9, Example 5.2]). Hence $T(g) \ne \emptyset $. Let $(u_n)_{n \in \mathbb {N}} \subset T(g) \subset SBV(\varOmega )$ be a sequence of solutions. By the same truncation argument as in [9], we obtain $\Vert u_n\Vert _{L^\infty (\varOmega )} \le \Vert g\Vert _{L^\infty (\varOmega )}$ for all $n \in \mathbb {N}$. Thus, $(u_n)_{n \in \mathbb {N}}$ is uniformly bounded and by using

$$\begin{aligned} m&:= \min _{v \in SBV(\varOmega )} MS_{SBV}^g(v) = MS_{SBV}^g(u_n) \\ \,\,\,&\ge \frac{1}{2} \int _\varOmega |\nabla u_n|^2 \mathrm {d}x + \alpha \mathcal {H}^{d-1}(S_{u_n}), \end{aligned}$$

we obtain that the conditions of the compactness result of Ambrosio [8] are fulfilled. Hence, there exist a subsequence $(u_{n_k})_{k \in \mathbb {N}}$ as well as $u \in SBV(\varOmega ) \cap L^\infty (\varOmega )$ such that $u_{n_k} \rightarrow u$ almost everywhere and the gradients $\nabla u_{n_k} \rightharpoonup \nabla u$ converge weakly in $L^2(\varOmega ; \mathbb {R}^d)$. Due to the $L^\infty $-boundedness, we obtain $u_{n_k} \rightarrow u$ in $L^2(\varOmega )$ by dominated convergence. Moreover, we get by the lower semi-continuity of the norm and the second part of Ambrosio’s compactness result that

$$\begin{aligned} m \le MS_{SBV}^g(u) \le \liminf _{n \rightarrow \infty } MS_{SBV}^g(u_n) = m \end{aligned}$$

and eventually $u \in T(g)$. This proves the compactness of T(g) in $L^2(\varOmega )$.

Step 2 ($T^{-1}(C) \subset L^\infty (\varOmega )$ is closed for all closed subsets $C \subset L^2(\varOmega )$). Let $C \subset L^2(\varOmega )$ be closed. We choose an arbitrary sequence $(g_n)_{n \in \mathbb {N}} \subset T^{-1}(C)$ convergent in $L^\infty (\varOmega )$ towards $g \in L^\infty (\varOmega )$. Then for all $n \in \mathbb {N}$ there exists $u_n \in T(g_n)$ that minimizes the corresponding Mumford–Shah functional. It thus holds that $MS_{SBV}^{g_n}(u_n) \le MS_{SBV}^{g_n}(v)$ for all $v \in SBV(\varOmega )$. By plugging in $v = 0$ and using the uniform boundedness of $g_n$ in $L^\infty $, we recognize that the conditions of Ambrosio’s compactness result are fulfilled, and therefore a subsequence $(u_{n_k})_{k \in \mathbb {N}}$ as well as a limit u exist. By the above arguments, we see $u_{n_k} \rightarrow u$ in $L^2(\varOmega )$ and by the closedness assumption also $u \in C$. Together with the convergence of $(g_{n_k})_{k \in \mathbb {N}}$, we find analogously to Step 1 that

$$\begin{aligned} MS^g_{SBV}(u)&\le \liminf _{n \rightarrow \infty } MS_{SBV}^{g_{n_k}}(u_{n_k}) \\&\le \liminf _{n \rightarrow \infty } MS_{SBV}^{g_{n_k}}(v) = MS_{SBV}^g(v) \end{aligned}$$

for all $v \in SBV(\varOmega )$. Thus, $u \in T(g) \cap C$, and therefore $T^{-1}(C)$ is a closed subset of $L^\infty (\varOmega )$. We obtain the existence of a measurable selection again by the theorem of Kuratowski–Ryll-Nardzewski. $\square $

While the above result provides some theoretical insights into the behavior of the computed image, it does not yield a way of how to treat the jump sets, respectively, the edge sets. We therefore return to the Ambrosio–Tortorelli approximation.

So far we have discussed measurable selections of operators on function space with respect to their topologies and induced Borel algebras. We consider now an abstract probability space $(\Xi ,\mathcal {F},\mathbb {P})$ together with a random variable $g:\Xi \rightarrow L^2(\varOmega )$ being measurable with respect to $\Xi $ equipped with the $\sigma $-algebra $\mathcal {F}$ and $L^2(\varOmega )$ equipped with the Borel algebra $\mathcal {B}(L^2(\varOmega ))$. If we consider now a measurable selection $g \mapsto (u_\mathrm {sel}(g),z_\mathrm {sel}(g))$ of minimizers (stationary points), then the composition $\xi \mapsto (u_\mathrm {sel}(g(\xi )),z_\mathrm {sel}(g(\xi )))$ is a random variable of minimizers (stationary points). We refer to the respective components simply as $\xi \mapsto (u(\xi ), z(\xi ))$.

3.3 Characterizing Measurable Selections for Ambrosio–Tortorelli

After establishing the existence of measurable selections, the next step is to derive a variational characterization, i.e., we seek to interpret selections of stationary points as solutions of a stochastic partial differential equation (SPDE) and, respectively, selections of minimizers as minimizer of an optimization problem. For this purpose, we need to introduce elements of Bochner space theory.

First we equip the (abstract) measurable space $(\Xi , \mathcal {F})$ with a probability measure $\mathbb {P}$. The resulting measure space induces a Lebesgue integral for $\mathbb {R}$-valued random variables denoted by

$$\begin{aligned} \mathbb {E}[ \ \cdot \ ] := \int _\Xi \left( \ \cdot \ \right) \mathrm {d}\mathbb {P}, \end{aligned}$$

provided the argument is Lebesgue integrable. For a measurable set $M \in \mathcal {F}$, we define the characteristic function as

$$\begin{aligned} \mathbb {1}_M(\xi ) := \left\{ \begin{array}{ll} 1 &{}\quad \text {if } \xi \in M,\\ 0 &{}\quad \text {else,} \end{array} \right. \end{aligned}$$

and remark that $\mathbb {E}\left[ {\mathbb {1}_M}\right] = \mathbb {P}[M]$ holds. Let now V be a separable Banach space and $p \in [1, +\infty ]$. The space of Bochner p-integrable functions is denoted by

$$\begin{aligned} L^p(\Xi ,\mathcal {F},\mathbb {P};V)&= \{ v:\Xi \rightarrow V : v \text { is measurable with }\\&\qquad \Vert v\Vert _V \in L^p(\Xi , \mathcal {F}, \mathbb {P}) \}. \end{aligned}$$

Since we do not change $(\Xi , \mathcal {F})$, we also denote the above space by $L^p_\mathbb {P}(V)$. Equipped with the norm $\Vert v\Vert _{L^p_{\mathbb {P}}(V)} := \mathbb {E}[\Vert v\Vert _V^p]^{\frac{1}{p}}$ it is even a Banach space, and in the case of $p = 2$ and V being a Hilbert space, it is a Hilbert space, as well. For all $v \in L^p_\mathbb {P}(V)$ the Bochner integral

$$\begin{aligned} \mathbb {E}[v] := \int _\Xi v \ \mathrm {d}\mathbb {P} \end{aligned}$$

exists and is referred to as the expectation. Since the Lebesgue integral is just a special case of the Bochner integral with $V = \mathbb {R}$, there is no risk of confusion. For an overview on Bochner spaces, the interested reader is referred to, e.g., [28, Section 1]. For $V = {H^1(\varOmega )}$, we also write ${H_\mathbb {P}}:= {L^{2}_\mathbb {P}}({H^1(\varOmega )})$ as well as

$$\begin{aligned} X_\mathbb {P}:= \{v \in {H_\mathbb {P}}: \exists C \in (0,\infty ) : |v| \le C \text { holds } \lambda ^d \otimes \mathbb {P}\text {-a.e.}\}, \end{aligned}$$

where we write $\lambda ^d \otimes \mathbb {P}$ for the product measure on $\varOmega \times \Xi $, equipped with its product $\sigma $-algebra $\mathcal {B}(\varOmega ) \otimes \mathcal {F}$. The spaces are equipped with the norms

$$\begin{aligned} \Vert \cdot \Vert _{H_\mathbb {P}}^2&:= \mathbb {E}\left[ {{\vert \vert \vert }\cdot {\vert \vert \vert }^2}\right] , \text { and}\\ \Vert \cdot \Vert _{X_\mathbb {P}}^2&:= \mathbb {E}\left[ {\Vert \cdot \Vert _{H^1(\varOmega )}^2}\right] + \Vert \cdot \Vert _{L^\infty (\varOmega \times \Xi , \lambda ^d \otimes \mathbb {P})}^2 \end{aligned}$$

respectively. In any case, for our random edge indicator $z \in {X_\mathbb {P}}$ we obtain $0 \le z \le 1$ holding $\lambda ^d\otimes \mathbb {P}$-almost everywhere, and in combination with the a priori estimate (3b) the boundedness in ${X_\mathbb {P}}$ and $L^\infty _\mathbb {P}(H^1(\varOmega ))$. If we, moreover, assume that $g \in {L^{p}_\mathbb {P}}(L^2(\varOmega ))$, then also $u \in {L^{p}_\mathbb {P}}(H^1(\varOmega ))$. From now on let $g \in {L^{2}_\mathbb {P}}(L^2(\varOmega ))$.

Our first target is the SPDE-characterization. For this purpose, we proceed regarding the notation as with the image g and denote by (u, z) a random variable of stationary points. Choosing arbitrary $\varphi \in {H_\mathbb {P}}$ and $\psi \in {X_\mathbb {P}}$, we prove the measurability of $\xi \mapsto a^{g(\xi )}(u(\xi ),z(\xi );\varphi (\xi ))$ as well as $\xi \mapsto b(u(\xi ),z(\xi );\psi (\xi ))$. The proof is only carried out for the latter one as the proof for the former is analogous.

Indeed, for $c \in \mathbb {R}$ we consider the preimage of $(-\infty , c]$ under b, which is

$$\begin{aligned} B := \{ (u',z',\psi ') \in H^1(\varOmega ) \times K \times B_R^{L^\infty }(0) : b(u',z';\psi ') \le c\}, \end{aligned}$$

where $R>0$ and $B_R^{L^\infty }(0) := \{ \tilde{\psi }\in X: \Vert \tilde{\psi }\Vert _{L^\infty } \le R\}$. We seek to show the closedness of B in $H^1(\varOmega ;\mathbb {R}^3)$. Let $(u_n',z_n',\psi _n') \rightarrow (u',z',\psi ')$ converge in $H^1(\varOmega ;\mathbb {R}^3)$. By dominated convergence, there exists a subsequence (not relabeled) such that $z_n'$ and $\psi _n'$ converge pointwise almost everywhere. It also holds that

$$\begin{aligned}&\left| \int _\varOmega |\nabla u_n'|^2 z_n' \psi _n' \mathrm {d}x - \int _\varOmega |\nabla u'|^2 z' \psi ' \mathrm {d}x\right| \\&\quad = \left| \int _\varOmega \left( |\nabla u_n'|^2 - |\nabla u'|^2 \right) z_n' \psi _n' \mathrm {d}x \right. \\&\qquad \left. - \int _\varOmega |\nabla u'|^2 (z' \psi ' - z_n' \psi _n') \mathrm {d}x \right| \\&\quad \le R \Vert \nabla u_n' - \nabla u'\Vert _{L^2(\varOmega )}\cdot \Vert \nabla u_n' + \nabla u'\Vert _{L^2(\varOmega )} \\&\qquad + \left| \int _\varOmega |\nabla u'|^2 (z' \psi ' - z_n' \psi _n') \mathrm {d}x \right| . \end{aligned}$$

$$\begin{aligned} b(u',z';\psi ') = \lim _{n \rightarrow \infty }b(u_n',z_n';\psi _n') \le c, \end{aligned}$$

yielding $(u',z',\psi ') \in B$. By the $H^1$-measurability of u, z and $\psi $, we get with $R := \Vert \psi \Vert _{X_\mathbb {P}}$ that

$$\begin{aligned}&\{\xi \in \Xi : b(u(\xi ),z(\xi ),\psi (\xi )) \le c \} \\&\quad = \{\xi \in \Xi : (u(\xi ),z(\xi ),\psi (\xi )) \in B \} \in \mathcal {F} \end{aligned}$$

holds for all $c \in \mathbb {R}$. This implies the measurability of $\xi \mapsto b(u(\xi ), z(\xi ); \psi (\xi ))$. Analogously, we obtain the measurability of $a^{g(\cdot )}(u(\,\cdot \,),z(\,\cdot \,);\varphi (\,\cdot \,))$.

Using the Cauchy–Schwarz inequality, we obtain for $u,\varphi \in {H_\mathbb {P}}$ and $z,\psi \in {X_\mathbb {P}}$

$$\begin{aligned} \mathbb {E}\left[ {\left| a^g(u,z;\varphi )\right| }\right] , \mathbb {E}\left[ {\left| b(u,z;\psi )\right| }\right] < \infty . \end{aligned}$$

Hence, every measurable selection of stationary points fulfils the SPDE-system (in weak form)

$$\begin{aligned} \mathbb {E}\left[ {a^g(u,z;\varphi )}\right]&= 0 \ \ \text { for all } \varphi \in {H_\mathbb {P}}, \end{aligned}$$

(6a)

$$\begin{aligned} \mathbb {E}\left[ {b(u,z;\psi )}\right]&= 0 \ \ \text { for all } \psi \in {X_\mathbb {P}}. \end{aligned}$$

(6b)

Consequently, every measurable selection of stationary points is also a solution of the SPDE (6). The converse holds true as well. Let $\varphi ^* \in H^1(\varOmega )$ and $\psi ^* \in X$ arbitrary. Then we define the following events

$$\begin{aligned} A_\Xi&:= \{\xi \in \Xi : a^{g(\xi )}(u(\xi ),z(\xi );\varphi ^*) \ge 0\},\\ B_\Xi&:= \{ \xi \in \Xi : b(u(\xi ),z(\xi ),\psi ^*) \ge 0 \}. \end{aligned}$$

Plugging $\varphi = \varphi ^* \mathbb {1}_{A_\Xi } - \varphi ^* \mathbb {1}_{A_\Xi ^c}$ and $\psi = \psi ^* \mathbb {1}_{B_\Xi } - \psi ^* \mathbb {1}_{B_\Xi ^c}$ into (6), we obtain

$$\begin{aligned} 0&= \mathbb {E}\left[ a^g(u,z;\varphi )\right] \\&= \mathbb {E}\left[ a^g(u,z;\varphi ^*)\mathbb {1}_{A_\Xi } - a^g(u,z;\varphi ^*)\mathbb {1}_{A_\Xi ^c}\right] = \mathbb {E}\left[ |a^g(u,z;\varphi ^*)|\right] \end{aligned}$$

and analogously $0 = \mathbb {E}\left[ |b(u,z;\psi ^*)|\right] $, from which we deduce $a(u,z;\varphi ^*) = 0$ $\mathbb {P}\text {-a.s.}$ as well as $b(u,z;\psi ^*) = 0$ $\mathbb {P}\text {-a.s.}$ for all $\varphi ^* \in H^1(\varOmega )$ and $\psi \in X$. From this, we see the equivalence of the realization-wise interpretation of (2) and the SPDE-system (6).

After the characterization of stationary points, we want to characterize measurable selections of minimizers in a manner similar to the interchange of minimization and integration in [37, Chapter 14, Section F]. For this purpose, we define the constraint set

$$\begin{aligned} K_\mathbb {P}:= \{v \in {H_\mathbb {P}}: 0 \le v \le 1 \text { holds } \lambda ^d \otimes \mathbb {P}\text {-almost everywhere} \} \end{aligned}$$

and formulate the following minimization problem

$$\begin{aligned} \text {minimize } \mathbb {E}\left[ {AT^g_\varepsilon (u,z)}\right] \text { over } (u,z) \in {H_\mathbb {P}}\times K_\mathbb {P}. \end{aligned}$$

(7)

Using the same arguments as for the deterministic Ambrosio–Tortorelli problem, we obtain the following result.

Theorem 9

The problem (7) admits a solution, and one can relax the constraint $z \in K_\mathbb {P}$ to $z \in {X_\mathbb {P}}$, i.e.,

$$\begin{aligned} \min _{(u,z) \in H_\mathbb {P} \times X_\mathbb {P}} \mathbb {E}\left[ AT^g_\varepsilon (u,z) \right] = \min _{(u,z) \in H_\mathbb {P} \times K_\mathbb {P}} \mathbb {E}\left[ AT^g_\varepsilon (u,z) \right] . \end{aligned}$$

Moreover, the functional $\mathbb {E}[AT^g_\varepsilon (\cdot ,\cdot )]: H_\mathbb {P} \times X_\mathbb {P} \rightarrow \mathbb {R}$ is Fréchet differentiable, and for every minimizer $(u,z) \in H_\mathbb {P} \times X_\mathbb {P}$ the stationarity system (6) is satisfied.

Let $(u^*,z^*)$ be a measurable selection of minimizers and $(\bar{u},\bar{z})$ a minimizer of (7). From this, we obtain $AT^g_\varepsilon (u^*,z^*) \le AT^g_\varepsilon (\bar{u}, \bar{z}) \ \mathbb {P}\text {-a.s.}$ Taking the expectation yields $\mathbb {E}[AT^g_\varepsilon (u^*,z^*)] \le \mathbb {E}[AT^g_\varepsilon (\bar{u},\bar{z})]$. Since $(\bar{u}, \bar{z})$ minimizes (7), it holds that $\mathbb {E}[AT^g_\varepsilon (\bar{u}, \bar{z})] \le \mathbb {E}[AT^g_\varepsilon (u^*,z^*)]$. Hence $(u^*,z^*)$ is a solution of (7) and it follows by the above scenariowise inequality that $AT^g_\varepsilon (u^*,z^*) = AT^g_\varepsilon (\bar{u}, \bar{z}) \ \mathbb {P}\text {-a.s.}$ The solution of the minimization problem for the space of Banach space valued random variables is therefore identical to the solution in the sense of $\mathbb {P}$-almost every scenario. We summarize our findings on the characterization of measurable selections in the following theorem.

Theorem 10

Let $g \in {L^{2}_\mathbb {P}}(L^2(\varOmega ))$. A random variable $(u,z) \in H_\mathbb {P} \times K_\mathbb {P}$ is a solution of (7) if and only if it is a $\mathbb {P}$-a.s. measurable selection of minimizers, and it solves (6) if and only if it is a $\mathbb {P}$-a.s. measurable selection of stationary points.

The purpose of this section is to give a mathematical meaning to random processed image and edges. The approximation of the Mumford–Shah problem yields the existence of random variables that solve the nondeterministic problem scenario-wise. In this sense, a measurable selection of minimizers serves the purpose of a random reconstruction and a random edge. Our numerical treatment of noise propagation in the following section is based on this observation.

4 Numerical Methods

As we are concerned with computing the influence of the signal’s randomness on the edge, our problem class falls into the realm of forward problems in the terminology of uncertainty quantification (UQ). For the sake of accessibility, we briefly summarize some of the main challenges also when designing numerical solution schemes.

1.
The solution may be non-unique.
2.
Due to the lack of convexity of the solution sets, we are unable to use the Michael selection theorem (cf. [13, Theorem 6.6.3]) to establish the existence of continuous selections.

These facts challenge the use of many popular methods in UQ. Indeed:

3.
The Wiener Chaos expansion is significantly troubled by the local constraint $0 \le z \le 1$. The global orthonormal expansion is not compatible to the pointwise inequality constraints. For a computational approach in this direction we refer to [36].
4.
Stochastic collocation is essentially an interpolation method and necessarily needs point evaluations and continuous dependence on the parameter.

The above techniques seek to approximate the entire random variable by discrete objects. A different approach is based on the definition of a quantity of interest and its approximation using the techniques of statistics and numerics. In this paper, we employ Monte Carlo-based techniques, which rely on the strong law of large numbers in a Hilbert space setting. The latter one is stated here for the sake of convenience. For a proof see, e.g., [19].

Theorem 11

Let H be a separable Hilbert space and $(Z_n)_{n \in \mathbb {N}}$ a sequence of pairwise independent, identically distributed (iid.) random variables in $L^1_\mathbb {P}(H)$ with $\mathbb {E}[Z_n] = m$ for all $n \in \mathbb {N}$. Then

$$\begin{aligned} \hat{m}_n := \frac{1}{n}\sum _{i = 1}^n Z_i \rightarrow m \text { converges } \mathbb {P}\text {-a.s.}, \end{aligned}$$

i.e., $\mathbb {P}\left[ \lim _{n \rightarrow \infty }\Vert \hat{m}_n - m\Vert _H = 0\right] = 1$ holds true. If, additionally, $(Z_n)_{n \in \mathbb {N}}$ is a family in $L^2_\mathbb {P}(H)$, then $\hat{m}_n \rightarrow m$ in $L^2_\mathbb {P}(H)$, as well.

4.1 Generalized Confidence Intervals

In our context, one important quantity of interest is concerned with the identification of regions, where it is more likely to encounter an edge. In this sense, we would like to obtain a notion of confidence intervals for the edge indicator. To achieve this, we first provide a pointwise generalization of cumulative distributional functions (cdfs) in the following definition.

Definition 12

Let Z be an ${L^2(\varOmega )}$-valued random variable. The function $F: {L^2(\varOmega )}\rightarrow L^\infty (\varOmega )$ is defined pointwise almost everywhere by

$$\begin{aligned} F(\tau )(x) := \mathbb {P}[Z(x) \le \tau (x)] \end{aligned}$$

for $\tau \in L^2(\varOmega )$ and is called the generalized cumulative distribution function (gcdf) of Z.

Note that the above pointwise definition is justified by the Fubini theorem on integration on product spaces. Using this, we choose a level of significance $s \in (0,1)$ and define for z – interpreted as a random variable in ${L^{2}_\mathbb {P}}({L^2(\varOmega )})$ only – the pointwise quantile $\tau _s$ for (almost all) $x \in \varOmega $ by

$$\begin{aligned} \tau _s(x) := \inf \{ t \ge 0 : \mathbb {P}[z(x) \le t] \ge s\}. \end{aligned}$$

This defines a one-sided confidence interval (the lower bound is zero), and two questions immediately arise: (i) Why is this object measurable, and (ii) how can this object be estimated in practice? Both questions are answered by using the following method.

By definition, the above sequences are nondecreasing, respectively, nonincreasing, and almost everywhere, it holds that $t^\ell _n \le t^\ell _{n + 1}$ ($t_n^u \ge t^u_{n + 1}$) and, moreover, $F(t_n^\ell ) < s \le F(t_n^u)$ for all $n \in \mathbb {N}$. By the monotonicity, we also obtain the inequality $0 \le t_n^\ell < t_n^u \le 1$ and hence the convergence $t^\ell _n \nearrow t_*$ and $t_n^u \searrow t^*$ pointwise almost everywhere on $\varOmega $. Due to the measurability of $t_n^\ell , t_n^u$, we obtain by the pointwise convergence, that the limits are measurable as well. Moreover, by construction we get $|t^\ell _{n + 1} - t^u_{n + 1}| \le \frac{1}{2}|t^\ell _n - t_n^u|$ and inductively $|t^\ell _n - t_n^u| \le 2^{-n}$ almost everywhere. Therefore we get $t_* = t^*$.

In addition, $t^*$ is the pointwise quantile of interest, which is equivalent to stating that for all $\tilde{t} \le t^*$ with $\lambda ^d(A) > 0$ with $A := \{x \in \varOmega : \tilde{t}(x) < t^*(x)\}$ also $\mathbb {P}[z< \tilde{t}] < s$ holds a.e. on A. Indeed, by the choice of $\tilde{t}$ there exists for all $x \in A$ a number $n = n(x) \in \mathbb {N}$ with $\tilde{t}(x) < t^\ell _n(x)$. Hence, by defining $A_n := \{ x \in \varOmega : \tilde{t} < t_n^\ell \}$ we see that $A_n \subset A_{n+1}$ and $A = \bigcup _{n=1}^\infty A_n$. By the construction of $(t^\ell _n)_{n \in \mathbb {N}}$ it holds that $\mathbb {P}[z< \tilde{t}] \le \mathbb {P}[z \le t_n^\ell ] < s$ almost everywhere on $A_n$. We then see the required inequality $\mathbb {P}[z< \tilde{t}] < s$ almost everywhere on A and have thus shown $t^* = \tau _s$.

The result of this discussion is summarized in the following proposition.

Proposition 13

Let Z be an $L^2(\varOmega )$-valued random variable and let a level of significance $s \in (0,1)$ be given. Then, the pointwise quantile

$$\begin{aligned} \tau _s(x) := \inf \{ t \ge 0 : \mathbb {P}[z(x) \le t] \ge s\} \end{aligned}$$

is a measurable function.

In practice, the exact gcdf is unknown and hence an approximation is needed. For this sake, we define the empirical quantiles in full analogy to the classical case of $\mathbb {R}$-valued random variables.

Definition 14

Let Z be an $L^2(\varOmega )$-valued random variable. Consider an iid. family of samples $(Z_i)_{i \in \mathbb {N}}$ with distribution $\mathbb {P}_Z$. The empirical gcdf is defined as the mapping

$$\begin{aligned} \tau \mapsto F_n(\tau ) := \frac{1}{n}\sum _{i = 1}^n \mathbb {1}_{\{Z_i \le \tau \}} \left( \in L^\infty (\varOmega ) \text { for every realization} \right) . \end{aligned}$$

For proving convergence of $(F_n)_{n \in \mathbb {N}}$ as introduced in Definition 14, we recall the statistical theory for $\mathbb {R}$-valued random variables and refer the reader to [41] for proofs and additional information.

Lemma 15

Let $(F_n)_{n \in \mathbb {N}}$ be a sequence of cdfs of associated $\mathbb {R}$-valued random variables and let F be the cdf of an $\mathbb {R}$-valued random variable. We assume $F_n(t) \rightarrow F(t)$ for all $t \in \mathbb {R}$. Let $s \in (0,1)$ such that the quantile function $F^{-1}$ is continuous in s. Then we have $F_n^{-1}(s) \rightarrow F^{-1}(s)$.

Proof

See [41, Lemma 21.2 in a stronger form]. $\square $

Based on the preceding lemma, we now prove a convergence result for the s-quantiles of the empirical gcdfs.

Theorem 16

Let Z be an $\mathbb {R}$-valued random variable with cdf $F :\mathbb {R} \rightarrow [0,1]$ and $(Z_i)_{i \in \mathbb {N}}$ be an iid. family of samples with distribution $\mathbb {P}_Z$. For $n \in \mathbb {N}$ we denote by $F_n$ the empirical cdf. For a level of significance $s \in (0,1)$, chosen such that the quantile function $F^{-1}$ is continuous in s, the empirical quantiles $F_n^{-1}(s) \rightarrow F^{-1}(s)$ converge with probability 1. If, moreover, $0 \le Z \le 1$ holds $\mathbb {P}$-a.s., then in addition we get $F_n^{-1}(s) \rightarrow F^{-1}(s) \text { in } L^p(\Xi , \mathcal {F}, \mathbb {P})$ for all $p \in [1, \infty )$.

Proof

Using the strong law of large numbers, we get $F_n(t) \rightarrow \mathbb {E}[F_n(t)] = F(t)$ $\mathbb {P}$-a.s for all $t \in \mathbb {R}$. By Lemma 15, we know that for every realization of $F_n$ the corresponding quantiles converge and therefore $F_n^{-1}(s) \rightarrow F^{-1}(s)$ with probability 1. If we assume $0 \le Z \le 1$, we get immediately $0 \le F_n^{-1}(s), F^{-1}(s) \le 1$. By dominated convergence, we see that for all $p \in [1, \infty )$ also $\mathbb {E}[|F_n^{-1}(s) - F^{-1}(s)|^p] \rightarrow 0$ converges. $\square $

After the above preparations, we can formulate a function space valued version of the previous theorem. For this, we need to extend the continuity assumption for the quantile in Lemma 15 on the level of significance to our situation. Note further that the gcdf induces for almost all $x \in \mathbb {R}$ a cdf of an $\mathbb {R}$-valued random variable reading as $\mathbb {R} \ni t \mapsto \mathbb {P}[z(x) \le t] = F(t)(x)$. Hence one can interpret the quantile function in a pointwise fashion as a function $s \mapsto \left( F(\,\cdot \,)(x)\right) ^{-1}(s)$ for almost all $x\in \varOmega $. This observation leads to the following assumption.

Assumption 17

Let $s \in (0,1)$ be fixed such that

$$\begin{aligned} N:= \{x \in \varOmega : s \mapsto \left( F(\,\cdot \,)(x)\right) ^{-1}(s) \text { is not continuous in } s \} \end{aligned}$$

has Lebesgue measure zero.

Utilizing Theorem 16 in a pointwise fashion, we obtain the following result.

Theorem 18

Let $z \in K_\mathbb {P}$ and $(z_i)_{i \in \mathbb {N}}$ be iid. samplings according to the distribution $\mathbb {P}_z$. Moreover, fix the level of significance $s \in (0,1)$ fulfilling Assumption 17 with respect to the gcdf of z. Then the sequence of pointwise empirical s-quantiles $\tau _{s,n}$ converges to the s-quantile $\tau _s$ for almost all points with probability 1 and in $L^p(\varOmega \times \Xi )$ for all $p \in [1,\infty )$, i.e.

$$\begin{aligned} \Vert \tau _{s,n} - \tau _s\Vert _{L^p(\varOmega \times \Xi )} \rightarrow 0. \end{aligned}$$

Proof

Since for almost all $x \in \varOmega $ the function $t \mapsto \mathbb {P}[z(x) \le t] = F(t)(x)$ is a cdf, we can apply Theorem 16 pointwise and obtain the convergence of $F_n^{-1}(s) \rightarrow F^{-1}(s)$ pointwise $\lambda ^d \otimes \mathbb {P}$-almost everywhere. Due to the pointwise restriction of z, we know almost surely that $\tau _{s,n} \in [0,1]$ a.e. on $\varOmega $. Since $|\tau _{s,n} - \tau _s| \le 1$ holds $\lambda ^d \otimes \mathbb {P}$-a.e. we derive by dominated convergence

$$\begin{aligned} \Vert \tau _{s,n} - \tau _s\Vert _{L^p(\varOmega \times \Xi )}^p = \int _\varOmega \mathbb {E}\left[ {|\tau _{s,n} - \tau _s|^p}\right] \mathrm {d}x \rightarrow 0, \end{aligned}$$

which ends the proof. $\square $

In this subsection, we have encountered the pointwise quantile as quantity of interest for the investigation of random edges. The associated convergence analysis yields a foundation for a sample-based approximation method for the gcdf and the pointwise quantile. The bisection method from the beginning of this section can then be used to efficiently approximate the empirical quantile for a given sample size.

4.2 Variance Reduction by Control Variates

In contrast to the previous section, where we focus on quantiles, we discuss here the expected value as a quantity of interest. Therefore we consider the random edge indicator as an object in $L^2_\mathbb {P}(L^2(\varOmega ))$ and aim at employing the Monte Carlo method here as well. Since the convergence order is always $\tfrac{1}{2}$, the convergence speed is merely depending on the variance. Associated acceleration methods aim at the formulation of a new random variable with the same expectation as the original one, but a smaller variance. Several such methods do exist mainly in finite dimensions (cf. [34] for an overview). We pursue an approach based on control variates. For convenience we first review the one-dimensional case aiming at an infinite-dimensional generalization.

Let $Z \in L^2(\Xi ,\mathcal {F},\mathbb {P})$ be an $\mathbb {R}$-valued random variable. Our aim is to estimate $\mathbb {E}\left[ {Z}\right] $. Suppose, we have another random variable Y, called the control variate, together with its (exact) expected value. Then we sample from the variable

$$\begin{aligned} Z_{{\rho }} = Z + {{\rho }} \left( Y - \mathbb {E}\left[ {Y}\right] \right) \end{aligned}$$

for a yet to be chosen ${{\rho }} \in \mathbb {R}$. Obviously, it holds that $\mathbb {E}\left[ {Z_{{\rho }}}\right] = \mathbb {E}\left[ {Z}\right] $ and the variance of $Z_{{\rho }}$ is minimal with respect to the weighting parameter ${{\rho }}$ for the choice

$$\begin{aligned} {{\rho }} = -\frac{{\text {Cov}}\left( Z,Y \right) }{{\text {Var}}\left( Y \right) } \end{aligned}$$

where ${\text {Cov}}\left( Z,Y \right) := \mathbb {E}\left[ {(Z - \mathbb {E}\left[ {Z}\right] ) \cdot (Y - \mathbb {E}\left[ {Y}\right] )}\right] $ and ${\text {Var}}\left( Y \right) := $

$\mathbb {E}\left[ {(Y - \mathbb {E}\left[ {Y}\right] )^2}\right] $. The minimal variance for $Z_{{\rho }}$ then becomes

$$\begin{aligned} {\text {Var}}\left( Z_{{\rho }} \right) = {\text {Var}}\left( Z \right) - \frac{{\text {Cov}}\left( Z,Y \right) ^2}{{\text {Var}}\left( Y \right) }, \end{aligned}$$

and, when ${\text {Cov}}\left( Z,Y \right) \ne 0$, a variance reduction is achieved (when compared to Z). Of course, this concept can be extended to m control variates $(Y^{(j)})_{j = 1}^m$ by defining

$$\begin{aligned} Z_{{\rho }} = Z + \sum _{j = 1}^m {{\rho }}_j \left( Y^{(j)} - \mathbb {E}\left[ {Y^{(j)}}\right] \right) . \end{aligned}$$

The optimal ${{\rho }} = ({{\rho }}_1, \dots , {{\rho }}_m)^\top $ then reads

$$\begin{aligned} {{\rho }} = -C_Y^+ c_Z, \end{aligned}$$

where $C_Y = \left( {\text {Cov}}\left( Y^{(i)},Y^{(j)} \right) \right) _{i, j = 1 \dots , m} \in \mathbb {R}^{m \times m}$ denotes the covariance matrix of Y and the quantity $c_Z = ({\text {Cov}}\left( Z,Y^{(j)} \right) )_{j = 1, \dots m} \in \mathbb {R}^m$ denotes the covariance vector of the control variates and the desired target. For a matrix $A \in \mathbb {R}^{m \times m}$ we denote its Moore–Penrose-inverse by $A^+$. In the same way as in the previous section, we extend the above methodology in a pointwise fashion and introduce for this aim the following notation.

For $U,V \in L_\mathbb {P}^2(L^2(\varOmega ))$ we define the pointwise covariance as

$$\begin{aligned} \mathbb {C}\left[ U,V \right] = \mathbb {E}\left[ {(U - \mathbb {E}\left[ {U}\right] )(V - \mathbb {E}\left[ {V}\right] )}\right] \in L^1(\varOmega ), \end{aligned}$$

where $\mathbb {E}$ is the expectation defined via the Bochner integral. Further we define the pointwise variance as

$$\begin{aligned} \mathbb {V}\left[ {U}\right] := \mathbb {C}\left[ U,U \right] = \mathbb {E}\left[ {\left( U - \mathbb {E}\left[ {U}\right] \right) ^2}\right] \end{aligned}$$

in contrast to the variance of U, which reads

$$\begin{aligned} {\text {Var}}\left( U \right) = \mathbb {E}\left[ {\Vert U - \mathbb {E}\left[ {U}\right] \Vert _{L^2(\varOmega )}^2}\right] = \int _\varOmega \mathbb {V}\left[ {U}\right] \mathrm {d}x. \end{aligned}$$

For a random variable $Z \in L^2_\mathbb {P}(L^2(\varOmega ))$, a family $(Y^{(j)})_{j = 1, \dots , m} \subset L^2_\mathbb {P}(L^2(\varOmega ))$ of control variates, and a measurable function ${{\rho }}: \varOmega \rightarrow \mathbb {R}^m$ we consider the random variable

$$\begin{aligned} Z_{{\rho }} = Z + \sum _{j = 1}^m {{\rho }}_j (Y^{(j)} - \mathbb {E}\left[ {Y^{(j)}}\right] ). \end{aligned}$$

The pointwise minimization of $\mathbb {V}\left[ {Z_{{\rho }}}\right] $ with respect to ${{\rho }}$ gives us the minimizer of ${\text {Var}}\left( Z_{{\rho }} \right) $ due to the Fubini theorem. Next we prove its measurability.

Lemma 19

The mapping induced by the Moore-Penrose inverse $( \ \cdot \ )^+: \mathbb {R}^{m \times m} \rightarrow \mathbb {R}^{m \times m}$ with $A \mapsto A^+$ is measurable.

Proof

By Theorem 3.4 in [7], it holds that

$$\begin{aligned} A^+ = \lim _{\delta \searrow 0}(A^\top A + \delta ^2 I_m)^{-1} A^\top . \end{aligned}$$

Let $GL_m(\mathbb {R})$ denote the set of invertible matrices. Since the mapping $(\ \cdot \ )^{-1}: GL_m(\mathbb {R}) \rightarrow GL_m(\mathbb {R})$ is continuous, the functions $f_k: \mathbb {R}^{m \times m} \rightarrow \mathbb {R}^{m \times m}$ defined by

$$\begin{aligned} A \mapsto f_k(A) = \left( A^\top A + \frac{1}{k} I_m\right) ^{-1} A^\top \end{aligned}$$

are continuous and thus measurable. Due to the pointwise convergence $A^+ = \lim _{k \rightarrow \infty } f_k(A)$ also the mapping $(\cdot )^+$ is measurable. $\square $

Hence we have proven the following result.

Proposition 20

The mapping ${{\rho }}: \varOmega \rightarrow \mathbb {R}^m$, defined by ${{\rho }} = -C_Y^+ c_Z$, is (Borel) measurable.

Proof

Since the composition as well as the multiplication of measurable functions are measurable as well, we obtain the measurability of ${{\rho }}$ using the preceding Lemma. $\square $

5 Numerical Examples

Our numerical methods are next applied to a selection of real world images. In this respect, the given image data are a (discrete) pixel matrix ${(G_{jk})}^m_{j,k = 1}$ of size $m \times m$ with values between 0 (black) and 255 (white). In order to utilize our continuous framework, we interpret this pixel mask as a function in $L^\infty (\varOmega )$ with $\varOmega := (0,m)^2$. This is done by decomposing $\varOmega $ (up to null sets) into open squares of size 1 and rewriting

$$\begin{aligned} g_0 = \sum _{j, k = 1}^m G_{jk} \mathbb {1}_{Q_{jk}}, \end{aligned}$$

with $Q_{jk}:= (0,1)^2 + (j-1, m-k)$. Concerning the underlying degradation models, we focus on two cases: (i) white noise and (ii) randomized motion blur.

5.1 White Noise Model

In this setting, we assume $L := id_{L^2(\varOmega )}$ and that the pixel values are corrupted by a family of independent, normally distributed random variables, whose standard deviation is $\sigma > 0$ for all pixels. Then the random variable reads as

$$\begin{aligned} \xi = \sum _{j,k = 1}^m \sigma \cdot \xi _{jk} \mathbb {1}_{Q_{jk}}, \end{aligned}$$

where $\xi _{jk} \sim \mathcal {N}(0,1)$ are iid. with standard normal distribution. In our experiments, we set $\sigma = 40$. This is in fact a rather large standard deviation compared to situations in practice. Hence, since the Mumford–Shah model has been designed in a way that it suppresses the influence of noise, a more aggressive noise model helps the visualization of results in subsequent experiments.

5.2 Motion Blur Model

In addition to the noise model above, we now also consider the motion blur operator $\widehat{L}_\ell \in \mathcal {L}(L^2(\varOmega ))$ with horizontal movement defined by

$$\begin{aligned} \widehat{L}_\ell g_0(x) := \frac{1}{2} \int _{-1}^1 g_0(x + t \ell v) \mathrm {d}t \text { with } \ell \sim \mathcal {U}(0,\bar{\ell }), \end{aligned}$$

and $v = (1,0)^\top $ the direction of motion. The parameter $\ell $ controls the velocity of the motion, respectively, the strength of the blurring effect. Since the integration domain exceeds $\varOmega $, we extend the argument $g_0$ by zero outside its domain. In our model, we assume the parameter $\ell \sim \mathcal {U}(0,\bar{\ell })$ to be a uniformly distributed random variable, where we choose $\bar{\ell }= \frac{m}{4}$ to be one fourth of the image size. In view of our underlying degradation framework, we decompose the above object into

$$\begin{aligned} g = \widehat{L}_\ell g_0 = L g_0 + \xi , \end{aligned}$$

with $L \in \mathcal {L}(L^2(\varOmega ))$ the linear operator such that $L g_0 = \mathbb {E}\left[ {\widehat{L}_\ell g_0}\right] $ holds for all $g_0 \in L^2(\varOmega )$ and $\xi := \widehat{L}_\ell g_0 - L g_0$ as centered random variable. The expected value is the subject of the following auxiliary result.

Lemma 21

Let $\widehat{L}_\ell g(x) := \frac{1}{2} \int _{-1}^1 g(x + t\ell v) \mathrm {d}t$ as above and $\ell \sim \mathcal {U}(0,\bar{\ell })$ be a uniformly distributed random variable with parameter $\bar{\ell }> 0$. Then the operator L has the form

$$\begin{aligned} L g(x) = \mathbb {E}\left[ {\widehat{L}_\ell g}\right] (x) =&\frac{1}{2}\int _{-1}^1 g(x + t \bar{\ell }v) (-\log |t|)\mathrm {d}t\\&\text { for } x \in \varOmega . \end{aligned}$$

Proof

It is easily seen that for each $g \in C_0^\infty (\varOmega )$ the mapping $\ell \mapsto \widehat{L}_\ell g$ is (Lipschitz-)continuous. Using $\Vert \widehat{L}_\ell \Vert _{\mathcal {L}(L^2(\varOmega ))} \le 1$ for all $\ell \in \mathbb {R}$ (shown similary to the proof of $\Vert L\Vert _{\mathcal {L}(L^2(\varOmega ))} \le 1$ below) and the density of $C_0^\infty (\varOmega )$ in $L^2(\varOmega )$ we obtain by the Banach–Steinhaus theorem the continuity of $\ell \mapsto \widehat{L}_\ell g$ for all $g \in L^2(\varOmega )$ and hence the measurability of the above random variable.

For the calculation of the expected value, choose again $g \in C_0^\infty (\mathbb {R}^d)$ arbitrarily. Then we obtain by direct computation

$$\begin{aligned} \mathbb {E}\left[ {\widehat{L}_\ell g}\right] (x)&= \frac{1}{\bar{\ell }} \int _0^{\bar{\ell }} \frac{1}{2} \int _{-1}^1 g(x + t\ell v) \mathrm {d}t\mathrm {d}\ell \\&= \frac{1}{2} \int _0^1 \int _{-1}^1 g(x + t\ell \cdot \bar{\ell }v) \mathrm {d}t\mathrm {d}\ell \\&= \frac{1}{2} \int _0^1 \frac{1}{\ell } \int _{-\ell }^\ell g(x + t \cdot \bar{\ell }v) \mathrm {d}t\mathrm {d}\ell \\&= \frac{1}{2} \int _M \frac{1}{\ell }\, g(x + t \cdot \bar{\ell }v) \mathrm {d}^2(t,\ell )\\&=: (*). \end{aligned}$$

The set of integration M is depicted in Fig. 1. Now we change the order of integration by slicing the set vertically instead of horizontally. A slice at $t \in (-1,1)$ reads then as the interval (|t|, 1). Hence, we proceed and obtain

$$\begin{aligned} (*)&= \frac{1}{2} \int _{-1}^1 \int _{|t|}^1 \frac{1}{\ell }\, g(x + t \cdot \bar{\ell }v) \mathrm {d}\ell \mathrm {d}t\\&= \frac{1}{2} \int _{-1}^1 g(x + t\cdot \bar{\ell }v) \int _{|t|}^1 \frac{1}{\ell } \mathrm {d}\ell \mathrm {d}t\\&= \frac{1}{2} \int _{-1}^1 g(x + t \cdot \bar{\ell }v) (-\log |t|) \mathrm {d}t. \end{aligned}$$

Finally, we obtain the boundedness of L by

$$\begin{aligned}&\Vert L g\Vert _{L^2(\varOmega )}^2 \le \int _{\mathbb {R}^d} |L g|^2(x) \mathrm {d}x\\&\quad = \frac{1}{4} \int _{\mathbb {R}^d} \int _{-1}^1 \int _{-1}^1 g(x + t \cdot \bar{\ell }v) g(x + s \cdot \bar{\ell }v) (-\log |t|) \\&\qquad (-\log |s|) \mathrm {d}t\mathrm {d}s \mathrm {d}x\\&\quad = \frac{1}{4} \int _{-1}^1 \int _{-1}^1 \left( \int _{\mathbb {R}^d} g(x + t \cdot \bar{\ell }v) g(x + s \cdot \bar{\ell }v) \mathrm {d}x\right) \\&\qquad (-\log |t|) (-\log |s|) \mathrm {d}s \mathrm {d}t\\&\quad \le \frac{1}{4}\Vert g\Vert _{L^2(\varOmega )}^2 \left( \int _{-1}^1 (-\log |t|) \mathrm {d}t\right) ^2 = \Vert g\Vert _{L^2(\varOmega )}^2, \end{aligned}$$

from which we deduce the relation for $L^2(\varOmega )$ by using again the density of $C_0^\infty (\varOmega )$ in $L^2(\varOmega )$. $\square $

5.3 Experimental Setup

As set of test images we take the image of the cameraman as well as an MRI scan of a human head in side view (see Fig. 2). We apply both instances of our degradation method to both test images, respectively. A selection of samples is depicted in Fig. 3. Since both of our methods are sample based, we need to generate a critical point of the Ambrosio–Tortorelli functional for each sample. This task can be parallelized. Of course we are unable to solve the minimization problem exactly and hence employ the finite element method (FEM) proposed in [14] in combination with the splitting method for the corresponding discretization. In order to obtain an FE mesh, we divide each of the abovementioned pixel squares into two triangles by inserting alternately a diagonal; see Fig. 4. Since the FEM approximation needs resources not depending on the image (like parts of the local stiffness matrices), we are able to share them among the minimization processes in order to reduce computational overhead. Our implementation is done in MATLAB. For the numerical solution of the PDEs, we use the software package AFEM [4].

In the white noise case, we use the parameters $\alpha = 75, \beta = 1, \varepsilon = 10^{-2}$ and $\eta = \varepsilon ^{1.1}$, and in the motion blur case, we change to the parameters $\alpha = 50, \beta = 10$ for the cameraman and $\alpha = 20, \beta = 10$ for the MRI. Since in the latter case the noise component $\xi $ is depending on the original image, a change of parameters is justified.

Moreover, we emphasize that depending on the form of the operator L it can be difficult to obtain the mass matrix with entries $(L {{\phi }}_i, L {{\phi }}_j)_{L^2(\varOmega )}$ exactly, where ${{\phi }}_i$ are nodal basis functions. Hence we are working with the interpolations $I_\mathcal {T} L {{\phi }}_i$ instead, where $I_\mathcal {T}: C(\bar{\varOmega }) \rightarrow C(\bar{\varOmega })$ denotes the nodal interpolation operator defined by

$$\begin{aligned} \left( I_\mathcal {T}y\right) (x) := \sum _{i = 1}^{(n+1)(m+1)} y(x_i) {{\phi }}_i, \end{aligned}$$

with $(x_i)_{i = 1, \dots , (n+1)(m+1)}$ being the coordinates of the nodes.

5.4 Pointwise Quantiles

Firstly we consider the pointwise quantiles in order to detect areas, where an edge is likely to be found. As level of significance, we take $s = 0.9$ for the noise model cases. For the motion blur case, we take $s = 0.5$—obtaining the median—and $s = 0.1$ to better study the behavior of the quantiles. The visual results are depicted in Fig. 5.

We note that the quantiles of experiments for the white noise case do not differ significantly from the edge indicators of the original images. In fact, on samples of edge indicators one can see little dark spots generated by the algorithm. Since they are not very distinct and their position is random, the quantile does not react sensitive to them. A more sensitive reaction can be seen for the expected values in the following section.

In the motion blur model, the reconstruction behavior has an influence. Since our blurring operator and the actually applied $\widehat{L}_\ell $ differ, the reconstruction part tries to transport intensity where it would normally not belong to. This behavior leads to high oscillations (especially for small $\ell $) as well as doubling effects (especially for big $\ell $) in the corresponding reconstructed image. This results in the horizontal motion of strongly indicated edges as well as weak, blurred edges. Only very small parts of the edges—mainly horizontal ones—remain at the same position.

In order to further clarify this effect, we resort to an additional synthetic test image similar to the one from [27] for the parameters $\alpha = 20$, $\beta = 10$ and evaluate the quantiles using only 200 samples for a descending series of levels of significance. The results can be found in Fig. 6.

One observes that for high levels of significance ($s = 0.9$ and $s = 0.75$) the only clearly recognizable edges are the horizontal ones (upper and lower edge of the bar, the lower edge of the triangle and at the upper and lower poles of the circle). When reducing s, the moving edges gain importance, which results in gray areas around the places corresponding to the circle and the triangle. As it can be seen in the result for $s = 0.3$, the presence of distinct edges next to each other leads—due to the motion blur—to an overlap. This results in the detection of an edge at a position where for the original image no edge occurs at all. For low levels of significance $s = 0.2$ and $s = 0.1$, the edges corresponding to the original image are duplicated in horizontal direction around their original positions. In the last image, this results in a broad band of edges.

Coming now back to our images considered in Fig. 5, we can build on the above observations to get an improved understanding of the mechanisms leading to these results. As we have seen, edges in different directions occur in the quantile for very different values of s. Thus, we use here two levels $s = 0.5$ and $s=0.1$ to get a better understanding. Again, we observe for the first value only gray areas as well as prolonged horizontal edges (e.g., the shoulder of the cameraman and the top of the head in the MRI example). For the lower value of $s = 0.1$, we observe doubled edges around the ones for the original images. In the cameraman image on the left and right boundary, gray horizontal ‘oscillations’ can be observed. This effect occurs since the image function has been extended by 0 outside of $\varOmega $. Since the background is originally gray, we generate here an edge (in contrast to the MRI image, where the background was already black in the first place). This also leads to some oscillatory effects on the left side due to “collisions” of the boundary and the man. We note that since our synthetic test image is an extreme example we do not have here such large bands of edges, but we still find duplications of edges.

5.5 Expected Values

In this section, we utilize the control variates technique to obtain the expected values of the edge indicators. Besides the mean values themselves, we also want to access the variance reduction effect. In our experiments, we use a total of 3 (respectively, 4) control variates $(y_j)$ for the noise-only model (respectively, motion blur model). Their choice is heuristic and specified in Table 1. For the noise case, we choose a different edge detection approach using filter-based methods which is way cheaper than the Mumford–Shah or Ambrosio–Tortorelli-based methods, respectively. Therefore, we use the so-called Marr–Hildreth filter, which reads

$$\begin{aligned} \Delta (k_\sigma \star g)(x)&= \Delta \left( \int _{\mathbb {R}^2} k_\sigma (\,\cdot \,-y)g(y) \mathrm {d}y \right) (x) \\&= \int _{\mathbb {R}^2} (\Delta k_\sigma )(x-y) g(y) \mathrm {d}y, \end{aligned}$$

where $k_\sigma := \exp (-\frac{1}{2\sigma }|x|^2)$ denotes the Gaussian kernel. The idea is to apply a smoothing step to reduce on the one hand the possible influence of noise and, on the other hand, to increase the regularity from an image signal in $L^2(\mathbb {R}^2)$ to a smoothed signal in $C^\infty (\mathbb {R}^2)$. The application of the Laplacian seeks to find sources and valleys in the gradient map, which resort to rapid changes in the gradient. Due to its structure, the Marr–Hildreth filter is also called Laplacian of Gaussian in the literature. For more details on this, the interested reader is referred to [18, section 2.6.2] and the references therein.

Table 1 Overview of the applied control variates

Full size table

As control variates, we then choose its absolute value as well as the exponential of its weighted negative. Especially, the latter has a chance to behave at least locally similar to the edge indicator obtained by our model. Additionally, we use the original image to get the noise applied to the image itself.

For motion blur, a direct application of the above technique would be inappropriate. Due to the presence of the operator L, we first apply a reconstruction step to the image based on the following consideration: Having the edge indicator given, one would just need to solve the linear equation (2a) to obtain the restored image. Since the real edge indicator is not given and expensive to calculate, we perform the reconstruction with respect to the two extreme cases, $z = 1$ corresponding to the a priori assumption of no edge occurring in the image at all, and $z = 0$ corresponding to the case of edges appearing everywhere in the image. Based on this, we consider the functions $v_1,v_2 \in H^1(\varOmega )$ as solutions of the PDEs

$$\begin{aligned}&-\eta \Delta v_1 + \beta L^* L v_1 = \beta L^* g \ \text { in } \varOmega , \frac{\partial v_1}{\partial \nu } = 0 \ \text { on } \partial \varOmega , \\&-(1+\eta ) \Delta v_2 + \beta L^* L v_2 = \beta L^* g \ \text { in } \varOmega , \frac{\partial v_2}{\partial \nu } = 0 \ \text { on } \partial \varOmega , \end{aligned}$$

where $L^*$ is the adjoint of L and $\nu $ denotes the outward unit normal to $\partial \varOmega $. In this sense, we obtain two reconstructions of the image with respect to two different parameter choices. As for the noise model case, we apply again the Marr–Hildreth filter to $v_1,v_2$ and use the absolute values as well as an exponential of its weighted negative.

To study the variance reduction effect, we compare the variances of the edge indicator z with the ones of the modified variable $z_{cv}$. Since our approach is based on the pointwise application of control variates, we moreover investigate where the variance has been reduced. For this sake, we plot the difference of the pointwise variances $\mathbb {V}\left[ {z}\right] - \mathbb {V}\left[ {z_{cv}}\right] $. We remark here that mathematically the difference can never be negative, but since our multiplier and our variances are just estimations, it can happen that scattered negative values do occur. Hence the plots have been truncated at zero from below to give a clearer view. Note that in Fig. 7 the colormap is inverted for better visibility.

An overall number of 1000 samples has been generated, where 100 of them have been used to approximate the multipliers for the control variates. Consequently, for the estimation of mean, variance and difference of pointwise variances the other 900 were used.

For the noise case, we see in Fig. 8 that the background is considerably darker in comparison to the quantiles. This behavior is caused by the isotropic distribution of little dark spots generated by the noise. Besides that, there is again a strong similarity to the edge indicator of the original image. Moreover, a noticeable variance reduction can be observed. By considering the plots of the difference of the pointwise variances, we see a bigger improvement in regions, where many edges are close to each other.

For the motion blur model, we see horizontal edges pulled longer. By closer inspection, one finds several edges parallel and near to the original ones, such that they form a weak blurred band. This is caused by the motion of edges in horizontal direction described in the previous section. A significantly higher variance reduction effect can be observed in Table 2, which is not surprising as the Mumford–Shah model has been designed to reduce the effect of noise. Regarding the difference of the pointwise variances, we find the highest reductions in close proximity to the edges of the original image.

6 Conclusion

In this paper, we studied the influence of errors in images on edges resulting from an image segmentation procedure. Due to the inherent difficulty of performing uncertainty quantification for a geometric quantity, we transitioned to the Ambrosio–Tortorelli problem to profit from the accessible vector space structure. Despite the lack of convexity or unique solutions, we were able to establish a rigorous theoretical foundation for selections of solutions. To perform a quantitative treatment, we proposed numerical methods addressing quantiles and the expected values and applied them to practically meaningful images and instances of the considered degradation model.

Since uncertainty quantification of geometric objects is a relatively new and scarcely investigated topic, we offer here a possible access point for researchers dealing with problems having a similar geometric structure. From the practical viewpoint, our results might be of importance to obtain a qualitative understanding of the uncertainty associated with (reconstructed) edges in certain medical or machine vision applications.

Our investigation opens several new research perspectives. On the one hand, one may incorporate more general degradation operators in the Mumford–Shah formulation. This leads to analytical difficulties for the existence of solutions as well as numerical challenges.

On the numerical level: due to the anisotropic nature of the error indicator as well as the phase field approach the use of adaptive mesh refinement is of interest. Further, with the large amount of data generated from images, the use of compression techniques like low rank approximations might be attractive. However, in both cases the interplay of uncertainties might lead to difficulties and needs careful treatment.

7 A Word of Caution at the End

As an addendum, we want to emphasize that the white noise model considered in this work cannot be interpreted as a pixelwise projection of a random variable with values in $L^2(\varOmega )$. In the following we will make use of techniques from probability theory, which can for example be found in [29]. In contrast to before let $\varOmega = (0,1)^2$ and define for $n \in \mathbb {N}$ a uniform grid $Q^n = {(Q^n_{jk})}^n_{j,k = 1}$ such that the area of a square around every pixel is $\tfrac{1}{n^2}$. Precisely speaking, we are going to show that there exists no random variable $\xi ^* \in {L^{2}_\mathbb {P}}$ such that the white noise with respect to this grid $\xi _Q$ has the same distribution as $\Pi _Q \xi ^*$, where the operator $\Pi _Q \in \mathcal {L}(L^2(\varOmega ))$ is the piecewise projection onto the pixel grid defined by

$$\begin{aligned} \Pi _Q g := \sum _{j,k = 1}^n \frac{1}{|Q_{jk}|} \int _{Q_{jk}} g \mathrm {d}x \cdot \mathbb {1}_{Q_{jk}}. \end{aligned}$$

Having $\lambda ^2(Q^n_{jk}) = \frac{1}{n^2}$, we introduce the orthonormal system $(e_{jk}^n)_{j,k = 1}^n$ with $e_{jk}^n := n \mathbb {1}_{Q_{jk}^n}$ and rewrite the projection $\Pi _{Q^n}$ as

$$\begin{aligned} \Pi _{Q^n} g = \sum _{j,k = 1}^n (e_{jk}^n,g)_{L^2(\varOmega )} \cdot e_{jk}^n. \end{aligned}$$

The white noise reads as

$$\begin{aligned} \xi ^n = \xi _{Q^n} = n \sum _{j,k = 1}^n \xi _{jk} \mathbb {1}_{Q_{jk}^n} = \sum _{j,k = 1}^n \xi _{jk} e_{jk}^n, \end{aligned}$$

with $(\xi _{jk})_{j, k = 1}^n$ a family of independent, identically distributed random variables with standard normal distribution. It is straightforward to show that $\Pi _{Q^{n}}\xi ^{2n} = \xi ^n$ holds for all n. For proving the non-existence of $\xi ^*$ as described above, we assume the contrary. Using the fact that ${\text {diam}}(Q_{jk}^n) = \frac{\sqrt{2}}{n}$ goes to zero for $n \rightarrow \infty $ one can show by the density of smooth functions in $L^2(\varOmega )$ combined with a version of the Banach–Steinhaus theorem (cf. [42, Korollar IV 2.5]) that the operators $\Pi _{Q^n}$ converge pointwise to the identity on $L^2(\varOmega )$.

Table 2 Variance reduction effects

Full size table

Hence $\Pi _{Q_n} \xi ^* \rightarrow \xi ^*$ converges $\lambda ^2 \otimes \mathbb {P}$-a.e. In particular this yields that $\xi _n$—having the same distribution as $\Pi _{Q_n} \xi ^*$—must converge in distribution to $\xi ^*$ meaning that the corresponding probability distributions $\mu _n$ are weakly convergent in the sense of measures. This implies especially the pointwise convergence of the respective characteristic functions, reading for an arbitrary $v \in L^2(\varOmega )$ as

$$\begin{aligned} \varphi _n(v)&:= \int _{L^2(\varOmega )} \exp (i_{\mathbb {C}}(w,v)_{L^2(\varOmega )}) \mathrm {d}\mu _{n}(w) \\ \,\,\,\,\,\,\,&= \mathbb {E}\left[ {\exp \left( i_{\mathbb {C}}(\xi ^n,v)_{L^2(\varOmega )}\right) }\right] , \end{aligned}$$

towards the characteristic function of $\xi ^*$. The computation of $\varphi _n$ yields

$$\begin{aligned} \varphi _n(v) :&= \mathbb {E}\left[ \exp \left( i_{\mathbb {C}}(\xi ^n,v)_{L^2(\varOmega )}\right) \right] \\&= \mathbb {E}\left[ \exp \left( i_{\mathbb {C}}\sum _{j,k = 1}^n \xi _{jk} (e_{jk}^n,v)_{L^2(\varOmega )} \right) \right] \\&= \mathbb {E}\left[ \prod _{j,k = 1}^n \exp \left( i_{\mathbb {C}}\xi _{jk} (e_{jk}^n,v)_{L^2(\varOmega )} \right) \right] \\&= \prod _{j,k = 1}^n \mathbb {E}\left[ \exp \left( i_{\mathbb {C}}\xi _{jk} (e_{jk}^n,v)_{L^2(\varOmega )}\right) \right] \\&= \prod _{j,k = 1}^n \exp \left( -\frac{1}{2}(e_{jk}^n, v)_{L^2(\varOmega )}^2 \right) \\&= \exp \left( -\frac{1}{2}\sum _{j,k = 1}^n (e_{jk}^n,v)_{L^2(\varOmega )}^2 \right) \\&= \exp \left( -\frac{1}{2}\Vert \Pi _{Q^n} v\Vert _{L^2(\varOmega )}^2 \right) . \end{aligned}$$

As we have seen above, the linear operators $\Pi _{Q^n}$ converge pointwise to the identity, such that we deduce the convergence

$$\begin{aligned} \varphi _n(v) \rightarrow \exp \left( -\frac{1}{2}\Vert v\Vert _{L^2(\varOmega )}^2\right) \text { for all } v \in L^2(\varOmega ). \end{aligned}$$

Since we have assumed that the $\mu _n$ converge weakly, this implies that the characteristic function of the distribution of $\xi ^*$ must be $\exp \left( -\tfrac{1}{2}\Vert v\Vert _H^2\right) $. But according to [31, Proposition 1.2.11], there exists no probability measure with this particular characteristic function. This yields the anticipated contradiction.

Hence we deduce that the term white noise is only justified with respect to a given pixel grid.

References

Aliprantis, C.D., Border, K.C.: Infinite Dimensional Analysis: A Hitchhiker’s Guide. Springer, Berlin (2006)
MATH Google Scholar
Attouch, H., Buttazzo, G., Michaille, G.: Variational Analysis in Sobolev and BV Spaces: Applications to PDEs and Optimization, MOS-SIAM Series on Optimization, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia (2014)
Book Google Scholar
Adams, R.A., Fournier, J.J.F.: Sobolev Spaces. Pure and Applied Mathematics. Elsevier Science, Amsterdam (2003)
Google Scholar
AFEM software package. www.math.hu-berlin.de/~ccafm/teachingAdvanced/Tutorials/afem.php. Last visited 10th of May 2019
Aubert, G., Faugeras, O., Kornprobst, P.: Mathematical Problems in Image Processing: Partial Differential Equations and the Calculus of Variations. Applied Mathematical Sciences. Springer, New York (2008)
Google Scholar
Ambrosio, L., Fusco, N., Pallara, D.: Functions of Bounded Variation and Free Discontinuity Problems. Oxford Science Publications, Clarendon Press, London (2000)
MATH Google Scholar
Albert, A.: Regression and the Moore–Penrose Pseudoinverse. Mathematics in Science and Engineering. Elsevier Science, Amsterdam (1972)
MATH Google Scholar
Ambrosio, L.: A compactness theorem for a new class of functions of bounded variation. Bollettino della Unione Matematica Italiana 3(4), 857–881 (1989)
MathSciNet MATH Google Scholar
Ambrosio, L.: Existence theory for a new class of variational problems. Arch. Ration. Mech. Anal. 111(4), 291–322 (1990)
Article MathSciNet Google Scholar
Ambrosio, L., Tortorelli, V.M.: Approximation of functional depending on jumps by elliptic functional via t-convergence. Commun. Pure Appl. Math. 43(8), 999–1036 (1990)
Article MathSciNet Google Scholar
Ambrosio, L., Tortorelli, V.M.: On the approximation of free discontinuity problems. Boll. Unione Mat. Ital. VII. Ser. B 6(1), 105–123 (1992)
MathSciNet MATH Google Scholar
Bankman, I.: Handbook of Medical Image Processing and Analysis. Academic Press Series in Biomedical Engineering, 2nd edn. Elsevier Science, Amsterdam (2008)
Google Scholar
Beer, G.: Topologies on Closed and Closed Convex Sets. Mathematics and Its Applications. Springer, Berlin (1993)
Book Google Scholar
Burke, S., Ortner, C., Süli, E.: An adaptive finite element approximation of a variational model of brittle fracture. SIAM J. Numer. Anal. 48(3), 980–1012 (2010)
Article MathSciNet Google Scholar
Chambolle, A.: Image segmentation by variational methods: Mumford and Shah functional and the discrete approximations. SIAM J. Appl. Math. 55(3), 827–863 (1995)
Article MathSciNet Google Scholar
Caselles, V., Kimmel, R., Sapiro, G., Sbert, C.: Minimal surfaces based object segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 19(4), 394–398 (1997)
Article Google Scholar
Cohen, L.D.: Avoiding local minima for deformable curves in image analysis. In: Le Méhauté, A., Rabut, C., Schumaker, L.L. (eds.) Curves and Surfaces with Applications in CAGD, pp. 77–84. Vanderbilt University Press, Nashvill (1997)
Google Scholar
Chan, T.F., Shen, J.: Image Processing and Analysis: Variational, PDE, Wavelet, and Stochastic Methods. Other Titles in Applied Mathematics. Society for Industrial and Applied Mathematics, Philadelphia (2005)
Book Google Scholar
Chen, Y.-X., Zhu, W.-J.: Note on the strong law of large numbers in a Hilbert space. Gen. Math. 19(3), 11–18 (2011)
MathSciNet MATH Google Scholar
De Giorgi, E., Carriero, M., Leaci, A.: Existence theorem for a minimum problem with free discontinuity set. Arch. Ration. Mech. Anal. 108(4), 195–218 (1989)
Article MathSciNet Google Scholar
Evans, L.C., Gariepy, R.F.: Measure Theory and Fine Properties of Functions. Studies in Advanced Mathematics. Taylor & Francis, New York (1991)
Google Scholar
Geman, D., Geman, S.: Stochastic relaxation, Gibbs distribution, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6(6), 721–741 (1984)
Article Google Scholar
Haehnle, J.: Numerical Analysis of the Mumford–Shah and Mumford–Shah–Euler Functionals for Sphere-Valued Functions, and Applications to Numerical Image Processing. Dissertation, Universität Tübingen (2010)
Hintermüller, M., Laurain, A.: Multiphase image segmentation and modulation recovery based on shape and topological sensitivity. J. Math. Imaging Vis. 35(1), 1–22 (2009)
Article MathSciNet Google Scholar
Hintermüller, M., Ring, W.: A second order shape optimization approach for image segmentation. SIAM J. Appl. Math. 64(2), 442–467 (2003)
Article MathSciNet Google Scholar
Hintermüller, M., Ring, W.: An inexact Newton-CG-type active contour approach for the minimization of the Mumford–Shah functional. J. Math. Imaging Vis. 20, 19–42 (2004)
Article MathSciNet Google Scholar
Hintermüller, M., Stadler, G.: An infeasible primal–dual algorithm for total bounded variation-based inf-convolution-type image restoration. SIAM J. Sci. Comput. 28(1), 1–23 (2006)
Article MathSciNet Google Scholar
Hytönen, T., van Neerven, J., Veraar, M., Weis, L.: Analysis in Banach Spaces: Volume I—Martingales and Littlewood–Paley Theory. Ergebnisse der Mathematik und ihrer Grenzgebiete. 3. Folge/A Series of Modern Surveys in Mathematics. Springer International Publishing, Cham (2016)
Book Google Scholar
Klenke, A.: Probability Theory: A Comprehensive Course. Universitext. Springer, London (2013)
Google Scholar
Kass, M., Witkin, A., Terzopoulos, D.: Snakes; active contour models. Int. J. Comput. Vis. 1, 321–331 (1987)
Article Google Scholar
Maniglia, S., Rhandi, A.: Gaussian measures on separable Hilbert spaces and applications. Quaderni del Dipartimento di Matematica dell’Università del Salento 2004(1) (2004)
Mumford, D., Shah, J.: Optimal approximations by piecewise smooth functions and associated variational problems. Commun. Pure Appl. Math. 42(5), 577–685 (1989)
Article MathSciNet Google Scholar
Morel, J.-M., Solimini, S.: Variational Methods in Image Segmentation, Volume 14 of Progress in Nonlinear Differential Equations and Their Applications. Birkhäuser Boston Inc., Boston (1995). With seven image processing experiments
Book Google Scholar
Owen, A.B.: Monte Carlo theory, methods and examples. not published yet, see http://statweb.stanford.edu/~owen/mc/ (2013)
Parker, J.R.: Algorithms for Image Processing and Computer Vision. EBL-Schweitzer, 2nd edn. Wiley, New York (2010)
Google Scholar
Pätz, T., Kirby, R.M., Preusser, T.: Ambrosio–Tortorelli segmentation of stochastic images: model extensions, theoretical investigations and numerical methods. Int. J. Comput. Vis. 103(2), 190–212 (2013)
Article MathSciNet Google Scholar
Rockafellar, R.T., Wets, M., Wets, R.J.B.: Variational Analysis. Grundlehren der mathematischen Wissenschaften. Springer, Berlin (2009)
Google Scholar
Smith, R.C.: Uncertainty Quantification: Theory, Implementation, and Applications. Computational Science and Engineering. SIAM, Philadelphia (2013)
Google Scholar
Sullivan, T.J.: Introduction to Uncertainty Quantification. Texts in Applied Mathematics. Springer International Publishing, Cham (2015)
Book Google Scholar
Treves, F., Smith, P.A., Eilenberg, S.: Topological Vector Spaces, Distributions and Kernels: Pure and Applied Mathematics, vol. 25. Elsevier Science, Amsterdam (2016)
Google Scholar
van der Vaart, A.W.: Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge (2000)
Google Scholar
Werner, D.: Funktionalanalysis. Springer-Lehrbuch. Springer, Berlin (2006)
Google Scholar

Download references

Acknowledgements

The authors thank the anonymous referees for many constructive comments and suggestions that helped to improve the paper.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Weierstrass Institute, Mohrenstr. 39, 10117, Berlin, Germany
Michael Hintermüller & Steven-Marian Stengl
Humboldt-Universität zu Berlin, Unter den Linden 6, 10099, Berlin, Germany
Michael Hintermüller & Steven-Marian Stengl
Department of Mathematics and Computer Science, Philipps-Universität Marburg, Hans-Meerwein-Str. 6, 35032, Marburg, Germany
Thomas M. Surowiec

Authors

Michael Hintermüller
View author publications
You can also search for this author in PubMed Google Scholar
Steven-Marian Stengl
View author publications
You can also search for this author in PubMed Google Scholar
Thomas M. Surowiec
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael Hintermüller.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The first two authors acknowledge financial support by the DFG under the Grant HI 1466/10-1 and the third author under SU 963/1-1, both associated with the project “Generalized Nash Equilibrium Problems with Partial Differential Operators: Theory, Algorithms, and Risk Aversion” within the priority programme SPP1962 “Non-smooth and Complementarity-based Distributed Parameter Systems: Simulation and Hierarchical Optimization.” Moreover, the first author received funding by the Einstein Center Mathematics research project (ECMath CH12) as well as by the Berlin Mathematics Research Center MATH+

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Hintermüller, M., Stengl, SM. & Surowiec, T.M. Uncertainty Quantification in Image Segmentation Using the Ambrosio–Tortorelli Approximation of the Mumford–Shah Energy. J Math Imaging Vis 63, 1095–1117 (2021). https://doi.org/10.1007/s10851-021-01034-2

Download citation

Received: 06 April 2020
Accepted: 21 April 2021
Published: 03 July 2021
Issue Date: November 2021
DOI: https://doi.org/10.1007/s10851-021-01034-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Uncertainty Quantification in Image Segmentation Using the Ambrosio–Tortorelli Approximation of the Mumford–Shah Energy

Abstract

Similar content being viewed by others

Mumford and Shah Model and Its Applications to Image Segmentation and Image Restoration

Mumford and Shah Model and Its Applications to Image Segmentation and Image Restoration

Automated Image Restoration and Segmentation Using Mumford–Shah-Like Regularization and Topological Asymptotic Expansion

1 Introduction

2 Properties of the Deterministic Problem

Lemma 1

Proof

Theorem 2

Proof

Theorem 3

Proof

Corollary 4

Proof

3 Existence of Measurable Selections

3.1 Measurable Selections for Ambrosio–Tortorelli

Theorem 5

Proof

Theorem 6

Corollary 7

Proof

3.2 Measurable Selections for Mumford–Shah

Theorem 8

Proof

3.3 Characterizing Measurable Selections for Ambrosio–Tortorelli

Theorem 9

Theorem 10

4 Numerical Methods

Theorem 11

4.1 Generalized Confidence Intervals

Definition 12

Proposition 13

Definition 14

Lemma 15

Proof

Theorem 16

Proof

Assumption 17

Theorem 18

Proof

4.2 Variance Reduction by Control Variates

Lemma 19

Proof

Proposition 20

Proof

5 Numerical Examples

5.1 White Noise Model

5.2 Motion Blur Model

Lemma 21

Proof

5.3 Experimental Setup

5.4 Pointwise Quantiles

5.5 Expected Values

6 Conclusion

7 A Word of Caution at the End

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation