1 Introduction

Image inpainting refers to the filling in of a region in an image where information is missing or needs to be replaced (due to, for example, an occluding object), in such a way that the result looks plausible to the human eye. The region to be filled is subsequently referred to as the inpainting domain. Since the seminal work of Bertalmio et al. [6], image inpainting has become increasingly important, with applications ranging from removing an undesirable or occluding object from a photograph, to painting out a wire in an action sequence, to 3D conversion of film  [27], as well as 3D TV and novel viewpoint construction [12, 17, 29] in more recent years. See [21] for a recent survey of the field.

Image inpainting methods can loosely be categorized as exemplar-based and geometric. Exemplar-based methods generally operate by copying pieces of the undamaged portion of the image into the inpainting domain, in such a way as to make the result appear seamless. Examples include  [3, 16, 43]. Exemplar-based methods also operate behind the scenes in Photoshop’s famous Content-Aware Fill tool. These methods excel at interpolating texture across large gaps, but may produce artifacts in structure dominated images with strong edges, or be needlessly expensive if the inpainting domain is thin.

Fig. 1
figure 1

Shell-based inpainting: here we illustrate the shell-based inpainting of an image including an undesirable human to be removed. In a we see the original image, including a human that is gradually eroded in bd as we fill more shells. In this case the inpainting method is Guidefill [27], and the application is disocclusion for 3D conversion, which means that the human does not need to be removed entirely. See  [27] or [34, Chapter 9] for more details on this application

Geometric inpainting methods attempt to smoothly extend image structure into the inpainting domain, typically using partial differential equations (PDEs) or variational principles—see  [34] for a comprehensive survey. Examples based on PDEs include the seminal work of Bertalmio et al.  [6], methods based on anisotropic diffusion such as [40], while examples based on variational principles include TV, TV-\(\hbox {H}^{-1}\), Mumford–Shah, Cahn–Hilliard inpainting [11, 14], Euler’s Elastica [13, 33], as well as the joint interpolation of image values and a guiding vector field in Ballester et al. [4]. These approaches are typically iterative and convergence is often slow (due to either the high order of the underlying PDE model or the difficult nature of the variational problem). On the other hand, Telea’s Algorithm  [38], coherence transport [7, 31], and our previous work Guidefill [27] are based on the simple idea of filling the inpainting domain in successive shells from the boundary inward, setting the color of pixels on the current boundary equal to a weighted average of their already filled neighbors—Fig. 1 illustrates this process. In general, “neighbors” may include “ghost pixels” lying between pixel centers and defined using bilinear interpolation—this concept was introduced in [27]. By choosing the weights, fill order, and averaging neighborhood carefully, image isophotes may be extrapolated into the inpainting domain. Compared to their iterative counterparts, these methods have the advantage of being very fast. However, at the same time all of them create, to a greater or lesser degree, some potentially disturbing visual artifacts. For example, extrapolated isophotes may “kink” (change direction), blur, or end abruptly. Although generally these problems have been gradually reduced over time as newer and better algorithms have been proposed, they are still present to some degree even in the state of the art.

The present paper has two goals. Firstly, to analyze these algorithms as a class in order to understand the origins of these artifacts and what, if anything, can be done about them. Secondly, to propose a simple extension in which the pixels in a given shell are solved simultaneously as a linear system, and to analyze its benefits. We will refer to the original methods as “direct” and the extended methods as “semi-implicit”—see Fig. 2 for an illustration of the difference between them. To our knowledge, this extension has never before been proposed in the literature. We also propose a simple iterative method for solving the linear system arising in the semi-implicit extension, show that it is equivalent to damped-Jacobi or successive over-relaxation (SOR), and analyze its convergence rates when applied to the linear system arising in semi-implicit Guidefill. Successive over-relaxation is shown to converge extremely quickly, provided unknowns are ordered appropriately.

Fig. 2
figure 2

The direct method and its semi-implicit extension: in this illustration, the filled portion of the image is highlighted in pale yellow, and the current inpainting domain is highlighted in both grey and blue, with the former denoting pixels in the interior of the inpainting domain and the latter pixels on its current boundary. The boundary is the “shell” that is currently being filled—note that the filled (known) portion of the image consists not only of the original undamaged region, but also of previously filled shells. At this stage, the grey and blue pixels are both unknown. In the direct method (a), the color of a given pixel (highlighted in red) is computed directly as a weighted average of pixels (in green) within a given neighboorhood (outlined in white) that are already known. In the semi-implicit extension, the sum also includes pixels on the current boundary of the inpainting domain, but not pixels in its interior. This results in linear system that needs to be solved, but this can be done relatively cheaply (see Sect. 6.2) and has benefits in terms of artifact reduction (see Sect. 7.1)

Remark 1

The original motivation for this paper comes from one of the author’s experiences working in industry for the 3D conversion company Gener8 (see [27] or [34, Chapter 9] for a description of 3D conversion). The programmers there had implemented a series of shell-based inpainting methods of the form discussed above (all essentially variants of Telea’s algorithm  [38], which they were unaware of) in order to solve an inpainting problem arising in their 3D conversion pipeline. This type of algorithm was attractive to them because it is fast and simple, while generally adequate for their application (where inpainting domains are typically very thin). However, they were puzzled by some of the artifacts they were seeing, most notably kinking artifacts. After a literature review it became clear that the existing theory [7, 31, 32] was enough to explain some, but not all, of what they observed. This paper is an attempt to fill the gap.

1.1 Organization

The remainder of this paper is organized as follows. First, in Sect. 2, we present the general form of the class of inpainting methods under consideration, including pseudocode for both the direct method and an implementation of our proposed semi-implicit extension (see Algorithm 1). We also cover in detail known artifacts created by the direct form of the method, how they have been reduced over time, related work and our contributions. Next, in Sect. 3 we review the main variants of Algorithm 1 present in the literature. Section 4 describes an equivalence principle between weighted sums of ghost pixels and a sum over real pixels with different weights. This principle will be applied repeatedly throughout our theoretical analysis. Section 5 describes our semi-implicit extension of Algorithm 1, including a description of an implementation of two alternatives for solving the resulting linear system, namely damped Jacobi and successive over-relaxation (SOR). Section 6 contains our core analysis, and is divided into a number of subsections. The first of these, Sect. 6.1, describes the simplifying assumptions that we make throughout. Next, Sect. 6.2 presents a theoretical analysis of the convergence rates of damped Jacobi and SOR for solving the linear system arising in semi-implicit Guidefill, and compares with real experiments. In Sect. 6.3 we define the continuum and asymptotic limits of Algorithm 1. Section 6.4 describes the connection between Algorithm 1 and stopped random walks, while Sects. 6.5 and 6.7 use this connection to prove convergence to the asymptotic and continuum limits previously defined in Sect. 6.3. Next, In Sect. 7, we apply our analysis to gain a theoretical understanding of kinking and blur artifacts. In particular, Sect. 7.1 utilizes our continuum limit to both explain kinking artifacts, and to show how to overcome them. Section 7.3 utilizes the asymptotic limit to make theoretical predictions quantifying the extent to which a given method introduces blur artifacts. Sections 7.1 and 7.3 also contain numerous numerical experiments demonstrating the accuracy of the predictions made by these limits. In Sect. 8 we show that our results—although derived in the context of image inpainting—are abstract results that can be applied to other problems. To demonstrate this, we apply our asymptotic limit to a problem in numerical linear algebra. Finally in Sect. 9 we draw some conclusions and sketch some directions for potential future research. We also have a number of appendices giving technical details of certain proofs.

1.2 Notation

  • \(h = \text{ the } \text{ width } \text{ of } \text{ one } \text{ pixel }\).

  • \(\mathbb {Z}^2_h := \{ (nh,mh) : (n,m) \in \mathbb {Z}^2 \}.\)

  • Given \(\mathbf{v} \in \mathbb {R}^2\), we denote by \(L_\mathbf{v}:=\{ \lambda \mathbf{v} : \lambda \in \mathbb {R} \}\) the line through the origin parallel to \(\mathbf{v}\).

  • Given \(\mathbf{x} \in \mathbb {R}^2\), we denote by \(\theta (\mathbf{x}) \in [0,\pi )\) the counter-clockwise angle \(L_\mathbf{x}\) makes with the x-axis. \(\theta (\mathbf{x})\) can also be thought of as the counterclockwise angle \(\mathbf{x}\) makes with the x-axis, modulo \(\pi \).

  • Given \(\mathbf{v} \in \mathbb {R}^2\), we denote by \(\mathbf{v}^{\perp }\) the counterclockwise rotation of \(\mathbf{v}\) by \(90^{\circ }\).

  • \(\varOmega = [a,b] \times [c,d]\) and \(\varOmega _h = \varOmega \cap \mathbb {Z}^2_h\) are the continuous and discrete image domains.

  • \(D_h = D^{(0)}_h \subset \varOmega _h\) is the (initial) discrete inpainting domain.

  • \(D^{(k)}_h \subseteq D^{(0)}_h\) is the discrete inpainting domain on step k of the algorithm.

  • \(D \subset \varOmega := \{ \mathbf{x} \in \varOmega : \exists \mathbf{y} \in D_h \text{ s.t. } \Vert \mathbf{y}-\mathbf{x}\Vert _{\infty } < h \}\) is the continuous inpainting domain.

  • \(D^{(k)}\) is the continuous inpainting domain on step k of the algorithm, defined in the same way as D.

  • \(u_0 : \varOmega _h \backslash D_h \rightarrow \mathbb {R}^d\) is the given (discrete) image. In an abuse of notation, we also use \(u_0\) to denote an assumed underlying continuous image \(u_0 : \varOmega \backslash D \rightarrow \mathbb {R}^d\).

  • \(u_h : \varOmega _h \rightarrow \mathbb {R}^d\) is the inpainted completion of \(u_0\).

  • \(\mathbf{g}(\mathbf{x})\) is the guidance vector field used by coherence transport and Guidefill.

  • \(B_{\epsilon }(\mathbf{x})\) the solid ball of radius \(\epsilon \) centered at \(\mathbf{x}\).

  • \(A_{\epsilon ,h}(\mathbf{x}) \subset B_{\epsilon }(\mathbf{x})\) denotes a generic discrete (but not necessarily lattice aligned) neighborhood of radius \(\epsilon \) surrounding the pixel \(\mathbf{x}\) and used for inpainting.

  • \(\text{ Supp }(A_{\epsilon ,h}(\mathbf{x})) \subset \mathbb {Z}_h^2\) denotes the set of real pixels needed to define \(A_{\epsilon ,h}(\mathbf{x})\) based on bilinear interpolation.

  • \(B_{\epsilon ,h}(\mathbf{x}) = \{ \mathbf{y} \in \varOmega _h : \Vert \mathbf{x}-\mathbf{y}\Vert \le \epsilon \}\), the discrete ball of radius \(\epsilon \) centerd at \(\mathbf{x}\) and the choice of \(A_{\epsilon ,h}(\mathbf{x})\) used by coherence transport.

  • \(r = \epsilon /h\) the radius of \(B_{\epsilon ,h}(\mathbf{x})\) measured in pixels.

  • \({\tilde{B}}_{\epsilon ,h}(\mathbf{x}) = R(B_{\epsilon ,h}(\mathbf{x}))\), where R is the rotation matrix taking (0, 1) to \(\mathbf{g}(\mathbf{x})\), the choice of \(A_{\epsilon ,h}(\mathbf{x})\) used by Guidefill.

  • \({\mathcal {N}}(\mathbf{x})=\{ \mathbf{x}+\mathbf{y} : \mathbf{y} \in \{-h,0,h\} \times \{-h,0,h\}\}\) is the 9-point neighborhood of \(\mathbf{x}\).

  • Given \(A_h \subset \mathbb {Z}^2_h\), we define the dilation of \(A_h\) by

    $$\begin{aligned} D_h(A_h) = \cup _{\mathbf{x} \in A_h} {\mathcal {N}}(\mathbf{x}). \end{aligned}$$

    If \(h=1\) we write D instead of \(D_1\).

  • Given \(A_h \subset \mathbb {Z}^2_h\), we define the discrete (inner) boundary of \(A_h\) by

    $$\begin{aligned} \partial A_h := \{ \mathbf{x} \in A_h : {\mathcal {N}}(\mathbf{x}) \cap \mathbb {Z}^2_h \backslash A_h \ne \emptyset \}. \end{aligned}$$

    For convenience we typically drop the word “inner” and refer to \(\partial A_h\) as just the boundary of \(A_h\).

  • O denotes the zero matrix.

  • Given \(c \in \mathbb {R}\), we define \(\{ y \le c \} := \{ (x,y) \in \mathbb {R}^2 : y \le c \}.\)

  • Given \( L > 0\), and \(0 \le x,y < L\) we define the circular distance between x and y by

    $$\begin{aligned} d^{{\text {circle}}}_L(x,y) = \min (x-y \mod L, y-x \mod L). \end{aligned}$$
  • Given \(\sigma , \mu \in \mathbb {R}\) we denote by \(G_{\sigma ,\mu }\) and \(g_{\sigma ,\mu }\) the Gaussians

    $$\begin{aligned} G_{\sigma ,\mu }(x) := \frac{e^{-\frac{(x-\mu )^2}{2\sigma ^2}}}{\sqrt{2\pi }\sigma } \quad g_{\sigma ,\mu }(x):=\frac{e^{-\frac{(x-\mu )^2}{2\sigma ^2}}}{\sum _{n \in \mathbb {Z}} e^{-\frac{(n-\mu )^2}{2\sigma ^2}}} \end{aligned}$$

    Similarly, given \(L > 0\) and \(N \in \mathbb {N}\) we denote by \(G^*_{\sigma ,\mu ,L}\) and \(g^*_{\sigma ,\mu ,N}\) the periodic Gaussians

    $$\begin{aligned} G^*_{\sigma ,\mu ,L}(x):= & {} \frac{e^{-\frac{d^{{\text {circle}}}_L(x,\mu )^2}{2\sigma ^2}}}{\sqrt{2\pi }\sigma }\\ g^*_{\sigma ,\mu ,N}(x):= & {} \frac{e^{-\frac{d^{{\text {circle}}}_N(x,\mu )^2}{2\sigma ^2}}}{\sum _{n =0}^{N-1} e^{-\frac{d^{{\text {circle}}}_N(n,\mu )^2}{2\sigma ^2}}}\end{aligned}$$

    Note that \(G_{\sigma ,\mu }\) and \(g_{\sigma ,\mu }\) are normalized with respect to integration over \(\mathbb {R}\) and summation over \(\mathbb {Z}\) respectively, while \(\frac{1}{\int _0^L G^*_{\sigma ,\mu ,L}(x)dx }\cdot G^*_{\sigma ,\mu ,L}\) and \(g^*_{\sigma ,\mu ,N}\) are normalized with respect to integration over [0, L) and summation over \(\{0,1,\ldots ,N-1\}\) respectively.

2 Shell-Based Geometric Inpainting

figure a
Fig. 3
figure 3

Illustration of a generic set \(A_{\epsilon ,h}(\mathbf{x})\) containing ghost pixels: in this illustration the overlaid grid is the lattice \(\mathbb {Z}^2_h\) with pixel centers at its vertices. The elements of a generic set \(A_{\epsilon ,h}(\mathbf{x})\) are represented as red dots—they do not need to occupy pixel centers, but they must all lie within distance \(\epsilon \) of \(\mathbf{x}\). Ghost pixels are defined based on bilinear interpolation of their real pixel neighbors. Here we have highlighted in green the squares whose vertices are the real pixels needed to define \(A_{\epsilon ,h}(\mathbf{x})\). We call this set of real pixels \(\text{ Supp }(A_{\epsilon ,h}(\mathbf{x}))\). Note that while \(A_{\epsilon ,h}(\mathbf{x}) \subset B_{\epsilon }(\mathbf{x})\), this inclusion is not in general true of \(\text{ Supp }(A_{\epsilon ,h}(\mathbf{x}))\)

In this paper we are interested in a simple class of shell-based geometric inpainting methods and their proposed semi-implicit extension. These methods fill the inpainting domain in successive shells from the boundary inwards, as illustrated in Fig. 1. In the direct form of the method, the color of a given pixel \(\mathbf{x}\) due to be filled is computed as a weighted average of its already filled neighbors within a discrete neighborhood \(A_{\epsilon ,h}(\mathbf{x}) \subset B_{\epsilon }(\mathbf{x})\). In the semi-implicit extension, this sum also includes unknown pixels within the current shell, resulting in a linear system (Fig. 2). We will cover the resulting linear system in detail in Sect. 5, where we also propose an iterative method for its solution. The direct method as well as this proposed iterative solution to the semi-implicit extension are illustrated in Algorithm 1 with pseudo code (the blue code indicates the parts relevant to the semi-implicit extension). The neighborhood \(A_{\epsilon ,h}(\mathbf{x})\) need not be axis aligned and may contain “ghost pixels” lying between pixel centers—see Fig. 3 for an illustration. Ghost pixels were introduced in [27] (where they were shown to be beneficial for reducing “kinking artifacts”—see Fig. 6), and the color of a given ghost pixel is defined as the bilinear interpolation of its four real pixel neighbors, but is undefined if one or more of them has not yet been assigned a color. We denote by \(\text{ Supp }(A_{\epsilon ,h}(\mathbf{x})) \subset \mathbb {Z}_h^2\) the set of real pixels needed to define \(A_{\epsilon ,h}(\mathbf{x})\) in this way. Here h and \(\epsilon \) denote respectively the width of one pixel and the radius of the bounding disk \(B_{\epsilon }(\mathbf{x}) \supset A_{\epsilon ,h}(\mathbf{x})\). The averaging weights \(w_{\epsilon }\) are non-negative and are allowed to depend on \(\mathbf{x}\), but must scale proportionally with the size of the neighborhood \(A_{\epsilon ,h}(\mathbf{x})\), like so:

$$\begin{aligned} w_{\epsilon }(\mathbf{x},\mathbf{y})={\hat{w}}\left( \mathbf{x}, \frac{\mathbf{y}-\mathbf{x}}{\epsilon }\right) \end{aligned}$$
(2)

for some function \({\hat{w}}(\cdot ,\cdot ) : \varOmega \times B_1(\mathbf{0}) \rightarrow [0,\infty ]\). Note that we will sometimes write \(w_r\) or \(w_1\) in place of \(w_{\epsilon }\)—in this case we mean (2) with \(\epsilon \) replaced by r or 1 in the denominator on the right hand side. As the algorithm proceeds, the inpainting domain shrinks, generating a sequence of inpainting domains

$$\begin{aligned} D_h = D^{(0)}_h \supset D^{(1)}_h \supset \ldots \supset D^{(K)}_h = \emptyset . \end{aligned}$$

We will assume the non-degeneracy condition

$$\begin{aligned} \sum _{\mathbf{y} \in A_{\epsilon ,h}(\mathbf{x}) \cap (\varOmega \backslash D^{(k)})}w_{\epsilon }(\mathbf{x},\mathbf{y}) > 0 \end{aligned}$$
(3)

holds at all times, this ensures that the average (1) in Algorithm 1 is always well defined. One trivial way of ensuring this is by having strictly positive weights \({\hat{w}}\), which all the methods considered do (see Sect. 3). At iteration k, only pixels belonging to the current boundary \(\partial D^{(k)}_h\) are filled, but moreover we fill only a subset \(\partial _{\text{ ready }} D^{(k)}_h \subseteq \partial D^{(k)}_h\) of pixels deemed to be “ready” (In Sect. 3 we will review the main methods in the literature and give their “ready” functions). The main inpainting methods in the literature of the general form given by Algorithm 1 are (in chronological order) Telea’s Algorithm [38], coherence transport [7], coherence transport with adapted distance functions [31], and our previous work Guidefill  [27]. As we will see, these methods essentially differ only in the choice of weights (2), the choice of fill order as dictated by the “ready” function, and the choice of neighborhood \(A_{\epsilon ,h}(\mathbf{x})\). We will review these methods in Sect. 3.

Remark 2

It is worth mentioning that this class of algorithms is nearly exactly the same as the “generic single-pass algorithm” first systematically studied by Bornemann and März in [7]. The two main differences are

  1. 1.

    They assume \(A_{\epsilon ,h}(\mathbf{x})=B_{\epsilon ,h}(\mathbf{x})\), while we allow for greater flexibility.

  2. 2.

    They consider only the direct form of Algorithm 1, not the semi-implicit extension.

Beyond this, they also phrase things in terms of a pre-determined fill order, rather than a “ready” function, but the former may easily be seen, mathematically at least, to be a special case of the latter (Sect. 3).

2.1 Advantages of Algorithm 1

The main appeal of Algorithm 1 is its simplicity and parallelizability, which enable it to run very fast. A second advantage, first noted by Bornemann and März [7], is the stability property

$$\begin{aligned} \min _{\mathbf{y} \in B}u_0(\mathbf{y}) \le u_h(\mathbf{x}) \le \max _{\mathbf{y} \in B} u_0(\mathbf{y}) \qquad \text{ for } \text{ all } {} \mathbf{x} \in D_h, \end{aligned}$$
(4)

(which holds channelwise) where \(u_0 : \varOmega _h \backslash D_h \rightarrow \mathbb {R}^d\) is the given image and B is the band of width \(\epsilon \) pixels surrounding \(\partial D_h\). This property holds because we have chosen non-negative weights summing to one.

Remark 3

Although we have presented Telea’s algorithm [38] as an example of Algorithm 1, this is not strictly true as its update formula (5) (see Sect. 3) contains a gradient term that, after it has been approximated with finite differences, effectively violates the rule of non-negative weights summing to one. This means that Telea’s algorithm does not satisfy the stability property (4). See Fig. 7.

2.2 Disadvantages and Artifacts

The main disadvantage of Algorithm 1 is obviously loss of texture, and in cases where texture is important, these methods should not be used. However, beyond loss of texture, inpainting methods of the general form given in Algorithm 1 can also introduce a host of of other artifacts, which we list below.

  • “kinking” of isophotes where extrapolated isophotes change direction at the boundary of the inpainting domain—see Figs. 6 and 8.

  • “blurring” of isophotes where edges that are sharp in the undamaged region may blur when extended into the inpainting domain—see Figs. 6 and 8.

  • “cutting off” of isophotes where isophotes extrapolated into the inpainting domain end abruptly—see Fig. 4.

  • formation of shocks where sharp discontinuities may form in the inpainting domain—see Fig. 4.

  • bright or dark spots that are only a problem if the stability condition (4) is violated, as it is for Telea’s algorithm. See Figs. 7 and 8.

Fig. 4
figure 4

Cut off isophotes and shocks: because Algorithm 1 fills the inpainting domain from many directions at once, “cut off isophotes” or shocks can sometimes be formed. In a this is due to the (superimposed) fill order, which is the default onion shell ordering and a bad choice in this case. In b we have a chosen a new fill order better adapted to the image and the problem is solved in this case. However, the shock in c is due to incompatible boundary conditions and it is unlikely any special fill order could solve the problem

Fig. 5
figure 5

Special directions: for a given guidance direction \(\mathbf{g}=(\cos \theta ,\sin \theta )\), coherence transport  [7, 31] can successfully extrapolate isophotes parallel to \(\mathbf{g}\) only if \(\mathbf{g}=\lambda \mathbf{v}\), for some \(\mathbf{v} \in B_{\epsilon ,h}(\mathbf{0})\). This is illustrated in b, where have solved the inpainting problem posed in a multiple times using coherence transport with \(\epsilon = 3\)px with a sequence of guidance directions \(\mathbf{g}_k =(\cos \theta _k,\sin \theta _k)\) (\(\theta _k \in \{k^{\circ }\}_{k=1}^{179}\)) and superimposed the results together with with \(B_{\epsilon ,h}(\mathbf{0})\) (the parameter \(\mu \) in (6) is \(\mu = 100\)). Instead of a smoothly varying line sweeping through the upper half plane and filling it with red, we see a superposition of finitely many lines, each passing through some \(\mathbf{v} \in B_{\epsilon ,h}(\mathbf{0})\). When we repeat the experiment in c using Guidefill [27], we see that it is not free of problems either. In this case Guidefill can extrapolate along \(\mathbf{g}=(\cos \theta ,\sin \theta )\) so long as \(0<\theta _c \le \theta \le \pi -\theta _c < \pi \), where \(\theta _c\) is a critical angle, and we get a red cone bounded on either side by \(\theta _c\). Here we have superimposed the dilated ball \(D_h(B_{\epsilon ,h}(\mathbf{0}))\), and it is evident that \(\theta _c\) is in some way related to this dilation—this will be explained in Sect. 7.1.1

Fig. 6
figure 6

A tale of two inpainting problems: in ad a line making an angle of \(\theta = 63^{\circ }\) with the horizontol is inpainted using each of Telea’s algorithm  [38], coherence transport, [7, 31], and Guidefill [27] (the inpainting domain is shown in yellow). In this case the radius of \(A_{\epsilon ,h}(\mathbf{x})\) is \(\epsilon = 3\)px, and since \(63^{\circ } \approx \arctan (2) \approx 63.44^{\circ }\) is close to one of the “special directions” in which coherence transport can extend isophotes successfully for this value of \(\epsilon \) (see Fig. 5), both coherence transport and Guidefill make a successful connection. In eh we change the angle of the line slightly to \(\theta = 73^{\circ }\). This isn’t one of coherence transport’s admissable directions for \(\epsilon = 3\)px, so it fails to make the connection, while Guidefill continues to have no problems, at the expense of some blurring. Telea’s algorithm, on the other hand, propagates in the direction of the normal to the inpainting domain boundary regardless of the undamaged image content, and thus fails to make the connection in both cases while also introducing significant blur. In i, j, we examine horizontal cross sections (of the red channel) of all three methods at the midpoint of the inpainting domain. Here, a disadvantage of Guidefill in terms of blur becomes more apparent—coherence transport by contrast produces a much sharper result. The reasons for this are explored in Sect. 7.3

2.3 Related Work (Artifact Reduction)

Broadly speaking, there has been incremental progress as follows: Telea’s algorithm [38], the earliest variant to appear in the literature, suffers from strong artifacts of every type. In particular, the weights make no attempt to take into account the orientation of undamaged isophotes in \(\varOmega _h \backslash D_h\), and the result shows strong kinking artifacts (see Fig. 6). Bornemann and März identified and sought to address this problem with coherence transport [7], which proposed carefully chosen weights that are proven (in a high resolution and vanishing viscosity limit) to extend isophotes in any desired guidance direction \(\mathbf{g}\) not parallel to the inpainting domain boundary. This was combined with a method aimed at robustly measuring the orientation of isophotes at the boundary, so that a suitable \(\mathbf{g}\) allowing for a seamless transition could be found. The problem of “kinking” ostensibly resolved, in a follow up work März proposed coherence transport with adapted distance functions [31] designed to minimize the problem of “cut off” isophotes and shocks. This was accomplished by recognizing that artifacts such as the incomplete line in Fig. 4a are often the byproduct of a suboptimal fill order such as the one superimposed (in this case the default onion shell ordering). The situation can often be corrected as in Fig. 4b, by using an ordering better adapted to the image such as the one illustrated there. Rather than filling pixels in an order proportional to their distance from the boundary, i.e. having the \(\text{ ready }\) function in Algorithm 1 always return “true”, März proposed a number of ways of generating improved orderings based on non-Euclidean distance from boundary maps. At the same time, recognizing that the presence of shocks was related to the “stopping set” [31] of the distance map, März was able to exert some measure of control over those as well, if not prevent them entirely. Guidefill [27] brought the focus back to the reduction of kinking artifacts, by noting that coherence transport is actually only able to propagate along a given guidance vector \(\mathbf{g}\) if it points in one of a finite set of special directions—see Fig. 5b. Whereas previous improvements to Algorithm 1 had focused first on improving the choice of weights, then the fill order (equivalently the choice of \(\text{ ready }\) function), Guidefill proposed for the first time to change the averaging neighborhood \(A_{\epsilon ,h}(\mathbf{x})\), which until then had always been the discrete ball \(B_{\epsilon ,h}(\mathbf{x})\) (Fig. 10a). Specifically, it proposed to replace \(A_{\epsilon ,h}(\mathbf{x})=B_{\epsilon ,h}(\mathbf{x})\) with \(A_{\epsilon ,h}(\mathbf{x})={\tilde{B}}_{\epsilon ,h}(\mathbf{x})\), where \({\tilde{B}}_{\epsilon ,h}(\mathbf{x})\) is the rotated discrete ball shown in Fig. 10b, aligned with the guidance direction \(\mathbf{g}\). Since \(A_{\epsilon ,h}(\mathbf{x})\) is in this case no longer axis aligned, it contains what the authors called “ghost pixels” lying between pixel centers, which they defined based on bilinear interpolation. This small change enabled Guidefill to propagate along most guidance directions, but it too has problems when the angle between \(\mathbf{g}\) and the boundary to the inpainting domain is too shallow—see Fig. 5c. However, Guidefill pays a price for its reduction in kinking artifacts in the form of an increase in blur artifacts. See Fig. 6, where coherence transport produces a sharp extension of image isophotes, albeit possibly in the wrong direction, whereas Guidefill extrapolates in the right direction, but the extrapolation suffers from blur. Guidefill also proposed its own “smart order” computed on the fly as an alternative to März’s adapted distance functions, but this does not have any serious advantage in terms of the quality of the results. Either approach will do for preventing “cut off” isophotes.

2.4 Related Theoretical Work

The direct form of Algorithm 1 has been studied from a theoretical point of view by proposing two distinct continuum limits. The first of these is the high-resolution vanishing viscosity limit proposed by Bornemann and März, in which \(h \rightarrow 0 \) and then \(\epsilon \rightarrow 0\) [7]. The second is a fixed-ratio continuum limit proposed in our previous work  [27] in which \((h,\epsilon ) \rightarrow (0,0)\) along the line \(\epsilon = r h\) (with \(r \in \mathbb {N}\) fixed). The non-negative integer r is simply the radius of \(A_{\epsilon ,h}(\mathbf{x})\) measured in pixels. Although both are perfectly valid mathematically, numerical experiments indicate that the high resolution viscosity limit gives a good approximation of the behaviour of Algorithm 1 in practice only when \(r \gg 1\), whereas our fixed-ratio limit gives a good approximation even when r is a small integer, as it typically is in practice (see Remark 4.3 in our previous work [27] for an explanation of why this is). There has also been significant work in studying the well-posedness of the high resolution and vanishing viscosity limit of Algorithm 1, both in [7] and especially in  [32]. See Fig. 9 for an illustration of these two separate limits.

Motivation for ghost pixels In Sect. 4 we will prove that any weighted sum over a set \(A_{\epsilon ,h}(\mathbf{x})\) of ghost pixels is equivalent to a sum over the real pixels in \(\text{ Supp }(A_{\epsilon ,h}(\mathbf{x}))\) with equivalent weights. While this makes ghost pixels in some sense redundant, they are useful concept. Specifically, in Theorem 6 we will prove that the fixed-ratio continuum limit described above and illustrated in Fig. 9 is a transport equation with transport direction that is a function of the weights \(w_{\epsilon }\) and the position vectors of the elements of \(A_{\epsilon ,h}(\mathbf{x})\). Avoiding “kinking” artifacts amounts to working backwards from a desired transport direction \(\mathbf{g}\) to the weights \(w_{\epsilon }\) and neighborhood \(A_{\epsilon ,h}(\mathbf{x})\) that yield \(\mathbf{g}\). This is easier to do when the elements of \(A_{\epsilon ,h}(\mathbf{x})\) can move continuously in \(\mathbb {R}^2\), rather than being constrained to the lattice \(\mathbb {Z}_h^2\).

2.5 Contributions

Our contributions are both theoretical and practical, aimed at a deeper understanding of the mathematical properties of both the direct and semi-implicit versions of Algorithm 1, and through that understanding, a better grasp of the underlying causes of the artifacts described above and what, if anything, can be done about them. Our main targets are “kinking” and “blurring” artifacts, the others having already been thoroughly analyzed in  [7, 32] and well understood. Broadly speaking, we have three main contributions:

  • We propose a semi-implicit variant of Algorithm 1, propose an efficient method for solving the linear system that arises in it, and derive analytical rates of convergence.

  • We propose two novel limits of Algorithm 1: a fixed-ratio continuum limit where the pixel width \(h \rightarrow 0\) with \(r =\epsilon /h\) fixed, and an asymptotic limit where \(r=\epsilon /h\) is again fixed but rather than taking h to zero, we explore the asymptotic dependence of the solution on h when it is very small.

  • We use the above two limits to explain the origins of kinking and blur artifacts in Algorithm 1, and how to rectify them. Among other things, we prove that any variant of the direct form of Algorithm 1 must exhibit kinking artifacts, whereas this is not the case for our proposed semi-implicit extension (Sect. 7.1.1).

While our present work focuses on inpainting, our results—in particular the asymptotic limit—are abstract results that can be applied more generally. An example of a situation where this comes up is the use of damped Jacobi iteration for solving a linear system

$$\begin{aligned} A\mathbf{x} = \mathbf{b} \end{aligned}$$

when A is an M-matrix (that is, has a positive diagonal and negative or zero off-diagonal elements). Section 8 demonstrates this by applying our asymptotic limit to predict the evolution of the error in damped Jacobi applied to a 1-d convection-diffusion problem with Dirichlet boundary conditions.

3 Review of Main Methods

Here we briefly review the main inpainting methods of the general form sketched in Algorithm 1.

Telea’s algorithm The earliest algorithm (to our knowledge) appearing in the literature and of the form sketched in Algorithm 1, Telea’s algorithm [38] is also the only such algorithm to use a different formula for \(u_h(\mathbf{x})\) than the expression (1) appearing in Algorithm 1 (see Remark 3). Instead of computing \(u_h(\mathbf{x})\) a weighted average of \(u_h(\mathbf{y})\) evaluated at nearby already filled pixels \(\mathbf{y}\), it takes a weighted average of the predictions that each of these pixels makes, based on linear extrapolation, for \(u_h(\mathbf{x})\). That is,

$$\begin{aligned} u_h(\mathbf{x}) = \frac{\sum _{\mathbf{y} \in B_{\epsilon ,h}(\mathbf{x}) \cap (\varOmega _h \backslash D^{(k)}_h )} w_{\epsilon }(\mathbf{x},\mathbf{y}) u_{\text{ predicted }}(\mathbf{x})}{\sum _{\mathbf{y} \in B_{\epsilon ,h}(\mathbf{x})\cap (\varOmega _h \backslash D^{(k)}_h )} w_{\epsilon }(\mathbf{x},\mathbf{y})}, \end{aligned}$$
(5)

where

$$\begin{aligned} u_{\text{ predicted }}(\mathbf{x}) := u_h(\mathbf{y})+\nabla _h u_h(\mathbf{y}) \cdot (\mathbf{x}-\mathbf{y}) \end{aligned}$$

and \(\nabla _h u_h(\mathbf{y})\) denotes the centered difference approximation to the gradient of \(u_h\) at \(\mathbf{y}\), that is

$$\begin{aligned} \nabla _h u_h(\mathbf{y}):= & {} \frac{1}{2}\big (u_h(\mathbf{y}+e_1)\\&\quad -u_h(\mathbf{y}-e_1),u_h(\mathbf{y}+e_2)-u_h(\mathbf{y}-e_2)\big ). \end{aligned}$$

As we have already noted in Remark 3, this approach has a disadvantage in that it results in the loss of the stability property (4). Moreover, the predictions based on linear extrapolation can become highly inaccurate when \(\mathbf{y}\) is on an edge in \(\varOmega _h \backslash D^{(k)}_h\), leading to significant over or undershooting, visible as bright or dark spots as in Figs. 7 and 8. Perhaps in recognition of this, the gradient term was dropped from (5) in all subsequent algorithms. The weights in this case are

$$\begin{aligned} w_{\epsilon }(\mathbf{x},\mathbf{y}) = \text{ dir }(\mathbf{x},\mathbf{y}) \cdot \text{ dst }(\mathbf{x},\mathbf{y}) \cdot \text{ lev }(\mathbf{x},\mathbf{y}), \end{aligned}$$

where

$$\begin{aligned} \text{ dir }(\mathbf{x},\mathbf{y}):= & {} \frac{\mathbf{x}-\mathbf{y}}{\Vert \mathbf{x}-\mathbf{y}\Vert } \cdot \mathbf{N}(\mathbf{x}), \\ \text{ dst }(\mathbf{x},\mathbf{y}):= & {} \frac{d_0^2}{\Vert \mathbf{x}-\mathbf{y}\Vert ^2}, \\ \text{ lev }(\mathbf{x},\mathbf{y}):= & {} \frac{T_0}{1+|T(\mathbf{y})-T(\mathbf{x})|}, \end{aligned}$$

and \(T(\mathbf{x})\) denotes the Euclidean distance from \(\mathbf{x}\) to the (original) boundary of the inpainting domain, and \(N(\mathbf{x}) = \nabla _h T(\mathbf{x})\) (estimated based on central differences). T is precomputed using the fast marching method. Telea’s algorithm uses the default onion shell ordering, that is “\(\text{ ready }(\mathbf{x}) \equiv \text{ true }\)” (Fig. 9).

Fig. 7
figure 7

Bright spots in Telea’s algorithm: in this example we consider the inpainting problem shown in a consisting of a line separating a region of dark blue from a region of dark red. We inpaint both using Telea’s algorithm (b) and coherence transport (c). Coherence transport obeys the stability property (4) and hence the brightness of the inpainted solution remains bounded above by the brightness on the exterior of the inpainting domain. This is not true of Telea’s algorithm, which exhibits bright spots outside the original color range. These were not visible in Fig. 6, because the brightness of each color channel was already saturated, and Telea’s algorithm uses clamping to prevent the solution from going outside the admissible color range. This is further illustrated in d, e, where we plot horizontal cross sections of the red channel of each inpainted solution

Fig. 8
figure 8

Blurring, kinking, and bright spots with Telea’s algorithm: even for problems such as a where the inpainting domain is very thin, Telea’s algorithm bd still creates strong blurring artifacts and fails to connect isophotes effectively. Also, due to the presence of the gradient term in (5), Telea’s algorithm violates the stability condition (4) and as a result can “overshoot” when filling pixels close to edges in the filled area, where the (numerical) gradient changes rapidly. This leads to the bright spots near the reconstructed in c, d. In this case coherence transport e, f is a much better choice

Fig. 9
figure 9

Two distinct continuum limits: Algorithm 1 has two distinct continuum limits, illustrated here for \(A_{\epsilon ,h}(\mathbf{x})=B_{\epsilon ,h}(\mathbf{x})\). The first, illustrated in af, is the high-resolution vanishing viscosity double limit proposed by Bornemann and März [7], in which \(h \rightarrow 0 \) (a)–(c) and then \(\epsilon \rightarrow 0\) (d)–(f). The second is the fixed-ratio limit single limit \((\epsilon ,h) \rightarrow (0,0)\) with \(r=\frac{\epsilon }{h}\) fixed proposed in our previous work  [27], illustrated in gi for \(r=4\). While they are both valid limits of Algorithm 1, they predict very different behaviour. In particular, while the high-resolution vanishing viscosity of Bornemann and März [7] is able to predict the “kinking” behaviour of Telea’s Algorithm [38], it fails to predict the kinking artifacts of their own method, coherence transport. Our fixed-ratio continuum limit, on the other hand, predicts both. See Theorem 6, as well as Sect. 7.1.2

Coherence transport Coherence transport  [7] improves upon Telea’s algorithm by adapting the weights in order to encourage extrapolation of isophotes in the direction of their tangent. This is done by calculating a “local coherence direction” \(\mathbf{g}(\mathbf{x})\) in terms of a modified structure tensor. Coherence transport calculates the color of a given pixel to be filled using the formula (1) in Algorithm 1 with weights

$$\begin{aligned} w_{\epsilon }(\mathbf{x},\mathbf{y}) = \frac{1}{\Vert \mathbf{y}-\mathbf{x}\Vert }\exp \left( -\frac{\mu ^2}{2\epsilon ^2}(\mathbf{g}^{\perp }(\mathbf{x}) \cdot (\mathbf{y}-\mathbf{x}))^2\right) , \end{aligned}$$
(6)

and with \(A_{\epsilon ,h}(\mathbf{x}) = B_{\epsilon ,h}(\mathbf{x})\)—see Fig. 10a and c. Like Telea’s algorithm, coherence transport uses the default onion shell ordering, that is “\(\text{ ready }(\mathbf{x}) \equiv \text{ true }\)”.

Coherence transport with adapted distance functions In a subsequent work [31], März made improvements to coherence transport by replacing the default onion shell ordering with one based on a variety of non-Euclidean distance functions. One such distance function defines an “active boundary” \(\Gamma _h \subseteq \partial D_h\) defined by

$$\begin{aligned} \Gamma _h := \{ \partial D_h : \langle \mathbf{g}(\mathbf{x}), \mathbf{N}(\mathbf{x}) \rangle ^2 > \gamma \} \end{aligned}$$

where \(\gamma > 0\) is a small constant. The non-Euclidean distance to boundary \(T^*_h\) is then computed as the Euclidean distance to the active boundary. The algorithm is modified so that at any given iteration, only a subset of boundary pixels are filled—namely those minimizing \(T^*_h\). That is

$$\begin{aligned} \text{ ready }(\mathbf{x}) = \text{ true } \Leftrightarrow \mathbf{x} \in {\text {argmin}}_{\mathbf{y} \in \partial D_h} T^*_h(\mathbf{y}). \end{aligned}$$

This adaptation leads to improvements in the long range extrapolation of isophotes, as in Fig. 4.

Fig. 10
figure 10

Neighborhoods and weights for coherence transport and Guidefill: here we illustrate the neighborhoods \(A_{\epsilon ,h}(\mathbf{x})\) and weights (6) used by coherence transport and Guidefill. In each case \(\epsilon = 3\)px and \(\mathbf{g}(\mathbf{x})=(\cos 73^{\circ },\sin 73^{\circ })\). Coherence transport a uses the lattice-aligned discrete ball \(A_{\epsilon ,h}(\mathbf{x})=B_{\epsilon ,h}(\mathbf{x})\), while Guidefill b uses the rotated discrete ball \(A_{\epsilon ,h}(\mathbf{x})={\tilde{B}}_{\epsilon ,h}(\mathbf{x})\). The ball \({\tilde{B}}_{\epsilon ,h}(\mathbf{x})\) is rotated so that it is aligned with the line L (shown in red) passing through \(\mathbf{x}\) parallel to \(\mathbf{g}(\mathbf{x})\). In general \({\tilde{B}}_{\epsilon ,h}(\mathbf{x})\) contains “ghost pixels” lying between pixel centers, which are defined using bilinear interpolation of their “real” pixel neighbors. Both use the same weights (6) illustrated in c. The parameter \(\mu \) controls the extent to which the weights are biased in favor of points lying on or close to the line L

Guidefill Guidefill [27] is a recent inpainting algorithm designed to address, among other things, the kinking issues in Figs. 5b and 6. While coherence transport is able to extrapolate along guidance direction \(\mathbf{g}(\mathbf{x})\) only if \(\mathbf{g}(\mathbf{x}) = \lambda (\mathbf{v}-\mathbf{x})\) for some \(\mathbf{v} \in B_{\epsilon ,h}(\mathbf{x})\) (see Fig. 5b), Guidefill replaces the lattice aligned discrete ball \(B_{\epsilon ,h}(\mathbf{x})\) with the rotated discrete ball \({\tilde{B}}_{\epsilon ,h}(\mathbf{x})\) aligned with the local transport direction \(\mathbf{g}(\mathbf{x})\), so that \(\mathbf{g}(\mathbf{x}) = \lambda (\mathbf{v}-\mathbf{x})\) for some \(\mathbf{v} \in {\tilde{B}}_{\epsilon ,h}(\mathbf{x})\) is always true. The rotated ball \({\tilde{B}}_{\epsilon ,h}(\mathbf{x})\) contains “ghost pixels” lying between pixel centers which are defined using bilinear interpolation. See Sect. 4 for a deeper discussion of ghost pixels, as well as Fig. 10a–b for an illustration of \(B_{\epsilon ,h}(\mathbf{x})\) and \({\tilde{B}}_{\epsilon ,h}(\mathbf{x})\).

Guidefill uses the same weights (6) as coherence transport (illustrated in Fig. 10c) and similarly to the latter’s extension [31], it has a way of automatically determining a good fill order. Unlike coherence transport which computes \(\mathbf{g}(\mathbf{x})\) concurrently with inpainting, Guidefill computes a guide field \(\mathbf{g}(\mathbf{x}) : D_h \rightarrow \mathbb {R}^2\) prior to inpainting. The guide field is computed based on splines which the user may adjust in order to influence the results. It is used to automatically compute a good fill order by computing for each \(\mathbf{x} \in \partial D_h\) a confidence \(C(\mathbf{x}) \in [0,1]\) inspired by Criminisi et al.  [16] and given by

$$\begin{aligned} C(\mathbf{x})=\frac{\sum _{\mathbf{y} \in {\tilde{B}}_{\epsilon ,h}(\mathbf{x}) \cap (\varOmega \backslash D^{(k)} )}w_{\epsilon }(\mathbf{x},\mathbf{y})}{\sum _{\mathbf{y} \in {\tilde{B}}_{\epsilon ,h}(\mathbf{x}) }w_{\epsilon }(\mathbf{x},\mathbf{y})}, \end{aligned}$$
(7)

and then only filling those pixels for which \(C(\mathbf{x}) > c\), where \( c \in (0,1)\) is a small constant. That is

$$\begin{aligned} \text{ ready }(\mathbf{x}) = 1( C(\mathbf{x}) > c) \end{aligned}$$
(8)

Guidefill was designed for use as part of a 3D conversion pipeline, and as such makes use of a set \(B_h\) of “bystander pixels” which are neither inpainted nor may be used for inpainting. However, this is not relevant to our current investigation and we will assume \(B_h = \emptyset \) throughout. As shown in Fig. 5c—Guidefill is able to largely, but not completely, eliminate kinking artifacts. It was in the hope of overcoming this that we designed the semi-implicit version of Algorithm 1 discussed in Sect. 5.

Remark 4

Note that we have deliberately excluded the point \(\mathbf{x}\) from the update formula (1) in Algorithm 1, even if the set \(A_{\epsilon ,h}(\mathbf{x})\) contains \(\mathbf{x}\). This is not done in any of the methods  [7, 27, 31, 38] we have just discussed, but it makes no difference to them or any other variant of the direct form of Algorithm 1, because the subroutine FillRow only involves sums taken over \(A_{\epsilon ,h}(\mathbf{x}) \cap (\varOmega \backslash D^{(k)})\), which never contains \(\mathbf{x}\). However, the semi-implicit extension of Algorithm 1 expresses \(u_h(\mathbf{x})\) as a sum of \(u_h(\mathbf{y})\) over a set of points that might include \(\mathbf{x}\). This creates problems with weights such as (6) for which \(w_{\epsilon }(\mathbf{x},\mathbf{x})=\infty \). See “Appendix A” for further details.

4 Ghost Pixels and Equivalent Weights

Because ghost pixels are defined using bilinear interpolation, any sum over a finite set of ghost pixels \(A(\mathbf{x})\) can be converted into a sum over an equivalent set of real pixels with equivalent weights,Footnote 1 that is

$$\begin{aligned} \sum _{\mathbf{y} \in A(\mathbf{x})} w(\mathbf{x},\mathbf{y}) u_h(\mathbf{y}) = \sum _{\mathbf{y} \in \text{ Supp }(A(\mathbf{x}))} {\tilde{w}}(\mathbf{x},\mathbf{y}) u_h(\mathbf{y}) \end{aligned}$$

where \(\text{ Supp }(A(\mathbf{x}))\) denotes the set of real pixels needed to define \(u_h(\mathbf{y})\) for each \(\mathbf{y} \in A(\mathbf{x})\) and \({\tilde{w}}\) denotes a set of equivalent weights. This works because each \(u_h(\mathbf{y})\) is itself a weighted sum of the form

$$\begin{aligned} u_h(\mathbf{y}) = \sum _{\mathbf{z} \in \mathbb {Z}_h^2}\varLambda _{\mathbf{z},h}(\mathbf{y})u_h(\mathbf{z}), \end{aligned}$$

where \(\{ \varLambda _{\mathbf{z},h} \}_{\mathbf{z} \in \mathbb {Z}^2_h}\) denote the basis functions of bilinear interpolation associated with the lattice \(\mathbb {Z}^2_h\). This is illustrated in Fig. 11a, b, where we show a heat map of the weights (6) over the set \({\tilde{B}}_{\epsilon ,h}(\mathbf{x}) \backslash \{ \mathbf{x}\}\) for \(\mu = 100\) and \(\epsilon =3\)px, as well as a similar heat map of the modified weights over \(\text{ Supp }({\tilde{B}}_{\epsilon ,h}(\mathbf{x})\backslash \{ \mathbf{x}\}) \subseteq D_h(B_{\epsilon ,h}(\mathbf{x}))\). Note that even though \({\tilde{B}}_{\epsilon ,h}(\mathbf{x})\backslash \{ \mathbf{x}\}\) does not contain the point \(\mathbf{x}\), the support of this set does. This will be important in Sect. 5. Here we briefly list some properties of equivalent weights, including an explicit formula. Proofs are sketched, but details are deferred to “Appendix B”.

Fig. 11
figure 11

Ghost pixels and equivalent weights: because ghost pixels are defined using bilinear interpolation, any weighted sum over a set of ghost pixels \(A_{\epsilon ,h}(\mathbf{x})\) is equivalent to a sum with equivalent weights over real pixels in the set \(\text{ Supp }(A_{\epsilon ,h}(\mathbf{x}))\), defined as the set of real pixels needed to define each ghost pixel \(\mathbf{y}\) in \(A_{\epsilon ,h}(\mathbf{x})\). We illustrate this in ac using Guidefill with \(\epsilon = 3\)px, \(\mathbf{g}=(\cos 77^{\circ },\sin 77^{\circ })\), and \(\mu = 100\). In a, the (normalized) weights (6) are visualized as a heat map over \({\tilde{B}}_{\epsilon ,h}(\mathbf{x}) \backslash \{ \mathbf{x} \}\). In b we show the equivalent weights over \(\text{ Supp }({\tilde{B}}_{\epsilon ,h}(\mathbf{x}) \backslash \{ \mathbf{x} \} ) \subseteq D_h(B_{\epsilon ,h}(\mathbf{x}))\) (this containment comes from (14) in Remark 6). Note that even though \({\tilde{B}}_{\epsilon ,h}(\mathbf{x}) \backslash \{ \mathbf{x} \}\) does not contain the point \(\mathbf{x}\), the support of this set does. In c we visualize the equivalent weights over the set \(D_h(B_{\epsilon ,h}(\mathbf{x}))\), which is strictly larger than \(\text{ Supp }({\tilde{B}}_{\epsilon ,h}(\mathbf{x}) \backslash \{ \mathbf{x} \} )\). For reference, we include the line parallel to \(\mathbf{g}\) in green

Properties of equivalent weights Properties 1–3 deal with a general finite set \(A(\mathbf{x})\) and general weights \(w(\mathbf{x},\mathbf{y})\), while properties 4–6 deal with the specific set \(A_{\epsilon ,h}(\mathbf{x}) \subset B_{\epsilon }(\mathbf{x})\) and the specific weights \(w_{\epsilon }(\mathbf{x},\mathbf{y})\) obeying (3).

  1. 1.

    Explicit formula:

    $$\begin{aligned} {\tilde{w}}(\mathbf{x},\mathbf{z}) = \sum _{\mathbf{y} \in A(\mathbf{x})} \varLambda _{\mathbf{y},h}(\mathbf{z})w(\mathbf{x},\mathbf{y}). \end{aligned}$$
    (9)
  2. 2.

    Preservation of total mass:

    $$\begin{aligned} \sum _{\mathbf{y} \in A(\mathbf{x})} w(\mathbf{x},\mathbf{y}) = \sum _{\mathbf{y} \in \text{ Supp }(A(\mathbf{x}))} {\tilde{w}}(\mathbf{x},\mathbf{y}). \end{aligned}$$
    (10)
  3. 3.

    Preservation of center of mass (or first moment):

    $$\begin{aligned} \sum _{\mathbf{y} \in A(\mathbf{x})} w(\mathbf{x},\mathbf{y})\mathbf{y} = \sum _{\mathbf{y} \in \text{ Supp }(A(\mathbf{x}))} {\tilde{w}}(\mathbf{x},\mathbf{y})\mathbf{y}. \end{aligned}$$
    (11)
  4. 4.

    Inheritance of non-negativity:

    $$\begin{aligned} {\tilde{w}}_{\epsilon }(\mathbf{x},\mathbf{z}) \ge 0 \quad \text{ for } \text{ all } \mathbf{z} \in \text{ Supp }(A_{\epsilon ,h}(\mathbf{x})). \end{aligned}$$
    (12)
  5. 5.

    Inheritance of non-degeneracy condition (3):

    $$\begin{aligned} \sum _{\mathbf{y} \in \text{ Supp }(A_{\epsilon ,h}(\mathbf{x}) \cap (\varOmega \backslash D^{(k)}))}{\tilde{w}}_{\epsilon }(\mathbf{x},\mathbf{y}) > 0. \end{aligned}$$
    (13)
  6. 6.

    Universal support: For any \(n \in \mathbb {Z}\), we have

    $$\begin{aligned} \text{ Supp }(A_{\epsilon ,h}(\mathbf{x}) \cap \{ y \le nh\} )\subseteq & {} D_h(B_{\epsilon ,h}(\mathbf{x})) \cap \{ y \le nh\} \\\subseteq & {} B_{\epsilon +2h,h}(\mathbf{x}) \cap \{ y \le nh\}. \end{aligned}$$

    where \(\{y \le nh \} := \{ (x,y) \in \mathbb {R}^2 : y \le nh\}\), and where \(D_h\) is the dilation operator defined in our section on notation.

Proof

Most of these properties are either obvious or are derived based on a simple exercise in changing the order of nested finite sums. Properties (10) and (11) are slightly more interesting—they follow from the fact that the bilinear interpolant of a polynomial of degree at most one is just the polynomial again. Note that an analogous formula for preservation of the second moment does not hold, because a quadratic function and its bilinear interpolant are not the same thing. The last identity is based on an explicit formula for the support of a point. Details are provided in “Appendix B”. \(\square \)

Remark 5

Although we have explicit formula (9) for the equivalent weights which will occasionally be useful, most of the time it is more fruitful to think about them in the following way: To compute \({\tilde{w}}_{\epsilon }(\mathbf{x},\mathbf{y})\) for some real pixel \(\mathbf{y}\), loop over the ghost pixels \(\mathbf{z}\) such that \(\mathbf{y} \in \text{ Supp }(\mathbf{z})\). Then, each such \(\mathbf{z}\) redistributes to \({\tilde{w}}_{\epsilon }(\mathbf{x},\mathbf{y})\) a fraction of its weight \(w_{\epsilon }(\mathbf{x},\mathbf{z})\) equal to the proportion of \(u_h(\mathbf{y})\) that went into \(u_h(\mathbf{z})\).

Remark 6

An obvious corollary of the universal support property (14) is that we also have the containment

$$\begin{aligned} \text{ Supp }(A_{\epsilon ,h}(\mathbf{x}) ) \subseteq D_h(B_{\epsilon ,h}(\mathbf{x})) \subseteq B_{\epsilon +2h,h}(\mathbf{x}). \end{aligned}$$
(14)

Figure 11 illustrates an example where this containment is strict, and in fact it is not hard to show that this holds in general. However, (14) is tight enough for our purposes in this paper.

Remark 7

The main results of this paper, in particular Theorems 2 and 6, remain applicable if ghost pixels are defined using a different form of interpolation (rather than bilinear interpolation as above) so long as that interpolation scheme can also be expressed in terms of non-negative basis functions with compact support and also preserves polynomials of degree one. Moreover, the fixed-ratio continuum limit (Theorem 6) is identical for all such interpolation schemes. The asymptotic limit (Theorem 2) however, is not. One example of an alternative interpolation scheme obeying these properties is Barycentric triangle interpolation [10]. In the future, it would be interesting the explore alternative interpolation schemes for ghost pixels, however this is beyond the scope of the current work.

5 Semi-Implicit Extension of Algorithm 1

Here we present the semi-implicit form of Algorithm 1, to our knowledge not previously proposed in the literature, in which instead of computing \(u_h(\mathbf{x})\) for each \(\mathbf{x} \in \partial _{\text{ ready }} D^{(k)}_h\) independently, we solve for them simultaneously by solving a linear system. We call our method semi-implicit in order to distinguish it from fully implicit methods in which the entire inpainting domain \(\{u_h(\mathbf{x}) : \mathbf{x} \in D_h\}\) is solved simultaneously, as is typically the case for most inpainting methods based on PDEs or variational principles, e.g.  [6, 11, 13]. Specifically, we solve

$$\begin{aligned} {\mathcal {L}}{} \mathbf{u} = \mathbf{f} \quad \text{ where } \mathbf{u} = \{ u_h(\mathbf{x}) : \mathbf{x} \in \partial _{\text{ ready }} D^{(k)}_h \}. \end{aligned}$$
(15)

and \(\mathbf{f}\) is a vector of length \(|\partial _{\text{ ready }} D^{(k)}_h|\). The explicit entries of \({\mathcal {L}}\) are written in terms of the equivalent weights \({\tilde{w}}_{\epsilon }\) introduced in Sect. 4. Defining

$$\begin{aligned} S^{(k)}_{\epsilon ,h}(\mathbf{x}) := \text{ Supp }(A_{\epsilon ,h}(\mathbf{x})) \cap \partial _{\text{ ready }} D^{(k)}_h, \end{aligned}$$

it follows that \({\mathcal {L}}\) couples each \(\mathbf{x} \in \partial _{\text{ ready }} D^{(k)}_h\) to its immediate neighbors in \(S^{(k)}_{\epsilon ,h}(\mathbf{x})\). In particular, we have

$$\begin{aligned} ({\mathcal {L}}{} \mathbf{u})(\mathbf{x})= & {} \left( 1-\frac{{\tilde{w}}_{\epsilon }(\mathbf{x},\mathbf{x})}{W}\right) u_h(\mathbf{x}) \\&\quad -\sum _{\mathbf{y} \in S^{(k)}_{\epsilon ,h}(\mathbf{x}) \backslash \{ \mathbf{x} \}}\frac{{\tilde{w}}_{\epsilon }(\mathbf{x},\mathbf{y})}{W}u_h(\mathbf{y}), \end{aligned}$$

where W is the total mass and can be computed in one of two ways, using either the original weights \(w_{\epsilon }\) or the equivalent weights \({\tilde{w}}_{\epsilon }\), exploiting preservation of mass (10):

$$\begin{aligned} W:= & {} \sum _{\mathbf{y} \in S^{(k)}_{\epsilon ,h}(\mathbf{x}) \cup \left( \text{ Supp }(A_{\epsilon ,h}(\mathbf{x})) \cap (\varOmega _h \backslash D^{(k)}_h)\right) }{\tilde{w}}_{\epsilon }(\mathbf{x},\mathbf{y}) \end{aligned}$$
(16)
$$\begin{aligned}= & {} \sum _{\mathbf{y} \in (A_{\epsilon ,h}(\mathbf{x}) \backslash \{ \mathbf{x}\}) \cap (\varOmega \backslash D^{(k)}) } w_{\epsilon }(\mathbf{x},\mathbf{y}). \end{aligned}$$
(17)

Generally, (17) is more convenient to work with than (16), but (16) combined with the inherited non-degeneracy condition (13) tells us that

$$\begin{aligned} \sum _{\mathbf{y} \in S^{(k)}_{\epsilon ,h}(\mathbf{x})} {\tilde{w}}_{\epsilon }(\mathbf{x},\mathbf{y}) < W, \end{aligned}$$

because the non-degeneracy conditions implies that a non-zero proportion of the total weight goes into the known pixels in \(\text{ Supp }(A_{\epsilon ,h}(\mathbf{x})) \cap (\varOmega \backslash D^{(k)}_h)\). From this it immediately follows that \({\mathcal {L}}\) is strictly diagonally dominant as

$$\begin{aligned}&|{\mathcal {L}}_{\mathbf{x},\mathbf{x}}| - \sum _{\mathbf{y} \in \partial _{\text{ ready }} D^{(k)}_h \backslash \{ \mathbf{x} \}}|\mathcal L_{\mathbf{x},\mathbf{y}}| \\&\quad = \left( 1-\frac{{\tilde{w}}_{\epsilon }(\mathbf{x},\mathbf{x})}{W}\right) -\sum _{\mathbf{y} \in S^{(k)}_{\epsilon ,h}(\mathbf{x}) \backslash \{ \mathbf{x} \}}\frac{{\tilde{w}}_{\epsilon }(\mathbf{x},\mathbf{y})}{W} \\&\quad = 1-\sum _{\mathbf{y} \in S^{(k)}_{\epsilon ,h}(\mathbf{x})}\frac{{\tilde{w}}_{\epsilon }(\mathbf{x},\mathbf{y})}{W}> 0. \end{aligned}$$

To compute \(\mathbf{f}\), we do not need the concept of equivalent weights. We have

$$\begin{aligned} \mathbf{f}(\mathbf{x}) = \sum _{\mathbf{y} \in (A_{\epsilon ,h}(\mathbf{x}) \backslash \{ \mathbf{x} \}) \cap ( \varOmega \backslash D^{(k)})}\frac{w_{\epsilon }(\mathbf{x},\mathbf{y})}{W}u_h(\mathbf{y}). \end{aligned}$$
(18)

5.1 Solving the Linear System

Designing maximally efficient methods for solving (15) is beyond the scope of this paper—our main purpose lies in understanding the effect of this extension on the continuum limit that we will derive later. Therefore, in this paper we consider only two very simple methods: damped Jacobi and SOR (successive over-relaxation). These are natural choices for a number of reasons. First, since \({\mathcal {L}}\) is strictly diagonally dominant, these methods are both guaranteed to converge  [42, Theorem 3.10, p. 79], at least in the case \(\omega =1\), where they reduce to Jacobi and Gauss–Seidel. Second, at least for the semi-implicit extension of Guidefill, the performance of SOR is already satisfactory, (see Sect. 6.2, Proposition 1). Third, both methods can be implemented with minimal changes to the direct form of Algorithm 1. In fact, changing the variable “semiImplicit” from “false” to “true” in Algorithm 1 and executing the “FillBoundary” subroutine in parallel is equivalent to solving (15) using damped Jacobi, with underrelaxation parameter

$$\begin{aligned} \omega ^* = \left( 1 - \frac{{\tilde{w}}_{\epsilon }(\mathbf{x},\mathbf{x})}{W}\right) \le 1. \end{aligned}$$
(19)

Similarly, executing the “FillBoundary” subroutine sequentially results in SOR with the same underrelaxation parameter—see Proposition 3 in “Appendix C” for a proof. Note that \(\omega ^*\) is typically very close 1, so these methods are very similar to plain Jacobi and Gauss–Seidel.

Remark 8

The reason Algorithm 1 with “semiImplicit” set to “true” results in damped Jacobi/SOR, rather than plain Jacobi/Gauss–Seidel, is because even though the update formula (1) for the nth iterate \(u_h^{(n)}(\mathbf{x})\) is expressed as a sum of the \((n-1)\)st iterate evaluated at ghost pixels that do not include \(\mathbf{x}\), some of those ghost pixels may indirectly depend on \(u^{(n-1)}_h(\mathbf{x})\) because they are defined using bilinear interpolation. The result is that the nth iterate \(u_h^{(n)}(\mathbf{x})\) depends on \(u_h^{(n-1)}(\mathbf{x})\), which is true of damped Jacobi/SOR but not of Jacobi/Gauss–Seidel.

Fig. 12
figure 12

Semi-implicit Guidefill and shallow inpainting Directions: In a a line makes a very shallow angle of just \(2^{\circ }\) with the inpainting domain, shown in yellow. This line is then inpainted using first Guidefill (b) and then the semi-implicit extension thereof (c). The latter uses 5 iterations of SOR per shell to solve the linear system (15) arising in every shell (in this case, the original image is \(2000\times 2000\)px, so there are 1000 shells). Visually identical results can be obtained using damped Jacobi, but more than 100 iterations per shell are required—see Proposition 1. Both methods use relaxation parameter \(\omega = \omega ^*\) (19) and are given \(\mathbf{g}=(\cos 2^{\circ },\sin 2^{\circ })\), \(\mu = 100\), \(\epsilon =3\)px, and use the default onion shell ordering (smart-order is turned off). Guidefill kinks while its extension does not. In d we see the result of failing to solve the linear system (15) to sufficient accuracy by applying too few iterations of damped Jacobi. In this case only 50 iterations per shell are used and the extrapolated line gradually fades away

6 Analysis

This section contains our core analysis, firstly of the convergence properties of semi-implicit Guidefill, then of the convergence of both Algorithm 1 and its semi-implicit extension to an asymptotic limit where \(r=\epsilon /h\) is fixed and \(h > 0\) but \(h \ll 1\) as well as a continuum limit where \((h,\epsilon ) \rightarrow (0,0)\) along the ray \(\epsilon = r h\) (with \(r \in \mathbb {N}\) fixed). Next, in Sect. 7, we will apply our results to explain some of the artifacts discussed in Sect. 2. We begin with a discussion of some symmetry assumptions that will hold from now on (Fig. 12).

6.1 Symmetry Assumptions

We will assume throughout that the inpainting domain D is the unit square \([0,1) \times (0,1]\) while the image domain is \(\varOmega =[0,1) \times (-\delta , 1]\) equipped with Dirichlet or periodic boundary conditions at \(x=0\) and \(x=1\), and no condition at \(y=1\). We denote the undamaged portion of the image by

$$\begin{aligned} {\mathcal {U}} := [0,1) \times (-\delta ,0] \quad \text{ and } \quad {\mathcal {U}}_h := {\mathcal {U}} \cap \mathbb {Z}^2_h. \end{aligned}$$
(20)

We discretize \(D = [0,1) \times (0,1]\) as an \(N \times N\) array of pixels \(D_h=D \cap (h \cdot \mathbb {Z}^2)\) with pixel width \(h:=1/N\). Specifically, we have

$$\begin{aligned} D_h = \{(ih,jh) : 0 \le i < N, 1 \le j \le N\}. \end{aligned}$$

In order to ensure that the update formula (1) is well defined, we need \(\epsilon +2h < \delta \), which we achieve by assuming \(h < \frac{\delta }{r+2}\) [this follows from the inclusion (14)]. We assume that the default onion shell ordering is used, so that

$$\begin{aligned} \partial D^{(k)}_h = \{ (jh,kh) \}_{j=1}^N. \end{aligned}$$

We also assume that the sets \(A_{\epsilon ,h}(\mathbf{x})\) are translations of one another, and the weights \(w_{\epsilon }(\mathbf{x},\mathbf{y})\) depend only on \(\frac{\mathbf{y}-\mathbf{x}}{\epsilon }\), that is

$$\begin{aligned} w_{\epsilon }(\mathbf{x},\mathbf{y})={\hat{w}}\left( \frac{\mathbf{y}-\mathbf{x}}{\epsilon }\right) \end{aligned}$$

For coherence transport and Guidefill this means that the guidance direction \(\mathbf{g}\) is a constant.

Remark 9

We make the above assumptions not because we believe they are necessary, but because they enable us to make our analysis as simple as possible while still capturing the phenomena we would like to capture. In particular, the above two assumptions on the weights \(w_{\epsilon }\) and neighborhood \(A_{\epsilon ,h}(\mathbf{x})\) ensure that the matrix \({\mathcal {L}}\) given by (15) is either Toeplitz or circulant (depending on the boundary conditions), and also ensures that the random walk we connect Algorithm 1 to in Sect. 6.3 has i.i.d. (independent identically distributed) increments. Without these simplifications, our already lengthy analysis would become even more technical. Numerical experiments (omitted for brevity) suggest that these assumptions can be weakened, but proving this is beyond the scope of the present work.

These assumptions give us a high level of symmetry with which we may rewrite the update formula (1) of Algorithm 1 in the generic form

$$\begin{aligned} u_h(\mathbf{x}) = \frac{\sum _{\mathbf{y} \in a^*_r} w_r(\mathbf{0},\mathbf{y}) u_h(\mathbf{x}+h\mathbf{y})}{\sum _{\mathbf{y} \in a^*_r} w_r(\mathbf{0},\mathbf{y})} \end{aligned}$$
(21)

where

$$\begin{aligned} a^*_r = \left( \frac{1}{h}A_{\epsilon ,h}(\mathbf{0}) \backslash \{ \mathbf{0} \}\right) \cap \{ y \le \delta \}, \end{aligned}$$

and \(\delta = -1\) for the direct method, while \(\delta = 0\) for the semi-implicit extension. In particular, for coherence transport we have \(a^*_r=b^-_r\) for the direct method and \(a^*_r = b^0_r\) for the extension, where

$$\begin{aligned} b^0_r:= & {} \{ (n,m) \in \mathbb {Z}^2 : 0<n^2+m^2 \le r^2, m \le 0 \}. \end{aligned}$$
(22)
$$\begin{aligned} b^-_r:= & {} \{ (n,m) \in \mathbb {Z}^2 : n^2+m^2 \le r^2, m \le -1 \}. \end{aligned}$$
(23)

Similarly, for Guidefill, we have \(a^*_r={\tilde{b}}^-_r\) for the direct method and \(a^*_r={\tilde{b}}^0_r\) for the semi-implicit extension, where

$$\begin{aligned} {\tilde{b}}^0_r:= & {} \{ n\hat{\mathbf{g}}+m\hat{\mathbf{g}}^{\perp } : (n,m) \in \mathbb {Z}^2, 0<n^2+m^2 \le r^2, \\&n\hat{\mathbf{g}}\cdot e_2+m\hat{\mathbf{g}}^{\perp } \cdot e_2 \le 0 \} \\ {\tilde{b}}^-_r:= & {} \{ n\hat{\mathbf{g}}+m\hat{\mathbf{g}}^{\perp } : (n,m) \in \mathbb {Z}^2, n^2+m^2 \le r^2, \\&n\hat{\mathbf{g}}\cdot e_2+m\hat{\mathbf{g}}^{\perp } \cdot e_2 \le -1 \}, \end{aligned}$$

and \(\hat{\mathbf{g}} := \mathbf{g}/\Vert \mathbf{g}\Vert \) (if \(\mathbf{g} = \mathbf{0}\) we set \({\tilde{b}}^-_r = b^-_r\)). The sets \(b^-_r\), \(b^0_r\), \({\tilde{b}}^-_r\), \({\tilde{b}}^0_r\) may be visualized by looking at the portion of Fig. 10a, b on or below the lines \(y=0\) and \(y=-1\) respectively. Also important are the dilated sets \({\bar{b}}^0_r=D(b^0_r) \cap \{ y \le 0\}\) and \({\bar{b}}^-_r=D(b^-_r) \cap \{ y \le -1\}\). These sets are given explicitly by

$$\begin{aligned} {\bar{b}}^0_r:= & {} \{ (n+\varDelta n,m+\varDelta m) : (n,m)\in b^0_r, (\varDelta n,\varDelta m) \nonumber \\\in & {} \{-1,0,1\} \times \{-1,0,1\}, m+\varDelta m \le 0\} \end{aligned}$$
(24)
$$\begin{aligned} {\bar{b}}^-_r:= & {} \{ (n+\varDelta n,m+\varDelta m) : (n,m) \in b^-_r, (\varDelta n,\varDelta m) \nonumber \\\in & {} \{-1,0,1\} \times \{-1,0,1\}, m+\varDelta m \le -1\}. \end{aligned}$$
(25)

This is because the universal support property (14) gives us the inclusion

$$\begin{aligned} \text{ Supp }(a^*_r) \subseteq {\left\{ \begin{array}{ll} {\bar{b}}^-_r \subseteq b^-_{r+2} &{} \text{ direct } \text{ form } \text{ of } \text{ Algorithm } \text{1. } \\ {\bar{b}}^0_r \subseteq b^0_{r+2} &{} \text{ semi-implicit } \text{ form. } \end{array}\right. } \end{aligned}$$
(26)

which will be critical later. The sets \({\bar{b}}^-_r\) and \({\bar{b}}^0_r\) are illustrated in Fig. 13.

Fig. 13
figure 13

Visualization \({\bar{b}}^-_r\) and \({\bar{b}}^0_r\): here we illustrate the sets \({\bar{b}}^-_r:=D(b^-_r) \cap \{ y \le -1\}\) and \({\bar{b}}^0_r:=D(b^0_r) \cap \{ y \le 0\}\) in the case \(r=3\). These sets are defined explicitly in (23) and (24), and D is the dilation operator defined in the notation section. These sets are important because depending on whether we use the direct form of Algorithm 1 or its semi-implicit extension, \(\text{ Supp }(a^*_r)\) is always contained in one or the other. See (26) in the text

Definition 1

We call the set \(a^*_r\) and the weights \(\{ w_r(\mathbf{0},\mathbf{y}) : \mathbf{y} \in a^*_r\}\) the stencil and stencil weights of a method. The center of mass of \(a^*_r\) is defined in the following two equivalent ways:

$$\begin{aligned} \text{ C.M. }=\frac{\sum _{\mathbf{y} \in a^*_r} w_r(0,\mathbf{y})\mathbf{y}}{\sum _{\mathbf{y} \in a^*_r} w_r(0,\mathbf{y})}=\frac{\sum _{\mathbf{y} \in \text{ Supp }(a^*_r)} {\tilde{w}}_r(0,\mathbf{y})\mathbf{y}}{\sum _{\mathbf{y} \in \text{ Supp }(a^*_r)} {\tilde{w}}_r(0,\mathbf{y})}. \end{aligned}$$

Here \({\tilde{w}}_r\) denote the equivalent weights from Sect. 4, and the above identity follows from (10) and (11). The center of mass of \(a^*_r\) will play a critical role both in the continuum limit of Algorithm 1 (where it is the transport direction of the resulting transport PDE) and in the connection to random walks (where, after multiplication by h, it is the mean of the increments of the walk).

Under these assumptions, the matrix \({\mathcal {L}}\) from (15) is independent of k (that is, we solve the same linear system for every shell), and moreover \({\mathcal {L}}\) becomes a Toeplitz matrix (Dirichlet boundary conditions) or circulant matrix (periodic boundary conditions). For a given pixel \(\mathbf{x}\) at least \(r+2\) pixels away from the boundary at \(x=0\) and \(x=1\), that is \(\mathbf{x} \in \{ (jh,kh) : r+2 \le j \le N-r-2\}\), it takes on the form

$$\begin{aligned} ({\mathcal {L}}{} \mathbf{u})(\mathbf{x}) =\left( 1-\frac{{\tilde{w}}_r(\mathbf{0},\mathbf{0})}{W}\right) u_h(\mathbf{x}) - \sum _{\mathbf{y} \in s_r \backslash \{ \mathbf{0} \}}\frac{{\tilde{w}}_{r}(\mathbf{0},\mathbf{y})}{W}u_h(\mathbf{x}+h\mathbf{y}), \end{aligned}$$

where by (26) we have

$$\begin{aligned} s_r= & {} \text{ Supp }(a^*_r) \cap ( \mathbb {Z} \times \{ 0 \} ) \\\subseteq & {} \{-(r+2)e_1,-(r+1)e_1,\ldots ,(r+1)e_1,(r+2)e_1\}. \end{aligned}$$

If \(\mathbf{x}\) is not at least \(r+2\) pixels away from the boundaries, then the formula changes in the usual way for Toeplitz and circulant matrices—we assume the reader is familiar with this and no further discussion is needed. Under the same assumptions the vector \(\mathbf{f}\) becomes

$$\begin{aligned} \mathbf{f} = \sum _{\mathbf{y} \in \text{ Supp }(a^*_r) \backslash ( \mathbb {Z} \times \{ 0\} ) }\frac{{\tilde{w}}_{r}(\mathbf{0},\mathbf{y})}{W}u_h(\mathbf{x}+h\mathbf{y}) \end{aligned}$$

where \(\text{ Supp }(a^*_r) \backslash ( \mathbb {Z} \times \{ \mathbf{0}\} ) \subseteq b^-_{r+2}\). We also define

$$\begin{aligned} {\tilde{W}} := \sum _{j=-r-2}^{r+2} {\tilde{w}}_r(\mathbf{0},je_1) \quad \text{ and } \quad {\tilde{w}}_{0,0}:={\tilde{w}}_r(\mathbf{0},\mathbf{0}). \end{aligned}$$
(27)

For a given point \(\mathbf{x} \in \partial D^{(k)}_h\), (again, assuming \(\mathbf{x}\) is far enough from the boundary) the ratio \(\frac{{\tilde{W}}}{W}\) gives the fraction of the mass of the stencil (Definition 1) centered at \(\mathbf{x}\) that gets “leaked” to the unknown pixels in \(\partial D^{(k)}_h\), while \(\frac{{\tilde{w}}_{0,0}}{W}\) gives the fraction that gets leaked to \(\mathbf{x}\). Together, these give a measure of the diagonal dominance of \({\mathcal {L}}\), as we have

$$\begin{aligned} \frac{\sum _{j \ne i} |{\mathcal {L}}_{ij}|}{|{\mathcal {L}}_{ii}|} = \frac{{\tilde{W}}-{\tilde{w}}_{0,0}}{W-{\tilde{w}}_{0,0}}. \end{aligned}$$

The smaller this ratio is, the stronger the diagonal dominance of \({\mathcal {L}}\), and the faster damped Jacobi and SOR can be expected to converge—see Proposition 1 for explicit formulas.

For semi-implicit Guidefill with guidance direction \(\mathbf{g}=(\cos \theta ,\sin \theta )\), it can be shown that \({\mathcal {L}}\) becomes a lower triangular matrix in the limit \(\mu \rightarrow \infty \), provided we order unknowns left to right if \(\cos \theta > 0\) and right to left otherwise (see “Appendix D”). This gives us a hint that Gauss–Seidel and SOR might be very effective for the solution of (15) in this case, and indeed Proposition 1 confirms this.

6.2 Convergence Rates of Damped Jacobi and SOR for Semi-Implicit Guidefill

Here we derive tight bounds on the convergence rates of damped Jacobi and SOR for solving (15) in the semi-implicit extension of Guidefill described in Sect. 5, under the symmetry assumptions discussed above, and in the limit \(\mu \rightarrow \infty \) (recall that \(\mu \) is the parameter from the weights (6) controlling the extent to which weights are biased in favor of pixels in the directions \(\pm \mathbf{g}\)). We will prove that in each case the parameter value \(\omega = 1\) is optimal, but also pay special attention to the case \(\omega = \omega ^*\) given by (19), since this is the value of \(\omega \) that our proposed implementation of the semi-implicit extension in Algorithm 1 uses. We consider general \(\omega \) mainly in order to demonstrate that the choice \(\omega = \omega ^*\), while not optimal, is close enough to optimal not to matter in practice.

We will assume that \(D=(0,1]^2\) with Dirichlet boundary conditions, as this simplifies our analysis of SOR—for damped Jacobi, we could just as easily have assumed periodic boundary conditions. We will measure convergence rates with respect to the induced infinity norm, which obeys the identity

$$\begin{aligned} \Vert A\Vert _{\infty } = \max _{i=1}^N \sum _{j=1}^N |a_{ij}| \end{aligned}$$
(28)

for any \(N \times N\) matrix A. Note that the iterates of the error \(\mathbf{e}^{(0)}\), \(\mathbf{e}^{(1)}\), \(\ldots \) associated with any stationary iterative method with iteration matrix M obey the bounds

$$\begin{aligned} \Vert \mathbf{e}^{(n)}\Vert \le \Vert M\Vert ^n\Vert \mathbf{e}^{(0)}\Vert \text{ and } R(\mathbf{e}) := \root n \of {\frac{\Vert \mathbf{e}^{(n)}\Vert }{\Vert \mathbf{e}^{(0)}\Vert }} \le \Vert M\Vert \end{aligned}$$
(29)

for any vector norm \(\Vert \cdot \Vert \) and induced matrix norm. We will be interested in these identities in the particular case that the vector norm is \(\Vert \cdot \Vert _{\infty }\), and the stationary iterative method is damped Jacobi or SOR. Here

$$\begin{aligned} \mathbf{e}^{(n)} := u_h - u^{(n)}_h \end{aligned}$$

denotes the difference between the exact solution to (15), found by first solving (15) to machine precision, with the approximate solution at iteration n.

Proposition 1

Suppose semi-implicit Guidefill with guidance direction \(\mathbf{g}=(\cos \theta ,\sin \theta )\) is used to inpaint the square \([0,1)^2\) under the assumptions above, using either damped Jacobi or SOR to solve (15). Suppose that in the case of SOR, \(\partial D^{(k)}_h= \{ (ih,kh)\}_{i=0}^{N-1}\) is ordered from left to right if \(\cos \theta \ge 0\) and from right to left otherwisen. Let \({\mathcal {L}}\) be as in (16) and define \({\mathcal {L}} = D-L-U\), where D, \(-L\), and \(-U\) are the diagonal, strictly lower triangular, and strictly upper triangular parts of \({\mathcal {L}}\) respectively. Let \(J_{\omega }\) and \(G_{\omega }\) denote the iteration matrices of damped Jacobi and SOR respectively with relaxation parameter \(\omega \), that is

$$\begin{aligned} J_{\omega }= & {} I - \omega D^{-1} {\mathcal {L}} \\ G_{\omega }= & {} (I-\omega D^{-1}L)^{-1}((1-\omega )I+D^{-1}U). \end{aligned}$$

Let \(r=\epsilon /h\) and define \(\theta _c = \arcsin (1/r)\). Define, for \(\theta _c \le \theta \le \pi - \theta _c\), \(j^* = \lfloor \frac{1}{\sin \theta } \rfloor \le r\). Let W be the total weight (17), let \({\tilde{W}}\) and \({\tilde{w}}_{0,0}\) be as in (27). Then, in the limit as \(\mu \rightarrow \infty \), we have

$$\begin{aligned} W= & {} \sum _{j=1}^r \frac{1}{j}, \qquad {\tilde{w}}_{0,0}=(1-\sin \theta )(1-|\cos \theta |)\\ {\tilde{W}}= & {} {\left\{ \begin{array}{ll} \sum _{j=1}^r \frac{1}{j} - r \sin \theta &{} \text{ if } \theta \in (0,\theta _c] \cup [\pi -\theta _c,\pi ) \\ \sum _{j=1}^{j^*} \frac{1}{j} - j^* \sin \theta &{} \text{ if } \theta \in (\theta _c , \pi - \theta _c) \end{array}\right. } \end{aligned}$$

and

$$\begin{aligned} \Vert J_{\omega }\Vert _{\infty }= & {} |1-\omega |+\omega \left( \frac{{\tilde{W}}-{\tilde{w}}_{0,0}}{W-{\tilde{w}}_{0,0}} \right) \quad \text{ for } \omega \in (0,2) \\ \Vert G_{\omega }\Vert _{\infty }= & {} \frac{|1-\omega |}{1-\omega \frac{{\tilde{W}}-{\tilde{w}}_{0,0}}{W-{\tilde{w}}_{0,0}}} \quad \text{ for } \omega \in (0,1], \\ \end{aligned}$$

where \(\Vert \cdot \Vert _{\infty }\) is the induced infinity matrix norm (28). The optimal \(\omega \in (0,2)\) is in both cases independent of \(\theta \) and equal to \(\omega = 1\), where we obtain

$$\begin{aligned} \Vert J_1\Vert _{\infty } = 1 - \frac{r\sin \theta }{\sum _{j=1}^r \frac{1}{j}-(1-\sin \theta )(1-|\cos \theta |)} \end{aligned}$$

if \(\theta \in (0, \theta _c] \cup [\pi -\theta _c,\pi ),\)

$$\begin{aligned} \Vert J_1\Vert _{\infty } =1 - \frac{\sum _{j=j^*+1}^r \frac{1}{j} + j^*\sin \theta }{\sum _{j=1}^r \frac{1}{j}-(1-\sin \theta )(1-|\cos \theta |)} \end{aligned}$$

if \(\theta \in (\theta _c, \pi - \theta _c)\), and

$$\begin{aligned} \Vert G_1\Vert _{\infty } = 0. \end{aligned}$$

Proof

“Appendix D”. \(\square \)

Corollary 1

For the special case of

$$\begin{aligned} \omega ^* = \left( 1 - \frac{{\tilde{w}}_{0,0}}{W}\right) = \left( 1-\frac{(1-\sin \theta )(1-|\cos \theta |)}{ \sum _{j=1}^r \frac{1}{j}}\right) \end{aligned}$$

that is, the parameter value equivalent to running Algorithm 1 with “semiImplicit” set to true, we obtain

$$\begin{aligned} \Vert G_{\omega ^*}\Vert _{\infty }= & {} \frac{{\tilde{w}}_{0,0}}{W-{\tilde{W}}_k+{\tilde{w}}_{0,0}}\\ \Vert J_{\omega ^*}\Vert _{\infty }= & {} \frac{{\tilde{W}}_k}{W} \end{aligned}$$

or

$$\begin{aligned} \Vert G_{\omega ^*}\Vert _{\infty }= & {} \frac{(1-\sin \theta )(1-|\cos \theta |)}{r\sin \theta +(1-\sin \theta )(1-|\cos \theta |)}\\ \Vert J_{\omega ^*}\Vert _{\infty }= & {} 1 - \frac{r\sin \theta }{\sum _{j=1}^r \frac{1}{j}} \end{aligned}$$

if \(\theta \in (0, \theta _c] \cup [\pi -\theta _c,\pi )\) and

$$\begin{aligned} \Vert G_{\omega ^*}\Vert _{\infty }= & {} \frac{(1-\sin \theta )(1-|\cos \theta |)}{\sum _{j=j^*+1}^r \frac{1}{j} + j^*\sin \theta +(1-\sin \theta )(1-|\cos \theta |)}\\ \Vert J_{\omega ^*}\Vert _{\infty }= & {} 1 - \frac{\sum _{j=j^*+1}^r \frac{1}{j} + j^*\sin \theta }{\sum _{j=1}^r \frac{1}{j}} \end{aligned}$$

if \(\theta \in (\theta _c, \pi - \theta _c)\). See Fig. 14 for a plot of \(\Vert G_{\omega ^*}\Vert _{\infty }\) and \(\Vert J_{\omega ^*}\Vert _{\infty }\) as a function of \(\theta \) for \(r=3\).

Proof

This follows from direct substitution of \(\omega ^*\) given by (19) into Proposition 1. \(\square \)

Fig. 14
figure 14

Convergence rates of damped Jacobi and SOR for semi-implicit Guidefill: here we compare the experimentally measured convergence rates of the implementation of semi-implicit Guidefill outlined in the blue text of Algorithm 1 (\(r=3\), \(\mathbf{g}=(\cos \theta ,\sin \theta )\) and \(\mu = 100\)) with the theoretical bounds on \(\Vert J_{\omega ^*}\Vert _{\infty }\) and \(\Vert G_{\omega ^*}\Vert _{\infty }\) from Corollary 1. Note the excellent performance of SOR in comparison with damped Jacobi

As noted in Sect. 5, the implementation of semi-implicit Guidefill outlined in Algorithm 1 (blue text) is equivalent to solving the linear system (15) iteratively using damped Jacobi (parallel implementation) or SOR (sequential implementation), with relaxation parameter \(\omega ^*\) given by (19). Figure 14 compares the experimentally measured convergence rates of the implementation of semi-implicit Guidefill outlined in the blue text of Algorithm 1 (\(r=3\), \(\mathbf{g}=(\cos \theta ,\sin \theta )\) and \(\mu = 100\)) with the theoretical bounds on \(\Vert J_{\omega ^*}\Vert _{\infty }\) and \(\Vert G_{\omega ^*}\Vert _{\infty }\) from Corollary 1. Specifically, (a) and (c) confirm experimentally the first bound in (29) in the cases \(M=J_{\omega ^*}\) and \(M=G_{\omega ^*}\), that is, damped Jacobi and SOR with relaxation parameter \(\omega ^*\), for the case \(\theta = 2^{\circ }\). The inpainting problem in this case is the same as in Fig. 12a, and all the parameters of semi-implicit Guidefill are the same. The “exact” solution \(u_h\) was found by first solving (15) to machine precision. In each case, we measured convergence rates only within the first “shell” of the inpainting problem. Next, (b)–(d) confirm experimentally the second bound in (29), as a function of \(\theta \). The inpainting problem is the same as the one in Fig. 5a, and all parameters are the same. In this case we vary \(\theta \) from \(1^{\circ }\) up to \(179^{\circ }\) in increments of one degree, in each case iteratively solving (15) (again, only for the first shell), computing \(R(\mathbf{e})\), and comparing with \(\Vert J_{\omega ^*}\Vert _{\infty }\) and \(\Vert G_{\omega ^*}\Vert _{\infty }\). Note the excellent performance of SOR in comparison with damped Jacobi.

Although our choice of \(\omega ^*\) is non-optimal, it is convenient to implement and the difference in performance is negligible. In particular, for \(r=3\) we have \(\Vert G_{\omega ^*}\Vert _{\infty } \le R(e) \le 0.06\) indpendent of \(\theta \). Moreover, it follows from Corollary 1 that \(\Vert G_{\omega ^*}\Vert _{\infty }\) is decreasing function of r for each fixed \(\theta \), so this bound holds for all \(r \ge 3\) as well.

6.3 Convergence of Algorithm 1 to Continuum and Asymptotic Limits

In this section we prove the convergence of the direct and semi-implicit forms of Algorithm 1 to an “asymptotic limit” and a “fixed-ratio continuum limit”. Then, in Sect. 7, we use these limits to explain the kinking and blur artifacts of Algorithm 1 observed in practice as outlined in Sect. 2. We begin by defining what these limits are:

Definition of the continuum limit We wish to prove convergence of the direct and semi-implicit forms of Algorithm 1 to a continuum limit u given by the transport equation

$$\begin{aligned} \nabla u \cdot \mathbf{g}_r^* = 0, \qquad u \Big |_{y = 0} = u_0 \Big |_{y = 0} \qquad u\Big |_{x = 0}=u\Big |_{x = 1} \end{aligned}$$
(30)

Specifically, we wish to prove convergence when we take \((h,\epsilon ) \rightarrow (0,0)\) along the path \(\epsilon = rh\) (with \(r \in \mathbb {N}\) fixed). Because of the assumptions we have made in Sect. 6.1, \(\mathbf{g}_r^*\) will turn out to be a constant equal to the center of mass of the stencil \(a^*_r\) with respect to the stencil weights \(\{ w_r(\mathbf{0},\mathbf{y}) : \mathbf{y} \in a^*_r \}\) (Definition 1), that is

$$\begin{aligned} \mathbf{g}_r^* = \frac{\sum _{\mathbf{y} \in a^*_r} w_r(0,\mathbf{y})\mathbf{y}}{\sum _{\mathbf{y} \in a^*_r} w_r(0,\mathbf{y})}. \end{aligned}$$
(31)

As we will allow discontinuous boundary data \(u_0\), the solution to (30) must be defined in a weak sense. However, since \(\mathbf{g}_r^*\) is a constant, this is simple. So long as \(\mathbf{g}_r^* \cdot e_2 \ne 0\), we simply define the solution to the transport problem (30) to be

$$\begin{aligned} u(\mathbf{x}) = u_0(\varPi _{\theta _r^*}(\mathbf{x})) \end{aligned}$$
(32)

where

$$\begin{aligned} \varPi _{\theta _r^*}(x,y)=(x-\cot (\theta _r^*)y \mod 1,0). \end{aligned}$$
(33)

We call \(\varPi _{\theta _r^*}: D \rightarrow \partial D\) the transport operator associated with (32). The \(\text{ mod }\) 1 is due to our assumed periodic boundary conditions and

$$\begin{aligned} \theta _r^* = \theta (\mathbf{g}_r^*) \in (0,\pi ) \end{aligned}$$

is the counterclockwise angle between the x-axis and the line \(L_{\mathbf{g}_r^*} := \{ \lambda \mathbf{g}_r^* : \lambda \in \mathbb {R} \}\).

Remark 10

When \(u_0\) is smooth, convergence is straightforward to prove—we did so in [27, Theorem 1] for the direct form of Algorithm 1 using a simple argument based on Taylor series. However, in this paper we are interested in a more general setting where \(u_0\) may not be smooth and Taylor series may not be available. Instead of Taylor series, our main tool will be a connection to stopped random walks.

Definition of the asymptotic limit Unlike the continuum limit, which finds the limiting value of \(u_h\) when \((h,\epsilon ) \rightarrow (0,0)\) along the path \(\epsilon = rh\) with \(r \in \mathbb {N}\) fixed, the asymptotic limit gives the asymptotic dependence of \(u_h\) on h when \(r = \epsilon /h\) is fixed and \(h \ll 1\). For a fixed \(\mathbf{x }=(x,y) \in D_h\) the asymptotic limit is computed in the following two steps:

  1. 1.

    Given discrete boundary data \(u_0 : {\mathcal {U}}_h \rightarrow \mathbb {R}\), generate new boundary data \({\tilde{u}}_0 : \partial U_h \rightarrow \mathbb {R} \) generated from \(u_0\) by convolving it with a discrete convolution kernel

    $$\begin{aligned} {\tilde{g}}_{\sigma (y,h)}(x_1,x_2) = g_{\sigma (y,h)}(x_1)\varDelta (x_2), \end{aligned}$$

    where \(g_{\sigma (y,h)}\) is a one-dimensional Gaussian with variance dependent on both h and y, and \(\varDelta \) is a function that “mixes” the top r rows of \(u_0\) by taking a convex combination with weights independent of both h and y.

  2. 2.

    \(u_h\) is now given by the solution to the same transport equation (30) as in the fixed ratio continuum limit, but with the boundary data \(u_0\) replaced by \({\tilde{u}}_0 = {\tilde{g}}_{\sigma (y,h)} * u_0\). That is,

    $$\begin{aligned} u_h(x,y) = ({\tilde{g}}_{\sigma (y,h)} * u_0)(\varPi _{\theta _r^*}(x,y))+o(1). \end{aligned}$$

Remark 11

By \(({\tilde{g}}_{\sigma (y,h)} * u_0)(\mathbf{x})\) we mean the discrete circular convolution

$$\begin{aligned} ({\tilde{g}}_{\sigma (y,h)} * u_0)(\mathbf{x}):= & {} \sum _{\mathbf{y} \in {\mathcal {U}}_h} u_0(\mathbf{y}){\tilde{g}}_{\sigma (y,h)}(\mathbf{x}-\mathbf{y}), \end{aligned}$$

where the x-coordinate of \(\mathbf{x}-\mathbf{y}\) is defined modulo 1. Strictly speaking, for this definition to coincide with the usual definition of discrete convolution given in, for example, [35, Ch. 6], we must have \(\mathbf{x}-\mathbf{y} \in {\mathcal {U}}_h\). In our case \(\mathbf{x}\) may lie between pixel centers, so we do not make this restriction.

6.4 Connection to Stopped Random Walks

The convergence of Algorithm 1 to both limits above is established based on a connection between Algorithm 1 and stopped random walks. The purpose of this section is to explain this connection.

To begin, Note that the update formula (21) gives a relationship between \(u_h(\mathbf{x})\) and its immediate neighbors in \(a^*_r\), which for now we assume obeys \(a^*_r \subseteq b^-_r\) or \(a^*_r \subseteq b^0_r\) (if this is not the case we can apply the method of equivalent weights from Sect. 4). Now suppose we modify (21) iteratively by repeated application of the following rule: for each \(\mathbf{y} \in a^*_r\), if \(\mathbf{x}+h\mathbf{y} \in D_h\), replace \(u_h(\mathbf{x}+h\mathbf{y})\) in the RHS of (21) with the RHS of a version of (21) where the LHS is evaluated at \(\mathbf{x}+h\mathbf{y}\) (in other words, we are substituting (21) into itself). Otherwise, if \(\mathbf{x}+h\mathbf{y} \in {\mathcal {U}}_h\), we are already in the undamaged portion of the image, and we may replace \(u_h(\mathbf{x}+h\mathbf{y})\) with \(u_0(\mathbf{x}+h\mathbf{y})\). Repeat this procedure until \(u_h(\mathbf{x})\) is entirely expressed as a weighted sum of \(u_0 \Big |_{{\mathcal {U}}_h}\), that is

$$\begin{aligned} u_h(\mathbf{x}) = \sum _{\mathbf{y} \in {\mathcal {U}}_h} \rho (\mathbf{y}) u_0(\mathbf{y}), \end{aligned}$$
(34)

for some as of yet unknown weights \(\rho \). Denoting \(\mathbf{x}:=(nh,mh)\), then for the direct form of Algorithm 1 this procedure will terminate after m steps, as in this case (21) expresses \(u_h(\mathbf{x})\) in terms of neighbors at least h units below it. On the other hand, for the semi-implicit extension, (34) has to be interpreted as a limit.

Fig. 15
figure 15

Connection to stopped random walks: here we illustrate the connection between the elimination procedure described in the text and stopped random walks. In a the pixel \(\mathbf{x}\) (colored black) is expressed as a weighted average (1) of its neighbors in in \(b^-_r\) (colored in red). In this case we use the direct form of Algorithm 1 with \(r=3\) and uniform weights (which is why each neighbor in \(b^-_r\) is the same shade of red). In c, d we have applied \(k=4\), \(k=17\), and \(k=40\) steps of our elimination procedure (which essentially consists of substituting (1) into itself repeatedly—details in the text), and \(u_h(\mathbf{x})\) is now expressed as weighted sum of \(u_h(\mathbf{y})\) for whichever pixels \(\mathbf{y}\) are colored red, with darker shades of red indicating greater weight. The purpose of this procedure is to express the color of a given pixel \(\mathbf{x}\) deep inside the inpainting domain entirely in terms of the colors of known pixels below the line \(y=0\) (shown in blue)

This elimination procedure has a natural interpretation in terms of stopped random walks. Since the weights \(\{ \frac{w(\mathbf{0},\mathbf{y})}{W} \}_{\mathbf{y} \in a^*_r}\) are non-negative and sum to 1, we can interpret them as the density of a two dimensional random vector \(\mathbf{Z} := (U,V)\) taking values in \(b^-_r\) or \(b^0_r\) with density

$$\begin{aligned} P(\mathbf{Z}_i =\mathbf{y})=\frac{w_r(\mathbf{0},\mathbf{y})}{W}. \end{aligned}$$
(35)

Moreover, defining the random walk

$$\begin{aligned} \mathbf{X}_j :=(X_j,Y_j) = (nh,mh) + h \sum _{i=1}^j \mathbf{Z}_i \end{aligned}$$
(36)

with \(\{ \mathbf{Z}_i \}\) i.i.d. and equal to \(\mathbf{Z}\) in distribution, and defining

$$\begin{aligned} \tau = \inf \{ j : \mathbf{X}_j \in {\mathcal {U}}_h \}=\inf \{ j : Y_j \le 0 \}, \end{aligned}$$
(37)

then after k steps of elimination, we have

$$\begin{aligned} u_h(\mathbf{x}) = \sum _{\mathbf{y} \in \varOmega _h} \rho _{\mathbf{X}_{j \wedge \tau }}(\mathbf{y}) u_h(\mathbf{y}), \end{aligned}$$

where \(\rho _{\mathbf{X}_{j \wedge \tau }}\) denotes the density of \(\mathbf{X}_{j \wedge \tau }\). Denoting the mean of \(\mathbf{Z}\) by \((\mu _x,\mu _y)\) note that by (31) we have the equivalence

$$\begin{aligned} (\mu _x, \mu _y) = \mathbf{g}^*_r. \end{aligned}$$

In other words, the mean of \(\mathbf{Z}\) is precisely the transport direction of our limiting Eq. (30). The condition \(\mathbf{g}^*_r \cdot e_2 \ne 0\), which we needed for (30) to be defined, implies \(\mu _y < 0\). In the nomenclature of random walks, this means that \(\mathbf{X}_k\) has negative drift in the y direction, while \(\tau \) is the first passage time through \(y=0\). Fortunately, this type of random walk and this type of first passage time have been studied and are well understood. See for example [24, Chapter 4], [22, 23, 25]. The book [24] also provides an good overview of stopped random walks in general. In particular, we know immediately that \(\tau \) is a stopping time, \(P(\tau = \infty )=0\), and \(\tau \) has finite moments of all orders [24, Chapter 3, Theorems 1.1 and 3.1]. It follows that

$$\begin{aligned} u_h(\mathbf{x}) = \sum _{\mathbf{y} \in {\mathcal {U}}_h} \rho _{\mathbf{X}_{\tau }}(\mathbf{y}) u_0(\mathbf{y}) = E[u_0(\mathbf{X}_{\tau })]. \end{aligned}$$
(38)

See Fig. 15 for an illustration of these ideas.

Lattice coordinates and absolute coordinates

Our derivation of the asymptotic limit leverages known results for stopped random walks. However, the results most relevant to us concern random walks on the integer lattice \(\mathbb {Z}^2\), whereas our random walk is on the scaled lattice \(h \cdot \mathbb {Z}^2\). In order to facilitate conversion between our setting and the setting of the results we quote, we make the following definition

Definition 2

Given \(\mathbf{x}=(x,y) = (ih,jh) \in D_h \subset h \cdot \mathbb {Z}^2\), we refer to \((i,j) \in \mathbb {Z}^2\) as the lattice coordinates of \(\mathbf{x}\), while we refer to \((x,y)=(ih,jh)\) as the absolute coordinates of \(\mathbf{x}\).

Overshoot and the ladder process associated with \(\mathbf{X}_{\tau }\)

In general, we expect that the stopped random walk \(\mathbf{X}_{\tau }\) will overshoot the line \(y=0\). The (asymptotic) nature of the overshoot probability density is well understood and is described briefly here (we will need it in Sect. 6.5). However, in order to describe it, we first need to define the ladder process associated with \(\mathbf{X}_{\tau }\). The strict ladder process \(\{n_i, S_{n_i} \}_{i \ge 1}\) associated with the y-coordinate of \(\mathbf{X}_{\tau }\) is given by

$$\begin{aligned} S_n= & {} \sum _{i=1}^n V_i, \end{aligned}$$
(39)
$$\begin{aligned} n_1= & {} \inf \{n: V_n < 0\}\end{aligned}$$
(40)
$$\begin{aligned} n_{i+1}= & {} \inf \{n: S_n < S_{n_{i}}\}. \end{aligned}$$
(41)

Following the notation in [36], we denote by H the law of \(V_{n_1}\), and write \(\nu _y = E[V_{n_1}]\). The (asymptotic) overshoot function \(\varDelta \) is then given by the following identity  [36, Equation (1.4)]:

$$\begin{aligned} \varDelta (kh) = H((-\infty ,k))/\nu _y. \end{aligned}$$
(42)

in absolute coordinates or

$$\begin{aligned} \varDelta (k) = H((-\infty ,k))/\nu _y. \end{aligned}$$
(43)

in lattice coordinates (Definition 2). Note that H here is a probability measure, so that \(H((-\infty ,k))\) refers to the measure of the interval \((-\infty ,k)\).

6.5 Derivation of the Asymptotic Limit

The key idea of our asymptotic limit is to combine (38) with known results on the asymptotic distribution of \(\rho _{\mathbf{X}_{\tau }}\) started from \((x,y)=(nh,mh) \in D_h\) when m is very large (for fixed y, this is equivalent to h very small). This problem has been considered by a number of authors, including [24, Chapter 4],  [22, 23, 25], where it is shown that \(\rho _{\mathbf{X}_{\tau }}\) obeys a central limit theorem, asymptotically approaching a normal distribution. However, for our purposes it is not enough to know the asymptotic distribution of the stopped walk for \(m \gg 1\), we must also know the rate of convergence as a function of m. While many authors have considered the former, to our knowledge only [30, 36] have considered the latter. Our results here are based on [36], specifically the corollary immediately following Theorem 3.3. The following theorem restates this result, along with a modified version adapted to our periodic boundary conditions.

Theorem 1

Let \(\mathbf{X}_{\tau }:=(X_{\tau },Y_{\tau }) \in \mathbb {Z}^2\) be a stopped random walk started from \((n,m) \in \mathbb {Z}^2\) and with i.i.d. increments \(\varDelta \mathbf{X}_i = \mathbf{Z}_i\). Assume that \(\mathbf{X}_{\tau }\) has negative drift, that is \((\mu _x,\mu _y) =E[\mathbf{Z}_i]\) obeys \(\mu _y < 0\) and define the stopping time \(\tau \) by

$$\begin{aligned} \tau := \inf \{ n : Y_n < 0 \}. \end{aligned}$$

Define the characteristic function \(\varphi _{\mathbf{Z}_i}\) of \(\mathbf{Z}_i\) by

$$\begin{aligned} \varphi _{\mathbf{Z}_i}(\mathbf{u}) = E[e^{\sqrt{-1}{} \mathbf{{Z_i}}\cdot \mathbf{u}}] \end{aligned}$$

and assume that \(|\varphi _{\mathbf{Z}_i}|<1\) unless each component of \(\mathbf{u}\) is an integer multiple of \(2\pi \). Then

$$\begin{aligned} \lim _{m \rightarrow \infty } \sqrt{m}|\rho _{\mathbf{X}_{\tau }}(i,j)-{\tilde{G}}_{\sigma (m),\mu }(i,j)| =0 \end{aligned}$$
(44)

uniformly in \((i,j) \in \mathbb {Z}^2\), where

$$\begin{aligned} {\tilde{G}}_{\sigma (m),\mu }(i,j)= & {} G_{\sigma (m),\mu }(i)\varDelta (j) \\ G_{\sigma (m),\mu }(i)= & {} \frac{e^{-\frac{(i-\mu )^2}{2\sigma (m)^2}}}{\sqrt{2\pi } \sigma (m)} \end{aligned}$$

with

$$\begin{aligned} \sigma (m)^2 = \frac{\gamma ^2 m}{|\mu _y|^3} \quad \text{ where } \gamma ^2 = {\text {Var}}(\mu _x V_1 - \mu _y U_1). \end{aligned}$$
(45)

and

$$\begin{aligned} \mu = n - \frac{\mu _x}{\mu _y}m \end{aligned}$$

and \(\varDelta (j)\) is given by (43). Moreover, if the domain of \(\mathbf{X}_{\tau }\) is changed to

$$\begin{aligned} \mathbf{X}_{\tau } \in \{0,1,\ldots ,N-1\} \times \{ -(r+2),-(r+1),\ldots ,N\} \end{aligned}$$

with periodic boundary condition \(0 \sim N\) in the x-direction, and if \(m \le N\) while \(\mathbf{Z}_i := (U_i,V_i)\) obeys \(V_i \le 0\), then the same result holds but with \(G_{\sigma (m),\mu }(i)\) replaced by

$$\begin{aligned} G^*_{\sigma (m),\mu ,N}(i) = \frac{e^{-\frac{d^{{\text {circ}}}_{N}(i,\mu ^*)^2}{2\sigma (m)^2}}}{\sqrt{2\pi } \sigma (m)} \end{aligned}$$

where

$$\begin{aligned} \mu ^* = n - \frac{\mu _x}{\mu _y}m \mod N. \end{aligned}$$

and \(d^{{\text {circ}}}_{N}\) denotes the circular distance defined for \(x,y \in \mathbb {R}\) by

$$\begin{aligned} d^{{\text {circ}}}_{N}(x,y) = \min (x-y \mod N,y-x \mod N) \end{aligned}$$
(46)

If \(\mu = 0\), we write \(G_{\sigma }\) and \(G^*_{\sigma ,N}\) in place of \(G_{\sigma ,0}\) and \(G^*_{\sigma ,0,N}\).

Proof

The first statement follows from the (un-numbered) Corollary immediately following Theorem 3.3 in [36]. The second statement is a simple corollary, the details of which may be found in “Appendix F”. \(\square \)

The asymptotic limit follows straightforwardly from Theorem 1 and (38) once we have completed the following tasks:

  1. 1.

    The second statement of Theorem 1 gives the asymptotic probability that \(X_{\tau } = i\) for every \(i=0,\ldots ,N-1\) for our random walk with periodic boundary conditions. However, the Gaussian \(G^*_{\sigma (m),\mu ,N}\) that \(X_{\tau }\) converges to is not normalized with respect to summation over \(\{i\}_{i=0}^{N-1}\). Lemma 1 addresses this problem by allowing us to convert statements about \(G^*_{\sigma (m),N}\) into statements about \(g^*_{\sigma (m),N}\), a discrete Gaussian that is normalized with respect to summation over the set \(\{i\}_{i=0}^{N-1}\).

  2. 2.

    The assumption in Theorem 1 on the characteristic function of \(\mathbf{Z}_i\) can be inconvenient to check. Definition 3 gives an equivalent geometric condition (which can typically be verified by inspection) that is proven in Lemma 2 to be equivalent.

  3. 3.

    Theorem 1 is expressed with respect to the integer lattice \(\mathbb {Z}^2\), whereas our random walk is on the scaled lattice \(h \cdot \mathbb {Z}^2\). Lemma 3 uses Theorem 1 combined with Lemmas 1 and 2 to prove a version of Theorem 1 applicable to our setting.

  4. 4.

    We need to derive an expression for \(\varDelta (x_2)\). This is done in Lemma 4.

  5. 5.

    We need to derive a bound on the probability of \(\mathbf{X}_{\tau }\) deviating significantly from \(\varPi _{\theta _r^*}(\mathbf{x})\). This is done in Lemma 5.

Once the above items are taken care of, we prove convergence to the asymptotic limit in Theorem 2.

We begin with task one. For now, we work in lattice coordines (Definition 2). Let us denote by \(G_{\sigma ,\mu }\) and \(G^*_{\sigma ,\mu ,N}\) the standard and periodic Gaussians with mean \(\mu \) and variance \(\sigma ^2\) defined for \(x \in \mathbb {R}\) and \(x \in [0,N)\) respectively by

$$\begin{aligned} G_{\sigma ,\mu }(x) = \frac{e^{-\frac{(x-\mu )^2}{2\sigma ^2}}}{\sqrt{2\pi }\sigma } \qquad G^*_{\sigma ,\mu ,N}(x) = \frac{e^{-\frac{d^{{\text {circ}}}_{N}(x,\mu )^2}{2\sigma ^2}}}{\sqrt{2\pi }\sigma } . \end{aligned}$$
(47)

These are the functions that \(X_{\tau }\) converges to in the two forms of Theorem 1 (infinite domain and periodic domain). However, neither of them are probability densities. This is because while

$$\begin{aligned} \int _{\mathbb {R}} G_{\sigma ,\mu }(x)dx=1 \end{aligned}$$

is true, the domain of \(X_{\tau }\) in the infinite case is not \(\mathbb {R}\), but rather \(\mathbb {Z}\), and

$$\begin{aligned} \sum _{i \in \mathbb {Z}} G_{\sigma ,\mu }(i) \ne 1. \end{aligned}$$

For \(G^*_{\sigma ,\mu ,N}\), the situation is worse: we have

$$\begin{aligned} \int _{0}^{N} G^*_{\sigma ,\mu ,N}(x)dx \ne 1 \quad \text{ and } \quad \sum _{i=0}^{N-1} G^*_{\sigma ,\mu ,N}(i) \ne 1. \end{aligned}$$

We therefore consider it more natural to work with the discrete Gaussians

$$\begin{aligned} g_{\sigma ,\mu }(i)= & {} \frac{e^{-\frac{(i-\mu )^2}{2\sigma ^2}}}{\sum _{k \in \mathbb {Z}} e^{-\frac{(k-\mu )^2}{2\sigma ^2}}} \nonumber \\ g^*_{\sigma ,\mu ,N}(i)= & {} \frac{e^{-\frac{d^{{\text {circ}}}_{N}(i,\mu )^2}{2\sigma ^2}}}{\sum _{k=0}^{N-1} e^{-\frac{d^{{\text {circ}}}_{N}(k,\mu )^2}{2\sigma ^2}}} \end{aligned}$$
(48)

which are normalized by construction (similarly to \(G_{\sigma ,\mu }\) and \(G^*_{\sigma ,\mu ,N}\), when \(\mu = 0\) we simply write \(g_{\sigma }\) and \(g^*_{\sigma ,N}\)). The following lemma allows us to convert statements about \(G^*_{\sigma ,\mu ,N}\) into statements about \(g^*_{\sigma ,\mu ,N}\). A nearly identical argument would allow us to do the same for \(G_{\sigma ,\mu }\) and \(g_{\sigma ,\mu }\), but we do not pursue this here. Lemma 1 also contains a result regarding discrete sums of Gaussians which is to the best of our knowledge novel and of interest in and of itself.

Lemma 1

  1. i.

    There is a constant \(C>0\) such that for \(|\sigma | \ge 1\) we have

    $$\begin{aligned} \left| \sum _{n \in \mathbb {Z}} e^{-\frac{(n-\mu )^2}{2\sigma ^2}} - \sqrt{2\pi }\sigma \right| \le C \sigma e^{-2 \pi ^2 \sigma ^2}\end{aligned}$$
    (49)
  2. ii.

    Let \(G^*_{\sigma ,N+1}\) and \(g^*_{\sigma ,N+1}\) be defined as above, with common mean \(\mu \) and variance \(\sigma ^2\). Further suppose that \(\sigma ^2\) is proportional to \(m \in \mathbb {N}\), that is \(\sigma ^2(m) = {\tilde{\sigma }}^2 m\) for some constant \({\tilde{\sigma }}^2> 0\). Suppose further that \(m \le N\) (recall that \(N=1/h\)). Then there exist constants \(A > 0\) and \(b >0 \) such that

    $$\begin{aligned} |G^*_{\sigma (m),N}(x)-g^*_{\sigma (m),N}(x)|<Am^{\frac{1}{2}}e^{-bm} \end{aligned}$$

    uniformly in \(x \in \mathbb {R}\) for m sufficiently large.

Proof

We prove only the first statement. The second statement is then derived from the first together with routine tail estimates on gaussians, and is provided in “Appendix G”.

First we note that our sum is closely related to Jacobi theta functions of the third type, defined by [5]

$$\begin{aligned} \vartheta _3(q) = \sum _{n \in \mathbb {Z}} q^{n^2}. \end{aligned}$$

This function is known explicitly for a few values of q, for example

$$\begin{aligned} \vartheta _3(e^{-\pi })=\frac{\pi ^{\frac{1}{4}}}{\Gamma (\frac{3}{4})}. \end{aligned}$$
(50)

We will use this identity later on.

Our argument is based on the Poisson summation formula [2, pp. 322–333], which states that if \(f : \mathbb {R} \rightarrow \mathbb {R}\) is a Schwartz function (that is, decays faster than \(|x|^{-N}\) for all \(N \in \mathbb {N}\) for x sufficiently large), then

$$\begin{aligned} \sum _{n \in \mathbb {Z}} f(n) = \sum _{n \in \mathbb {Z}} \int ^{\infty }_{-\infty } f(x) e^{-i2\pi n x}dx. \end{aligned}$$

Choosing \(f(x) = e^{-\frac{(x-\mu )^2}{2\sigma ^2}}\) gives

$$\begin{aligned} \sum _{n \in \mathbb {Z}} e^{-\frac{(n-\mu )^2}{2\sigma ^2}}= & {} \sum _{n \in \mathbb {Z}} \int ^{\infty }_{-\infty } e^{-\frac{(x-\mu )^2}{2\sigma ^2}} e^{-i2\pi n x}dx \\= & {} \sum _{n \in \mathbb {Z}} \int ^{\infty }_{-\infty } e^{-\frac{(x-\mu )^2}{2\sigma ^2}} \cos (2\pi n x)dx, \end{aligned}$$

where on line two, noting that the LHS side is real, we have eliminated the imaginary part of the RHS as well. Changing variables in the integral to \(y=x-\mu \) gives

$$\begin{aligned} \sum _{n \in \mathbb {Z}} e^{-\frac{(n-\mu )^2}{2\sigma ^2}}= & {} \sum _{n \in \mathbb {Z}} \int ^{\infty }_{-\infty } e^{-\frac{y^2}{2\sigma ^2}} \cos (2\pi n y+2\pi n \mu )dy \\= & {} \sum _{n \in \mathbb {Z}} \cos (2\pi n \mu ) \int ^{\infty }_{-\infty } e^{-\frac{y^2}{2\sigma ^2}} \cos (2\pi n y)dy \end{aligned}$$

Since the \(\sin \) terms in the expansion of \(\cos (2\pi n x+2\pi n \mu )\) are odd functions, they integrate to 0, and we have eliminated them in line two. But it is well known that

$$\begin{aligned} \int ^{\infty }_{-\infty } e^{-\frac{y^2}{2\sigma ^2}} \cos (2\pi n y)dy = \sqrt{2\pi }\sigma e^{-2\pi ^2 \sigma ^2 n^2} \end{aligned}$$

(this follows from the identity for the Fourier Transform of a Gaussian [8]). Hence

$$\begin{aligned} \sum _{n \in \mathbb {Z}} e^{-\frac{(n-\mu )^2}{2\sigma ^2}}=\sqrt{2\pi }\sigma \sum _{n \in \mathbb {Z}} e^{-2\pi ^2 \sigma ^2 n^2}\cos (2\pi n \mu ) \end{aligned}$$

and therefore (using the fact that \(|\sigma | \ge 1\) on line four):

$$\begin{aligned} \left| \sum _{n \in \mathbb {Z}} e^{-\frac{(n-\mu )^2}{2\sigma ^2}}-\sqrt{2\pi }\sigma \right|\le & {} 2\sqrt{2\pi }\sum ^{\infty }_{n=1} e^{-2\pi ^2 \sigma ^2 n^2}|\cos (2\pi n \mu )| \\\le & {} 2\sqrt{2\pi }\sum ^{\infty }_{n=1} e^{-2\pi ^2 \sigma ^2 n^2} \\= & {} 2\sqrt{2\pi }\sigma e^{-\pi ^2\sigma ^2} \sum _{n=1}^{\infty } e^{-2\pi ^2\sigma ^2(n^2-1)} \\\le & {} 2\sqrt{2\pi }\sigma e^{-\pi ^2\sigma ^2} \sum _{n=1}^{\infty } e^{-\pi (n^2-1)} \\= & {} \sqrt{2\pi }e^{\pi }\left( \vartheta _3(e^{-\pi })-1\right) \sigma e^{-\pi ^2\sigma ^2}. \end{aligned}$$

Using (50), claim i. then follows with

$$\begin{aligned} C= & {} \sqrt{2\pi }e^{\pi }\left( \vartheta _3(e^{-\pi })-1\right) \\= & {} \sqrt{2\pi }e^{\pi }\left( \frac{\pi ^{\frac{1}{4}}}{\Gamma (3/4)}-1\right) \approx 5.014. \end{aligned}$$

\(\square \)

Remark 12

Although statement i. of Lemma 1 is stated asymptotically, in practice the convergence is so rapid that we have

$$\begin{aligned} \sum _{n \in \mathbb {Z}} e^{-\frac{n^2}{2\sigma ^2}} = \sqrt{2\pi }\sigma \end{aligned}$$

to machine precise for \(|\sigma |>2\). This has been noted empirically by other authors, e.g. [20, p. 288], but as far as we know (49) is the first explicit bound. Note also that our bound (49) is extremely tight. For \(\sigma =1\), the first value of \(\sigma \) for which our bound is valid, the left hand side and right hand side of (49) differ by \(O(10^{-12})\). For \(\alpha > 1\), the bound rapidly becomes even tighter.

Having established a connection between discrete and continuous Gaussians, we now move onto task number two and derive a necessary and sufficient condition for the characteristic function \(\varphi _{\mathbf{Z}_i}\) of \(\mathbf{Z}_i\) to satisfy the conditions of Theorem 1.

Definition 3

We say that a stencil \(a^*_r\) is “confined to a sublattice of \(\mathbb {Z}^2\)” if there exists an \((n,m)\in \mathbb {Z}^2\) and \(k_1,k_2 \in \mathbb {Z}\) with at least one of \(k_1,k_2\) greater than one in magnitude, such that the density of \(\text{ Supp }(a^*_r)\) is zero outside of \((n,m)+k_1\mathbb {Z}\times k_2\mathbb {Z}\). We say that \(a^*_r\) is not confined to any sublattice of \(\mathbb {Z}^2\) if no such (nm) and \(k_1,k_2\) exist.

Lemma 2

The characteristic function of \(\mathbf{Z}_i\), defined by

$$\begin{aligned} \varphi _{\mathbf{Z}_i}(\mathbf{u}) = E[e^{\sqrt{-1}{} \mathbf{{Z_i}}\cdot \mathbf{u}}] \end{aligned}$$

obeys \(|\varphi _{\mathbf{Z}_i}|<1\) unless each component of \(\mathbf{u}\) is an integer multiple of \(2\pi \) if and only if the associated stencil \(a^*_r\) is not confined to any sublattice of \(\mathbb {Z}^2\) (in the sense of Definition 3).

Proof

First assume that \(a^*_r\) is not confined to any sublattice of \(\mathbb {Z}^2\). For convenience, let us write \({\tilde{w}}_\mathbf{y} := {\tilde{w}}_r(\mathbf{0},\mathbf{y})\). By the definition of expectation we have

$$\begin{aligned} \varphi _{\mathbf{Z}_i}(\mathbf{u}) =E[e^{\sqrt{-1}{} \mathbf{{Z_i}}\cdot \mathbf{u}}]=\frac{\sum _{\mathbf{y} \in \text{ Supp }(a^*_r)} {\tilde{w}}_\mathbf{y}e^{\sqrt{-1}{} \mathbf{y} \cdot \mathbf{u}}}{W}. \end{aligned}$$

Since \(W \equiv \sum _{\mathbf{y} \in \text{ Supp }(a^*_r)}{\tilde{w}}_\mathbf{y}\) we have \(|\varphi _{\mathbf{Z}_i}(\mathbf{u})| \le 1\), with equality if and only if \(e^{\sqrt{-1}{} \mathbf{{y}}\cdot \mathbf{u}}\) have the same phase for all \(\mathbf{y} \in \text{ Supp }(a^*_r)\) such that \({\tilde{w}}_\mathbf{y} >0\). Since \(a^*_r\) is not confined to any sublattice of \(\mathbb {Z}^2\), there must exist a \(\mathbf{y}_1 \in \text{ Supp }(a^*_r)\) such that \({\tilde{w}}_{\mathbf{y}_1}>0\) and \({\tilde{w}}_{\mathbf{y}_1+e_1}>0\). Thus

$$\begin{aligned} |\varphi _{\mathbf{Z}_i}(\mathbf{u})| = 1\Rightarrow & {} e^{\sqrt{-1}{} \mathbf{{y}_1}\cdot \mathbf{u}}=e^{\sqrt{-1}{(\mathbf {y}_1}+e_1)\cdot \mathbf{u}} \\\Rightarrow & {} e^{\sqrt{-1}{} \mathbf{u} \cdot e_1}=1 \end{aligned}$$

after canceling a common factor of \(e^{\sqrt{-1}{\mathbf{y}_1}}\). But then \(\mathbf{u} \cdot e_1\) must be an integer multiple of \(2\pi \), and a similar argument with proves that the same must be true of \(\mathbf{u} \cdot e_2\).

Now assume that \(a^*_r\) is confined to some sublattice \((n,m)+k_1\mathbb {Z} \times k_2\mathbb {Z}\) of \(\mathbb {Z}^2\). Then defining \(\mathbf{u} := (u_1,u_2)\) and \(\mathbf{y}=(y_1,y_2)\) we can write

$$\begin{aligned} |\varphi _{\mathbf{Z}_i}(\mathbf{u})|= & {} \big |\sum _{\mathbf{y} \in \mathbb {Z}^2} \frac{{\tilde{w}}_\mathbf{y}}{W} e^{\sqrt{-1}(n+y_1k_1,m+y_2k_2)\cdot \mathbf{u}}\big | \\= & {} \big |e^{\sqrt{-1}(n,m) \cdot \mathbf{u}}\sum _{\mathbf{y} \in \mathbb {Z}^2}\frac{{\tilde{w}}_\mathbf{y}}{W} e^{\sqrt{-1}(y_1k_1,y_2k_2)\cdot \mathbf{u}}\big | \\= & {} \big |\sum _{\mathbf{y}\in \mathbb {Z}^2}\frac{{\tilde{w}}_\mathbf{y}}{W} e^{\sqrt{-1}(y_1k_1,y_2k_2)\cdot \mathbf{u}}\big | \end{aligned}$$

If \(u_1 = 2\pi /k_1\) and \(u_2 = 2\pi /k_2\), then we have

$$\begin{aligned} |\varphi _{\mathbf{Z}_i}(\mathbf{u})| = \sum _{\mathbf{y} \in \mathbb {Z}^2} \frac{{\tilde{w}}_\mathbf{y}}{W} = 1. \end{aligned}$$

Since at least one of \(k_1, k_2\) is greater than one in magnitude, at least one of \(u_1\), \(u_2\) is not an integer multiple of \(2\pi \). \(\square \)

Next, we use the above lemma together with Lemma 1 to prove a version of Theorem 1 expressed in terms of a discrete Gaussian \(g^*_{\sigma (y,h),\mu ^*,1}\) rather than \(G^*_{\sigma (m),\mu ^*,N}\).

Lemma 3

Let \(a^*_r\) be a stencil not confined to any sublattice of \(\mathbb {Z}^2\) in the sense of Definition 3, or equivalently that the characteristic function of \(\mathbf{Z}_i\), that is

$$\begin{aligned} \varphi _{\mathbf{Z}_i}(\mathbf{u}) = \sum _{\mathbf{y} \in \text{ Supp }(a^*_r)} \frac{w_r(\mathbf{0},\mathbf{y}) e^{\sqrt{-1}{} \mathbf{y} \cdot \mathbf{u}}}{W}, \end{aligned}$$

obeys \(|\varphi _{\mathbf{Z}_i}(\mathbf{u})| < 1\) unless both components of \(\mathbf{u}\) are integer multiples of \(2\pi \). Let \(\mathbf{x}=(x,y) \in D_h\), and let \(\mathbf{X}_{\tau }\) denote the stopped random walk started from \(\mathbf{x}\) described in Sect. 6.4. Then for y fixed, we have

$$\begin{aligned} \rho _{X_{\tau }}(ih,jh)={\tilde{g}}_{\sigma (y,h),{\hat{x}}}(ih,jh)+o(\sqrt{h}) \end{aligned}$$

where

$$\begin{aligned} {\tilde{g}}_{\sigma (y,h),{\hat{x}}}(ih,jh)=g^*_{\sigma (y,h),{\hat{x}},1}(ih)\varDelta (jh) \end{aligned}$$

with \(\varDelta (jh)\) given by (42) and where \(g^*_{\sigma (y,h),{\hat{x}},1}\) is the one dimensional discrete periodic gaussian (48) with mean

$$\begin{aligned} {\hat{x}}=x - \frac{\mu _x}{\mu _y}y \mod 1 \equiv \varPi _{\theta _r^*}(x,y), \end{aligned}$$

where \(\varPi _{\theta _r^*}\) is the transport operator (33), and with variance

$$\begin{aligned} \sigma (y,h)^2 = \frac{\gamma ^2 y h}{|\mu _y|^3} \quad \text{ where } \gamma ^2 = {\text {Var}}(\mu _x V_1 - \mu _y U_1). \end{aligned}$$

Proof

Let \(\mathbf{x}=(x,y) = (nh,mh)\). By Theorem 1 we have

$$\begin{aligned} \rho _{X_{\tau }}(ih,jh)=G^*_{\sigma (m),\mu ^*,N}(i)\varDelta (jh)+o\left( \frac{1}{\sqrt{m}}\right) , \end{aligned}$$

where \(G^*_{\sigma (m),\mu ^*,N}\) is defined by (47), has mean \(\mu ^* = {\hat{n}} := {\hat{x}}/h\) and variance \(\sigma (m)^2 = \frac{\gamma ^2 m}{|\mu _y|^3}:={\tilde{\sigma }}^2m\). By Lemma 1 statement ii, we have

$$\begin{aligned} G^*_{\sigma (m),{\hat{n}},N}(i) = g^*_{\sigma (m),{\hat{n}},N}(i)+O(m^{\frac{1}{2}}e^{-cm}), \end{aligned}$$

for some constant \(c>0\), and hence

$$\begin{aligned} \rho _{X_{\tau }}(ih,jh)=g^*_{\sigma (m),{\hat{n}},N}(i)\varDelta (jh)+o\left( \frac{1}{\sqrt{m}}\right) , \end{aligned}$$

since the asymptotic decay of \(m^{\frac{1}{2}}e^{-cm}\) is strictly faster than that of \(m^{-\frac{1}{2}}\) for all \(c > 0\). But \(h = y/m\) and y is a constant, so \(o(\frac{1}{\sqrt{m}}) = o(\sqrt{h})\). Finally, since

$$\begin{aligned} g^*_{\sigma (m),{\hat{n}},N}(i)= & {} \frac{e^{-\frac{d^{{\text {circ}}}_{N}(i,{\hat{n}})^2}{2{\tilde{\sigma }}^2m}}}{\sum _{j=0}^{N-1} e^{-\frac{d^{{\text {circ}}}_{N}(j,{\hat{n}})^2}{2{\tilde{\sigma }}^2m}} }\\= & {} \frac{e^{-\frac{d^{{\text {circ}}}_{1}(ih,{\hat{x}})^2}{2{\tilde{\sigma }}^2yh}}}{\sum _{j=0}^{N-1} e^{-\frac{d^{{\text {circ}}}_{1}(jh,{\hat{x}})^2}{2{\tilde{\sigma }}^2yh}} }=g^*_{\sigma (y,h),{\hat{x}},1}(ih), \end{aligned}$$

we conclude

$$\begin{aligned} \rho _{X_{\tau }}(ih,jh)=g^*_{\sigma (y,h),{\hat{x}},1}(ih)\varDelta (jh)+o\left( \sqrt{h}\right) . \end{aligned}$$

\(\square \)

Now we move onto task three and derive an expression for \(\varDelta (x_2)\) for the random walk \(\mathbf{X}_{\tau }\) under consideration:

Lemma 4

Let H denote the law of \(V_{n_1}\), where \(n_1 = \inf \{ k : V_k < 0 \}\), and let \(\nu _y = E[V_{n_1}]\). Let \(\varDelta \) denote the asymptotic law of the overshoot \(Y_{\tau }\). Then, for the direct method we have

$$\begin{aligned} \varDelta (kh) = \frac{1}{{\mu _y}}\frac{\sum _{\mathbf{j} \in {\bar{b}}^-_r} {\tilde{w}}_{r}(\mathbf{0},\mathbf{j}) 1(\mathbf{j} \cdot e_2 < k)}{W}. \end{aligned}$$

For the semi-implicit method we have

$$\begin{aligned} \varDelta (kh) = \frac{1}{{\nu _y}}\frac{\sum _{\mathbf{j} \in {\bar{b}}^0_r} {\tilde{w}}_{r}(\mathbf{0},\mathbf{j}) 1(\mathbf{j} \cdot e_2 < k \cap \mathbf{j} \cdot e_2 \ne 0 )}{\sum _{\mathbf{j} \in {\bar{b}}^0_r} {\tilde{w}}_{r}(\mathbf{0},\mathbf{j})1(\mathbf{j} \cdot e_2 \ne 0 )}, \end{aligned}$$

where \(\nu _y = (1-p_s)^{-1}\mu _y\) and \(p_s\) is the probability that the random walk in the implicit case does not move downwards in a given step:

$$\begin{aligned} p_s = \frac{1}{W}\sum _{\mathbf{j} \in {\bar{b}}^0_r} {\tilde{w}}_{r}(\mathbf{0},\mathbf{j})1(\mathbf{j} \cdot e_2 = 0 ). \end{aligned}$$

Proof

First, note that since the increments \(\{V\}\) obey \(-r-1\le V_i \le 0\) for both the direct and semi-implicit methods, we have

$$\begin{aligned} \varDelta (kh) = H((-\infty ,k))/\nu _y = H([-r-1,k))/\nu _y . \end{aligned}$$

For the direct method, the ladder process above is particularly simple, since \(\mathbf{X_{\tau }}\) is guaranteed to move at least one pixel down in the y direction every iteration. In this case we simply have \(n_i = i\) for all i. Hence \(\nu _y = \mu _y\) and H is equal to the law of \(V_1\). It follows that

$$\begin{aligned} \varDelta (kh)= & {} \frac{1}{\mu _y}H([-r-1,k))\\= & {} \frac{1}{\mu _y}\frac{\sum ^{k-1}_{\ell = -r-1} \sum _{\mathbf{j} \in {\bar{b}}^-_r} {\tilde{w}}_{r}(\mathbf{0},\mathbf{j})1(\mathbf{j}\cdot e_2 = \ell )}{W} \\= & {} \frac{1}{\mu _y} \frac{\sum _{\mathbf{j} \in {\bar{b}}^-_r}\sum ^{k-1}_{\ell = -r-1} {\tilde{w}}_{r}(\mathbf{0},\mathbf{j})1(\mathbf{j}\cdot e_2 = \ell )}{W}\\= & {} \frac{1}{{\mu _y}}\frac{\sum _{\mathbf{j} \in {\bar{b}}^-_r} {\tilde{w}}_{r}(\mathbf{0},\mathbf{j}) 1(\mathbf{j} \cdot e_2 < k)}{W}. \end{aligned}$$

The semi-implicit case is more complex, since although \(X_{\tau }\) can never move in the positive y direction, it can “pause” at a constant value of y for an arbitrarily long duration before resuming its downward march. Let \(p_s\) denote the probability that a given increment of the walk does not move in the y-direction, that is \(p_s = P(V_i=0)\). Then we clearly have

$$\begin{aligned} p_s = \frac{1}{W}\sum _{\mathbf{j} \in {\bar{b}}^0_r} {\tilde{w}}_{r}(\mathbf{0},\mathbf{j})1(\mathbf{j} \cdot e_2 = 0 ). \end{aligned}$$

By Wald’s Identity we have \(\nu _y = E[V_{n_1}]=\mu _y E[n_1]\) since \(n_1\) is a stopping time. We have

$$\begin{aligned} E[n_1]= & {} \sum _{n=1}^{\infty } p_s^{n-1}(1-p_s)n \\= & {} \frac{1-p_s}{p_s}\sum _{n=1}^{\infty } np_s^n\\= & {} \frac{1-p_s}{p_s}\cdot \frac{p_s}{(1-p_s)^2}=\frac{1}{1-p_s}, \end{aligned}$$

and hence

$$\begin{aligned} \nu _y = \frac{\mu _y}{1-p_s}. \end{aligned}$$

The law of \(V_{n_1}\) (and hence H) is given by:

$$\begin{aligned} p_{V_{n_1}}(\mathbf{j}) = \frac{{\tilde{w}}_{r}(\mathbf{0},\mathbf{j})1(\mathbf{j} \cdot e_2 \ne 0 )}{\sum _{\mathbf{j} \in {\bar{b}}^0_r} {\tilde{w}}_{r}(\mathbf{0},\mathbf{j})1(\mathbf{j} \cdot e_2 \ne 0 )}. \end{aligned}$$

It follows that

$$\begin{aligned} \varDelta (kh)= & {} \frac{1}{\nu _y}H([-r-1,k)) \\= & {} \frac{1}{\mu _y}\frac{\sum ^{k-1}_{\ell = -r-1} \sum _{\mathbf{j} \in {\bar{b}}^0_r} {\tilde{w}}_{r}(\mathbf{0},\mathbf{j})1(\mathbf{j}\cdot e_2 = \ell )}{\sum _{\mathbf{j} \in {\bar{b}}^0_r} {\tilde{w}}_{r}(\mathbf{0},\mathbf{j})1(\mathbf{j} \cdot e_2 \ne 0 )} \\= & {} \frac{1}{\nu _y} \frac{\sum _{\mathbf{j} \in {\bar{b}}^0_r}\sum ^{k-1}_{\ell = -r-1} {\tilde{w}}_{r}(\mathbf{0},\mathbf{j})1(\mathbf{j}\cdot e_2 = \ell )}{\sum _{\mathbf{j} \in {\bar{b}}^0_r} {\tilde{w}}_{r}(\mathbf{0},\mathbf{j})1(\mathbf{j} \cdot e_2 \ne 0 )}\\= & {} \frac{1}{{\nu _y}}\frac{\sum _{\mathbf{j} \in {\bar{b}}^0_r} {\tilde{w}}_{r}(\mathbf{0},\mathbf{j}) 1(\mathbf{j} \cdot e_2 < k )}{\sum _{\mathbf{j} \in {\bar{b}}^0_r} {\tilde{w}}_{r}(\mathbf{0},\mathbf{j})1(\mathbf{j} \cdot e_2 \ne 0 )}. \end{aligned}$$

\(\square \)

Our final task before proving our main result is to bound the probability of \(\mathbf{X}_{\tau }\) deviating significantly from \(\varPi _{\theta _r^*}(\mathbf{x})\). The following Lemma accomplishes this.

Lemma 5

Let \(\mathbf{x} \in D_h\) and \(\hat{\mathbf{x}}:=({\hat{x}},0) := \varPi _{\theta ^*_r}(\mathbf{x})\). Let \(d^{{\text {circ}}}_{1}\) denote the circular distance (46). Then

$$\begin{aligned} P(d^{{\text {circ}}}_{1}({ X}_{\tau },{\hat{x}})> \delta )\le & {} 2\exp \left( -\frac{(\delta -rh)^2\sin ^2\theta ^*_r}{4\frac{r}{|\mu _y|}rh}\right) \\&\quad +\exp \left( -\frac{1}{4\frac{r}{|\mu _y|}rh}\right) . \end{aligned}$$

Proof

Our random walk on the periodic domain \([0,1) \times (-\delta ,1]\) is equivalent to the same walk on the infinite domain \(\mathbb {R} \times (-\delta ,1]\), modulo the equivalence relation \((x,y) \sim (x+k,y)\) for all \(k \in \mathbb {Z}\). Let us denote (within this lemma only) our periodic walk by \(\mathbf{X}^*_{\tau }\) while the walk on \(\mathbb {R} \times (-\delta ,1]\) is denoted by \(\mathbf{X}_{\tau }\). Then we have

$$\begin{aligned} \{ d^{{\text {circ}}}_1(X^*_{\tau },{\hat{x}})> \delta \}= & {} \bigcap _{k \in \mathbb {Z}} \{ |X_{\tau }-({\hat{x}}+k)|> \delta \} \\\subseteq & {} \{ |X_{\tau }-{\hat{x}}| > \delta \}. \end{aligned}$$

We therefore have

$$\begin{aligned} P( d^{{\text {circ}}}_1(X^*_{\tau },{\hat{x}})> \delta ) \le P( |X_{\tau }-{\hat{x}}| > \delta ), \end{aligned}$$

and hence it suffices to prove

$$\begin{aligned} P( |X_{\tau }-{\hat{x}}| > \delta )\le & {} 2\exp \left( -\frac{(\delta -rh)^2\sin ^2\theta ^*_r}{4\frac{r}{|\mu _y|}rh}\right) \\&\quad +\exp \left( -\frac{1}{4\frac{r}{|\mu _y|}rh}\right) . \end{aligned}$$

Step 1. \(P\left( \tau > \lceil \frac{2}{|\mu _y|h} \rceil \right) \le \exp \left( -\frac{1}{4\frac{r}{|\mu _y|}rh}\right) .\) We define

$$\begin{aligned} M_k := Y_{k \wedge \tau } - (\tau \wedge k) \mu _yh -mh \end{aligned}$$

which the reader may verify is a zero mean martingale with bounded increments

$$\begin{aligned} |M_{k+1}-M_k| \le rh. \end{aligned}$$

Next we note that for any k the following events are equal:

$$\begin{aligned} \{ \tau \ge k \}= & {} \{Y_{\tau \wedge k} \ge 0\} = \{ M_k \ge -(k \wedge \tau ) \mu _y h - mh \} \\= & {} \{ M_k \ge -k\mu _y h - mh \}. \end{aligned}$$

Therefore

$$\begin{aligned} \left\{ \tau \ge \left\lceil \frac{2}{|\mu _y|h} \right\rceil \right\}= & {} \left\{ M_{ \left\lceil \frac{2}{|\mu _y|h} \right\rceil } \ge - \left\lceil \frac{2}{|\mu _y|h} \right\rceil \mu _y h - mh \right\} \\\subseteq & {} \left\{ M_{ \left\lceil \frac{2}{|\mu _y|h} \right\rceil } \ge 1 \right\} \end{aligned}$$

where we have used the inequality

$$\begin{aligned} -\left\lceil \frac{2}{|\mu _y|h} \right\rceil \mu _y h - mh = \left\lceil \frac{2}{|\mu _y|h} \right\rceil |\mu _y| h - mh \ge 2-mh \ge 1. \end{aligned}$$

Noting that \(M_0 = 0\) we apply Azuma’s inequality to find

$$\begin{aligned} P\left( \tau > \left\lceil \frac{2}{|\mu _y|h} \right\rceil \right)\le & {} P \left( M_{ \left\lceil \frac{2}{|\mu _y|h} \right\rceil } \ge 1 \right) \\\le & {} \exp \left( - \frac{1}{2\left\lceil \frac{2}{|\mu _y|h} \right\rceil r^2h^2} \right) \\\le & {} \exp \left( -\frac{1}{4\frac{r}{|\mu _y|}rh}\right) . \end{aligned}$$

Step 2. Let \(d(\mathbf{X}_{\tau },L_{\mathbf{x},\hat{\mathbf{x}}})\) denote the orthogonal distance from \(\mathbf{X}_{\tau }\) to the line passing through \(\mathbf{x}\) and \(\hat{\mathbf{x}}\). This time we define

$$\begin{aligned} M_k := (\mathbf{X}_{\tau \wedge k} - \mathbf{x}) \cdot (-\sin \theta ^*_r,\cos \theta ^*_r). \end{aligned}$$

Once again, \(M_k\) is a zero-mean martingale with bounded increments \(|M_{k+1}-M_k| \le rh\) obeying \(M_0 = 0\). Therefore, applying Azuma’s inequality again gives, for any \(k \in \mathbb {N}\),

$$\begin{aligned} P(|M_k| > \delta ) \le 2\exp \left( -\frac{\delta ^2}{2kr^2h^2}\right) . \end{aligned}$$

Moreover, if \(\tau \le k\), then we have the equality

$$\begin{aligned} |M_k|=d(\mathbf{X}_{\tau },L_{\mathbf{x},\hat{\mathbf{x}}}) \end{aligned}$$

For any integer \(k \in \mathbb {N}\) we clearly have

$$\begin{aligned} P(d(\mathbf{X}_{\tau },L_{\mathbf{x},\hat{\mathbf{x}}})> \delta )\le & {} P( \{ d(\mathbf{X}_{\tau },L_{\mathbf{x},\hat{\mathbf{x}}})> \delta \} \cap \{ \tau \le k \})\\&+P( \tau> k )\\\le & {} P(|M_k|> \delta ) + P(\tau > k). \end{aligned}$$

Taking \(k = \lceil \frac{2}{|\mu _y|h} \rceil \), substituting the above bound as well as the one from Step 1 gives

$$\begin{aligned} P(d(\mathbf{X}_{\tau },L_{\mathbf{x},\hat{\mathbf{x}}}) > \delta )\le & {} 2\exp \left( -\frac{\delta ^2}{4\frac{r}{|\mu _y|}rh}\right) \\&+\exp \left( -\frac{1}{4\frac{r}{|\mu _y|}rh}\right) . \end{aligned}$$

Finally, by the triangle inequality and simple trigonometry we have

$$\begin{aligned} | X_{\tau }-{\hat{x}}| \le \frac{d(\mathbf{X}_{\tau },L_{\mathbf{x},\hat{\mathbf{x}}})}{\sin \theta ^*_r}+rh. \end{aligned}$$

Hence

$$\begin{aligned} P(|X_{\tau }-{\hat{x}} |> \delta ) \le P(d(\mathbf{X}_{\tau },L_{\mathbf{x},\hat{\mathbf{x}}}) > (\delta -rh)\sin \theta ^*_r), \end{aligned}$$

from which the claimed result follows trivially. \(\square \)

Having accomplished all of our tasks, we are now ready to prove our main result.

Theorem 2

Let the inpainting domain D and undamaged area \({\mathcal {U}}\), as well as their discrete counterparts \(D_h\), \({\mathcal {U}}_h\) be as described in Sect. 6.1. Let \(u_0 : {\mathcal {U}} \rightarrow \mathbb {R}^d\) denote as usual the undamaged portion of the image, and assume \(u_0\) is bounded in norm, that is, there is an \(M > 0\) such that \(\Vert u_0\Vert \le M\) on \({\mathcal {U}}\). Suppose we inpaint \(D_h\) using Algorithm 1 or its semi-implicit extension, and denote the result by \(u_h : D_h \rightarrow \mathbb {R}^d\). Assume the assumptions of Sect. 6.1 hold and let \(a^*_r\) denote the stencil of our inpainting method. Let \({Z_i} = (U_i,V_i)\) taking values in \(\text{ Supp }(a^*_r)\) with mean \((\mu _x,\mu _y)\) denote the increments of the random walk described above, with probability density given by (35). Assume that \(a^*_r\) is not confined to any sublattice of \(\mathbb {Z}^2\) in the sense of Definition 3, or equivalently that the characteristic function \(\varphi _{\mathbf{Z}_i}\) of \(\mathbf{Z}_i\), that is

$$\begin{aligned} \varphi _{\mathbf{Z}_i}(\mathbf{u}) = \sum _{\mathbf{y} \in \text{ Supp }(a^*_r)} \frac{w_r(\mathbf{0},\mathbf{y}) e^{\sqrt{-1}{} \mathbf{y} \cdot \mathbf{u}}}{W}, \end{aligned}$$

obeys \(|\varphi _{\mathbf{Z}_i}(\mathbf{u})| < 1\) unless both components of \(\mathbf{u}\) are integer multiples of \(2\pi \). Let \(\varPi _{\theta ^*_r} : D \rightarrow \partial D\) denote the transport operator defined in (32) (recall that that the transport direction \(\mathbf{g}^*_r\) obeys \(\mathbf{g}^*_r=(\mu _x,\mu _y)\)), and let \(\mathbf{x}=(x,y)=(nh,mh) \in D_h\), and assume y is fixed. Let

$$\begin{aligned} {\tilde{g}}_{\sigma (y,h)}(ih,jh)=g^*_{\sigma (y,h),1}(ih){\bar{\varDelta }}(jh) \end{aligned}$$

where \(g^*_{\sigma (y,h),1}\) is the one-dimensional discrete Gaussian

$$\begin{aligned} g_{\sigma (y,h)}(ih)=\frac{e^{-\frac{d^{{\text {circ}}}_{1}(ih,0)^2}{2\sigma (y,h)^2}}}{\sum _{k=0}^{N-1} e^{-\frac{d^{{\text {circ}}}_{1}(kh,0)^2}{2\sigma (y,h)^2}}} \end{aligned}$$

with variance

$$\begin{aligned} \sigma (y,h)^2 = \frac{\gamma ^2 y h}{|\mu _y|^3} \quad \text{ where } \gamma ^2 = {\text {Var}}(\mu _x V_1 - \mu _y U_1) \end{aligned}$$

and where \({\bar{\varDelta }}\) denotes the even extension of \(\varDelta \) defined by

$$\begin{aligned} {\bar{\varDelta }}(jh)=\varDelta (|j|h) \end{aligned}$$

where \(\varDelta (jh)\) is given by Lemma 4. Then

$$\begin{aligned} u_h(x,y) = (u_0 * {\tilde{g}}_{\sigma (y,h)})(\varPi _{\theta ^*_r}(x,y)) +o(1) \end{aligned}$$
(51)

Proof

Our goal is to show that for any \(\epsilon > 0\), taking h sufficiently small gives

$$\begin{aligned} \Vert u_h(x,y) - (u_0 * {\tilde{g}}_{\sigma (y,h)})(\varPi _{\theta ^*_r}(x,y))\Vert < \epsilon . \end{aligned}$$

Let us define \(\hat{\mathbf{x}} :=({\hat{x}},0) := \varPi _{\theta ^*_r}(x,y)\). Then this is equivalent to proving

$$\begin{aligned} \left\| \sum _{\mathbf{x} \in {\mathcal {U}}_h}u_0(\mathbf{x})\left( \rho _{\mathbf{X}_{\tau }}(\mathbf{x} ) - {\tilde{g}}_{\sigma (y,h)}(\hat{\mathbf{x}}-\mathbf{x})\right) \right\| < \epsilon , \end{aligned}$$

where the x-coordinate of \(\hat{\mathbf{x}}-\mathbf{x}\) is understood to be taken modulo 1 due to our periodic boundary conditions. Since \(\Vert u_0\Vert \le M\) it suffices to prove

$$\begin{aligned} S\equiv & {} \sum _{(ih,jh) \in {\mathcal {U}}_h} \left| \rho _{\mathbf{X}_{\tau }}(ih,jh) - \varDelta (jh)g^*_{\sigma (y,h),1}({\hat{x}}-ih)\right| \\< & {} \frac{\epsilon }{M}. \end{aligned}$$

We will prove shortly that the per-term error

$$\begin{aligned} \left| \rho _{\mathbf{X}_{\tau }}(ih,jh) - \varDelta (jh)g^*_{\sigma (y,h),1}({\hat{x}}-ih)\right| \end{aligned}$$

is of size \(o(1/\sqrt{N})\). However, this condition alone is not enough to prove the desired result, as S contains O(N) terms. To get around this, we introduce the sets \(I = \{0,1,\ldots ,N-1\}\) and

$$\begin{aligned} I_k=\{ i \mod N\}_{i=\lceil \frac{{\hat{x}}}{h}-k\sqrt{m}\rceil }^{\lfloor \frac{{\hat{x}}}{h}+k\sqrt{m} \rfloor }, \end{aligned}$$

where \(k \in \mathbb {N}\) is a constant to be determined. Note that \(I_k\) may equivalently be characterized as

$$\begin{aligned} I_k = \left\{ i \in I : d^{{\text {circ}}}_{N}\left( i,\frac{{\hat{x}}}{h}\right) \le k\sqrt{m}\right\} , \end{aligned}$$

which in turn means that

$$\begin{aligned} I \backslash I_k = \left\{ i \in I : d^{{\text {circ}}}_{N}\left( i,\frac{{\hat{x}}}{h}\right) > k\sqrt{m}\right\} , \end{aligned}$$
(52)

Which will be critical later.

We break the sum into two pieces \(S = S_1 + S_2\) defined as

$$\begin{aligned} S_1\equiv & {} \sum _{j=-r-2}^0 \sum _{i \in I_k} \left| \rho _{\mathbf{X}_{\tau }}(ih,jh) - \varDelta (jh)g^*_{\sigma (y,h),1}({\hat{x}}-ih)\right| \\ S_2\equiv & {} \sum _{j=-r-2}^0 \sum _{i \in I \backslash I_k} \left| \rho _{\mathbf{X}_{\tau }}(ih,jh) - \varDelta (jh)g^*_{\sigma (y,h),1}({\hat{x}}-ih)\right| . \end{aligned}$$

Since \(m = yN\) and y is fixed, m is proportional to N, and hence \(S_1\) contains \(O(\sqrt{m})=O(\sqrt{N})\) terms concentrated around \({\hat{x}}\). We will control the size of \(S_1\) by using the \(o(\sqrt{h})=o(1/\sqrt{N})\) error estimate from Lemma 3. The second sum \(S_2\) will contain all remaining terms in S. We will bound \(S_2\) by making separate tail estimates for \(\rho _{\mathbf{X}_{\tau }}\) and \(g^*_{\sigma (y,h),1}({\hat{x}}-ih)\), both based on (52). \(\square \)

Claim 1

For \(k \ge \max \left( \frac{2r}{\sqrt{y}},4\right) \) there exist constants \(C_1,C_2,C_3,C_4 > 0\) (dependent on r and y) such that

$$\begin{aligned} S_2 < C_1e^{-C_2k^2}+C_3N^{\frac{3}{2}}e^{-C_4N}. \end{aligned}$$

We defer the proof of Claim 1 until the end. For now, note that it gives us the estimate

$$\begin{aligned} S \le S_1 + C_1e^{-C_2k^2}+C_3N^{\frac{3}{2}}e^{-C_4N}. \end{aligned}$$

Fixing k so that \(C_1e^{-C_2k^2} < \frac{\epsilon }{4M}\) and \(k \ge \frac{2r}{\sqrt{y}}\) are both true, and taking N large enough that

\(C_3N^{\frac{3}{2}}e^{-C_4N}< \frac{\epsilon }{4M}\), it remains to prove that \(S_1 < \frac{\epsilon }{2M}\). Defining \(C :=(r+3)3k\sqrt{y}\) we have

$$\begin{aligned} S_1\equiv & {} \sum _{j=-r-2}^0 \sum _{i \in I_k} \left| \rho _{\mathbf{X}_{\tau }}(ih,jh) - \varDelta (jh)g^*_{\sigma (y,h),1}({\hat{x}}-ih)\right| \\\le & {} C\sqrt{N}\sup _{i,j}\left| \rho _{\mathbf{X}_{\tau }}(ih,jh) - \varDelta (jh)g^*_{\sigma (y,h),1}({\hat{x}}-ih)\right| . \end{aligned}$$

By Lemma 3, we have

$$\begin{aligned}&\sqrt{N}\sup _{i,j}\left| \rho _{\mathbf{X}_{\tau }}(ih,jh) - \varDelta (jh)g^*_{\sigma (y,h),1}({\hat{x}}-ih)\right| \\&\rightarrow 0 \quad \text{ as } N \rightarrow \infty . \end{aligned}$$

Taking making N large enough (equivalently, h small enough) that

$$\begin{aligned} \sqrt{N}\sup _{i,j}\left| \rho _{\mathbf{X}_{\tau }}(ih,jh) - \varDelta (jh)g^*_{\sigma (y,h),1}({\hat{x}}-ih)\right| < \frac{1}{C}\frac{\epsilon }{2M}, \end{aligned}$$

we are done.

Proof of Claim 1

We break \(S_2\) into two pieces:

$$\begin{aligned} S_2\equiv & {} \sum _{j=-r-2}^0 \sum _{i \in I \backslash I_k} \left| \rho _{\mathbf{X}_{\tau }}(ih,jh) - \varDelta (jh)g^*_{\sigma (y,h),1}({\hat{x}}-ih)\right| \\\le & {} \sum _{j=-r-2}^0 \sum _{i \in I \backslash I_k} \rho _{\mathbf{X}_{\tau }}(ih,jh)\\&\quad + \sum _{j=-r-2}^0 \varDelta (jh) \sum _{i \in I \backslash I_k} g^*_{\sigma (y,h),1}({\hat{x}}-ih) \end{aligned}$$

But by (52), we have

$$\begin{aligned} S_2\le & {} P(d^{{\text {circ}}}_1(X_{\tau },{\hat{x}})> k\sqrt{y}\sqrt{N}h) \\&+ \sum _{i \in I \backslash I_k} g^*_{\sigma (y,h),1}({\hat{x}}-ih).\\ \end{aligned}$$

The first of these, we already know how to bound. By Lemma 5, we have

$$\begin{aligned}&P(d^{{\text {circ}}}_1(X_{\tau },{\hat{x}})> k\sqrt{y}\sqrt{N}h) \\&\le 2\exp \left( -\frac{(k\sqrt{y}\sqrt{N}h-rh)^2\sin ^2\theta ^*_r}{4r^2/|\mu _y|h}\right) \\&\quad +\exp \left( -\frac{1}{4\frac{r}{|\mu _y|}rh}\right) \end{aligned}$$

Since \(k \ge \frac{2r}{\sqrt{y}}\) we have

$$\begin{aligned} \frac{(k\sqrt{y}\sqrt{N}h-rh)^2}{h} = (k\sqrt{y}-r\sqrt{h})^2 \ge (k\sqrt{y}-r)^2 \ge 0.5yk^2, \end{aligned}$$

it follows that

$$\begin{aligned}&P(d^{{\text {circ}}}_1(X_{\tau },{\hat{x}})> k\sqrt{y}\sqrt{N}h) \nonumber \\&\quad \le 2\exp \left( -y\left\{ \frac{\sin ^2\theta ^*_r}{16r^2/|\mu _y|}\right\} k^2\right) \nonumber \\&\qquad +\exp \left( -\frac{1}{4\frac{r}{|\mu _y|}rh}\right) . \end{aligned}$$
(53)

For the second, we find it convenient to rewrite the sum in terms of \(G^*_{\sigma (m),N}(i-{\hat{x}}/h)\). To that end, we define \({\tilde{\sigma }}^2=\sigma ^2(y,h)/(yh) = \frac{\gamma ^2}{|\mu _y|^3}\) where \(\gamma ^2 > 0\) is a constant given by (45). Then

$$\begin{aligned} \sum _{i \in I \backslash I_k} g^*_{\sigma (y,h),1}(ih-{\hat{x}})= & {} \sum _{i \in I \backslash I_k} g_{\sigma (m),N}(i-{\hat{x}}/h) \nonumber \\\le & {} \sum _{i \in I \backslash I_k} G^*_{\sigma (m),N}(i-{\hat{x}}/h)\nonumber \\&\quad +\frac{A}{y}m^{\frac{3}{2}}e^{-bm}, \end{aligned}$$
(54)

where we have used Lemma 1 together with \(|I \backslash I_k| \le N=m/y\) in the rightmost inequality. Next, define

$$\begin{aligned} \varXi = \left\{ d^{{\text {circ}}}_{N}\left( i,\frac{{\hat{x}}}{h}\right) : i \in I \backslash I_k \right\} . \end{aligned}$$

Since \(I \backslash I_k \subseteq \mathbb {Z}\), every member of \(\varXi \) must be of the form

$$\begin{aligned} \xi + n \qquad \text{ where } \xi = \frac{{\hat{x}}}{h}-\left\lfloor \frac{{\hat{x}}}{h} \right\rfloor \in [0,1) \end{aligned}$$

for some \(n \in \mathbb {Z}\). But by (52), we must have \(n \ge \lfloor k \sqrt{m} \rfloor \). Finally, since the map \(i \in I \backslash I_k \rightarrow d^{{\text {circ}}}_{N}(i,\frac{{\hat{x}}}{h})\) is at most two to one, we have

$$\begin{aligned} \sum _{i \in I \backslash I_k} G^*_{\sigma (m),N}(i-{\hat{x}}/h)= & {} \sum _{i \in I \backslash I_k} G_{\sigma (m)}(d^{{\text {circ}}}_{N}(i,{\hat{x}}/h)) \\\le & {} 2\sum _{x \in \varXi } G_{\sigma (m)}(x) \\\le & {} 2\sum _{n = \lfloor k \sqrt{m} \rfloor }^{\infty } G_{\sigma (m)}(\xi + n) \end{aligned}$$

The following routine tail estimate gives us a bound on our sum.

$$\begin{aligned} \sum _{i \in I \backslash I_k} G^*_{\sigma (m),N}(i-{\hat{x}}/h)\le & {} 2\sum _{n = \lfloor k \sqrt{m} \rfloor }^{\infty } G_{\sigma (m)}(\xi + n) \\= & {} 2\sum _{n = \lfloor k \sqrt{m} \rfloor }^{\infty } \frac{e^{-\frac{(\xi + n)^2}{2\sigma (m)^2}}}{\sqrt{2\pi }\sigma (m)} \\\le & {} 2\sum _{n = \lfloor k \sqrt{m} \rfloor }^{\infty } \frac{e^{-\frac{n^2}{2\sigma (m)^2}}}{\sqrt{2\pi }\sigma (m)} \\\le & {} 2 \int ^{\infty }_{\lfloor k \sqrt{m} \rfloor -1} \frac{e^{-\frac{x^2}{2\sigma (m)^2}}}{\sqrt{2\pi }\sigma (m)}dx. \end{aligned}$$

Since \(k > 4\), we can find a \(c \in (0,1)\) independent of \(1 \le m \le N\) and k (for example \(c = 0.5\) will work) such that \(ck\sqrt{m} \le \lfloor k \sqrt{m} \rfloor -1\). At the same time, we have \(\sigma (m)^2 = {\tilde{\sigma }}^2m\) where \({\tilde{\sigma }}^2 = \gamma ^2/|\mu _y|^3\) with \(\gamma ^2 = {\text {Var}}(\mu _x V_1 - \mu _y U_1)\). Therefore

$$\begin{aligned} \sum _{i \in I \backslash I_k} G^*_{\sigma (m),N}(i-{\hat{x}}/h)\le & {} 2 \int ^{\infty }_{c k \sqrt{m}} \frac{e^{-\frac{x^2}{2{\tilde{\sigma }}^2m}}}{\sqrt{2\pi }{\tilde{\sigma }}\sqrt{m}}dx \\\le & {} 2 \int ^{\infty }_{c k \sqrt{m}} \frac{x}{ck\sqrt{m}} \frac{e^{-\frac{x^2}{2{\tilde{\sigma }}^2m}}}{\sqrt{2\pi }{\tilde{\sigma }}\sqrt{m}}dx\\= & {} \sqrt{\frac{2}{\pi }}\frac{{\tilde{\sigma }}}{ck}e^{-\frac{c^2k^2}{2{\tilde{\sigma }}^2}} \\\le & {} \sqrt{\frac{2}{\pi }}\frac{{\tilde{\sigma }}}{c}e^{-\frac{c^2k^2}{2{\tilde{\sigma }}^2}} \\ \end{aligned}$$

Putting this together with (54) and (53) gives

$$\begin{aligned} S_2\le & {} 2\exp \left( -y\left\{ \frac{\sin ^2\theta ^*_r}{4r^2/|\mu _y|}\right\} k^2\right) +\exp \left( -\frac{1}{4\frac{r}{|\mu _y|}rh}\right) \\&+\sqrt{\frac{2}{\pi }}\frac{{\tilde{\sigma }}}{c}e^{-\frac{c^2k^2}{2{\tilde{\sigma }}^2}}+Am^{\frac{3}{2}}e^{-bm} \\\le & {} \left( 2+\sqrt{\frac{2}{\pi }}\frac{{\tilde{\sigma }}}{c}\right) e^{-\alpha k^2}+(1+A)y^{\frac{3}{2}}N^{\frac{3}{2}}e^{-\beta y N} \end{aligned}$$

where \(\alpha \equiv \min (\frac{\sin ^2\theta ^*_r}{4r^2/|\mu _y|},\frac{c^2}{2{\tilde{\sigma }}^2})\), \(\beta \equiv \min (\frac{1}{4r^2/|\mu _y|},b)\). Taking \(C_1 = \left( 2+\sqrt{\frac{2}{\pi }}\frac{{\tilde{\sigma }}}{c}\right) \), \(C_2=\alpha \), \(C_3=(1+A)y^{\frac{3}{2}}\), \(C_4=\beta y\) completes the proof of the claim. \(\square \)

6.6 Asymptotic Limit Version 2

An important special case of the general framework considered in the previous two sections occurs when only the x-component of \(\mathbf{X}_{\tau }\) is random. In particular, let us assume that the increments \(\varDelta \mathbf{X}_i = h \mathbf{Z}_i = h \cdot (U,V)\) obey \(U \in \{ -(r+2),-(r+1),\ldots ,r+1,r+2\}\) but \(V \equiv -1\). Then the assumptions of Theorem 1 are never satisfied, as

$$\begin{aligned} \varphi _{\mathbf{Z}_i}(u_1,u_2)=e^{-\sqrt{-1}u_2} \sum _{k=-(r+2)}^{(r+2)} P(U=k)e^{\sqrt{-1}ku_1} \end{aligned}$$

obeys \(|\varphi _{\mathbf{Z}_i}|=1\) for \(u_1\) an integer multiple of \(2\pi \) and any \(u_2 \in \mathbb {R}\). Since we cannot use Theorem 1, a different approach is required in this case.

Here we use the following local central limit theorem for U:

Theorem 3

Let \(X_1,X_2, \ldots \) be i.i.d. copies of an integer-valued random variable X of mean \(\mu \) and variance \(\sigma ^2\). Suppose furthermore that there is no infinite subprogression \(a+q\mathbb { Z}\) of \(\mathbb {Z}\) with \(q>1\) for which X takes values almost surely in \(a+q\mathbb {Z}\). Then one has

$$\begin{aligned} \mathbf{P}( S_n = m) = \frac{1}{\sqrt{2\pi n} \sigma } e^{-(m-n\mu )^2/2n \sigma ^2} + o(1/n^{1/2}) \end{aligned}$$

for all \(n \ge 1\) and all integers m, where the error term \(o(1/n^{1/2})\) is uniform in m.

Proof

This theorem may be found on Terrance Tao’s blog [37]. \(\square \)

The requirement in Theorem 3 “that there is no infinite subprogression \(a+q\mathbb { Z}\) of \(\mathbb {Z}\) with \(q>1\) for which X takes values almost surely in \(a+q\mathbb {Z}\)” is equivalent to the following one dimensional version of Definition 3:

Definition 4

We say that an integer valued random variable X is “confined to a sublattice of \(\mathbb {Z}\)” if there exists an \(n \in \mathbb {Z}\) and \(k \in \mathbb {Z}\) with \(|k| > 1\), such that the density of X is zero outside of \(n+k\mathbb {Z}\). We say that X is not confined to any sublattice of \(\mathbb {Z}\) if no such n and k exist.

Moreover, by an argument almost identical to that of Lemma 2, this is equivalent to the characteristic function \(\varphi _X(u)\) defined by

$$\begin{aligned} \varphi _X(u) = \sum _{j \in \mathbb {Z}} P(X = j) e^{\sqrt{-1}ju} \end{aligned}$$

obeying \(|\varphi _X(u)|<1\) unless u is an integer multiple of \(2\pi \).

Following a sequence of steps almost identical to the previous section, we arrive at a version of the asymptotic limit applicable to this case.

Theorem 4

Let the inpainting domain D and undamaged area \({\mathcal {U}}\), as well as their discrete counterparts \(D_h\), \({\mathcal {U}}_h\) be as described in Sect. 6.1. Let \(u_0 : {\mathcal {U}} \rightarrow \mathbb {R}^d\) denote as usual the undamaged portion of the image, and assume \(u_0\) is bounded in norm, that is, there is an \(M > 0\) such that \(\Vert u_0\Vert \le M\) on \({\mathcal {U}}\). Suppose we inpaint \(D_h\) using Algorithm 1 or its semi-implicit extension, and denote the result by \(u_h : D_h \rightarrow \mathbb {R}^d\). Assume the assumptions of Sect. 6.1 hold and let \(a^*_r\) denote the stencil of our inpainting method. Assume that the stencil weights \({\tilde{w}}_\mathbf{j}\) assign zero mass outside of the line \(\mathbb {Z} \times \{ -1 \}\), and assume that the x-coordinate U of \(\mathbf{Z}_i = (U,V)\) is not confined to any sublattice of \(\mathbb {Z}\) in the sense Definition 4, or equivalently that the characteristic function \(\varphi _U\) of U defined by

$$\begin{aligned} \varphi _U(u) = \sum _{j = -(r+2)}^{r+2} P(U = j) e^{\sqrt{-1}ju} \end{aligned}$$

obeys \(\varphi _U(u) <1\) unless u is an integer multiple of \(2\pi \). Let \(\varPi _{\theta ^*_r} : D \rightarrow \partial D\) denote the transport operator defined in (32) (recall that that the transport direction \(\mathbf{g}^*_r\) obeys \(\mathbf{g}^*_r=(\mu _x,\mu _y)\)), and let \(\mathbf{x}=(x,y) \in D_h\). Let \(g_{\sigma (y,h)}\) denote the one-dimensional discrete Gaussian

$$\begin{aligned} g_{\sigma (y,h)}(ih)=\frac{e^{-\frac{d^{{\text {circ}}}_1(ih,0)^2}{2\sigma (y,h)^2}}}{\sum _{k=0}^{N-1} e^{-\frac{d^{{\text {circ}}}_1(kh,0)^2}{2\sigma (y,h)^2}}} \end{aligned}$$

with mean 0 and variance

$$\begin{aligned} \sigma (y,h) = \sigma ^2 yh \end{aligned}$$

where \(\sigma ^2\) denote the variance of U. Then

$$\begin{aligned} u_h(x,y) = (u_0 * g_{\sigma (y,h)})(\varPi _{\theta ^*_r}(x,y)) +o(1) \end{aligned}$$
(55)

Proof

The proof is nearly identical to that of Theorem 2 and is left as an exercise to the reader. \(\square \)

6.7 Derivation of Fixed-Ratio Continuum Limit

In this section, we prove pointwise convergence of the discrete solution \(u_h(\mathbf{x})\) to the fixed ratio continuum limit \(u(\mathbf{x})\), for those points \(\mathbf{x} \in D\) for which \(\varPi _{\theta _r^*}(\mathbf{x}) \in \partial D\) is a continuity point of \(u_0\). Theorem 2 tells us that

$$\begin{aligned} u_h(\mathbf{x}) = {\tilde{g}}_{\sigma (y,h)} * u_0(\varPi _{\theta _r^*}(\mathbf{x})) + o(1), \end{aligned}$$

where \({\tilde{g}}_{\sigma (y,h)}\) is the blur kernel from Lemma 3. Since \(u(\mathbf{x}) = u_0(\varPi _{\theta _r^*}(\mathbf{x}))\), our job amounts to proving that

$$\begin{aligned} {\tilde{g}}_{\sigma (h)} * u_0(\mathbf{x}) \rightarrow u_0(\mathbf{x}) \qquad \text{ as } h \rightarrow 0 \end{aligned}$$

when \(\mathbf{x}\) is a continuity point of \(u_0\).

Before tackling this problem, it is worth digressing briefly to consider what would happen had we considered the original high resolution vanishing viscosity limit of März and Bornemann, in which first \(h \rightarrow 0\) and then \(\epsilon \rightarrow 0\). Writing \(u_{\epsilon ,h}\) in place of \(u_h\) to make the dependence on \(\epsilon \) explicit, after \(h \rightarrow 0\), we are left with

$$\begin{aligned} u_{\epsilon } = \lim _{h \rightarrow 0} u_{\epsilon ,h} = G_{\sigma (y,\epsilon )} * u_0 \end{aligned}$$

where \(G_{\sigma (y,\epsilon )}\) is an integral convolution operator. For \(u_0 \in BV({\mathcal {U}})\) the limit

$$\begin{aligned} \lim _{\epsilon \rightarrow 0} u_{\epsilon }(\mathbf{x}) \end{aligned}$$

can be almost characterized completely by well known results for functions of bounded variation. For completeness, we quote these results and the relevant definitions here:

Definition 5

(Approximate limit) we say that \(u \in BV(\varOmega )\) has an approximate limit at \(\mathbf{x} \in \varOmega \) if there is a \(z \in \mathbb {R}\) such that

$$\begin{aligned} \lim _{\rho \downarrow 0} \frac{1}{|B_{\rho }(\mathbf{x})|}\int _{B_{\rho }(\mathbf{x})} |u(y)-z|dy = 0. \end{aligned}$$

The set \(S_u\) of points where this property does not hold is called the approximate discontinuity set. For any \(\mathbf{x} \in \varOmega \backslash S_u\), we define \({\tilde{u}}(\mathbf{x}) = z\).

Definition 6

(Approximate jump points) Given \(\mathbf{x} \in \varOmega \subseteq \mathbb {R}^n\), \(\mathbf{v} \in S^{n-1}\), and \(\rho > 0\), define:

$$\begin{aligned} B^+_{\rho }(\mathbf{x},\mathbf{v})):= & {} \{ \mathbf{y} \in B_{\rho }(\mathbf{x}) : (\mathbf{y}-\mathbf{x})\cdot \mathbf{v} > 0\} \\ B^-_{\rho }(\mathbf{x},\mathbf{v})):= & {} \{ \mathbf{y} \in B_{\rho }(\mathbf{x}) : (\mathbf{y}-\mathbf{x})\cdot \mathbf{v} < 0\}. \end{aligned}$$

Let \(u \in BV(\varOmega )\subseteq \mathbb {R}^n\) and \(\mathbf{x} \in \varOmega \). We say that \(\mathbf{x}\) is an approximate jump point of u if there exist \(a,b \in \mathbb {R}\) and \(\mathbf{v} \in \mathbb {R}^{n-1}\) such that \(a \ne b\) and

$$\begin{aligned} \lim _{\rho \downarrow 0} \frac{1}{|B^+_{\rho }(\mathbf{x})|}\int _{B^+_{\rho }(\mathbf{x})} |u(y)-a|dy = 0. \\ \lim _{\rho \downarrow 0} \frac{1}{|B^-_{\rho }(\mathbf{x})|}\int _{B^-_{\rho }(\mathbf{x})} |u(y)-b|dy = 0. \end{aligned}$$

The set of \(\mathbf{x} \in \varOmega \) where this property holds is called the approximate jump set \(J_u\) of u. For any \(\mathbf{x} \in J_u\), we define \(u^+(\mathbf{x}) =a\) and \(u^+(\mathbf{x}) =b\).

Theorem 5

If \(\rho _{\epsilon }\) is a family of mollifiers and if \(\mathbf{x} \in \varOmega \backslash S_u\), then the functions \(u * \rho _{\epsilon }(\mathbf{x})\) converge to \({\tilde{u}}(\mathbf{x})\) as \(\epsilon \downarrow 0\). Moreover, if \(\mathbf{x} \in J_u\), then

$$\begin{aligned} \lim _{\epsilon \downarrow 0} u * \rho _{\epsilon }(\mathbf{x}) \rightarrow \frac{u^+(\mathbf{x})+u^-(\mathbf{x})}{2}. \end{aligned}$$

Proof

See Proposition 3.64 and Corollary 3.80 on pages 160 and 175 respectively of [1]. \(\square \)

März and Bornemann’s high-resolution vanishing viscosity limit is also the solution to a transport equation, differing from our fixed-ratio limit only in that the transport operator, which we denote here by \(\varPi _{M}\), is different (see [7, 31] for details). If \(u_0 \in SBV({\mathcal {U}})\), the approximate discontinuity set \(S_{u_0}\) equals the approximate jump set \(J_{u_0}\) and in this case we have

$$\begin{aligned} \lim _{\epsilon \rightarrow 0} u_{\epsilon }({\mathbf{x}})=u_{M}({\mathbf{x}}) \end{aligned}$$

where

$$\begin{aligned} u_{M} = u_0(\varPi _{M}({\mathbf{x}})) \text{ if } {\mathbf{x}} \in {\mathcal {U}} \backslash J_{u_0} \end{aligned}$$

and

$$\begin{aligned} u_{M} = \alpha u^+_0(\varPi _{M}(\mathbf{x}))+(1-\alpha )u^-_0(\varPi _{M}(\mathbf{x})) \text{ if } \mathbf{x} \in {\mathcal {J}}_{u_0} \end{aligned}$$

for some \(\alpha \in [0,1]\). This is a result that März and Bornemann appear to be unaware of, as their approach is very different from ours.

In our case, however, \({\tilde{g}}_{\sigma (y,h)}\) is discrete and \(BV({\mathcal {U}})\) (or \(SBV({\mathcal {U}})\)), where functions are only defined up to a set of measure zero, does not give us the right setup to prove our result. Moreover, even for piecewise continuous boundary data with simple jump type discontinuities, the behavior of \(u_0(\mathbf{x})\) when \(\varPi _{\theta _r^*}(\mathbf{x})\) is not a continuity point of \(u_0\) is in general rather complex. Therefore, we content ourselves with proving convergence in the case where \(\varPi _{\theta _r^*}(\mathbf{x})\) is a continuity point of \(u_0\).

Theorem 6

Assume \(u_0\) obeys the same conditions as in Theorem 2. Then for every \(\mathbf{x} \in D\) such that \(\varPi _{\theta _r^*}(\mathbf{x})\) is a continuity point of \(u_0\) we have

$$\begin{aligned} u_h(\mathbf{x}) \rightarrow u(\mathbf{x}) = u_0(\varPi _{\theta _r^*}(\mathbf{x})) \qquad \text{ as } h \rightarrow 0. \end{aligned}$$

where \(\varPi _{\theta _r^*}(\mathbf{x})\) is given by

$$\begin{aligned} \varPi _{\theta _r^*}(x,y)=(x-\cot (\theta _r^*)y \mod 1,0), \end{aligned}$$

and

$$\begin{aligned} \theta _r^* = \theta (\mathbf{g}_r^*) \in (0,\pi ) \end{aligned}$$

is the counterclockwise angle between the x-axis and the line \(L_{\mathbf{g}_r^*} := \{ \lambda \mathbf{g}_r^* : \lambda \in \mathbb {R} \}\), and \(\mathbf{g}^*_r\) is the center of mass of the stencil \(a^*_r\) which can be computed in two equivalent ways (Definition 1)

$$\begin{aligned} \mathbf{g}^*_r =\frac{\sum _{\mathbf{y} \in a^*_r} w_r(0,\mathbf{y})\mathbf{y}}{\sum _{\mathbf{y} \in a^*_r} w_r(0,\mathbf{y})}=\frac{\sum _{\mathbf{y} \in \text{ Supp }(a^*_r)} {\tilde{w}}_r(0,\mathbf{y})\mathbf{y}}{\sum _{\mathbf{y} \in \text{ Supp }(a^*_r)} {\tilde{w}}_r(0,\mathbf{y})}. \end{aligned}$$

Proof

Assume that \(\varPi _{\theta _r^*}(\mathbf{x}):=\hat{\mathbf{x}}\) is a continuity point of \(u_0\). Note that

$$\begin{aligned} \Vert u_h(\mathbf{x}) - u_0(\hat{\mathbf{x}})|\le & {} \Vert u_h(\mathbf{x}) - ({\tilde{g}}_{\sigma (h)} * u_0)(\hat{\mathbf{x}})\Vert \\&+ \Vert ({\tilde{g}}_{\sigma (h)} * u_0)(\hat{\mathbf{x}}) - u_0(\hat{\mathbf{x}})\Vert . \end{aligned}$$

By Theorem 2, for h sufficiently small the first term in the above sum is at most \(\frac{\epsilon }{2}\) . It suffices to show that taking h sufficiently small bounds the second term by \(\frac{\epsilon }{2}\) as well. For convenience, within this proof given \(\mathbf{x}:=(x_1,x_2), \mathbf{y}:=(y_1,y_2) \in [0,1) \times \mathbb {R}\) we define \(B_{r,h}(\mathbf{x})\) and \(\mathbf{x}-\mathbf{y}\) in the circular sense, that is

$$\begin{aligned} B_{r,h}(\mathbf{x}):= & {} \{ \mathbf{y} : d^{{\text {circ}}}_1(x_1,y_1)^2+(x_2-y_2)^2 \le r^2 \}\\ \mathbf{x}-\mathbf{y}:= & {} (x_1-y_1 \mod 1,x_2-y_2). \end{aligned}$$

For any \(r > 0\) we have

$$\begin{aligned}&\Vert ({\tilde{g}}_{\sigma (y,h)} * u_0)(\hat{\mathbf{x}}) - u_0(\hat{\mathbf{x}})\Vert \\&\quad \le \left\| \sum _{\mathbf{y} \in {\mathcal {U}}_h \cap B_{r,h}(\hat{\mathbf{x}})} {\tilde{g}}_{\sigma (y,h)}(\hat{\mathbf{x}}-\mathbf{y})u_0(\mathbf{y}) - u_0(\hat{\mathbf{x}})\right\| \\&\qquad + \left\| \sum _{\mathbf{y} \in {\mathcal {U}}_h \backslash B_{r,h}(\hat{\mathbf{x}})} {\tilde{g}}_{\sigma (h)}(\hat{\mathbf{x}}-\mathbf{y})u_0(\mathbf{y}) - u_0(\hat{\mathbf{x}})\right\| \\&\quad \le \sum _{\mathbf{y} \in {\mathcal {U}}_h \cap B_{r,h}(\hat{\mathbf{x}})} {\tilde{g}}_{\sigma (h)}(\hat{\mathbf{x}}-\mathbf{y})\Vert u_0(\mathbf{y}) - u_0(\hat{\mathbf{x}})\Vert \\&\qquad + \sum _{\mathbf{y} \in {\mathcal {U}}_h \backslash B_{r,h}(\hat{\mathbf{x}})} {\tilde{g}}_{\sigma (h)}(\hat{\mathbf{x}}-\mathbf{y})\Vert u_0(\mathbf{y}) - u_0(\hat{\mathbf{x}})\Vert . \end{aligned}$$

Take r small enough that \(\mathbf{y} \in B_{r,h}(\hat{\mathbf{x}})\) implies \(|u_0(\mathbf{y})-u_0(\hat{\mathbf{x}})|<\frac{\epsilon }{4}\). At the same time, since \(\sigma (y,h) \rightarrow 0\) as \(h \rightarrow 0\), taking h sufficiently small ensures that the mass of \({\tilde{g}}_{\sigma (y,h)}(\hat{\mathbf{x}}-\mathbf{y})\) outside of \(B_{r,h}(\hat{\mathbf{x}})\) is no more than \(\frac{\epsilon }{4M}\). This gives

$$\begin{aligned} \Vert ({\tilde{g}}_{\sigma (y,h)} * u_0)(\hat{\mathbf{x}}) - u_0(\hat{\mathbf{x}})\Vert\le & {} \sum _{\mathbf{y} \in {\mathcal {U}}_h \cap B_{r,h}(\hat{\mathbf{x}})} {\tilde{g}}_{\sigma (h)}(\hat{\mathbf{x}}-\mathbf{y})\frac{\epsilon }{4} \\&\quad + \sum _{\mathbf{y} \in {\mathcal {U}}_h \backslash B_{r,h}(\hat{\mathbf{x}})} {\tilde{g}}_{\sigma (h)}(\hat{\mathbf{x}}-\mathbf{y})M \\\le & {} 1 \cdot \frac{\epsilon }{4} +\frac{\epsilon }{4M} \cdot M \le \frac{\epsilon }{2} \end{aligned}$$

as required. \(\square \)

Remark 13

Here we have derived the fixed-ratio continuum limit as a corollary of the asymptotic limit. This was done for the sake of efficiency - it can also be derived as a stand alone result. Moreover, this result is much more general, covering convergence in \(L^p\) rather than just pointwise convergence, and the assumption that the stencil \(a^*_r\) not be restricted to any sublattice of \(\mathbb {Z}^2\) (Definition 3) can be eliminated. Convergence rates are also given, which turn out to depend on both the choice of \(L^p\) norm and the regularity of the boundary data \(u_0\). However, the proof is very long and technical, so for this reason we have published it online only—please see [26].

7 Consequences

In this section apply the fixed-ratio continuum limit and asymptotic limit of the previous section, that is Theorems 2 and 6 , in order to explain:

  1. 1.

    The “kinking” and “blur” artifacts listed in Sect. 2.2 and illustrated in Figs. 56, and 8 .

  2. 2.

    The differences between the direct form of Algorithm 1 and its semi-implicit extension in terms of coping with these artifacts.

We begin by considering kinking artifacts, and then go on to blur. We will see that the fixed-ratio continuum limit from Theorem 6 is all that is required to understand kinking artifacts, while blur artifacts require the asymptotic limit from Theorem 2. Shocks and cut off isophotes are not considered as they have already been adequately explained elsewhere [31, 32].

7.1 Kinking

We begin by exploring kinking artifacts. First we prove a fundamental distinction between the direct form of Algorithm 1 and the semi-implicit extension. Then we go on to look at the limiting transport directions associated with three specific methods: coherence transport, Guidefill, and semi-implicit Guidefill.

7.1.1 The Direct form Algorithm 1 Kinks, the Semi-Implicit Extension Need Not

An immediate corollary of Theorem 6 is that the direct form of Algorithm 1 is limited in terms of what directions it can extrapolate along, while this is not true of the semi-implicit extension. This follows from the geometric interpretation of the limiting transport direction \(\mathbf{g}^*_r\) as the center of mass of the stencil \(a^*_r\) with the respect to the stencil weights \(\{ w_r(\mathbf{0},\mathbf{y})/W : \mathbf{y} \in a^*_r \}\). Specifically, since the original weights \(\{ \frac{w_r(\mathbf{0},\mathbf{y})}{W} \}_{\mathbf{y} \in a^*_r}\) are non-negative and sum to one, by the preservation of total mass (10) and inheritance of non-negativity (12) properties of equivalent weights, the same is true of the equivalent weights \(\{ \frac{{\tilde{w}}_r(\mathbf{0},\mathbf{y})}{W} \}_{\mathbf{y} \in \text{ Supp }(a^*_r)}\). Hence

$$\begin{aligned} \mathbf{g}^*_r = \sum _{\mathbf{y} \in \text{ Supp }(a^*_r)} \frac{{\tilde{w}}_r(\mathbf{0},\mathbf{y})}{W} \mathbf{y} \in \text{ Conv }(\text{ Supp }(a^*_r)), \end{aligned}$$

where \(\text{ Conv }(\text{ Supp }(a^*_r))\) denotes the convex hull of \(\text{ Supp }(a^*_r)\). But by (26) we have

$$\begin{aligned} \text{ Supp }(a^*_r) \subseteq {\left\{ \begin{array}{ll} {\bar{b}}^-_r &{} \text{ direct } \text{ form } \text{ of } \text{ Algorithm } \text{1 } \\ {\bar{b}}^0_r &{} \text{ semi-implicit } \text{ form } \end{array}\right. } \end{aligned}$$
(56)

where \({\bar{b}}^0_r\) and \({\bar{b}}^-_r\) are the dilated sets defined by (24) and (25) respectively. It follows that \(\mathbf{g}^*_r\) has to lie in the convex hull of \({\bar{b}}^-_r\) if the direct form of Algorithm 1 is used, or the convex hull of \({\bar{b}}^0_r\) if the semi-implicit form is used instead. That is

$$\begin{aligned} \mathbf{g}^*_r \in {\left\{ \begin{array}{ll} \text{ Conv }({\bar{b}}^-_r) &{} \text{ direct } \text{ form } \text{ of } \text{ Algorithm } \text{1. } \\ \text{ Conv }({\bar{b}}^0_r) &{} \text{ semi-implicit } \text{ form. } \end{array}\right. } \end{aligned}$$
(57)

This in turn immediately implies that if we use the direct form of Algorithm 1, then \(\theta (\mathbf{g}^*_r)\) is restricted to the cone

$$\begin{aligned} \theta _c \le \theta (\mathbf{g}^*_r) \le \pi - \theta _c \end{aligned}$$
(58)

where

$$\begin{aligned} \theta _c = \arcsin \left( \frac{1}{r}\right) \end{aligned}$$
(59)

is the smallest possible angle one can make given the restriction (57), while for the semi-implict method we have

$$\begin{aligned} 0< \theta (\mathbf{g}^*_r) < \pi \end{aligned}$$
(60)

where the angles \(\theta = 0\) and \(\theta = \pi \) are omitted not because of the restriction (57), but because Theorem 6 does not apply in either case (indeed, the continuum limit u given by (30) is not even defined). The cone (58) is exactly what we saw in Fig. 5c. At the same time, the lack of restrictions implied by (60) is consistent with our experience in Figs. 12 and 14, where semi-implicit Guidefill was able to successfully extrapolate lines making an angle as small as \(1^{\circ }\) with the inpainting domain boundary for \(r=3\), well under the critical angle \(\theta _c= \arcsin (\frac{1}{3}) \approx 19.5^{\circ }\) where standard Guidefill breaks down. Figure 16 illustrates this result.

Fig. 16
figure 16

Convex hulls and transport directions: An immediate corollary of Theorem 6 is that the limiting transport direction \(\mathbf{g}^*_r\) lies in the convex hull of \({\bar{b}}^-_r\) for the direct form of Algorithm 1, and the convex hull of \({\bar{b}}^0_r\) for the semi-implicit extension. Since \(\text{ Conv }({\bar{b}}^-_r)\) contains only a cone of directions while \(\text{ Conv }({\bar{b}}^0_r)\) contains the full arc from 0 to \(\pi \), this explains why Guidefill (and more generally all direct methods of the general form given by Algorithm 1) fail for shallow angles, while this is not true of the semi-implicit extension. To illustrate this, we have repeated the experiment from Fig. 5 using Guidefill (a) and semi-implicit Guidefill (b), while superimposing the sets \(\text{ Conv }(-{\bar{b}}^-_r)\) and \(\text{ Conv }(-{\bar{b}}^0_r)\) respectively (we have negated the sets for convenience which we can do since \(\mathbf{g}^*_r\) and \(-\mathbf{g}^*_r\) define the same transport equation). Note that in the case of semi-implicit Guidefill, the lines appear to be getting fainter as \(\theta (\mathbf{g}^*_r) \rightarrow 0\) and \(\theta (\mathbf{g}^*_r) \rightarrow \pi \). This likely has two causes. For one, the angular footprint of the red dot in Fig. 5a, i.e. the width of the dot multiplied by \(\sin \theta ^*_r\) (which tells you how much of the dot is “visible” from direction \(\theta ^*_r\)) is going to zero. Secondly, we will see in Sect. 7.3 (see in particular Fig. 23) that semi-implicit Guidefill has blur artifacts that become arbitrarily bad as \(\theta (\mathbf{g}^*_r) \rightarrow 0\) or \(\theta (\mathbf{g}^*_r) \rightarrow \pi \). Nevertheless, Fig. 12 demonstrates that semi-implict Guidefill can successfully extrapolate lines with \(\theta \) very close to 0 without serious issues of faintness. All parameters are the same as in Fig. 5

7.1.2 Limiting Transport Directions for Coherence Transport, Guidefill, and Semi-Implicit Guidefill

Fig. 17
figure 17

Limiting transport direction \(\theta ^*_r=\theta (\mathbf{g}^*_r)\) as a function of the guideance direction \(\theta = \theta (\mathbf{g})\) for the three main methods: The theoretical limiting curves \(\theta ^*_r =F(\theta )\) obtained for coherence transport (a), (b), Guidefill (c), (d), and semi-implicit Guidefill (e),(f). The desired curve \(F(\theta )=\theta \) is highlighted in red. Coherence transport exhibits a staircase pattern with \(F(\theta ) \ne \theta \) for all but finitely many \(\theta \), Guidefill obeys \(F(\theta )=\theta \) for all \(\theta \in (\theta _c, \pi - \theta _c)\) where \(\theta _c\) is the critical angle (59), and semi-implicit Guidefill obeys \(F(\theta ) = \theta \) for all \(\theta \ne 0\). All angles are in degrees

Here we derive formulas for the limiting transport directions of coherence transport, Guidefill, and semi-implicit Guidefill in the limit as \(\mu \rightarrow \infty \). For convenience, in this section we rescale the limiting transport direction \(\mathbf{g}^*_r\) by a factor of \(W = \sum _{\mathbf{y} \in a^*_r} w_r(\mathbf{0},\mathbf{y})\), giving

$$\begin{aligned} \mathbf{g}^*_r = \sum _{\mathbf{y} \in a^*_r} w_r(\mathbf{0},\mathbf{y})\mathbf{y}. \end{aligned}$$
(61)

This is more convenient to work with and defines the same transport equation. In fact, the ability to rescale \(\mathbf{g}^*_r\) by any \(\lambda \ne 0\) without changing the underlying transport equation is a tool that we will use repeatedly in our arguments throughout this section. To make the dependence of the transport direction on \(\mu \) explicit, within this section we write this direction as \(\mathbf{g}^*_{r,\mu }\), and define

$$\begin{aligned} \mathbf{g}^*_r := \lim _{\mu \rightarrow \infty } \mathbf{g}^*_{r,\mu }. \end{aligned}$$

In order to resolve the ambiguity that \(\mathbf{g}^*_r\) and \(-\mathbf{g}^*_r\) define the same transport equation, we will frequently be taking angles modulo \(\pi \). This is reflected in the definition we have selected for \(\theta (\mathbf{v})\)—recall that this refers to the angle that the line \(L_{\mathbf{v}}=\{ \lambda \mathbf{v} : \lambda \in \mathbb {R} \}\) makes with the x-axis, and hence always lies in the range \([0,\pi )\). We define \(\mathbf{g}=(\cos \theta ,\sin \theta )\), \(\theta ^*_{r,\mu } :=\theta (\mathbf{g}^*_{r,\mu })\), \(\theta ^*_r :=\theta (\mathbf{g}^*_r)\) and consider the function \(\theta ^*_r = F(\theta )\). Our main concern is to determine when \(\theta ^*_r = \theta \) (no kinking) and when \(\theta ^*_r \ne \theta \) (kinking). Results are illustrated in Fig. 17.

Coherence transport We have already stated in Sect. 2 and illustrated in Fig. 5 that coherence transport in the limit \(\mu \rightarrow \infty \) kinks unless \(\mathbf{g}=\lambda \mathbf{v}\) for some \(\mathbf{v} \in B_{\epsilon ,h}(\mathbf{0})\) and some \(\lambda \in \mathbb {R}\). Under the assumptions of Sect. 6.1, a more precise statement is that coherence transport in the limit \(\mu \rightarrow \infty \) fails to kink if and only if \(\mathbf{g}=\lambda \mathbf{v}\) for some \(\mathbf{v} \in b^-_r\) and \(\lambda \in \mathbb {R}\). Before we prove this, let us make some definitions. We define the angular spectrum of \(b^-_r\) as

$$\begin{aligned} \varTheta (b^-_r) := \{ \theta _1,\ldots ,\theta _n \in [0,\pi ) : \exists {} \mathbf{y} \in b^-_r \text{ s.t. } \theta _i = \theta (\mathbf{y}) \}. \end{aligned}$$
(62)

In other words, \(\varTheta (b^-_r)\) is the collection of angles modulo \(\pi \) that are representable using members of \(b^-_r\) (or which elements of the projective space \(\mathbb {RP}^1\) are representable, to be more mathematically precise), see Fig. 18 for an illustration. The angular spectrum may be similarly defined in the obvious way for more general sets, and we do so in “Appendix E”. We will show that for coherence transport, when \(0< \theta < \pi \), we either have

$$\begin{aligned} \theta ^*_r \in \varTheta (b^-_r) \end{aligned}$$

or

$$\begin{aligned} \theta ^*_r = \frac{\theta _i+\theta _{i+1}}{2} \quad \text{ for } \text{ consecutive } \theta _i, \theta _{i+1} \in \varTheta (b^-_r). \end{aligned}$$

To begin, note that in this case (61) becomes

$$\begin{aligned} {\mathbf{g}}^*_{r,\mu } := \sum _{{\mathbf{y}} \in b^-_r} e^{-\frac{\mu ^2}{2r^2}({\mathbf{y}} \cdot {\mathbf{g}}^{\perp })^2}\frac{\mathbf{y}}{\Vert {\mathbf{y}}\Vert }, \end{aligned}$$

where \(b^-_r\) is the discrete half ball (23). Denote by \(\varPsi \) the set of minimizers of \(|\mathbf{y} \cdot \mathbf{g}^{\perp }|\) for \(\mathbf{y} \in b^-_r\), meaning that \(|\mathbf{y} \cdot \mathbf{g}^{\perp }|:=\varDelta \ge 0\) for all \(\mathbf{y} \in \varPsi \) and \(|\mathbf{y} \cdot \mathbf{g}^{\perp }|>\varDelta \) for all \(\mathbf{y} \in b^-_r \backslash \varPsi \). After rescaling by \(e^{\frac{\mu ^2}{2r^2}\varDelta ^2}\), the transport direction \(\mathbf{g}^*_{r,\mu }\) becomes

$$\begin{aligned} {\mathbf{g}}^*_{r,\mu }= & {} \sum _{{\mathbf{y}} \in \varPsi } \frac{\mathbf{y}}{\Vert {\mathbf{y}}\Vert }+\sum _{{\mathbf{j}} \in b^-_r \backslash \varPsi } e^{-\frac{\mu ^2}{2r^2}\left\{ ({\mathbf{y}} \cdot {\mathbf{g}}^{\perp })^2 - \varDelta ^2 \right\} }\frac{\mathbf{y}}{\Vert {\mathbf{y}}\Vert } \nonumber \\\rightarrow & {} \sum _{{\mathbf{y}} \in \varPsi } \frac{\mathbf{y}}{\Vert {\mathbf{y}}\Vert } \qquad \text{ as } \mu \rightarrow \infty . \end{aligned}$$
(63)

Note that \(|\mathbf{y} \cdot \mathbf{g}^{\perp }|\) represents the distance from the point \(\mathbf{y}\) to the line through the origin \(L_\mathbf{g}\). Thus computing the set \(\varPsi \) is equivalent to finding the set of points in \(b^-_r\) closest to a given line through the origin. In “Appendix E” we prove that as \(\theta \) sweeps out an arc from 0 to \(\pi \), for all but finitely many \(\theta \) the set \(\varPsi \) is a singleton, containing a sequence of lone minimizers that we enumerate (in order of occurrence) as \(\mathbf{y}_1, \mathbf{y}_2,\ldots \mathbf{y}_{n'}\) (for some finite \(n'\)). Now, it turns out that \(n'=n\) and moreover for every \(\theta _i \in \varTheta (b^-_r)\) we have

$$\begin{aligned} \theta _i = \theta (\mathbf{y}_i) \end{aligned}$$
Fig. 18
figure 18

Illustration of the angular spectrum: the angular spectrum \(\varTheta (b^-_r)\) tells us which angles (modulo \(\pi \)), are representable using elements of \(b^-_r\). Here we have illustrated \(\varTheta (b^-_r)\), measured in degrees, for \(r=1,2,\ldots ,10\)

(“Appendix E”, Proposition 4). In other words, we have a 1-1 correspondence between (singleton) minimizers of \(|\mathbf{y} \cdot \mathbf{g}^{\perp }|\) and the angular spectrum \(\varTheta (b^-_r)\). Moreover, it can be shown that if \(\theta _i< \theta < \theta _{i+1}\) for some \(\theta _i, \theta _{i+1} \in \varTheta (b^-_r)\), then either \(\varPsi = \{ \mathbf{y}_i \}\) (for \(\theta \) close to \(\theta _i\)) or \(\varPsi = \{ \mathbf{y}_{i+1} \}\) (for \(\theta \) close to \(\theta _{i+1}\)) or \(\varPsi = \{\mathbf{y}_i, \mathbf{y}_{i+1} \}\) if \(\theta =\theta _{i,i+1}\), where \(\theta _{i,i+1} \in (\theta _i,\theta _{i+1})\) is a critical angle we refer to as a transition angle.

Definition 7

Given minimizers \(\{ \mathbf{y}_i \}_{i=1}^n\) defined as above we define the set of transition angles \(\{ \theta _{i,i+1} \}_{i=1}^{n-1}\) by

$$\begin{aligned} \theta _{i,i+1} = \theta (\mathbf{y}_i + \mathbf{y}_{i+1}) \quad \text{ for } 1 \le i \le n-1, \end{aligned}$$

where \(\theta (\mathbf{y})\) denotes as usual the angle between the line \(L_\mathbf{y}\) and the x-axis.

For convenience, we also define \(\theta _{0,1}=0\) and \(\theta _{n,n+1}=\pi \). Since one can also prove \(\varPsi = \{ \mathbf{y}_1 \}\) for \(0:=\theta _{0,1}<\theta <\theta _1\) and \(\varPsi = \{\mathbf{y}_n\}\) for \(\theta _n< \theta < \theta _{n,n+1}:=\pi \), with this notation we have the general result

$$\begin{aligned} \varPsi ={\left\{ \begin{array}{ll} \{\mathbf{y}_i\} \text{ if } \theta _i< \theta< \theta _{i,i+1} \\ \text{ for } \text{ some } i=1,\ldots ,n \\ \{\mathbf{y}_i,\mathbf{y}_{i+1}\} \text{ if } \theta =\theta _{i,i+1} \\ \text{ for } \text{ some } i=1,\ldots ,n-1 \\ \{\mathbf{y}_{i+1}\} \text{ if } \theta _{i,i+1}< \theta < \theta _{i+1} \\ \text{ for } \text{ some } i=0,\ldots ,n-1 \\ \end{array}\right. } \end{aligned}$$
Fig. 19
figure 19

Closest points and shallowest angles: we claim that the closest point in the set \(b^-_r \subset \mathbb {Z}^2\) to a given line \(L_{\mathbf{g}}\) (\(\mathbf{g}=(\cos \theta ,\sin \theta )\)) is always one of the two points casting the shallowest angle with \(L_\mathbf{g}\) on either side, as illustrated in a for \(\theta = 53^{\circ }\), \(r=4\). This statement does not hold if \(b^-_r\) is replaced with a generic \(A \subset \mathbb {Z}^2\), as demonstrated by the counterexample \(A=\{(1,2),(4,6),(6,4)\}\) and \(\theta = 45^{\circ }\) in b

(“Appendix E”, Proposition 5—note that we have carefully excluded \(\theta _{0,1}:=0\) and \(\theta _{n,n+1}=\pi \) from the middle case). In words, this means that the element(s) of \(b^-_r\) closest to the line \(L_\mathbf{g}\) are also the member(s) of \(b^-_r\) that cast the shallowest angles with \(L_\mathbf{g}\) from above and below. This statement is not true if \(b^-_r\) is replaced with a generic subset of \(\mathbb {Z}^2\)—see Fig. 19. The remaining cases are \(\theta =\theta _i \in \varTheta (b^-_r)\) and \(\theta =0\). We deal with the former first—in the first case we have

$$\begin{aligned} \varPsi = \{ \mathbf{y} \in b^-_r : \theta (y) = \theta _i \}, \end{aligned}$$

a set containing up to r members, all of which are parallel to each other and to \(\mathbf{g}\). In order to make these ideas more concrete, Example 1 gives explicit expressions for \(\varTheta (b^-_r)\) and \(\varPsi \) in the case \(r=3\). Also, in “Appendix E” Remark 20, we give an algorithm for computing \(\varTheta (b^-_r)\) and \(Y(b^-_r):=\{\mathbf{y}_1,\mathbf{y}_2,\ldots ,\mathbf{y}_n\}\) for any r.

Example 1

When \(r=3\) we have

$$\begin{aligned} \varTheta (b^-_3):= & {} \{\theta _1,\theta _2,\theta _3,\theta _4,\theta _5,\theta _6,\theta _7\}\\= & {} \{ \arctan (1/2), \frac{\pi }{4}, \arctan (2), \frac{\pi }{2},\\&\frac{\pi }{2}+\arctan (1/2),\frac{3\pi }{4},\frac{\pi }{2}+\arctan (2)\}. \end{aligned}$$

For \(0 < \theta \le \frac{\pi }{2}\), (we omit \(\frac{\pi }{2}< \theta < \pi \) for brevity) the set of minimizers \(\varPsi \) is given by

$$\begin{aligned} \varPsi = {\left\{ \begin{array}{ll} \{(-2,-1)\}:=\{\mathbf{y}_1\} \\ \text{ if } 0< \theta< \theta _{1,2}. \\ \{(-2,-1),(-1,-1)\}:=\{\mathbf{y}_1,\mathbf{y}_2\} \\ \text{ if } \theta =\theta _{1,2}. \\ \{(-1,-1)\}:=\{\mathbf{y}_2\} \\ \text{ if } \theta _{1,2}< \theta< \theta _2. \\ \{(-1,-1),(-2,-2)\}:=\{\mathbf{y}_2,2\mathbf{y}_2\} \\ \text{ if } \theta = \theta _2. \\ \{(-1,-1)\}:=\{\mathbf{y}_2\} \\ \text{ if } \theta _2< \theta< \theta _{2,3}. \\ \{(-1,-1),(-1,-2)\}:=\{\mathbf{y}_2,\mathbf{y}_3\} \\ \text{ if } \theta = \theta _{2,3}. \\ \{(-1,-2)\}=\{ \mathbf{y}_3\} \\ \text{ if } \theta _{2,3}< \theta< \theta _{3,4}. \\ \{(-1,-2),(0,-1)\}=\{\mathbf{y}_3,\mathbf{y}_4 \} \\ \text{ if } \theta =\theta _{3,4}. \\ \{(0,-1)\}:=\{\mathbf{y}_4\} \\ \text{ if } \theta _{3,4}< \theta < \theta _4. \\ \{(0,-1),(0,-2),(0,-3)\}:=\{\mathbf{y}_4,2\mathbf{y}_4,3\mathbf{y}_4\} \\ \text{ if } \theta = \theta _4. \end{array}\right. } \end{aligned}$$

where \(\theta _{0,1}=0\), \(\theta _{1,2}=\arctan (2/3)\), \(\theta _{2,3}=\arctan (3/2)\), \(\theta _{3,4}=\arctan (3)\).

When \(\varPsi \) is a singleton set, that is \(\varPsi = \{ \mathbf{y}_i \}\) for some \(1 \le i \le n\), (63) becomes \(\mathbf{g}^*_r = \frac{\mathbf{y}_i}{\Vert \mathbf{y}_i\Vert }\) and we have

$$\begin{aligned} \theta (\mathbf{g}^*_r) = \theta \left( \frac{\mathbf{y}_i}{\Vert \mathbf{y}_i\Vert }\right) = \theta _i \end{aligned}$$

On the other hand, if \(\theta = \theta _i \in \varTheta (b^-_r)\), \(\mathbf{g}^*_r\) is a sum of vectors all parallel to one another and to \(\mathbf{g}\), and we get \(\theta ^*_i = \theta _i\) again. This is the lone case in which coherence transport doesn’t kink. Next, at the transition angles \(\theta _{i,i+1}\) where \(\varPsi = \{ \mathbf{y}_i, \mathbf{y}_{i+1} \}\), we have \(\mathbf{g}^*_r = \frac{\mathbf{y}_i}{\Vert \mathbf{y}_i\Vert }+\frac{\mathbf{y}_{i+1}}{\Vert \mathbf{y}_{i+1}\Vert }\), so that

$$\begin{aligned} \theta (\mathbf{g}^*_r) = \theta \left( \frac{\mathbf{y}_i}{\Vert \mathbf{y}_i\Vert }+\frac{\mathbf{y}_{i+1}}{\Vert \mathbf{y}_{i+1}\Vert }\right) = \frac{\theta _i+\theta _{i+1}}{2}, \end{aligned}$$

where we have used the observation (proved in “Appendix E”, Observation 7) that \(\theta (\mathbf{v}+\mathbf{w}) = \frac{\theta (\mathbf{v}) + \theta (\mathbf{w})}{2}\) holds for all unit vectors \(\mathbf{v}, \mathbf{w} \in S^1\). Finally, suppose \(\theta = 0\). Here we have \(\varPsi = \{ (i,-1) : -r+1 \le i \le r-1 \}\), giving

$$\begin{aligned} \mathbf{g}^*_r = e_2 \qquad \text{ for } \theta = 0 \end{aligned}$$

after rescaling. In summary, for \(\theta \in [0,\pi )\) we then have

$$\begin{aligned} \theta ^*_r = {\left\{ \begin{array}{ll} \frac{\pi }{2} &{} \text{ if } \theta = 0 \\ \theta _i &{} \text{ if } \theta _{i-1,i}< \theta < \theta _{i,i+1} \text{ for } \text{ some } i = 1,\ldots n \\ \frac{\theta _i+\theta _{i+1}}{2} &{} \text{ if } \theta = \theta _{i,i+1} \text{ for } \text{ some } i = 1,\ldots n \end{array}\right. } \end{aligned}$$
(64)

See Fig. 17a, b for an illustration of (64) for \(r=3\) and \(r=5\). In “Appendix E”, Corollary 2 we prove a generalization of this result that applies if, for example, the discrete ball used by coherence transport is replaced with a discrete square.

Guidefill. In this case (61) becomes

$$\begin{aligned} {\mathbf{g}}^*_{r,\mu } := \sum _{{\mathbf{y}} \in {\tilde{b}}^-_r} e^{-\frac{\mu ^2}{2r^2}({\mathbf{y}} \cdot {\mathbf{g}}^{\perp })^2}\frac{\mathbf{y}}{\Vert {\mathbf{y}}\Vert }, \end{aligned}$$

where \({\tilde{b}}^-_r\) is given by (23). It is useful to patition \({\tilde{b}}^-_r\) into a disjoint union of sets \(\ell ^-_k\) such at each \(\ell ^-_k\) is the collection of points in \({\tilde{b}}^-_r\) distance k from the line \(L_\mathbf{g} = \{ \lambda \mathbf{g} : \lambda \in \mathbb {R}\}\), that is

$$\begin{aligned} \ell ^-_k =\{ n\mathbf{g}+m\mathbf{g}^{\perp } \in {\tilde{b}}^-_r : m = \pm k\}. \end{aligned}$$

Since the weights (6) exponentially decay with distance from \(L_\mathbf{g}\), then so long as \(\ell ^-_{0}\) is non-empty, we expect the contribution of the other \(\ell ^-_k\) to vanish. For Guidefill, \(\ell ^-_0\) can be explicitly parametrized as

$$\begin{aligned} \ell ^-_0:= & {} \{ n\mathbf{g} \in {\tilde{b}}^-_r : n\mathbf{g} \cdot e_2 \le -1\} \\= & {} \{ n\mathbf{g} : n=-r,\ldots ,-\lceil \csc \theta \rceil \}. \end{aligned}$$

For \(\ell ^-_0\) to be non-empty, we need \(\lceil \csc \theta \rceil \le r\), which occurs only if \(\theta _c \le \theta \le \pi -\theta _c\), where \(\theta _c\) is the same critical angle (59) from Sect. 7.1.1. If \(\ell ^-_0 \ne \emptyset \), we have

$$\begin{aligned} {\mathbf{g}}^*_{r,\mu }= & {} \sum _{{\mathbf{y}} \in \ell ^-_0} \frac{\mathbf{y}}{\Vert {\mathbf{y}}\Vert }+\sum _{k=1}^r \sum _{{\mathbf{y}} \in \ell ^-_k} e^{-\frac{\mu ^2}{2r^2}k^2 \Vert {\mathbf{g}}\Vert ^2 }\frac{\mathbf{y}}{\Vert {\mathbf{y}}\Vert } \\\rightarrow & {} \sum _{{\mathbf{y}} \in \ell ^-_0} \frac{\mathbf{y}}{\Vert {\mathbf{y}}\Vert } \qquad \text{ as } \mu \rightarrow \infty . \\= & {} \sum _{n=-r}^{-\lceil \csc \theta \rceil } n {\mathbf{g}} \\= & {} {\mathbf{g}} \qquad \text{ after } \text{ rescaling. } \end{aligned}$$

Hence we transport in the correct direction in this case.

One the other hand, if \(\ell ^-_0 = \emptyset \) but \(\theta \ne 0\), then the weights (6) concentrate all their mass into \(\ell ^-_1\), with all other \(\ell ^-_k\) vanishing. Unfortunately, unlike \(\ell ^0\), the set \(\ell ^-_1\) is not parallel to \(\mathbf{g}\) so we expect kinking in this case. In general \(\ell ^-_1\) can consist of two parallel components on either side of \(\ell ^-_0\). But in this case, since \(\ell ^-_0\) lies entirely above the line \(y=-1\), we know \(\ell ^-_1\) consists of just one component and we can write it explicitly as

$$\begin{aligned} \ell ^-_1 =\{ n\mathbf{g}-\text{ sgn }(\cos \theta )\mathbf{g}^{\perp } : n=-r+1,\ldots ,-1 \} \end{aligned}$$

(remember that \(\mathbf{g}^{\perp }\) denotes the counterclockwise rotation of \(\mathbf{g}\) by \(90^{\circ }\)). After rescaling by \(e^{\frac{\mu ^2}{2r^2}\Vert \mathbf{g}\Vert ^2}\) we obtain

$$\begin{aligned} {\mathbf{g}}^*_{r,\mu }= & {} \sum _{{\mathbf{y}} \in \ell ^-_1} \frac{\mathbf{y}}{\Vert {\mathbf{y}}\Vert }+\sum _{k=2}^r \sum _{{\mathbf{y}} \in \ell ^-_k} e^{-\frac{\mu ^2}{2r^2}(k^2-1) \Vert {\mathbf{g}}\Vert ^2 }\frac{\mathbf{y}}{\Vert {\mathbf{y}}\Vert } \\\rightarrow & {} \sum _{{\mathbf{y}} \in \ell ^-_1} \frac{\mathbf{y}}{\Vert {\mathbf{y}}\Vert } \qquad \text{ as } \mu \rightarrow \infty . \\= & {} \left( \sum _{n=-r+1}^{-1} \frac{n}{\sqrt{1+n^2}} \right) \mathbf{g}\\&\quad - \text{ sgn }(\cos \theta ) \left( \sum _{n=-r+1}^{-1} \frac{1}{\sqrt{1+n^2}} \right) {\mathbf{g}}^{\perp } \\= & {} {\mathbf{g}}+\text{ sgn } (\cos \theta )\alpha _r {\mathbf{g}}^{\perp } \qquad \text{ after } \text{ rescaling. } \end{aligned}$$

where

$$\begin{aligned} \alpha _r = \frac{\sum _{n=1}^{r-1} \frac{1}{\sqrt{1+n^2}}}{\sum _{n=1}^{r-1} \frac{n}{\sqrt{1+n^2}}}. \end{aligned}$$

Finally, if \(\theta =0\) we have \({\tilde{b}}^-_r = b^-_r\) and we obtain \(\mathbf{g}^*_r = e_2\) as for coherence transport. Defining \(\varDelta \theta _r = \arctan (\alpha _r)\), for \(\theta \in [0,\pi )\) we obtain

$$\begin{aligned} \theta ^*_r = {\left\{ \begin{array}{ll} \frac{\pi }{2} &{} \text{ if } \theta = 0 \\ \theta +\varDelta \theta _r &{} \text{ if } 0< \theta< \theta _c \\ \theta &{} \text{ if } \theta _c \le \theta \le \pi -\theta _c \\ \theta -\varDelta \theta _r &{} \text{ if } \pi - \theta _c< \theta < \pi . \end{array}\right. } \end{aligned}$$
(65)

In other words, aside from exceptional case \(\theta =0\), we have \(\theta _r^*=\theta \) for the “well behaved” range of values \(\theta _c \le \theta \le \pi -\theta _c\), but \(\theta _r^*\) jumps by a constant angle \(\varDelta \theta _r\) near \(\theta = 0\) and \(\theta =\pi \). The first few values of \(\varDelta \theta _r\) are \(\varDelta \theta _3 \approx 35.8^{\circ }\), \(\varDelta \theta _4 \approx 30.0^{\circ }\), \(\varDelta \theta _5 \approx 25.9^{\circ }\). See Fig. 17c, d for an illustration of (65) for \(r=3\) and \(r=5\).

Semi-implicit Guidefill. The analysis of semi-implicit Guidefill is the same as for Guidefill except that the set \({\tilde{b}}^-_r\) is replaced by \({\tilde{b}}^0_r\). Defining \(\ell ^0_k\) as the collection of points in \({\tilde{b}}^0_r\) distance k from the line \(L_\mathbf{g} = \{ \lambda \mathbf{g} : \lambda \in \mathbb {R}\}\) as before, we find that in this case

$$\begin{aligned} \ell ^0_0 := \{ n\mathbf{g} \in {\tilde{b}}_r : n \mathbf{g} \cdot e_2 \le 0\} \end{aligned}$$

is never empty. In fact, for \(0 \le \theta < \pi \) we have

$$\begin{aligned} \ell ^0_0 = {\left\{ \begin{array}{ll} \{ n\mathbf{g} : -r \le n \le -1\} &{} \text{ if } 0<\theta <\pi \\ \{ n\mathbf{g} : -r \le n \le r, n \ne 0\} &{} \text{ if } \theta = 0. \end{array}\right. } \end{aligned}$$

If \(\theta > 0\), we proceed as before and find

$$\begin{aligned} {\mathbf{g}}^*_{r,\mu }= & {} \sum _{{\mathbf{y}} \in \ell ^0_0} \frac{\mathbf{y}}{\Vert {\mathbf{y}}\Vert }+\sum _{k=1}^r \sum _{{\mathbf{y}} \in \ell ^0_k} e^{-\frac{\mu ^2}{2r^2}k^2 \Vert {\mathbf{g}}\Vert ^2 }\frac{\mathbf{y}}{\Vert {\mathbf{y}}\Vert } \\\rightarrow & {} \sum _{{\mathbf{y}} \in \ell ^0_0} \frac{\mathbf{y}}{\Vert {\mathbf{y}}\Vert } \qquad \text{ as } \mu \rightarrow \infty . \\= & {} \sum _{n=-r}^{-1} n {\mathbf{g}} \\= & {} {\mathbf{g}} \qquad \text{ after } \text{ rescaling. } \end{aligned}$$

However, if \(\theta =0\), this argument doesn’t work because the elements of \(\ell ^0_0\) all cancel each other out. In fact, in this case we have

Hence, in this case \(\mathbf{g}^*_r = e_2\) yet again, just like for Guidefill and coherence transport. In general, for \(0 \le \theta < \pi \), we have

$$\begin{aligned} \theta ^*_r = {\left\{ \begin{array}{ll} \frac{\pi }{2} &{} \text{ if } \theta = 0 \\ \theta &{} \text{ if } 0< \theta < \pi \end{array}\right. } \end{aligned}$$
(66)

In other words, semi-implicit Guidefill kinks only if \(\mathbf{g}\) is exactly parallel to boundary of the inpainting domain. See Fig. 17e, f for an illustration of (66) for \(r=3\) and \(r=5\) (the curves are of course the same, since (66) is independent of r).

Remark 14

In contrast to Guidefill and coherence transport, (66) tells us that for semi-implicit Guidefill in the limit \(\mu \rightarrow \infty \) we have \(\theta ^*_r = \theta \) for all \(\theta \in (0,\pi )\), independent of r. This is in fact exactly the same prediction (albeit under stronger simplifying assumptions) that März and Bornemann obtained for coherence transport in their own continuum limit \(u_{M}\) as \(\mu \rightarrow \infty \) [7, p. 14]. We have in some sense come full circle—the original predictions of [7] for coherence transport under their high resolution and vanishing viscosity limit are the same as ours for semi-implicit Guidefill under our fixed ratio limit.

7.2 Numerical Validation of Limiting Transport Directions for Coherence Transport, Guidefill, and Semi-Implicit Guidefill

Fig. 20
figure 20

Stretching a dot into a line: In a we have an inpainting problem consisting of a red dot on a blue background, with the inpainting domain in yellow. In b we see the result of inpainting using coherence transport with \(\mu =40\), \(r=3\), and \(\mathbf{g}=(\cos 45^{\circ },\sin 45^{\circ })\). The dot is now stretched into a line, the orientation of which may be measured to deduce \(\mathbf{g}^*_r\)

Fig. 21
figure 21

Validation of limiting transport directions for coherence transport, Guidefill, and semi-implicit Guidefill: here we compare the limiting transport directions \(\theta ^*_r=\theta (\mathbf{g}^*_r)\) as a function of \(\theta = \theta (\mathbf{g})\) derived in Sect. 7.1.2 for coherence transport, Guidefill, and semi-implicit Guidefill as \(\mu \rightarrow \infty \) with the orientation of extrapolated isophotes in the inpainting problem shown in Fig. 20a (where \(\mu = 40\)). The theoretical curves are shown in blue, while the measured curves are shown in red. In every case we have \(r=3\)

In this experiment we compare the limiting transport directions derived in Sect. 7.1.2 for coherence transport, Guidefill, and semi-implicit Guidefill as \(\mu \rightarrow \infty \) with the orientation of extrapolated isophotes in an actual inpainted image \(u_h\) obtained in practice with finite \(\mu \). In each case we choose as our boundary data the image shown in Fig. 20a, consisting of a red dot on a blue background, with the inpainting domain shown in yellow. We run each algorithm with

$$\begin{aligned} \mathbf{g}=(\cos (k^{\circ }),\sin (k^{\circ })) \end{aligned}$$

for \(k=0,1,\ldots ,90\), with \(\mu =40\) fixed and for various values of r. The dot is then stretched into a line as in Fig. 20b, the orientation of which gives \(\mathbf{g}^*_{\mu ,r}\) and which can be measured numerically.

Results are shown in Fig. 21 for \(r=3\) (similar results may be found for different values of r, but these are omitted for brevity). The theoretical curves are shown in blue, while the measured curves are shown in red. While we see some smoothing out of the jump discontinuities in the case of coherence transport, this is expected as one can easily show that the convergence to \(\theta ^*=\lim _{\mu \rightarrow \infty } \theta ^*_{r,\mu }\) is pointwise but not uniform, becoming arbitrarily slow in the vicinity of the jumps. On the other hand, for Guidefill and semi-implicit Guidefill we see excellent agreement with theory even in the vicinity of jump discontinuities. This is to be expected as well, since one can easily show that the relevant limits are uniform in this case.

7.3 Consequences of the Asymptotic Limit

In this section, we apply the asymptotic limit from Theorem 2 to study the blur artifacts associated with Algorithm 1 and illustrated in Sect. 2.2. As a motivating example, we consider the situation illustrated in Fig. 22, where we extrapolate a vertical line using Guidefill with two different guideance directions: \(\mathbf{g}_1 = e_2\) and \(\mathbf{g}_2 = e_1\). We have already shown in Sect. 7.1.2 that these lead to the same continuum limit with transport direction \(\mathbf{g}^*_r = e_2\) (Fig. 17). However, \(\mathbf{g}_1 = e_2\) leads to a clean extrapolation (Fig. 22a) while the extrapolation using \(\mathbf{g}_1 = e_2\) suffers from heavy blur (Fig. 22b). Evidently, the fixed ratio continuum limit from Theorem 6 is inadequate to explain this discrepancy. Instead we turn to the asymptotic limit—plugging the relevant parameters into Theorem 2, we obtain a theoretical result that is indistinguishable from the result obtained in reality (Fig. 22c, d). The asymptotic limit thus appears to yield excellent predictions of the blur obtained in practice.

Fig. 22
figure 22

Transport is not the whole story: in this experiment, the problem shown in a of inpainting \(D=[0,1]^2\) (\(200 \times 200\)px) given data on \([0,1] \times [-0.3,0)\) is solved using Guidefill with \(\mu = 100\), \(r=3\) and \(\mathbf{g}=(\cos \theta ,\sin \theta )\) for \(\theta = \frac{\pi }{2}\) (b) and \(\theta = 0\) (c). While Sect. 7.1.2 tells us that \(\theta = 0\) and \(\theta = \frac{\pi }{2}\) have the same continuum limit, evidently they are very different in reality. This is reflected in the asymptotic limit (Theorem 2) which is able to predict differences not reflected in the continuum limit. In c, d we compare horizontal slices of c at \(y=0.1\) and \(y=1\) respectively with the predictions of Theorem 2. In this case the predictions are accurate to within an error of 1/255, the minimum increment of an image on our machine

Although our asymptotic limit more closely approximates the real behavior of Algorithm 1 than the fixed-ratio continuum limit, the latter remains valid in the high resolution limit \(h \rightarrow 0\). This is illustrated in Fig. 24 where we repeatedly solve the same inpainting problem with \(D = [0,1) \times (0,0.5]\) but at increasing high resolutions. By examining sections of constant y for differing values of h, we see explicitly the convergence of the discrete solution \(u_h\) to the fixed-ratio continuum limit u.

Remark 15

Since \(u_h\) converges to u obeying the transport PDE (30), Algorithm 1 may be viewed as a numerical method for solving (30). It is well known (see for example [28]) that numerical methods for transport equations tend to introduce blur, and therefore we should not be surprised that this is also the case of Algorithm 1. One way of studying this blur is the technique of modified equations  [28, p. 117]. However, this is not the approach we have adopted in this paper.

Angular dependence of blur artifacts Here we briefly explore how the variance \(\sigma (y,h)^2\) depends on \(\mathbf{g}=(\cos \theta ,\sin \theta )\) as \(\theta \) changes. Since \(\sigma (y,h)^2=\frac{\gamma ^2yh}{|\mu _y^3|}\) can be quite complex in general we explore this numerically. Figure 23 illustrates the angular dependence of \(\sigma (y,h)\) as a function of \(\theta \in [0,\pi ]\) with \(y=1\) and \(h=0.01\) fixed, for the three main methods of interest—coherence transport, Guidefill, and semi-implicit Guidefill (note the log scale). In every case we have \(r=3\) and \(\mu = 100\).

Note that for semi-implicit Guidefill, since \(\mu _y \rightarrow 0\) as \(\theta \rightarrow 0\), we expect blur artifacts to become arbitrarily bad as \(\theta \rightarrow 0\), unless we also have \(\gamma ^2 \rightarrow 0\) as \(\theta \rightarrow 0\) in such a way that the ratio remains bounded. In fact,

$$\begin{aligned} \gamma ^2 = {\text {Var}}(\mu _x V_1 - \mu _y U_1)={\text {Var}}[(\mu _x,\mu _y) \cdot (U_1,V_1)]. \end{aligned}$$

is the variance of the increments \(\mathbf{Z_1}=(U_1,V_1)\) orthogal to the mean \((\mu _x,\mu _y)\), which does go to zero as \(\theta \rightarrow 0\). However, as Fig. 23 illustrates, it does not do so fast enough to compensate for the blow up of \(1/|\mu _y|^3\). Indeed, numerical experiments (omitted for brevity) demonstrate that the blur artifacts for semi-implicit Guidefill do become arbitrarily bad as \(\theta \rightarrow 0\) (Fig. 24).

Note that for coherence transport, \(\sigma ^2(y,h) = 0\) for all but finitely many angles, explaining the methods apparent lack of blur artifacts (Fig. 6). These special angles correspond precisely to the jumps in Fig. 17a, where coherence transport puts its mass into more than one \(\mathbf{y} \in b^-_r\) (Sect. 7.1.2).

Fig. 23
figure 23

Angular variation of blurring artifacts: here we plot the angular dependence of the variance \(\sigma ^2(y,h)\) from the asymptotic blur kernel \(g_{\sigma (y,h)}\) in Theorem 2, for the three methods coherence transport, Guidefill, and semi-implicit Guidefill. We fix with \(r=3\), take \(\mu \rightarrow \infty \), vary \(\mathbf{g}=(\cos \theta ,\sin \theta )\), and plot \(\sigma ^2(y,h)\) as a function of \(\theta \). We fix \(y=1\) and \(h=0.01\)

Degenerate stencils As noted above, coherence transport suffers from no blur artifacts for all but finitely many angles. To understand why this is, recall from the analysis of Sect. 7.1.2 that coherence transport “kinks” because in the limit as \(\mu \rightarrow \infty \), the stencil weights \(w_r(0,\mathbf{y})\) are zero for all but a single \(\mathbf{y}^* \in b^-_r\) which receives stencil weight one (except when \(\theta (\mathbf{g})\) is a transition angle or \(\theta (\mathbf{g})=0\), in which case—as observed in numerical experiments omitted for brevity—coherence transport does blur). We call such stencils degenerate.

Theorem 2 cannot be applied to degenerate stencils as we have

$$\begin{aligned} \varphi _{\mathbf{Z}_i}(\mathbf{u}) = e^{\sqrt{-1}{} \mathbf{y}^* \cdot \mathbf{u}} \end{aligned}$$

which has magnitude one irrespective of \(\mathbf{u}\). In this case, the “random” walk \(\mathbf{X}_{\tau }\) has become deterministic as the increments \(\varDelta \mathbf{X}_i\) now obey \(\varDelta \mathbf{X}_i= h\mathbf{Z}_i = h\mathbf{y}^*\) with probability one. One may readily show that \(\rho _{\mathbf{X}_{\tau }}\) assigns mass 1 to a single pixel in \({\mathcal {U}}_h\), an hence there is no blur.

Fig. 24
figure 24

Convergence to the fixed-ratio continuum limit: the continuum problem of inpainting the line \(\tan (73^{\circ }) - 0.1 \le y \le \tan (73^{\circ }) + 0.1\) with image domain \(\varOmega =[-1,1] \times [-0.5,0.5]\) and inpainting domain \(D = [-0.8,0.8] \times [-0.3,0.3]\) is rendered at a variety of resolutions and inpainted each time using Guidefill. Examining cross-sections of \(u_h\) at \(y=0.3\) (on the boundary of \(D_h\)), \(y=0.25\) (just inside), and \(y=0\) (in the middle of \(D_h\)) we notice a gradual deterioration of the initially sharp signal. This deterioration is to be expected in light of Theorem 2, as \(u_h(\mathbf{x})\) is related to a mollified version of \(u_0\) with a Gaussian mollifier \(g_{\sigma (y,h)}\). However, in light of Theorem 6, we also expect that as \(h \rightarrow 0\) and we approach the fixed-ratio continuum limit, this signal degradation should disappear. By examining the same slices at differing resolutions, we see that this is indeed the case

Fig. 25
figure 25

Extrapolating a diagonal line: similarly to Fig. 22, here we consider again the problem of inpainting \(D=[0,1]^2\) given data on \([0,1] \times [-0.3,0)\). This time \(u_0\) consists of a line making a an angle of \(10^{\circ }\) with the horizontal, but the slice \(u_0(x,0)\) is the same step function as in Fig. 22. This time \(D_h\) is \(1000 \times 1000\)px. Inpainting is done using semi-implicit Guidefill (\(r=3\), \(\mathbf{g}=(\cos 10^{\circ },\sin 10^{\circ })\) \(\mu = 100\)). In c we show the initially sharp signal at \(y=0\), while df compare horizontal slices at \(y = \tan (10^{\circ }) \approx 0.18\), \(y = 2\tan (10^{\circ }) \approx 0.35\) and \(y=3\tan (10^{\circ }) \approx 0.53\) with the predictions of Theorem 2, once again obtaining a very good prediction. Compared with Fig. 22, notice that despite the fact that h has decreased by an order of magnitude, our loss of signal is much more rapid. This is consistent with the divergence \(\sigma (y,h) \rightarrow \infty \) as \(\theta \rightarrow 0\) we will encounter later in Fig. 23

Alternative interpolation schemes

As noted in Remark 7, if we were to modify semi-implicit Guidefill so that ghost pixels are defined not using bilinear interpolation, but by some other interpolation scheme that also

  1. 1.

    can be expressed in terms of non-negative basis functions summing to one

  2. 2.

    preserves polynomials of degree 1

we would obtain an identical continuum limit as for semi-implicit Guidefill. In the future, we would like to explore whether or not another interpolation scheme can be found such that \(\sigma (y,h)\) remains bounded as \(\theta \rightarrow 0\). This would amount to finding a scheme for which the orthogonal variance \(\gamma ^2 \rightarrow 0\) as \(\theta \rightarrow 0\) fast enough to compensate for the blow up of \(1/|\mu _y|^3\).

Summary

Theorem 2 and the experiments of this section have four main takeaway messages:

  • Blur gets worse as one moves further into the inpainting domain and (e.g. y increases)—Figs. 22 and 25 .

  • Blur gets better as \(h \rightarrow 0\)—Fig. 24.

  • Blur is non-existent if the stencil weights are degenerate—that is, put all of their mass into a single real pixel \(\mathbf{y}\) (since in this case \(\mathbf{Z}_i\) is deterministic and all variances are 0).

  • For semi-implicit Guidefill, blur gets arbitrarily bad as the \(\theta (\mathbf{g}) \rightarrow 0\).

8 Applications to Numerical Linear Algebra

Although we have derived our asymptotic and continuum limits in the context of image inpainting, they are abstract results that apply more generally. In particular, the asymptotic limit applies to any situation in which a sequence of vectors \(\{\mathbf{x}^{(m)} \}_{m \in \mathbb {N}}\) is generated recursively (for some \(r \in \mathbb {N}\)) by

$$\begin{aligned} A_0\mathbf{x}^{(m)}= A_1\mathbf{x}^{(m-1)}+A_2\mathbf{x}^{(m-2)}+\ldots +A_r\mathbf{x}^{(m-r)} \end{aligned}$$

where \(A_0,A_1,A_2,\ldots ,A_r\) are circulant or ToeplitzFootnote 2 matrices with bandwidth at most \(2r+1\), with \(A_0\) an M-matrix (positive diagonal and non-negative off-diagonal entrees) while \(A_1,A_2,\ldots ,A_r\) are element-wise non-negative. An example of a situation where this comes up is the application of damped-Jacobi for iteratively solving the linear system

$$\begin{aligned} A\mathbf{x} = \mathbf{b} \end{aligned}$$

where A is a Toeplitz or circulant M-matrix. In this case the evolution of the error is given by

$$\begin{aligned} e^{(m)}={\mathcal {J}}_{\omega } e^{(m-1)}={\mathcal {J}}^m_{\omega } e^{(0)} \end{aligned}$$
(67)

where \(\omega \) is the damping parameter and the iteration matrix \({\mathcal {J}}_{\omega }\) is given by

$$\begin{aligned} {\mathcal {J}}_{\omega } = I - \omega D^{-1}A, \end{aligned}$$

with D denoting the diagonal of A. This can be viewed as a special case of the above with r determined by the bandwidth of A, \(A_0 = I\), \(A_1 = {\mathcal {J}}_{\omega }\), and

$$\begin{aligned} A_2=A_3=\ldots =A_r = O. \end{aligned}$$

One may easily verify that \({\mathcal {J}}_{\omega }\) is non-negative so long as \(\omega \in (0,1)\). Here we demonstrate how our asymptotic limit may be used to predict the evolution of the error of damped Jacobi applied to the one-dimensional convection diffusion equation

$$\begin{aligned} u_{xx}+\alpha u_x = f \end{aligned}$$
(68)

with Dirichlet boundary conditions.

Boundary Conditions Technically, our asymptotic limit is only valid for periodic boundary conditions. To make our result rigorous for Dirichlet boundary conditions, we would need to change the stopping time \(\tau \) of the stopped random walk \(\mathbf{X}_{\tau }:=(X_{\tau },Y_{\tau })\) from Sect. 6.4 to

$$\begin{aligned} \tau := \inf \{n: X_n< 0 \text{ or } X_n > 1 \text{ or } Y_n < 0\}. \end{aligned}$$

However, we know that as \(h \rightarrow 0\) the asymptotic limit approaches the continuum limit, and one may easily see that simply padding the boundary data \(u_0\) on either side with an infinite strip of zeros leads to the same continuum limit—and hence this should also lead to a reasonable approximation for the asymptotic limit, at least away from the boundary at \(x=0\) and \(x=1\).

Discretization We consider two discretizations: 2nd order centered differences, leading to

$$\begin{aligned} A_{\text{ centered }} = {\text {tri}}\left[ -1-\frac{\alpha h}{2},2,-1+\frac{\alpha h}{2}\right] \end{aligned}$$

and first order upwind differences

$$\begin{aligned} A_{\text{ upwind }} = {\text {tri}}\left[ -1,2-\alpha h,-1+\alpha h\right] \end{aligned}$$

Centered differences In this case, we obtain the iteration matrix

$$\begin{aligned} {\mathcal {J}}_{\omega ,\text{ centered }} = {\text {tri}}\left[ \frac{\omega }{2}\left( 1+\frac{\alpha h}{2}\right) ,1-\omega ,\frac{\omega }{2}\left( 1-\frac{\alpha h}{2}\right) \right] . \end{aligned}$$

Assuming \(0 \le \omega \le 1\) and \(-2 \le \alpha h \le 2\) we have \({\mathcal {J}}_{\omega ,\text{ centered }}\) non-negative, and hence by Sect. 6.4 we have the associated random walk with increments (UV) given by

$$\begin{aligned} U = {\left\{ \begin{array}{ll} 1 &{} \text{ with } \text{ probability } \frac{\omega }{2}\left( 1-\frac{\alpha h}{2}\right) \\ 0 &{} \text{ with } \text{ probability } 1-\omega \\ -1 &{} \text{ with } \text{ probability } \frac{\omega }{2}\left( 1+\frac{\alpha h}{2}\right) \\ \end{array}\right. } \end{aligned}$$

and \(V = -1\) with probabilty 1. Since V is deterministic, the second form of the asymptotic limit (Theorem 4) applies so long as \(\omega \ne 1\). If \(\omega = 1\), the stencil of U is constrained to the sublattice \(-1+2\cdot \mathbb {Z} \subsetneq \mathbb {Z}\) and hence the conditions of Theorem 4 are violated. We will see in Fig. 26 that in this case the asymptotic limit of Theorem 4does not accurately predict the error evolution (67). To compute the asymptotic limit, we need only consider the mean \(\mu \) and variance \(\sigma ^2\) of U. We have

$$\begin{aligned} \mu = -\omega \frac{\alpha h}{2} \qquad \sigma ^2 = \omega - \frac{\omega ^2 \alpha ^2 h^2}{4} \end{aligned}$$

Thus, defining

$$\begin{aligned} g_{\sigma ,\mu }(n) := \frac{e^{-\frac{(n-\mu )^2}{2\sigma ^2}}}{\sum _{n'=1}^N e^{-\frac{(n'-\mu )^2}{2\sigma ^2}}} \end{aligned}$$

we have

$$\begin{aligned} e^{(m)}_k \approx (e^{(0)} * g_{m\sigma ,m \mu })_k. \end{aligned}$$
(69)

Figure 26 presents a comparison of the predicted error (69) with the real measured error, for \(N = 100\), \(\alpha h = 1\) fixed and for \(\omega = 0.5\) and \(\omega = 1\). Note that even in the case \(\omega = 1\), where the assumptions of Theorem 4 do not apply, (69) is still able to predict the overall shape of the error, minus high frequency oscillations. For \(\omega \ne 1\), (69) is an excellent prediction of the error, except near the boundaries \(x=0\) and \(x=1\).

Fig. 26
figure 26

Error evolution of damped Jacobi: here we plot evolution of the error of damped Jacobi applied to the one dimensional convection diffusion Eq. (68), discretized using centered differences with \(\alpha h = 1\) and on a grid of size \(N=100\). Also included is the theoretical error as predicted by the asymptotic limit (Theorem 4). We consider both \(\omega = 1\) and \(\omega =0.5\). In the former case, the asymptotic limit captures of the overall trend of the error evolution, but fails to capture some high frequency osscillations. This is because when \(\omega = 1\), the assumptions of Theorem 4 are not satisfied. On the other hand, when \(\omega = 0.5\), the conditions of Theorem 4 are satisfied and the asymptotic limit gives an excellent approximation of the true error evolution

Upwind in this case, we obtain the iteration matrix

$$\begin{aligned} {\mathcal {J}}_{\omega ,\text{ upwind }} = {\text {tri}}\left[ \frac{\omega }{2-\alpha h},1-\omega ,\omega \frac{1-\alpha h}{2-\alpha h}\right] . \end{aligned}$$

This time we assume \(0 \le \omega , \alpha h \le 1\) and consider the associated random walk with increments (UV) given by

$$\begin{aligned} U = {\left\{ \begin{array}{ll} 1 &{} \text{ with } \text{ probability } \omega \frac{1-\alpha h}{2-\alpha h} \\ 0 &{} \text{ with } \text{ probability } 1-\omega \\ -1 &{} \text{ with } \text{ probability } \frac{\omega }{2-\alpha h} \\ \end{array}\right. } \end{aligned}$$

and \(V = -1\) with probabilty 1. Again, the second form of the asymptotic limit (Theorem 4) applies so long as \(\omega \ne 1\). The mean \(\mu \) and variance \(\sigma ^2\) of U in this case are given by

$$\begin{aligned} \mu = -\omega \frac{\alpha h}{2-\alpha h} \qquad \sigma ^2 = \omega - \frac{\omega ^2 \alpha ^2 h^2}{(2-\alpha h)^2} \end{aligned}$$

Once again the predicted error is given by (69). Similar results (omitted for brevity) are obtained as in the centered differences case.

Remark 16

The multigrid algorithm [9, 39] is based in part on the observation that simple iterative methods such as damped Jacobi or Gauss–Seidel—while not necessarily effective at eliminating the error—can be very effective at smoothing it. It is well known that while Jacobi iteration (\(\omega =1\)) is typically not an effective smoother, damped Jacobi with \(\omega \ne 1\) often is, and indeed it is widely employed in the multigrid method [39]. The analysis of this section corroborates this well known fact from the multigrid community—for \(\omega \ne 1\), Theorem 4 gives us an explicit formula for the asymptotic error at iteration m as “smoothed” version of the initial error. However, when \(\omega = 1\), the assumptions of Theorem 4 are violated and this fails.

Remark 17

In its present form, our asymptotic limit cannot be used to analyze the error evolution of damped Jacobi applied to the linear systems arising from the discretization of partial differential equations in dimension \(d \ge 2\). To handle this case, we must first generalize our asymptotic limit to \(\mathbb {R}^{d+1}\). This is straightforward. In particular, the original form of Theorem 1 as it is presented in [36], is applicable to random walks in \(\mathbb {R}^d\). We quoted the two dimensional version of it in Theorem 1 simply because this was the version relevant to the problem at hand. Since the asymptotic limit is derived from Theorem 1, generalizing it to arbitrary dimensions is straightforward.

9 Conclusions and Future Work

9.1 Conclusions

In this paper we have presented a detailed analysis of a class of geometric inpainting algorithms based on the idea of filling the inpainting domain in successive shells from the boundary inwards, where every pixel to be filled is assigned a color equal to a weighted average of its already filled neighbors. These methods are all special cases of a generic framework sketched in Algorithm 1. Methods in the literature falling within this framework include

  • Telea’s Algorithm [38].

  • Coherence Transport [7, 31].

  • Guidefill [27].

A subtle but important point about these methods is that pixels in the current inpainting boundary are filled independently. Noting this, we have proposed a semi-implicit extension of these methods in which pixels in the current shell are instead filled simultaneously by solving a linear system. We have also sketched in Algorithm 1 a straightforward extension of the original framework that is equivalent to solving this linear system using damped Jacobi or successive over-relaxation (SOR). A theoretical convergence analysis of these methods is presented for the semi-implicit extension of Guidefill, where we show that SOR is extremely effective. This analysis is backed up by numerical experiments.

As all of the algorithms listed above (with the exception of semi-implicit Guidefill, which is presented for the first time in this paper) are known to create some disturbing visual artifacts, the main objective of our analysis is to understand why these occur and whether or not they are inevitable. We focus specifically on kinking artifacts and blur artifacts. Other artifacts including the formation of shocks and extrapolated isophotes that end abruptly are discussed but not analyzed, as they have already been studied elsewhere [7, 32] and are well understood. Our analysis is based on three main ideas:

  • A continuum limit, which we use to explain kinking artifacts.

  • An asymptotic limit, which we use to analyze blur artifacts.

  • A connection to the theory of stopped random walks, from which both limits are derived.

Similarly to the earlier work of Bornemann and März  [7], our continuum limit is a transport equation. However, our limit process is different and so are the coefficients of the resulting transport equation. Moreover, numerical experiments show that our transport equation is a better reflection of the behaviour of Algorithm 1 (the direct form and our proposed semi-implicit extension) in practice, capable of accurately predicting kinking phenomena that is not captured by the alternative continuum limit proposed in [7]. The third core idea of our analysis, which is to relate Algorithm 1 and its extension to stopped random walks, is critical for two reasons. Firstly, it allows us to prove convergence to our continuum limit even for boundary data with low regularity, such as jump discontinuities. By contrast, the analysis in [7] demonstrates convergence under the assumption of smooth boundary data, which is an unrealistic assumption for images. Secondly, this connection is central to our analysis of blur artifacts, which we analyze based not on a continuum limit where \(h \rightarrow 0\), but rather an asymptotic limit where h very small but nonzero. Our asymptotic limit allows us to make quantitative predictions regarding blur that are in excellent agreement with numerical experiments, even for relatively low resolution images (e.g. Fig. 22 which is only \(200\times 200\)px). Although our analysis operates in a highly idealized setting (Sect. 6.1), our conclusions are far reaching. In particular, we prove the following:

  1. 1.

    In the direct form of Algorithm 1, kinking artifacts will always be present. That is, certain isophotes cannot be extended into the inpainting domain without bending (Sect. 7.1.1).

  2. 2.

    This is not true of the semi-implicit extension of Algorithm 1. In particular, semi-implicit Guidefill can extrapolate isophotes with any orientation,Footnote 3 and moreover is able to do so efficiently by using SOR to solve the required linear system (Sects. 7.1.1, 7.1.2, Corollary 1).

  3. 3.

    Blur artifacts exhibit an angular dependence, which for semi-implicit Guidefill become arbitrarily bad as the angle an extrapolated isophote makes with the boundary of the inpainting domain goes to zero. Thus, semi-implicit Guidefill pays a heavy price (on top of the need to solve a linear system for every shell) for its ability to successfully extrapolate such isophotes (Theorem 2, Fig. 23).

  4. 4.

    Blur artifacts become less significant as the resolution of the image goes up, and get worse the further into the inpainting domain you extrapolate (Theorem 2).

  5. 5.

    Methods that put all of their weight into a single pixel exhibit no blur, but can only extrapolate without kinking in finitely many directions (Sect. 7.3).

In addition to this, we have also demonstrated that our asymptotic limit has applications outside of image processing. In particular, in Sect. 8 we showed how it could be used to predict the error evolution of damped Jacobi applied to a discretized one-dimensional convection-diffusion problem.

9.2 Future Work

There are a three main questions we would like to explore in the future.

  1. 1.

    Does there exist an algorithm within the framework we have described that avoids blurring artifacts and kinking artifacts at the same time? If not, is it at least possible to design an algorithm that, like semi-implicit Guidefill, avoids kinking artifacts so long as the guidance direction \(\mathbf{g}=(\cos \theta ,\sin \theta )\) obeys \(\theta \ne 0\), but for which the severity of blur as a function of \(\theta \) remains bounded? Could this be done, for example, by the replacing the bilinear interpolation used to define ghost pixels with a more sophisticated interpolation scheme?

  2. 2.

    What happens if the semi-implicit version of Algorithm 1 is generalized to a fully-implicit extension in which pixel colors are computed as a weighted average not only of their (known) already filled neighbors in \(A_{\epsilon ,h}(\mathbf{x}) \cap \varOmega \backslash D^{(k)}\) and (unknown) neighbors within the same shell \(\partial D^{(k)}_h\) , but of all neighbors in \(A_{\epsilon ,h}(\mathbf{x})\)? Are their additional benefits in terms of artifact reduction and, if so, can the resulting linear system be solved efficiently enough to make this worthwhile?

  3. 3.

    We would like to explore the connection of our asymptotic limit to problems in numerical linear algebra more deeply. In particular, we would like to generalize our asymptotic limit to arbitrary dimensions so that it can be applied to discretized partial differential equations in dimension \(d \ge 2\) (we have already sketched how to do this in Remark 17). We would also like to see whether our results can be applied to more sophisticated methods than damped Jacobi and—if so—whether or not any new insights can be gained by doing so.