1 Introduction

Distance transforms (DTs) [6, 7, 46, 47] are standard tools in image analysis and processing and are used for a wide range of tasks such as image registration [8, 20, 44], template matching [37], classification [37], segmentation [5] and skeletonization [49]. In many scenarios when DTs are used, the input image may be noisy or the processing step preceding the DT may yield an unreliable output, manifested as noise. DTs are based on extremal (minimal) distance values and are therefore very sensitive to noise; robust estimation of the DT is a challenging issue which is only partly addressed in the literature.

Discrete random sets DRSs [25, 26, 39] are random variables taking values as subsets of some reference space. Modeling noisy (corrupted) binary images as DRSs lets us apply a rich set of probabilistic tools to analyze and mitigate the impact of the noise. In the theory of DRSs, the coverage probability function CPF of a DRS gives the probability of each element being included in the set, which enables probabilistic reasoning of the individual elements of the set.

One method for computing a robust Hausdorff distance between noisy sets of points, utilizing the distance transform, is based on random removal of points to compute weighted distance histograms based on user-provided point-wise reliability coefficients [42]. Continuing this research track on how the accuracy of DTs can be improved, starting from the theory of DRSs rather than distance histograms, we propose a Stochastic Distance Transform (SDT). The SDT is based on the stochastic expectation of minimal point-to-set distances in noisy images. An initial study of a special case of this concept was introduced in [45], establishing the ability of this approach to reduce the impact of additive noise on the DT. In this extended work, we develop and analyze a more general and flexible framework. We also present and investigate a method, based on kernel density estimation (KDE), to estimate the coverage probability function. We evaluate the performance of the proposed framework on corrupted images (with both additive and subtractive noise), both in terms of accuracy of the computed DT values and its performance when utilized for image segmentation.

A number of theoretical frameworks for computing robust distances between sets and distributions have been proposed. One such framework relies on distances based on optimal transport (OT), which models the amount of displacement of mass required to transform one distribution into another, subject to an underlying spatial distribution [2, 21, 55]. Optimal transport problems (and related distances) suffer from a heavy computational cost that is prohibitive in large-scale applications. Recent approximate solutions, building on the use of Sinkhorn iterations [21], reduce the computation load substantially, however, at the cost of numerical stability challenges. Distances based on optimal transport may be inherently unstable and noise sensitive; regularization techniques, e.g., based on entropy, can mitigate this issue [2], if a certain level of smoothing is acceptable in the computed approximation. OT typically considers the transformation of one distribution into another and is as such not directly applicable as a point-to-set distance (or DT).

One research track towards a robust point-to-set distance, and the reconstruction of geometric objects subject to noise is [12]. Instead of assuming an underlying compact set that is sampled by the points, they assume an underlying probability measure from which a point sample is drawn. The distance function to the measure depends on a mass parameter which acts as a smoothing term. This work is further developed into the k-distance [18, 27] which provides a practical algorithm for a noise-insensitive distance measure by utilizing an (unweighted) aggregation of the distances to the k nearest points. The here-proposed deterministic method for computing the SDT resembles a refined weighted version of the k-distance. In this work, we observe that, by allowing distinct weights of the aggregated distances, the proposed SDT becomes a more general tool, which can treat both subtractive and additive noise successfully.

Persistent homology [13, 14, 22, 64] is another rich framework for extraction of multi-scale noise-robust topological structures. By observing the stability of the topology as a function of increasing size of, e.g., disks growing from each point, topologically salient (noise-insensitive) representations of the raw data can be obtained. The resulting persistence diagrams are useful for comparing images with respect to topology but do not directly provide geometric information. In a very recent work, [10] is shown a non-trivial relation between persistent homology and curvature of an object. The authors propose to learn this relation using standard tools from statistical and machine learning.

Figure 1 provides an illustrative example of the challenge that noise imposes on the accuracy of the DT and different ways to address it. It can be difficult to formulate a rule for removing noise and filling holes such that all noise is successfully removed and all holes are filled. Considering that the DT is based on distances to the nearest point of a set, any remaining noise can corrupt large regions of the DT, while incorrectly removed object points can lead to over-estimations of distances and loss of structural information. Figure 1e shows the result of applying a morphological opening (a common denoising method) on the noisy image, which has (i) failed to remove all noise and (ii) removed substantial parts of the object of interest. Figure 1f shows the DT of the image after opening, which displays severe corruption compared to the desired result in Fig. 1b. Figure 1g, h shows the DT resulting from the k-distance as well as the proposed SDT method with a uniform certainty map [45]. The two results are similar; the impact of noise is reduced compared to the results of using a classic DT (Fig. 1d, f), but the distances are substantially underestimated. Figure 1i shows the DT resulting from the here-proposed method with a non-uniform coverage probability function (estimated by a Kernel Density Estimator), which visibly corresponds well to the noise-free desired output in Fig. 1b.

2 Contributions of the Paper

This work extends [45], where we introduced the SDT, a mathematical tool based on the theory of DRSs, for computing robust distances in the presence of noise. In the here-presented study, we include general definitions of the transform as well as two methods for its efficient computation: one based on Monte-Carlo simulation and one based on ordered sequences of nearest points. We perform analysis of the approximation accuracy and computational complexity for both methods. We expand the discussion of advantages and limitations of the special case of the SDT presented in [45], where coverage probability functions are restricted to be uniform and incorporate this case into the more general version of the framework. Further, we propose a method, based on KDE, to estimate coverage probability functions from noisy input images.

Fig. 1
figure 1

Impact of noise on the DT. a Example image with a single line segment; b DT of a; c the image a corrupted by salt and pepper noise; d the DT of the noisy image c; e the noisy image after a morphological opening with a disk of radius 1; f the DT of the filtered image in e; g the k-distance of c; h the SDT of c with a uniform coverage probability function [45]; and i the result of the here proposed SDT of c, based on kernel density estimation, exhibiting only small amounts of corruption caused by the noise (to be compared with b)

We perform empirical analysis of the accuracy of the DT in the presence of both additive and subtractive (salt and pepper) noise, comparing performances in combination with a number of common kernels for density estimation. In addition, we evaluate the SDT on localization microscopy images which are generated by a software simulator [43, 53], providing known ground-truth and therefore direct evaluation while still considering realistic data. We observe that the proposed method has excellent performance compared to several other candidate methods. We also insert the SDT into an image segmentation framework based on watersheds and evaluate the robustness of segmentation of partially overlapping objects in the presence of border noise and discretization artifacts.

3 Background and Previous Work

In this work, we rely on existing concepts and results related to random sets, distance transforms, kernel density estimation, fuzzy set theory, and the k-distance.

3.1 Discrete Random Sets

Discrete random sets (DRSs) [25, 26, 39] are random variables taking values as subsets of some discrete reference space. A DRS \({\mathcal {Y}}\) on a domain (reference space) \(I\) is characterized by its probability-mass function (PMF),

$$\begin{aligned} f_{{\mathcal {Y}}}(X) = {\mathbb {P}}({\mathcal {Y}}= X), \end{aligned}$$

where \(X \subseteq I\) is a standard (non-random) set. DRSs have been shown to be useful in image processing [1], especially to model (random) binary images which can directly be related through binary random fields [25]. Theory of DRSs provides foundation and offers suitable tools for the modeling and analysis of shapes in images, allowing exploration of both their structural and statistical characteristics. In particular, representing (binary) image objects as (finite) random subsets on an image domain \(I\) (bounded on \({\mathbb {Z}}^n\)) facilitates their structural analysis in presence of noise.

The coverage probability function (CPF) [1, 35] of a DRS \({\mathcal {Y}}\) expresses the probability that \({\mathcal {Y}}\) contains x and is defined as

$$\begin{aligned} p_{\mathcal {Y}}(x) = {\mathbb {P}}(x \in {\mathcal {Y}}). \end{aligned}$$
(1)

If the inclusions of the elements of the domain in a DRS \({\mathcal {Y}}\) are mutually independent, then the CPF uniquely defines the DRS, with the following PMF:

$$\begin{aligned} f_{{\mathcal {Y}}}(X) = \prod _{x \in X} p_{\mathcal {Y}}(x) \cdot \!\!\prod _{x \in I {\setminus } X} (1-p_{\mathcal {Y}}(x)). \end{aligned}$$

The avoidance functional [41] of a DRS and a subset \(K \subseteq I\) is

$$\begin{aligned} Q_{{\mathcal {Y}}}(K) = {\mathbb {P}}({\mathcal {Y}}\cap K = \emptyset ). \end{aligned}$$
(2)

The void probability [25] is the probability that a DRS takes on the empty set, \(f_{\mathcal {Y}}(\emptyset ) = {\mathbb {P}}({\mathcal {Y}}= \emptyset )\).

3.2 Distance Transforms

The initial works on distance transforms focused on raster-scan methods on binary images for computing Manhattan and Chessboard distances [46, 47], followed by more refined weighted (Chamfer mask) distance transforms [7, 15] which approximate the Euclidean distance transform (EDT) with smaller error than previous methods. Fast, separable algorithms have since then been developed which eliminate approximation errors and compute the exact Euclidean distance transform in linear time [16, 24, 40] and which are parallelizable and fast in practice.

There are two major themes in the DT-related literature: (i) methods that introduce more sophisticated models for the shape and location of the objects to which the distance is being computed and (ii) methods that consider different media in which the distances are propagated through.

Works belonging to category (i) include the anti-aliased distance transform [28, 30], which improves the sub-pixel error of distances to continuous structures (discretized in a grid) and the exact EDT for grid sampled signals [38]. Another study investigates models for imprecision in the precise location of digital objects [54]. A different but related aspect to this, concerning distance transforms defined on irregular grids, was explored in [56,57,58]. The SDT proposed in this work also belongs to category (i).

Works belonging to category (ii) include the gray-scale and fuzzy distance transform methods [29, 34, 50] which use available information-rich representations to obtain more accurate distance transforms.

Definitions of distances between points and sets commonly build on a distance measure between points. The Euclidean point-to-point distance \(d_{E}\) is a natural and common choice and is used as the default in this study, unless explicitly specified. Note that the framework discussed in this study does not rely on metric properties of the distances, though explicit methods for computation may.

Given a point-to-point distance d, the standard point-to-set distance between a point x and a set Y is defined as

$$\begin{aligned} d(x, Y) = \inf \limits _{y \in Y}{d(x, y)}. \end{aligned}$$
(3)

The (unsigned, external) distance transform (DT) with respect to a set (image object) Y evaluated on the image domain I is

$$\begin{aligned} \texttt {DT}[Y](x) = \min \limits _{y \in Y}{d(x, y)}, \qquad \text {for} \quad x \in I. \end{aligned}$$
(4)

3.3 Distances to Discrete Random Sets

Point-to-set distances can also be extended to DRSs [1] and hence generate probability distributions of distance values. The distance \(d(x, {\mathcal {Y}})\) is a random variable with values \([0,\infty ]\), taking on the value \(\infty \) when \({\mathcal {Y}}=\emptyset \).

If \(d(x, {\mathcal {Y}})\) is integrable for all \(x \in I\), (i.e., the void probability is 0 for distances which take on the value \(\infty \) for empty sets), then a mean distance function for DRSs [1, 35] can be defined as

$$\begin{aligned} {\bar{d}}(x) = {\mathbb {E}}d(x, {\mathcal {Y}}). \end{aligned}$$
(5)

The requirement of a zero void probability is too restrictive for the context of this work, since it is reasonable to consider scenarios where the presence of no single element is completely certain. By saturating the distance to a maximum value \(d_{\texttt {MAX}}\), the mean distances are well-defined and finite even for empty sets.

Given a set of realizations of a DRS, \(Y_1, Y_2, Y_3 \ldots Y_n\), an empirical sample mean (or some other statistics such as median or sample variance) of the distances \(d(x,Y_1), \ldots , d(x,Y_n)\) can be computed, [1, 35]. This is a useful tool when our input consists of multiple images, representing realizations of the same underlying DRS [1], or when we are able to sample images from a DRS, which forms the basis of the proposed Monte-Carlo method. In this work, we propose methods for efficiently computing expected distances to DRSs under specific assumptions, such that we can compute noise-robust DTs given a single binary image.

3.4 The k-Distance

Let \(n_{Y}^{(i)}(x)\) denote the i-th point nearest to x in Y (if multiple points are equidistant to x, the tie is broken arbitrarily), and let \({\mathcal {N}}_{Y}^{(k)}(x)\) denote the set of the k nearest points to x in Y, \({\mathcal {N}}_{Y}^{(k)}(x) = \{n_{Y}^{(1)}(x),\ldots ,n_{Y}^{(k)}(x)\}\).

The k-distance [18, 27] is the (unweighted) root-mean-square of the distances from the k nearest points \(n_{Y}^{(i)}(x)\) to x:

$$\begin{aligned} d_{k}\left[ Y\right] (x) = \sqrt{\frac{1}{k} \sum _{i=1}^k d^2(x, n_{Y}^{(i)}(x))}. \end{aligned}$$
(6)

The k-distance exhibits a degree of robustness to additive noise where the appearance of a few noise points have a limited impact on the average. However, a remedy for subtractive (pepper) noise is not offered, since points incorrectly not included (due to noise) in the set Y remain excluded from the mean of Eq. (6). The k-distance exhibits a bias inside dense objects where the distance is overestimated; it takes on a mean of the distances to the k nearest grid-points, which is greater than zero.

3.5 Fuzzy Set Theory and Distances

We recall a few basic concepts related to fuzzy sets [62], a theoretical framework for representing partial membership/belongingness of elements to sets/objects. Fuzzy set theory is one of the major theoretical frameworks that enables encoding of precision and is the foundation of many noise-robust methods.

A fuzzy set on a reference set is a set of ordered pairs, , where is the membership function of .

A crisp set \(C \subseteq X_C\) (a binary image) is a special case of a fuzzy set, with its characteristic function as membership function

$$\begin{aligned} \mu _C(x) = \left\{ \begin{array}{ll} 1, \;\; &{} \quad \text {for } \,\, x \in C\\ 0, \;\; &{}\quad \text {for } \,\, x \notin C\,. \end{array}\right. \end{aligned}$$
(7)

The height of a fuzzy set is . An \(\alpha \)-cut of a fuzzy set is a crisp set defined as , i.e., a thresholded image.

A fuzzy point-to-set distance measure [37] which has shown excellent performance on a number of tasks such as template matching and classification is given by the integral over \(\alpha \)-cuts,

(8)

where h(x) is the height of point x, and \(h(x) \in \left[ 0, 1\right] \). In this work, all heights are unit-valued, since we compute distances from the (crisp) domain.

Fuzzy sets and element-wise independent DRSs are superficially similar in that fuzzy sets associate each element with a partial membership to a reference set, while an element-wise independent DRS associates each element with a CP. However, the meanings of the two concepts differ substantially, since a fuzzy set is not a random variable [62], but rather a tool for encoding deterministic partial inclusion in a set.

3.6 Kernel Density and Mass Estimation

Kernel density estimation (KDE) [23, 31, 51] is a nonparametric method where a (weighted) kernel, or window function, is used to estimate the probability density function of a continuous random variable or probability mass function of a discrete random variable.

Methods based on KDE have been successfully applied in denoising of noisy point-clouds [9, 63], clustering (of noisy point-cloud data) [17] and different types of outlier detection [33, 36], which are all based on separating salient elements from non-salient elements based on density.

There are many KDE kernel functions in common use [32]; a subset of those defined for univariate kernels are listed in Table 1, all of which have a finite support on \(\left| z\right| \le 1\). There are multiple ways of transforming such univariate kernels into multivariate kernels. Given a univariate kernel K and a suitable norm, a radially symmetric multivariate kernel [32, 51] with radius r is given by

$$\begin{aligned} K_{\circ }({\mathbf {z}}) = \beta K\left( \frac{\left\Vert {\mathbf {z}}\right\Vert }{r}\right) , \end{aligned}$$
(9)

where \(\beta \) is a normalization constant. For this study, when considering images on rectangular grid domains, the normalization constant is suitably chosen so that the kernel is normalized to a unit sum. For the discretized versions of the continuous kernels, the procedure becomes kernel mass estimation.

Table 1 Common univariate kernels

4 The Stochastic Distance Transform

4.1 Notation and Definitions

Let \({\hat{Y}}= f(Y)\) be an observed binary image on a domain \(I\), resulting from a true image \(Y\) which is corrupted by a random field (stochastic process) f. In other words, the observed binary image \({\hat{Y}}= f(Y)\) can be seen as one sample of a discrete random set \({\mathcal {Y}}\) on \(I\).

The DRS \({\mathcal {Y}}\) can be characterized by its CPF \(p_{{\mathcal {Y}}}(y) = {\mathbb {P}}(y \in Y), \, \forall y \in I\), which is either known or estimated. For all DRSs in this work, we assume mutual independence of inclusion of elements. This simplifies definitions and algorithms significantly and is also reasonable in many imaging scenarios.

We now introduce the SDT, a mathematical tool for computing the distance map on an image domain w.r.t. an object represented by a DRS \({\mathcal {Y}}\).

Definition 1

Given an image domain \(I\), a DRS \({\mathcal {Y}}\) on \(I\), a maximal distance \(d_{\texttt {MAX}}\in {\mathbb {R}}_{+}\), and a point-to-set distance d, the (unsigned, external) Stochastic Distance Transform (SDT) is

$$\begin{aligned} \begin{aligned} \texttt {SDT}\left[ \mathcal {Y}\right] (x) = {{\,\mathrm{{\mathbb {E}}}\,}}(\min \left[ d(x, {\mathcal {Y}}), d_{\texttt {MAX}}\right] ), \quad \text {for} \; \; x\in I. \end{aligned} \end{aligned}$$
(10)

\(d_{\texttt {MAX}}\) is typically taken to be the diameter of the domain \(I\). If \({\mathcal {Y}}\) has a nonzero void probability, the saturation to a maximal distance yields a finite and well-defined expectation in Definition 1.

We also define the variance of the point-to-set distance distribution underlying the SDT.

Definition 2

Given an image domain I, a DRS \({\mathcal {Y}}\) on I, a maximal distance \(d_{\texttt {MAX}}\in {\mathbb {R}}_{+}\), and a point-to-set distance d, the (unsigned, external) Stochastic Distance Variance Transform (SDVT) is

$$\begin{aligned} \begin{aligned} \texttt {SDVT}\left[ \mathcal {Y}\right] (x) = {{\,\mathrm{\texttt {Var}}\,}}(\min \left[ d(x, {\mathcal {Y}}), d_{\texttt {MAX}}\right] ), \quad \text {for} \; \; x\in I. \end{aligned} \end{aligned}$$
(11)

For scenarios where each element in the set of interest is associated with a coverage probability, from, e.g., a prior distribution of an imaging process, the SDT can be used directly without further pre-processing. If no such probabilities are readily available, they can be estimated using some established method, e.g., KDE.

For the further analysis of the SDT, we borrow some terminology from fuzzy set theory and adapt it to DRSs. The support of \({\mathcal {Y}}\) is

$$\begin{aligned} \text {supp}({\mathcal {Y}}) = \left\{ x \, :x \in I \; \texttt {and} \; p_{{\mathcal {Y}}}(x) > 0 \right\} . \end{aligned}$$
(12)

The core of \({\mathcal {Y}}\) is

$$\begin{aligned} \text {core}({\mathcal {Y}}) = \left\{ x \, :x \in I \; \texttt {and} \; p_{\mathcal {Y}}(x) = 1 \right\} . \end{aligned}$$
(13)

We denote the size of \({\mathcal {Y}}\) as the cardinality of the support:

$$\begin{aligned} \vert {\mathcal {Y}}\vert = \vert \text {supp}({\mathcal {Y}})\vert . \end{aligned}$$
(14)

The minimum nonzero probability of \(p_{{\mathcal {Y}}}\) is denoted \(\min \nolimits _{+}({\mathcal {Y}})\), which is equal to 0 only when \(\text {supp}({\mathcal {Y}}) = \emptyset \).

4.2 Monte Carlo Method

An estimate of \(\texttt {SDT}\left[ \mathcal {Y}\right] (x)\) can be obtained by a Monte Carlo method, denoted MC-SDT, by drawing N random samples (sets) from \({\mathcal {Y}}\), (\(Y_1, \ldots , Y_N\)), computing the corresponding DTs using a suitable algorithm and then computing their empirical mean,

$$\begin{aligned} \begin{aligned} \texttt {MC-SDT}_N\left[ \mathcal {Y}\right] (x) = \frac{1}{N} \sum \limits _{i=1}^{N}{\min \left[ d(x, Y_i), d_{\texttt {MAX}}\right] }\,. \end{aligned} \end{aligned}$$
(15)

Due to the mutual independence of inclusions of elements, each sample \(Y_i\) can be generated by one independent Bernoulli trial per element in \(\text {supp}({\mathcal {Y}})\).

4.2.1 Error Analysis

The Monte Carlo method is a sample mean estimator (with finite mean and variance); by the Central Limit Theorem, the margin of error is given by

$$\begin{aligned} ME=\frac{2 \sqrt{\texttt {SDVT}\left[ \mathcal {Y}\right] (x)}}{\sqrt{N}}, \end{aligned}$$
(16)

for sufficiently large N, and asymptotically the margin of error decreases as \({\mathcal {O}}(\frac{1}{\sqrt{N}})\). The error is dependent on \({\mathcal {Y}}\) (which may depend on the image content) and is potentially different for each element x.

4.2.2 Complexity Analysis

We recall that with a fast separable algorithm the EDT can be computed in \({\mathcal {O}}(\vert I\vert )\) time [24, 40]. The Bernoulli sampling for each element using a pseudo-random number generator (PRNG) has complexity \({\mathcal {O}}(\vert {\mathcal {Y}}\vert )\), and the accumulation of each DT into a running sum (over the N sampling iterations) has complexity \({\mathcal {O}}(\vert I\vert )\). The complexity of the whole processes is therefore \({\mathcal {O}}(N \vert I\vert )\).

The method is highly parallelizable, since each random sampling and subsequent DT can be computed independently and are only aggregated at the end.

4.3 Deterministic Method

A deterministic estimation method for the SDT can be derived from an expansion of the expectation in Definition 1, observing an ordered sequence of distances to the nearest set points. For a given point in such a sequence, let its rank mean its index in the sequence.

Let \({\mathcal {N}}_{{\mathcal {Y}}}^{(k)}(x)\) denote \({\mathcal {N}}_{\text {supp}({\mathcal {Y}})}^{(k)}(x)\), and \(n_{{\mathcal {Y}}}^{(i)}(x)\) denote \(n_{\text {supp}({\mathcal {Y}})}^{(i)}(x)\), for convenience.

The probability that all nearest points in \(\text {supp}({\mathcal {Y}})\) of rank up to and including k do not belong to the set is the avoidance functional [41] which is given by

$$\begin{aligned} \begin{aligned} Q_{{\mathcal {Y}}}({\mathcal {N}}_{{\mathcal {Y}}}^{(k)}(x)) = \prod _{i=1}^{k}{\left[ 1-p_{\mathcal {Y}}(n_{{\mathcal {Y}}}^{(i)}(x))\right] }. \end{aligned} \end{aligned}$$
(17)

The probability corresponding to the i-th nearest point in \(\text {supp}({\mathcal {Y}})\) to be the nearest point to belong to the set (i.e., all the nearest points with lower rank than i do not belong to the set and the i-th nearest point does) is the selection functional which is given by

$$\begin{aligned} \begin{aligned} C_{{\mathcal {Y}}}(x; i) = \left\{ \begin{array}{lr} p_{{\mathcal {Y}}}(n_{{\mathcal {Y}}}^{(i)}(x)), &{} \quad \text {for } i = 1\\ p_{{\mathcal {Y}}}(n_{{\mathcal {Y}}}^{(i)}(x))\cdot Q_{{\mathcal {Y}}}({\mathcal {N}}_{{\mathcal {Y}}}^{(i-1)}(x)), &{}\quad \text {for } 1 < i \le \vert {\mathcal {Y}}\vert \\ 0, &{} \quad \text {for } i > \vert {\mathcal {Y}}\vert \\ \end{array}\right\} . \end{aligned} \end{aligned}$$
(18)

Definition 3

Given an image domain I, a DRS \({\mathcal {Y}}\), a maximal distance \(d_{\texttt {MAX}}\in {\mathbb {R}}_{+}\), and a point-to-set distance d, a deterministic method for computing the SDT is

$$\begin{aligned} \begin{aligned} \texttt {DET-SDT}\left[ \mathcal {Y}\right] (x)&= Q_{{\mathcal {Y}}}({\mathcal {N}}_{{\mathcal {Y}}}^{(\vert {\mathcal {Y}}\vert )}(x)) d_{\texttt {MAX}}\, \\&\quad + \sum \limits _{i = 1}^{\vert {\mathcal {Y}}\vert }{C_{{\mathcal {Y}}}(x; i) \min \left[ d(x, n_{{\mathcal {Y}}}^{(i)}(x)), d_{\texttt {MAX}}\right] }. \end{aligned} \end{aligned}$$
(19)

In practice, it may be computationally unfeasible or not required for the application at hand to compute the SDT exactly. The \(\text {DET-SDT}\) can, due to the diminishing weight of the high rank nearest points, be well approximated by truncating the distance probability distribution to either a given probability mass \(p_{\texttt {MAX}}\) (in order of low-to-high rank) for the first k nearest points, thresholding all elements with probability below some threshold \(p_\text {min}\), or alternatively considering a fixed number k of nearest points. The k-nearest points and their distances can be computed using an efficient space-partitioning data structure, e.g., kd-trees [3] or by a separable algorithm [19].

We introduce some notions to aid the reasoning about the accumulated probability mass of sets of k-nearest neighbors. Let the accumulated probability mass up to and including the first i points be denoted as

$$\begin{aligned} p_{\text {acc}}\left[ {\mathcal {Y}}, x\right] (i) = \sum \limits _{j=1}^{i}{C_{{\mathcal {Y}}}(x; j)}. \end{aligned}$$
(20)

For given \({\mathcal {Y}}, x, p_{\texttt {MAX}}\), let \(k_B\) be the smallest positive number such that

$$\begin{aligned} p_{\texttt {MAX}}\le p_{\text {acc}}\left[ {\mathcal {Y}}, x\right] (k_B). \end{aligned}$$
(21)

\(k_B\) encodes the smallest number of nearest neighbors which must be included for their accumulated probability to surpass or equal \(p_{\texttt {MAX}}\). \(k_B\) will be used for computing lower and upper bounds on the SDT.

An upper bound on the SDT, where \(k_B\) is \(k\left[ {\mathcal {Y}}, x\right] (p_{\texttt {MAX}})\), is

$$\begin{aligned} \begin{aligned} d_{U}\left[ {\mathcal {Y}}\right] (x)&= \sum \limits _{i = 1}^{k_B-1}{C_{{\mathcal {Y}}}(x; i) \min \left[ d(x, n_{{\mathcal {Y}}}^{(i)}(x)), d_{\texttt {MAX}}\right] } \\&\quad + (p_{\texttt {MAX}}- p_{\text {acc}}\left[ {\mathcal {Y}}, x\right] (k_B-1)) \min \left[ d(x, n_{{\mathcal {Y}}}^{(k_B)}(x)), d_{\texttt {MAX}}\right] \\&\quad + (1-p_{\texttt {MAX}}) d_{\texttt {MAX}}. \end{aligned} \end{aligned}$$
(22)

The upper bound is based on truncating the tail of the distance distribution by substituting the distances of all higher rank (than \(k_B\)) nearest points with \(d_{\texttt {MAX}}\). The \(k_B\)-th nearest point is only partially considered, in order to avoid spurious variations in distance across the DT due to variable probability mass being substituted with \(d_{\texttt {MAX}}\). The upper bound is guaranteed to be tight if \(p_{\text {MAX}}=1\).

A lower bound on the SDT can be given by (assuming non-empty support and \(p_{\texttt {MAX}}> 0\))

$$\begin{aligned} \begin{aligned}&d_{L}^{*}\left[ {\mathcal {Y}}\right] (x) = \sum \limits _{i = 1}^{k_B-1}{C_{{\mathcal {Y}}}(x; i) \min \left[ d(x, n_{{\mathcal {Y}}}^{(i)}(x)), d_{\texttt {MAX}}\right] } \\&\quad +(p_{\texttt {MAX}}- p_{\text {acc}}\left[ {\mathcal {Y}}, x\right] (k_B-1)) \min \left[ d(x, n_{{\mathcal {Y}}}^{(k_B)}(x)), d_{\texttt {MAX}}\right] , \end{aligned} \end{aligned}$$
(23)

and based on \(d_{L}^{*}\), a (tighter) normalized lower bound is given by

$$\begin{aligned} \begin{aligned} d_{L}\left[ {\mathcal {Y}}\right] (x) = \frac{1}{p_{\texttt {MAX}}}d_{L}^{*}\left[ {\mathcal {Y}}\right] (x). \end{aligned} \end{aligned}$$
(24)

The lower bound is based on truncating the tail of the distance distribution by substituting the distances of all higher rank (than \(k_B\)) nearest points (which have higher distances by definition) with the \(d_{L}^{*}\), using normalization by division with \(p_{\texttt {MAX}}\). The \(k_B\)-th nearest point is only partially considered, in order to avoid spurious variations in distance across the DT due to variable probability mass being substituted with \(d_{L}^{*}\). The lower bound is guaranteed to be tight if \(p_{\text {MAX}}=1\) and \(\text {core}({\mathcal {Y}}) \ne \emptyset \).

Algorithm 1 computes both the upper bound and the lower bound jointly.

figure d

4.4 The Uniform Probability Model

If \(p_{{\mathcal {Y}}}\) is of the form

$$\begin{aligned} \begin{aligned} p_{\mathcal {Y}}(x) = \left\{ \begin{array}{lr} \rho , &{}\quad \text {for } x \in {\hat{Y}}\\ 0, &{} \quad \text {for } x \notin {\hat{Y}}\\ \end{array}\right\} , \end{aligned} \end{aligned}$$
(25)

given \(\rho \in {(}0, 1{]}\), and input image \({\hat{Y}}\) then \(p_{\mathcal {Y}}\) is a uniform CPF and we denote the corresponding DRS \({\mathbb {U}}{[}{\hat{Y}}; \, \rho {]}\).

The uniform model is noteworthy, first and foremost for its simplicity, where a single probability value, together with an input binary image \({\hat{Y}}\) on domain \(I\) characterizes the entire DRS. Our initial study of the SDT [45] focused on this special case. Inserting Eq. (25) into Eq. (10) yields the distance transform

$$\begin{aligned} \begin{aligned} \texttt {SDT}\left[ \mathbb {U}\left[ \hat{Y}; \, \rho \right] \right] (x) = {{\,\mathrm{{\mathbb {E}}}\,}}\left( \min \left[ d\left( x, {\mathbb {U}}\left[ {\hat{Y}}; \, \rho \right] \right) , d_{\texttt {MAX}}\right] \right) , \quad \text {for} \; \; x\in I. \end{aligned} \end{aligned}$$
(26)

The uniform model provides a degree of robustness to additive (salt) noise, but treats all points in \({\hat{Y}}\) equally, which limits how much noise it can handle before corruption of the DT becomes severe. An illustration of the limitations of this model can be seen in Fig. 1h. Furthermore, it is ineffective in the presence of subtractive (pepper) noise, since only points in the input set \({\hat{Y}}\) are considered and missing points (points present in \(Y\) and absent in \({\hat{Y}}\)) remain missing. Inside large connected components, the uniform model yields an overestimation of the distance with a bias of, at least,

$$\begin{aligned}&\text {Bias}\left\{ \texttt {SDT}\left[ \mathbb {U}\left[ \hat{Y}; \, \rho \right] \right] \right\} (x) = (1-\rho )^{\vert {\hat{Y}}\vert }d_{\texttt {MAX}}\nonumber \\&\quad + \sum \limits _{i = 2}^{\vert {\hat{Y}}\vert }{\rho (1-\rho )^{i-1}} \min \left[ d(x, n_{I}^{(i)}(x)), d_{\texttt {MAX}}\right] , \end{aligned}$$
(27)

since the true distance is 0, and even if all object points are positioned as near to point x as possible, the ordered distance sequence of all the points in the domain from x will be included in the expectation with a nonzero probability.

The avoidance functional Q and selection functional C for the uniform model reduces to

$$\begin{aligned} \begin{aligned} Q_{{\mathbb {U}}\left[ {\hat{Y}}, \rho \right] }({\mathcal {N}}_{{\mathbb {U}}\left[ {\hat{Y}}, \rho \right] }^{(i)}(x)) = (1-\rho )^i \end{aligned} \end{aligned}$$
(28)

and

$$\begin{aligned} \begin{aligned} C_{{\mathbb {U}}\left[ {\hat{Y}}, \rho \right] }(x; i) = \left\{ \begin{array}{ll} \rho (1-\rho )^{i-1}, &{} \quad \text {for } 1 \le i \le \vert {\mathcal {Y}}\vert \\ 0, &{}\quad \text {for } i > \vert {\mathcal {Y}}\vert \end{array}\right\} \end{aligned} \end{aligned}$$
(29)

where \(0^0=1\). The deterministic method therefore simplifies into

$$\begin{aligned} \begin{aligned} {\texttt {DET-SDT}}\left[ {\mathbb {U}}\left[ {\hat{Y}}, \rho \right] \right] (x) = (1-\rho )^{\vert {\hat{Y}}\vert } d_{\texttt {MAX}}\, + \\ \sum \limits _{i = 1}^{\vert {\hat{Y}}\vert }{\rho (1-\rho )^{i-1} \min \left[ d(x, n_{{\mathbb {U}}\left[ {\hat{Y}}, \rho \right] }^{(i)}(x)), d_{\texttt {MAX}}\right] }, \end{aligned} \end{aligned}$$
(30)

in the uniform case, which is similar to the form of the deterministic formulation introduced in [45].

4.4.1 Approximation Error Analysis

The lower and upper bounds of the deterministic method both approach the true \({\texttt {SDT}}\left[ {\mathcal {Y}}\right] (x)\) as \(p_{\texttt {MAX}}\rightarrow 1\). In this section, we investigate how fast the error decreases as a function of increasing \(p_{\texttt {MAX}}\), i.e., what rate of improvement is obtained when considering a larger part of the probability mass. We note that the magnitude of the error and the rate of improvement is highly image-dependent (and not constant over the image domain) but general error functions can be expressed in terms of the probabilities and distances.

For this error analysis, we use \(p_{\text {acc}}(i)\) as shorthand for \(p_{\text {acc}}\left[ {\mathcal {Y}}, x\right] (i)\).

If \(p_{\texttt {MAX}}< p_{\text {acc}}(\vert {\mathcal {Y}}\vert )\), then the error of the upper bound of the deterministic method as a function of the parameter \(p_{\texttt {MAX}}\) is

$$\begin{aligned} \begin{aligned}&\epsilon _{d_U}\left[ {\mathcal {Y}}, x\right] (p_{\texttt {MAX}}) = (p_{\text {acc}}(k_B) - p_{\texttt {MAX}}) (d_{\texttt {MAX}}- d(x, n_{{\mathcal {Y}}}^{(k_B)}(x))) \\&\quad +\,\sum \limits _{i = k_B+1}^{\vert {\mathcal {Y}}\vert }{C_{\mathcal {Y}}(x; \,i) (d_{\texttt {MAX}}-d(x, n_{{\mathcal {Y}}}^{(i)}(x) ))}, \end{aligned} \end{aligned}$$
(31)

and otherwise the error is zero. Only the term concerning point \(k_B\) varies continuously as a function of \(p_{\texttt {MAX}}\). If the error is not 0, the slope of the error function w.r.t. \(p_{\texttt {MAX}}\) is

$$\begin{aligned} \frac{d \epsilon _{d_U}\left[ {\mathcal {Y}}, x\right] }{dp_{\texttt {MAX}}}(p_{\texttt {MAX}}) = d(x, n_{{\mathcal {Y}}}^{(k_B)}(x)) - d_{\texttt {MAX}}, \end{aligned}$$
(32)

which is non-positive since \(d_{\texttt {MAX}}\ge d_k\). The error function is thus a piece-wise linear non-increasing function where the difference between the maximum distance and the distance of point k yields the linear rate of improvement at a given \(p_{\texttt {MAX}}\).

For \(p_{\texttt {MAX}}< p_{\text {acc}}(\vert {\mathcal {Y}}\vert )\), the (signed) error of the lower bound is given by a piece-wise reciprocal function

$$\begin{aligned} \begin{aligned} \epsilon _{d_L}\left[ {\mathcal {Y}}, x\right] (p_{\texttt {MAX}}) =&\frac{1}{p_{\texttt {MAX}}}{\sum \limits _{i=1}^{k_B-1} C_{{\mathcal {Y}}}(x;\, i) d(x, n_{{\mathcal {Y}}}^{(i)}(x))}\\&\quad +\,\frac{1}{p_{\texttt {MAX}}}(p_{\texttt {MAX}}- p_{\text {acc}}(k_B-1)) d(x, n_{{\mathcal {Y}}}^{(k_B)}(x))\\&\quad -{\texttt {SDT}}\left[ {\mathcal {Y}}\right] (x), \end{aligned} \end{aligned}$$
(33)

which has the derivative

$$\begin{aligned} \begin{aligned}&\frac{d \epsilon _{d_L}\left[ {\mathcal {C}}, x\right] }{dp_{\texttt {MAX}}}(p_{\texttt {MAX}}) = \frac{1}{{p_{\texttt {MAX}}}^2}{(p_{\text {acc}}(k_B-1) d(x, n_{{\mathcal {Y}}}^{(k_B)}(x)))} \\&\quad -\, \frac{1}{{p_{\texttt {MAX}}}^2}{\sum \limits _{i=1}^{k-1} C_{{\mathcal {Y}}}(x; \,i) d(x, n_{{\mathcal {Y}}}^{(i)}(x))}. \end{aligned} \end{aligned}$$
(34)

The derivative of the error of the lower bound can be interpreted as the change in error due to moving probability mass from the weighted sum of distances at \(p_{\texttt {MAX}}\) into the k-th point, weighted by the amount of probability mass already accumulated.

4.4.2 Complexity Analysis

Algorithm 1 is guaranteed to terminate, since I (and therefore also \(\text {supp}({\mathcal {Y}})\)) is finite. It may also terminate early if the accumulated probability mass surpasses the chosen limit, \(p_{\texttt {MAX}}\).

As was observed in [45] for the uniform model, an upper bound on the number of points k required to cover at least probability mass \(p_{\texttt {MAX}}\) for a DRS with minimum nonzero coverage probability \(\min \nolimits _{+}({\mathcal {Y}})\) is given by

$$\begin{aligned} \kappa \left[ {\mathcal {Y}}\right] (p_{\texttt {MAX}}) = \Bigl \lceil \frac{log(1-p_{\texttt {MAX}})}{log(1-\min \nolimits _{+}({\mathcal {Y}}))} \Bigr \rceil . \end{aligned}$$
(35)

The computational complexity of Algorithm 1 depends on the probabilities of the sequence of the k-nearest points, where k is unknown before executing the algorithm, and not constant over I. However, with the assumption that \({\mathcal {Y}}\) satisfies

$$\begin{aligned} \min \nolimits _{+}({\mathcal {Y}}) \ge \rho , \quad \text {for} \; \; \rho \in \left[ 0, \,1\right] , \end{aligned}$$
(36)

the complexity becomes \({\mathcal {O}}(k + D_{\kappa \left[ {\mathcal {Y}}\right] (p_{\texttt {MAX}})})\). In the worst case, \({\mathcal {Y}}(x) = \min \nolimits _{+}({\mathcal {Y}}), \, \forall x \in \text {supp}({\mathcal {Y}})\). Computing the DET-SDT for all points in the domain I then has complexity \({\mathcal {O}}(\vert I\vert (k + D_k))\) where \(D_k\) denotes the computational complexity of finding the k nearest points.

The deterministic method is highly parallelizable, requiring only the prerequisite data structures (e.g., a kd-tree) to be initialized for fast computation of \(n_{{\mathcal {Y}}}^{(i)}(x)\) and \(d(x, n_{{\mathcal {Y}}}^{(i)}(x))\), since the evaluation of each point is an independent process.

5 Coverage Probability Estimation Using Kernel Density Estimation and Soft Thresholding

In this section, we introduce a method for constructing DRSs with non-uniform CPFs for the SDT based on density estimation or mass estimation. We propose using a variation of KDE for binary images on rectangular grids, to obtain a CPF based on the local density of object/foreground points.

Let \(K :I \rightarrow \left[ 0, 1\right] \) be a discrete kernel, satisfying [48] (in the discrete case)

$$\begin{aligned} \sum _{x_1=-\infty }^{+\infty } \ldots \sum _{x_d=-\infty }^{+\infty } K(x_1, \ldots , x_d) = 1. \end{aligned}$$
(37)

A DRS \({\mathcal {Y}}_K\) can then be formulated, which is characterized by a CPF given by the resulting kernel density estimate on an input set \({\hat{Y}}\),

$$\begin{aligned} p_{{\mathcal {Y}}_K}(x) = (F_{{\hat{Y}}} * K)(x), \end{aligned}$$
(38)

where \(*\) denotes the convolution operation which is used in the literature for fast KDE on discrete grids [51, 52, 60], and \(F_{{\hat{Y}}}\) which denotes an indicator function of \({\hat{Y}}\). If a kernel K satisfies the normalization property (37) and has finite support, then each point x centered in a locally dense region, where all points of \(I\) falling within the kernel radius are object points in \({\hat{Y}}\), belongs to \(\text {core}({\mathcal {Y}}_K)\). If \(x \in \text {core}({\mathcal {Y}}_K)\), then \({\texttt {SDT}}\left[ {\mathcal {Y}}_K\right] (x) = 0\). The method therefore has zero bias within sufficiently large connected components.

The result of the KDE may not always have an ideal CPF for providing maximal accuracy of the DT. If objects of interest are small or thin, the maximal estimated density may be much lower than one, resulting in an empty core. In addition, sparse noise may not be fully suppressed, but kept to some degree, which can, in aggregate, have a large effect on the computed DT. We therefore propose to extend the KDE method with a soft thresholding operator, for images on rectangular grids (it could be generalized to other grids and to point-clouds by replacing the convolution with a point-cloud density estimation framework [9, 63]). The soft thresholding operator we propose has parameterized crossover-point \(c\), transition width \(w\) and steepness \(s\), defined as

$$\begin{aligned} \begin{aligned} \tau _{p}\left[ {c, w, s}\right] (x) = \text {SMOOTH-STEP}_s\left( \frac{p(x)-c}{2w} + \frac{1}{2}\right) , \end{aligned} \end{aligned}$$
(39)

where

$$\begin{aligned} \begin{aligned} \text {SMOOTH-STEP}_{s}(z) = \left\{ \begin{array}{ll} 0, &{}\quad \text {for } z< 0\\ 2^{s-1}(z^s), &{}\quad \text {for } 0 \le z \le \frac{1}{2}\\ 1-2^{s-1}(1-z)^s, &{}\quad \text {for } \frac{1}{2} < z \le 1 \\ 1, &{}\quad \text {for } z > 1\\ \end{array}\right\} , \end{aligned} \end{aligned}$$
(40)

is a smooth transition function from 0 to 1. For \(s=1\), \(\tau \) is a linear transition from 0 at \(c-\frac{1}{2}w\), to 1 at \(c+\frac{1}{2}w\); for \(s \rightarrow \infty \), \(\tau \) is a hard threshold at \(c\). \(\tau \) is flexible and expressive, and it has separate, continuous, and tunable parameters for the nonlinear smoothing of the transition, the crossover point and the width transition. We observe that line structures benefit from a higher slope than disks. An example of the function \(\tau \), for a particular choice of the parameters, is given in Fig. 2. When the deterministic method for SDT computation is used, to reduce the potentially high computational cost of considering points with low CPs, we use a threshold \(p_\text {min}\), such that CPs below \(p_\text {min}\) are truncated to 0.

Fig. 2
figure 2

The soft threshold function \({\tau _{p}\left[ 0.3, 0.25, 2.0\right] }\), (threshold 0.3, transition width 0.25, steepness 2)

Inserting Eq. (38) into Eq. (39), we obtain, for a given discrete kernel K, a DRS with CPF

$$\begin{aligned} p_{{\mathcal {Y}}_{{\hat{K}}\left[ {c, w, s}\right] }}(x) = \tau _{\, p_{{\mathcal {Y}}_{K}}}\left[ {c, w, s}\right] (x), \end{aligned}$$
(41)

which we also shall denote \(p_{{\mathcal {Y}}_{{\hat{K}}}}\) when we omit the parameters, and the corresponding DRS \({\mathcal {Y}}_{{\hat{K}}}\) (Fig. 3).

If a kernel satisfies (37), it can be inserted into (38) and further into (41). This yields the distance transform

$$\begin{aligned} \begin{aligned} {\texttt {SDT}}\left[ {\mathcal {Y}}_{\hat{K}}\right] (x) = {{\,\mathrm{{\mathbb {E}}}\,}}(\min \left[ d(x, {\mathcal {Y}}_{{\hat{K}}} ), d_{\texttt {MAX}}\right] ), \quad \text {for} \; \; x\in I. \end{aligned} \end{aligned}$$
(42)

This method is denoted \(\text {SDT}_K\) in the performance analysis section.

Table 2 contains a summary of all the parameters used in the KDE CP estimation method, with their supported intervals.

6 Performance Analysis and Applications

In this section, we evaluate the proposed SDT using KDE-based CPs, as well as the model with uniform CPs, w.r.t. accuracy of the computed DT values. We also evaluate these methods on the task of segmentation of overlapping disks; we observe their performance when incorporating them in the watershed segmentation framework, which is closely related to, and relies on, DT. Since we observed similar performance for the Monte-Carlo and the deterministic method, [45], here we focus on the deterministic method, with an advantage of not including randomness. We present the evaluation results.

6.1 Alternative Methods for Computing Distance Transforms from Coverage Probability Functions

Here we describe two alternative methods which can be used together with the estimated CPF to compute DTs, based on hard thresholding and on distances to fuzzy sets. We present them here, to enable us to compare the proposed methods with alternative approaches.

If the parameter \(s\) in the function \(\tau \), Eq. (39), tends to \(\infty \), and the SDT is replaced with the standard DT, we obtain a KDE-based denoising by thresholding method, which we denote \(\text {T}_{K}\). This enables comparison between the SDT and KDE-based denoising.

Another class of tools which can utilize the known or estimated CPFs to compute distance transforms are distance measures from the fuzzy set-theory, [37, 62]. The CPFs can be directly translated into membership values, which are defined for all elements and belong to the interval \(\left[ 0, 1\right] \), keeping in mind the difference in interpretation between probabilities and fuzzy membership values. The distance measure defined by Eq. (8) can thus be used to compute the distance from each point to the fuzzy set corresponding to \(p_{{\mathcal {Y}}_{K}}\) by assuming unit heights for all points of the domain, and integrating over the \(\alpha \)-cuts,

$$\begin{aligned} d^{\alpha }(x, p_{{\mathcal {Y}}_{{\hat{K}}\left[ c, w, s\right] }}) = \int _{0}^{1}{\min \left[ d(x, {}^{\alpha }p_{{\mathcal {Y}}_{{\hat{K}}\left[ c, w, s\right] }}), \; d_{\texttt {MAX}}\right] \, \mathrm{d}\alpha }. \end{aligned}$$
(43)

We denote the fuzzy point-to-set method \(d^{\alpha }_{K}\).

Table 2 All the parameters of the proposed method with the intervals of their allowed values
Fig. 3
figure 3

a, e Visualization of two distributions which were sampled to obtain (b, f) noisy realizations. c, g CPF estimated from (b, f) using the \(K_{\text {TW}}\) kernel with radii 9 and 8.5 respectively. d, h CPF estimations based on c, g and soft thresholding \({\mathcal {Y}}_{{\hat{K}}{[0.315, 0.13, 2.15]}}\) and \({\mathcal {Y}}_{{\hat{K}}{[0.315, 0.24, 4.35]}}\) respectively

6.2 Distance Accuracy in the Presence of Noise

To observe the accuracy of the proposed DTs compared to EDT and a few other reference methods, we evaluate them all on sets of synthetic generated images, with an aim to compare the obtained DTs with a ground-truth DT.

6.2.1 Synthetic Datasets

Two synthetic datasets of binary images of size \(128 \times 128\) pixels were generated for evaluation of accuracy of the computed distance values: (i) a line segment image dataset, consisting of binary dilations of digital line segments drawn between random points and (ii) a disk image dataset, consisting of a random number (between 1 and 20) of digital disks, with random radii between 4 and 8. One version of each image is generated without noise, as a source of a reference distance transform, and one with between 1 to 20% additive noise and between 25 and 75% subtractive (pepper) noise, either increasing or decreasing as a function of the distances to the center-point of the image. For both datasets, we have created 100 (pairs of) training images, for parameter-tuning, and 1000 (pairs of) test images. Figure 4 shows examples from both of the datasets, with and without corruption by noise.

Fig. 4
figure 4

Example images from the line segment image dataset and the disk image dataset, showing the original images (a, c) and corrupted versions (b, d)

6.2.2 Experiment Setup

Let D be a dataset of N distance transforms generated by some method and G be a dataset of corresponding N ground-truth distance transforms, on domain I. The used quality measures for the error of the distance transforms over datasets D and G are the mean average error (MAE) of the distance values, given by

$$\begin{aligned} \text {MAE}_{D, G} = \frac{1}{N}\frac{1}{\vert I\vert }\sum \limits _{i=1}^{N}{\sum \limits _{x \in I}{\left| D_i(x)-G_i(x)\right| }}, \end{aligned}$$
(44)

and the mean maximum error (MME) of the distance values, defined as

$$\begin{aligned} \text {MME}_{D, G} = \frac{1}{N}\sum \limits _{i=1}^{N}{\max \limits _{x \in I}{\left| D_i(x)-G_i(x)\right| }}, \end{aligned}$$
(45)

where \(D_i\) is the estimated distance transform for image i in the dataset D and \(G_i\) is the ground-truth distance transform for the noise-free image i in the dataset G. In addition to these mean error measures, the empirical standard deviations are also presented.

For both datasets and the deterministic SDT method, a probability mass of \(p_{\texttt {MAX}}= 0.95\) and a truncation probability threshold \(p_\text {min}= 0.05\) were used. The presented results use the lower bound estimates from Algorithm 1.

6.2.3 Parameter Tuning

The parameters, listed in Table 2 (with the exception of \(p_\text {min}\)), were tuned by 600 iterations of random search in the parameter space on the training sets (with 100 images) for both datasets, within ranges given in Table 3. Along with the tuning of the SDT method, the hard-thresholding method and the fuzzy set distance method were also tuned, and the best parameter for each method was chosen. Normalization of the CPF, performed according to

$$\begin{aligned} {\tilde{p}}_{{\mathcal {Y}}}(x) = \frac{p_{{\mathcal {Y}}}(x) - \min \limits _{x \in I}{p_{{\mathcal {Y}}}}}{\max \limits _{x \in I}{p_{{\mathcal {Y}}}}-\min \limits _{x \in I}{p_{{\mathcal {Y}}}}}, \end{aligned}$$
(46)

for KDE-based DRS \({\mathcal {Y}}_K\), is applied before the thresholding step to investigate if it has beneficial effects on the accuracy of the DT. The parameters of each method, for which the lowest MAE was observed on the training set, were then used for the evaluation on the test sets.

For the line segment data set, the parameters leading to best performance are: \(r=8.1, c=0.15, w=0.29, s=4.65\). For the disk data set, the optimal parameter values are: \(r=9, c=0.315, w=0.13, s=2.15\). We observe a slight difference in the transition range and the steepness parameters, but otherwise very similar values are identified. Normalization was not beneficial on either of the datasets for the SDT-method, but it did improve the accuracy of the hard thresholding.

Table 3 Parameters included in parameter tuning, and their ranges

6.2.4 Choice of Kernel

With the aim of choosing a kernel which maximizes the accuracy of the final SDT, a number of commonly used kernels were considered. To evaluate the kernels, each one was tuned (with 600 iterations of random search) on the combined disk and line segment datasets, containing 200 training images and evaluated on the 2000 test images in the combined set. We list the MAEs from this evaluation in Table 4. The Triweight kernel, \(K_{\text {TW}}\), exhibited the highest accuracy and was therefore chosen as the kernel candidate for the rest of the experiments. The same kernel, \(K_{\text {TW}}\), led to the best observed performance of \(\text {T}_{K}\) and \(d^{\alpha }_{K}\), as well.

Table 4 Results of the evaluation of different kernels in terms of mean (over different images) average (over different pixels within each image) error

6.2.5 Results

The results of the evaluation are presented in Table 5. The proposed method (\(\text {SDT}_K\)) substantially outperforms the reference methods (EDT, EDT+Opening, k-distance, \(\text {T}_K\) and \(d^{\alpha }_K\)) on both datasets and both considered quality measures.

Table 5 Accuracy of the considered methods on the synthetic datasets as measured in mean average error (MAE) and mean maximum error (MME)

6.3 Algorithm Convergence and Run-time Analysis

In this work, we propose two algorithms for computing the SDT. To assist in the choice of algorithm, the empirical rate of convergence to the exact SDT (which can be computed exactly with the deterministic algorithm if \(p_{\texttt {MAX}}=1\) and the upper bound is used) and run-time are measured and contrasted.

6.3.1 Experiment Setup

Starting from an image from the line segment data set, used in the synthetic tests of Sect. 6.2, we compute the exact SDT and then compute distance and elapsed run-time measured in seconds, for a sequence of \(p_{\texttt {MAX}}\) and N-values, respectively. We then use the mean average error (MAE) evaluation criterion between the exact SDT and each approximation to determine the accuracy.

The parameters chosen for this experiment are \(r = 4.0, c= 0.15, w=0.3, s=1.0, p_\text {min}= 0.05\). The kernel is chosen to be \(K_{\text {TW}}\). These parameters are not meant to be optimal in terms of accuracy for the test image but provide a wide range of different probability values in the resulting CPF in order to exercise the algorithms.

The research prototype implementation of the deterministic algorithm used for this experiment uses a kd-tree data structure provided in the SciPy software package [59], which does not take advantage of the regular grid structure of the data. This is a limitation of the part of this study related to measurement of run-time.

All experiments are performed on a workstation with a 6-core Intel i7-6800K, 3.4 GHz processor with 48GB of RAM and 15 MB cache. The operating system Ubuntu 18.04 LTS and Python version 3.6.9 were used.

6.3.2 Results

In Fig. 5, the MAE and run-time for each method are displayed as functions of the respective method’s key approximation parameter. For the deterministic method, the MAE decreases approximately linearly as a function of \(p_{\texttt {MAX}}\), while the run-time increases super-linearly. For the Monte Carlo algorithm, the MAE decreases approximately as predicted by Eq. (16), which was evaluated by observing a sequence \(N=\left\{ 100, \ldots , 250\right\} \) and computing the quotients of the MAE for data-point N and MAE for data-point 4N. The mean of these quotients measured 1.992 which is close to the expected 2.

Fig. 5
figure 5

The MAE error and run-time of the two proposed SDT algorithms as functions of their respective parameters. Trend lines added for illustration purposes

If the requirements of accuracy are not of highest importance and non-deterministic output is not a concern, the Monte Carlo algorithm with a low N is a good choice. If very high accuracy is required, or if non-deterministic distance values are not suitable for the application at hand, the deterministic algorithm is the clear choice.

6.4 Watershed Segmentation of Overlapping Objects

Watershed segmentation [5] is a prominent method for segmentation tasks where touching or overlapping objects are to be separated (assigned distinct labels). It is an important part of many pipelines for, e.g., cell segmentation and tracking [11]. More recent approaches combine convolutional neural networks with watershed partitioning [61]. One challenge with using the classical watershed method is that it is prone to over-segmentation, where parts of a single object are split into multiple objects, necessitating post-processing to fuse those objects back together. The over-segmentation can be caused by spurious local maxima (caused by irregularities and noise at the object boundaries) in the DT (to the background) used to compute the watershed. In this section, we investigate if the SDT, being smooth and robust to noise, can reduce the impact of noisy object boundaries. This evaluation is an extension to the performance analysis of [45].

6.4.1 Experiment Setup

We consider a set of distances \(\left\{ 3.0, 3.1, \ldots , 9.8, 9.9 \right\} \) at which we position two disks with radius in the range \(\left[ \pi , 2 \pi \right] \) in a digital grid, where the centroid of one of the disks is positioned at a random sub-pixel position and the centroid of the other disk is placed at the given distance from the first disk, displaced in a random direction. For each distance, 500 test cases are generated and segmented with the considered methods. The evaluation is performed both without any noise (in addition to the discretization artifacts) and with various amounts of salt and pepper noise applied at the boundaries of the objects, which serves to create random deformations (cavities and protrusions).

A simplified scenario with only two objects in the image is considered because of two main reasons: (i) it simplifies the evaluation procedure and (ii) even in fully realistic scenarios with overlapping circular objects, they would not all be overlapping but instead consist of a large number of connected components each being treated individually by the watershed segmentation.

For the standard EDT, we also consider a version where a morphological closing (dilation followed by erosion) with a \(3 \times 3\) square structuring element is applied before computing the DT, to eliminate some of the noise.

Success is defined as segmenting exactly two objects and where the centroids of the two true objects are assigned distinct labels.

6.4.2 Parameter Tuning

The parameters for the different methods were tuned using 25 images for each distance. For the \(\text {SDT}_U\) method, a coverage probability value \(\rho =0.25\) was chosen, and \(p_{\texttt {MAX}}=0.75\). For the \(\text {SDT}_K\) method, transition size \(w= 0.5\), crossover point \(c= 0.35\), radius \(r=2.5\), smooth-step exponent \(s=0.5\), and normalization disabled, and \(p_{\texttt {MAX}}=0.9\). For the \(\text {SDT}_K\) method, it increased performance to mask out (assign CP to 0) all points which were not included in the original noisy image. For the \(\text {T}_K\) method, threshold \(c=0.1\), radius \(r=2.5\), normalization disabled. For the \(d^{\alpha }_K\), all the relevant parameters had the same values as for \(\text {SDT}_K\), and 15 quantization levels were used. For the k-distance, \(k=6\).

6.4.3 Results

The results of the watershed segmentation experiments are presented in Fig. 6 and Table 6. We observe that the SDT-based methods are superior to the other compared methods for the task of interest and that both the uniform and the KDE-based SDTs exhibit similar performance, with a slight advantage to \(\text {SDT}_K\) for all cases.

Fig. 6
figure 6

Example of the result of watershed segmentation of an image using the considered methods

Table 6 Success rate of the watershed segmentation of overlapping (circular) objects subject to random cavities and protrusions for a number of chosen methods

6.5 Distance Transforms of Images in Localization Microscopy

Stochastic Optical Reconstruction Microscopy (STORM) [4] is a family of imaging methods which enables imaging nanometer-scale structures with a very fine spatial resolution. By imaging only a small subset of the objects of interest in every image taken in a series, the locations of the objects can be computed by fitting a point-spread function (PSF) model over the modes of each image. Each image thus contributes a sparse set of points for a final point-cloud constituting the detected object points.

6.5.1 Experimental Setup

We used the TestSTORM simulator [43, 53] for generating test image series for a known synthetic test pattern consisting of over 30,000 points in a point cloud forming a set of lines. This is a synthetic but realistic test scenario since one major use-case of this type of microscopy is to image thin curve structures [53]. Using simulated data enables the computation of a distortion-free ground-truth image and corresponding distance transform, enabling direct evaluation. We simulate 5 series of 800 different \(64 \times 64\) images with short exposure time (0.005 s) and pixel size 160 nanometers and then compute reconstructed images from the simulated image series using the software RainSTORM [43]. We then create higher resolution images of size \(256 \times 256\) from the resulting point-clouds. All the considered methods are then used to compute DTs to these noisy images, and the results are compared with the ground-truth DT to obtain mean average error (MAE) and mean maximum error (MME) in nanometers.

6.5.2 Parameter Tuning

We generated a set of 5 training images, with the identical process as for the test images. For each DT method which has parameters to set, we generated 1000 random sets of parameters, and the method was subsequently applied to all the training images and the set of parameters which resulted in the lowest mean average error (MAE) was chosen for processing of the test set.

6.5.3 Results

The quantitative results are presented in Table 7. We observe that the results for this (more realistic) set of images agree to a high extent in terms of accuracy with the results on the line segment image dataset; the \(\text {SDT}_K\) method has a MAE which is less than half the MAE of the second best evaluated method. An example of the spatial distribution of the error over the image, as well as magnitude of the error, are shown in Fig. 7.

Table 7 Accuracy of the considered methods on the STORM microscopy simulated datasets as measured in mean average error (MAE) [nanometers] and mean maximum error (MME) [nanometers]
Fig. 7
figure 7

Visualization of the error of different methods on DT on point-clouds yielded by STORM imaging. a Ground-truth image; b an acquired image from a corrupted by noise resulting from imaging artifacts and spurious signal detections; ci the errors of the considered methods, computed with b as input, compared to the DT of the ground-truth image. All color-maps are the same for ci with discretized colors representing different errors. Red signifies higher error (Color figure online)

7 Conclusion and Future Work

In this work, we present the SDT, a DT which uses a model of noisy images based on DRSs to achieve robustness in the presence of noise, and develop it for general DRS (satisfying mutual independence of inclusion of elements). The work extends the definition introduced in [45] (where a preliminary study of SDT, observed on a special case of a uniform CPs, is presented). We propose a density-based method for estimating the CPs, which is able to separate noise and objects well, without relying on, and thus being sensitive to exact local patterns, which can be a challenge for, e.g., morphological noise filtering. We show through empirical evaluation that excellent accuracy of the proposed SDT, relying on the KDE-based CP estimation method, can be attained using the proposed method, even in the presence of severe corruption (1–20% of additive noise and between 25 and 75% of subtractive noise) of the image of interest, substantially better than the other compared methods (both hard-thresholding of KDE, a method based on fuzzy set theory, the SDT with a uniform CPF, morphological opening-based denoising followed by a standard DT, and the k-distance [18]).

We also show that the method can be successfully used together with watershed segmentation for separation of overlapping objects with increased robustness to both discretization artifacts and noise appearing on the border of the objects, which have a strong negative effect on segmentation which relies on a classic (noise-sensitive) DT.

We further evaluate the method on simulated STORM localization microscopy images where we observed very good results, congruent with the larger synthetic study.

For future work along this research track, we suggest exploring learning-based methods for estimation of CPFs, using, e.g., denoising auto-encoders, which have capacities to model more complex patterns than a single kernel, faster specialized kd-tree algorithms for accelerating the computation of the SDT on rectangular grids, and exploring the applicability and performance of the SDT in image registration.