Generalized cross validation for ℓp-ℓq minimization

Buccini, Alessandro; Reichel, Lothar

doi:10.1007/s11075-021-01087-9

Generalized cross validation for ℓ^p-ℓ^q minimization

Original Paper
Open access
Published: 29 March 2021

Volume 88, pages 1595–1616, (2021)
Cite this article

Download PDF

You have full access to this open access article

Numerical Algorithms Aims and scope Submit manuscript

Generalized cross validation for ℓ^p-ℓ^q minimization

Download PDF

1496 Accesses
13 Citations
Explore all metrics

Abstract

Discrete ill-posed inverse problems arise in various areas of science and engineering. The presence of noise in the data often makes it difficult to compute an accurate approximate solution. To reduce the sensitivity of the computed solution to the noise, one replaces the original problem by a nearby well-posed minimization problem, whose solution is less sensitive to the noise in the data than the solution of the original problem. This replacement is known as regularization. We consider the situation when the minimization problem consists of a fidelity term, that is defined in terms of a p-norm, and a regularization term, that is defined in terms of a q-norm. We allow 0 < p,q ≤ 2. The relative importance of the fidelity and regularization terms is determined by a regularization parameter. This paper develops an automatic strategy for determining the regularization parameter for these minimization problems. The proposed approach is based on a new application of generalized cross validation. Computed examples illustrate the performance of the method proposed.

Estimates for the generalized cross-validation function via an extrapolation and statistical approach

Article 18 June 2018

GCV for Tikhonov regularization by partial SVD

Article 18 May 2017

Fully robust one-sided cross-validation for regression functions

Article 27 February 2017

1 Introduction

In many areas of science and engineering one is faced with the problem of having to compute a meaningful approximate solution, defined in an appropriate way, of linear systems of equations of the form

$$ A\mathbf{x}\approx\mathbf{b}, $$

(1)

where $A\in {\mathbb R}^{m\times n}$ is a given matrix, $\mathbf {x}\in {\mathbb R}^{n}$ is the desired solution, and $\mathbf {b}\in {\mathbb R}^{m}$ is a data vector. The symbol ≈ indicates that we would like to determine a vector x such that Ax approximates b in a suitable way. We consider the case when the singular values of A decrease to zero quickly without a significant gap. Then the matrix A is severely ill-conditioned and, moreover, it is not meaningful to define the rank of A. Approximation problems (1) with a matrix of this kind are commonly referred to as discrete ill-posed problems; see, e.g., [17, 26, 27] for discussions.

In applications of interest to us, the data vector b is not available; instead a measured vector $\mathbf {b}^{\delta }\in {\mathbb R}^{m}$, which is contaminated by noise $\boldsymbol {\eta }\in {\mathbb R}^{m}$, is known, i.e.,

$$ \mathbf{b}^{\delta}=[b_{1}^{\delta},b_{2}^{\delta},\ldots,b_{m}^{\delta}]^{T}=\mathbf{b}+\boldsymbol{\eta} $$

replaces the vector b in (1). Here and throughout this paper the superscript ^T denotes transposition. We assume that η is made up of white Gaussian noise, impulse noise and/or salt and pepper noise. Impulse noise modifies only some entries of b and leaves the other entries unchanged. In detail, we have for 1 ≤ i ≤ m,

$$ b^{\delta}_{i}=\left\{\begin{array}{ll}d_{i}&\text{with probability } \sigma,\\ b_{i}&\text{with probability } 1-\sigma, \end{array}\right. $$

where the d_i are realizations of a random variable with uniform distribution in the interval $[d_{\min \limits },d_{\max \limits }]$, which is the dynamic range of b_i. If $d_{i}\in \{d_{\min \limits },d_{\max \limits }\}$, i.e., if the d_i only attain their maximal or minimal achievable values, then impulse noise is commonly referred to as salt-and-pepper noise. Impulse noise simulates the effect of broken sensors on a measuring device such as a CCD camera.

Since A is ill-conditioned and b^δ is contaminated by noise, the least-squares solution A^‡b^δ of minimal norm of (1) with b replaced by b^δ is meaningless due to propagation and severe amplification of the error η in b^δ into the solution; here A^‡ denotes the Moore-Penrose pseudo-inverse of A. To reduce the propagated error in the computed solution, one typically modifies the problem to be solved. This modification is commonly referred to as regularization.

A regularization method that recently has received considerable attention is the ℓ^p-ℓ^q minimization method; see [14, 18, 30, 35]. This method solves the minimization problem

$$ \underset{\mathbf{x}\in{\mathbb R}^{n}}{\min}\left\{\frac{1}{p}\| A\mathbf{x}-\mathbf{b}^{\delta} \|_{p}^{p}+ \frac{\mu}{q}\| L\mathbf{x} \|^{q}_{q}\right\}, $$

(2)

where $L\in \mathbb {R}^{r\times n}$ is a regularization matrix, μ > 0 a regularization parameter, and $\| \mathbf {z} \|_{p}^{p}={\sum }_{i=1}^{k}|z_{i}|^{p}$ for $\mathbf {z}=[z_{1},z_{2},\ldots ,z_{k}]^{T}\in {\mathbb R}^{k}$ for k ∈{m,n}. We are interested in parameters p and q in the interval (0,2]. Observe that z↦∥z∥_p is a norm only if p ≥ 1; however, for notational convenience, we will refer to the quantity ∥z∥_p as an ℓ^p-norm also for 0 < p < 1.

The regularization parameter μ > 0 determines the trade-off between the first and second terms in (2), and decides how sensitive the solution of (2) is to the error η in b^δ. Moreover, the choice of μ affects how close the solution is to the desired vector x_true = A^‡b. An imprudent choice of μ may result in that the solution of (2) is a poor approximation of x_true. It therefore is important to develop methods that are able to determine a suitable value of μ. This value will depend on the matrices A and L, the vectors b^δ and η, as well as the parameters p and q. If p = q = 2, then the problem (2) reduces to Tikhonov regularization in general form; see, e.g., [17, 22, 27, 38] for discussions on Tikhonov regularization.

We now briefly discuss different choices for p and q, and first comment on the choice of q, which affects the second term in (2). This terms is commonly referred to as the regularization term. In many situations the desired solution x_true of (1) is sparse after some transformation. For instance, if we represent x_true in terms of framelets or wavelets, then x_true typically has a sparse representation, i.e., many coefficients in this representation vanish. Moreover, when L is a discretized gradient operator, the vector Lx_true generally has many vanishing entries. To promote sparsity of the computed solution of (2), we may consider letting L = I and q = 0. Then the regularization term measures the size of the computed solution by the ℓ⁰-norm. This “norm” counts the number of nonzero entries in the vector x in (2). Note that this norm is not convex. Similarly, to promote sparsity in the vector Lx, we may consider letting q = 0 in (2). However, the minimization problems so obtained are extremely difficult to solve. Therefore, it is common to approximate the ℓ⁰-norm by the ℓ¹-norm. This approximation has the advantage that the ℓ¹-norm is convex, which makes it easier to solve (2) than when using the ℓ⁰-norm. However, ℓ^q-norms with 0 < q < 1 are better approximations of the ℓ⁰-norm than the ℓ¹-norm. In particular, smaller values of q > 0 yield better approximations of the ℓ⁰-norm than larger values; see Fig. 1 for an illustration. For 0 < q < 1 the resulting minimization problem (2) is not convex and its solution may be difficult to determine; see Lanza et al. [36] for a recent discussion on the choice of q in the context of image restoration.

The choice of p affects the first term in (2), known as the fidelity term, and should depend on the type of noise η in the data vector b^δ. If the noise η is white Gaussian, then p = 2 is appropriate and a method for determining μ for this kind of noise, based on the discrepancy principle, is described in [7]. This method requires that a fairly accurate estimate of the norm of η be available and allows 0 < q < 1. However, for impulse noise, choosing p = 2 usually produces computed solutions of poor quality. It has been shown, see, e.g., [8, 30, 35], that the choice 0 < p < 1 leads to accurate restorations in the case of salt-and-pepper noise. In [8], the authors developed two strategies based on cross-validation for determining a suitable value of μ for any 0 < p ≤ 2 without any knowledge of the noise. This method is further commented on below.

Let $\mathcal {N}(M)$ denote the null space of the matrix M. We will assume that the matrices A and L are such that

$$ \mathcal{N}(A)\cap\mathcal{N}(L)=\{\mathbf{0}\}. $$

(3)

Then the matrix A^TA + μL^TL is nonsingular for any μ > 0, and the minimization problem (2) for p = q = 2 has the unique solution

$$ \mathbf{x}_{\mu}=(A^{T}A+\mu L^{T}L)^{-1}A^{T}\mathbf{b}^{\delta}. $$

(4)

We remark that both the selection of L and μ are important for the quality of the computed solution and have been widely discussed; see, e.g., [15, 19, 31,32,33, 37].

Generalized cross validation (GCV) introduced by Golub et al. [23] is a popular approach to choosing the regularization parameter for Tikhonov regularization; more recent discussions on this method can be found in [19, 20, 24, 29]. The GCV method is statistically based and chooses a regularization parameter that minimizes the GCV functional

$$ G(\mu)=\frac{\| \mathbf{b}^{\delta}-A\mathbf{x}_{\mu} \|_{2}^{2}}{\left( \text{\textsf{trace}}\left( I-A\left( A^{T}A+\mu L^{T}L\right)^{-1} A^{T}\right)\right)^{2}}. $$

(5)

Denote the minimizer of G(μ) by μ_GCV. In rare events, when the minimizer is not unique, we choose the largest one.

The minimization of G(μ) requires the computation, or at least estimation, of the trace of the matrix I − A(A^TA + μL^TL)^− 1A^T that is not explicitly known. This can make the computation of G(μ) expensive when the matrices A and L are large. Different ways to speed up the computations by application of a global block Lanczos method or Golub-Kahan bidiagonalization are described in [19, 20]. However, these works are limited to the situation when L = I.

In this paper, we combine ideas in [19] with the techniques described in [30] to derive an algorithm for computing the GCV parameter for (2) for general values of 0 < p,q ≤ 2. Specifically, we consider the AMM-GKS algorithm in [30] for the solution of (2) (see Section 3.1 for more details), and modify it so that at each iteration the regularization parameter is chosen adaptively by minimizing a reduced version of the function (5). The computation of the GCV parameter is fully automatic in the sense that no information about the noise η is required. The GCV method for determining the regularization parameter is a so-called heuristic method, see Kindermann [32], and therefore may fail for certain data vectors b. In our experience, failure is extremely rare. The GCV method is related to cross validation (CV), another heuristic method for determining a suitable regularization parameter; see [8, 39]. We will illustrate in Section 4 that GCV typically gives about the same value of the regularization parameter as CV for a much lower computational effort.

This paper is organized as follows: Section 2 reviews how the GCV parameter for Tikhonov regularization in general form can be computed fairly inexpensively by projection into a Krylov subspace, and in Section 3 we discuss how to determine the GCV parameter for large-scale minimization problems (2). A smoothed variant of GCV that can give better regularization parameter values also is presented. Section 4 contains numerical examples that illustrate the performance of the proposed approach. Finally, Section 5 contains some concluding remarks.

2 GCV for Tikhonov regularization in general form

This section discusses the computation of the GCV parameter for Tikhonov regularization in general form, i.e., when solving the minimization problem (2) with p = q = 2. We first describe an approach that requires the generalized singular value decomposition (GSVD) of the matrix pair {A,L}. This approach is computationally feasible only for small to medium-sized problems since the computation of the GSVD for a large matrix pair is very expensive. We then discuss how the computational burden for large-scale problems can be reduced by projecting the matrix pair {A,L} into a Krylov subspace of fairly small dimension.

2.1 GCV computation by using the GSVD

This subsection reviews material in [27], where further details can be found. Consider the Tikhonov regularization problem in general form with μ > 0,

$$ \mathbf{x}_{\mu}=\arg\underset{\mathbf{x}\in{\mathbb R}^{n}}{\min}\left\{\| A\mathbf{x}-\mathbf{b}^{\delta} \|_{2}^{2}+\mu\| L\mathbf{x} \|_{2}^{2}\right\}, $$

(6)

and introduce the GSVD of the matrix pair {A,L},

$$ \left\{\begin{array}{l} A=U{\varSigma}_{A}Y^{T},\\ L=V{\varSigma}_{L}Y^{T}, \end{array}\right. $$

(7)

where Σ_A and Σ_L are diagonal, possibly rectangular, matrices, the matrices U and V are orthogonal, and the matrix Y is square and non-singular. Further details of these matrices, including their sizes, can be found, e.g., in [16, 25, 27]. The evaluation of the GCV function G(μ), defined by (5), requires the computation of the quantities

$$ \begin{array}{@{}rcl@{}} r_{\mu}&=&\| A\mathbf{x}_{\mu}-\mathbf{b}^{\delta} \|_{2}^{2},\\ t_{\mu}&=&\text{\textsf{trace}}\left( I-A\left( A^{T}A+\mu L^{T}L\right)^{-1}A^{T}\right), \end{array} $$

for several values of μ > 0, where the vector x_μ is given by (4). These quantities can be evaluated inexpensively when the factorizations (7) are available as follows. For notational simplicity, the formulas below assume that all matrices are square. This restriction easily can be removed; see, e.g., [16, 25, 27] for details.

The quantity r_μ can be expressed as

$$ \begin{array}{@{}rcl@{}} r_{\mu}&=&\| \mathbf{b}^{\delta}-A\mathbf{x}_{\mu} \|_{2}^{2}\\ &\overset{(a)}{=}&\| \mathbf{b}^{\delta}-A(A^{T}A+\mu L^{T}L)^{-1}A^{T}\mathbf{b}^{\delta} \|_{2}^{2}\\ &\overset{(b)}{=}&\| \tilde{\mathbf{b}}^{\delta}- {\varSigma}_{A}({{\varSigma}_{A}^{T}}{\varSigma}_{A}+\mu {{\varSigma}_{L}^{T}}{\varSigma}_{L})^{-1}{{\varSigma}_{A}^{T}}\tilde{\mathbf{b}}^{\delta} \|_{2}^{2}, \end{array} $$

where (a) is obtained by using the definition (4) of x_μ, and (b) is derived using the factorizations (7) with $\tilde {\mathbf {b}}^{\delta }=U^{T}\mathbf {b}^{\delta }$. Analogously,

$$ \begin{array}{@{}rcl@{}} t_{\mu}&=&\text{\textsf{trace}}\left( I-A\left( A^{T}A+\mu L^{T}L\right)^{-1}A^{T}\right)\\ &=&\text{\textsf{trace}}\left( I-{\varSigma}_{A}({{\varSigma}_{A}^{T}}{\varSigma}_{A}+ \mu {{\varSigma}_{L}^{T}}{\varSigma}_{L})^{-1}{{\varSigma}_{A}^{T}}\right), \end{array} $$

where we have used the factorizations (7) and the fact that trace is invariant under similarity transformation. We have

$$ \mu_{\text{GCV}}=\arg\underset{\mu}{\min}\frac{r_{\mu}}{t_{\mu}}, $$

where we in rare cases of nonunicity let μ_GCV be the largest of the minimizers.

It is clear from the above expressions for r_μ and t_μ that the main computational cost for evaluating (5) is the computation of the GSVD (7) of the matrix pair {A,L}. Observe that, to determine μ_GCV we typically have to evaluate G(μ) for many values of μ. However, it suffices to compute the factorizations (7) only once, since they are independent of μ.

2.2 GCV computation in a Krylov subspace

When the matrices A and L are large, the computation of the GSVD (7) is unattractive due to its high cost; for instance, Paige’s algorithm generally requires at least 35.3n³ arithmetic floating point operations when $A,L\in {\mathbb R}^{n\times n}$; see [1, Table 5.1] for details. We therefore now describe a more computational attractive approach that is based on projecting the problem into a Krylov subspace of fairly small dimension. Our projection method extends the method used in [19] by allowing the matrix L to be different from the identity.

Let $A\in {\mathbb R}^{m\times n}$ and $L\in {\mathbb R}^{r\times n}$ and let the columns of V ∈ R^n×d form an orthonormal basis for the Krylov subspace $\mathcal {V}$ of dimension $1\leq d\ll \min \limits \{m,n,r\}$,

$$ \mathcal{V}=\mathcal{K}_{d}\left( A^{T}A,A^{T}\mathbf{b}^{\delta} \right)= \text{\textsf{span}}\left\{A^{T}\mathbf{b}^{\delta},A^{T}AA^{T}\mathbf{b}^{\delta},\ldots,(A^{T}A)^{d-1}A^{T}\mathbf{b}^{\delta} \right\}. $$

We assume here that d is small enough so that $\dim (\mathcal {K}_{d}\left (A^{T}A,A^{T}\mathbf {b}^{\delta } \right ))=d$. This is the generic situation. It is convenient to compute the columns of V by applying d steps of Golub-Kahan bidiagonalization to A with initial vector b^δ. This gives the decomposition

$$ AV=[U,u_{d+1}]B^{(A)}, $$

(8)

where the matrices [U,u_d+ 1] ∈ R^m×(d+ 1) and V ∈ R^n×d have orthonormal columns and $B^{(A)}\in {\mathbb R}^{(d+1)\times d}$ is lower bidiagonal; see, e.g., [25]. We assume that d is small enough so that this decomposition exists.

We would like to compute an approximate solution of the Tikhonov minimization problem (6) in $\mathcal {V}$. The restriction of the approximate solution to the subspace $\mathcal {V}$ of low dimension d reduces the computational cost of significantly. Thus, we would like to solve

$$ \mathbf{x}=\arg\underset{\mathbf{x}\in\mathcal{V}}{\min}\left\{\| A\mathbf{x}-\mathbf{b}^{\delta} \|^{2}_{2}+\mu\| L\mathbf{x} \|_{2}^{2}\right\}. $$

Substituting x = V y for some $\mathbf {y}\in {\mathbb R}^{d}$ into the above equation gives

$$ \mathbf{y}=\arg\underset{\mathbf{y}\in{\mathbb R}^{d}}{\min}\left\{\| AV\mathbf{y}-\mathbf{b}^{\delta} \|_{2}^{2}+\mu\| LV\mathbf{y} \|_{2}^{2}\right\}. $$

(9)

Substitute the QR factorization

$$ LV=Q^{(L)}R^{(L)}, $$

where the matrix $Q^{(L)}\in {\mathbb R}^{r\times d}$ has orthonormal columns and $R^{(L)}\in {\mathbb R}^{d\times d}$ is upper triangular, and the decomposition (8) into (9) gives

$$ \mathbf{y}=\arg\underset{\mathbf{y}\in{\mathbb R}^{d}}{\min}\left\{\| B^{(A)}\mathbf{y}-\mathbf{e}_{1}\|\mathbf{b}^{\delta}\|_{2}^{2} \|_{2}^{2}+ \mu\| R^{(L)}\mathbf{y} \|_{2}^{2}\right\}, $$

(10)

with e₁ = [1,0,…,0]^T denoting the first axis vector. The projected problem (10) is of quite inexpensive to compute and of fairly small dimension. In particular, it is feasible to compute the GSVD of the matrix pair {B^(A),R^(L)} and use it to calculate an approximation of the GCV function G(μ).

3 GCV for ℓ ^p-ℓ ^q regularization

We describe how to combine the computations discussed in Section 2 for determining the regularization parameter by the GCV method with the algorithms presented in [30] for minimizing the expression (2).

3.1 Solution of the ℓ ^p-ℓ ^q minimization problem for fixed μ > 0

For convenience of the reader, we outline the AMM-GKS method for solving (2) proposed in [30]. Introduce a smoothed version of the q-norm $\| \mathbf {x} \|_{q}^{q}$ for 0 < q ≤ 1 as follows. Let ${\varPhi }_{q}:{\mathbb R}\rightarrow {\mathbb R}$ be defined by

$$ {\varPhi}_{q}(t)=|t|^{q}, $$

and observe that, if 0 < q ≤ 1, then t↦Φ_q(t) is not differentiable at t = 0. We approximate Φ_q by the differentiable function

$$ {\varPhi}_{q,\varepsilon}(t)=\left( \sqrt{t^{2}+\varepsilon^{2}}\right)^{q}, $$

(11)

where ε > 0 is a (small) constant. Similarly, we define for $\mathbf {x}=[x_{1},x_{2},\ldots ,x_{n}]^{T}\in {\mathbb R}^{n}$ the smoothed version of $\| \mathbf {x} \|_{q}^{q}$ as

$$ \| \mathbf{x} \|_{q}^{q}\approx\sum\limits_{i=1}^{n} {\varPhi}_{q,\varepsilon}(x_{i}). $$

Finally, introduce the smoothed version of the functional that is minimized in (2),

$$ \mathcal{J}_{\varepsilon}(\mathbf{x}):=\frac{1}{p}\sum\limits_{i=1}^{m}{\varPhi}_{p,\varepsilon}((A\mathbf{x}-\mathbf{b}^{\delta})_{i})+ \frac{\mu}{q}\sum\limits_{i=1}^{r}{\varPhi}_{q,\varepsilon}((L\mathbf{x})_{i}). $$

(12)

This gives us the smoothed version

$$ \mathbf{x}^{*}:=\arg\min_{\mathbf{x}}\mathcal{J}_{\varepsilon}(\mathbf{x}) $$

of the minimization problem (2).

The AMM-GKS method described in [30] is a majorization-minimization method for computing a stationary point of the functional (12). This method determines a sequence of iterates x^(k), k = 1,2,…, that converge to a stationary point of (12). Each iterate is computed in by two steps: first the functional (12) is majorized by a quadratic tangent majorant $\mathbf {x}\rightarrow \mathcal {Q}(\mathbf {x},\mathbf {x}^{(k)})$ for $\mathcal {J}_{\varepsilon }$ at x^(k). Then the unique minimizer of $\mathbf {x}\rightarrow \mathcal {Q}(\mathbf {x},\mathbf {x}^{(k)})$ is calculated. This minimizer is the next iterate x^(k+ 1). We outline this method in the remainder of this subsection.

Definition 1 (30)

Consider the differentiable functional $\mathcal {J}_{\varepsilon }(\mathbf {x}):{\mathbb R}^{n}\rightarrow {\mathbb R}$. We say that the functional $\mathbf {x}\mapsto \mathcal {Q}(\mathbf {x},\mathbf {y}):{\mathbb R}^{n}\rightarrow {\mathbb R}$ is a quadratic tangent majorant for $\mathcal {J}_{\varepsilon }(\mathbf {x})$ at $\mathbf {y}\in {\mathbb R}^{n}$ if the following conditions are satisfied:

$\mathcal {Q}(\mathbf {x},\mathbf {y})$ is quadratic,
$\mathcal {Q}(\mathbf {x},\mathbf {y})\geq \mathcal {J}(\mathbf {x})$ for all $\mathbf {x}\in {\mathbb R}^{n}$,
$\mathcal {Q}(\mathbf {y},\mathbf {y})=\mathcal {J}(\mathbf {y})$ and $\nabla _{\mathbf {x}}\mathcal {Q}(\mathbf {y},\mathbf {y})=\nabla _{\mathbf {x}}\mathcal {J}(\mathbf {y})$,

where ∇_x denotes the gradient with respect to the first argument of Q.

Majorization step.:

The quadratic tangent majorant at the point x^(k) constructed by the AMM-GKS method is determined as follows. Let

$$ \begin{array}{ll} \mathbf{v}^{(k)}&:=A\mathbf{x}^{(k)}-\mathbf{b}^{\delta},\\ \mathbf{u}^{(k)}&:=L\mathbf{x}^{(k)}, \end{array} $$

and compute the vectors

$$ \begin{array}{ll} \boldsymbol{\omega}_{\text{fid}}^{(k)}&:=((\mathbf{v}^{(k)})^{2}+\varepsilon^{2})^{p/2-1},\\ \boldsymbol{\omega}^{(k)}_{\text{reg}}&:=((\mathbf{u}^{(k)})^{2}+\varepsilon^{2})^{q/2-1}, \end{array} $$

where all the operations, including squaring, are meant element-wise. Define the diagonal matrices

$$ W^{(k)}_{\text{fid}}=\text{diag}(\boldsymbol{\omega}_{\text{fid}}^{(k)}),\quad W^{(k)}_{\text{reg}}=\text{diag}(\boldsymbol{\omega}_{\text{reg}}^{(k)}). $$

It is shown in [30] that the functional

$$ \mathcal{Q}(\mathbf{x},\mathbf{x}^{(k)})=\frac{1}{2}\| (W^{(k)}_{\text{fid}})^{1/2}(A\mathbf{x}-\mathbf{b}^{\delta}) \|_{2}^{2}+ \frac{\mu}{2}\| (W^{(k)}_{\text{reg}})^{1/2}L\mathbf{x} \|_{2}^{2}+c, $$

(13)

with c a suitable constant that is independent of x, is a quadratic tangent majorant for $\mathcal {J}_{\varepsilon }$ at x^(k); see [30] for details on the derivation of $\boldsymbol {\omega }^{(k)}_{\text {fid}}$, $\boldsymbol {\omega }^{(k)}_{\text {reg}}$, and c.

Minimization step.:

We describe the computation of the minimizer x^(k+ 1) of (13). Since $\mathcal {Q}$ is quadratic, x^(k+ 1) can be determined as the zero of the gradient, i.e., by solving

$$ (A^{T}W^{(k)}_{\text{fid}}A+\mu L^{T}W^{(k)}_{\text{reg}}L)\mathbf{x}=A^{T}W^{(k)}_{\text{fid}}\mathbf{b}^{\delta}. $$

(14)

The system matrix is nonsingular and positive definite for all μ > 0 if and only if

$$ \mathcal{N}(A^{T}W^{(k)}_{\text{fid}}A)\cap\mathcal{N}(L^{T}W^{(k)}_{\text{reg}}L)=\{\mathbf{0}\}. $$

This condition typically is satisfied. Then the solution x^(k+ 1) of (14) is the unique minimizer of $\mathbf {x}\to \mathcal {Q}(\mathbf {x},\mathbf {x}^{(k)})$.

The computation of the solution of (14) can be demanding when the matrices A and L are large. An approximate solution can be computed efficiently by projecting the problem into a low-dimensional subspace $\mathcal {V}_{d}$ of dimension $1\leq d\ll \min \limits \{m,n,r\}$. Let the columns of the matrix $V_{d}\in {\mathbb R}^{n\times d}$ form an orthonormal basis for $\mathcal {V}_{d}$ and calculate the skinny QR factorizations

$$ \begin{array}{rclcll} (W^{(k)}_{\text{fid}})^{1/2}AV_{d}&=&Q_{A}R_{A}\quad&\text{with}\quad&Q_{A}\in{\mathbb R}^{m\times d},& ~~~R_{A}\in{\mathbb R}^{d\times d},\\ (W^{(k)}_{\text{reg}})^{1/2}LV_{d}&=&Q_{L}R_{L}\quad&\text{with}\quad&Q_{L}\in{\mathbb R}^{r\times d},& ~~~R_{L}\in{\mathbb R}^{d\times d}, \end{array} $$

where the matrices Q_A and Q_L have orthonormal columns and the matrices R_A and R_L are upper triangular. Using these factorizations, we obtain the reduced analogue of (14),

$$ ({R_{A}^{T}}R_{A}+\mu {R_{L}^{T}}R_{L})\mathbf{y}={R_{A}^{T}}{Q_{A}^{T}}(W^{(k)}_{\text{fid}})^{1/2}\mathbf{b}^{\delta}. $$

(15)

The solution y^(k+ 1) of this equation gives, with a slight abuse of notation, the approximate solution

$$ \mathbf{x}^{(k+1)}=V_{d}\mathbf{y}^{(k+1)} $$

of (14). Substituting this expression into (14) gives the residual vector

$$ \mathbf{r}^{(k+1)}:=A^{T}W^{(k)}_{\text{fid}}(AV_{d}\mathbf{y}^{(k+1)}-\mathbf{b}^{\delta})+ \mu L^{T}W^{(k)}_{\text{reg}}LV_{d}\mathbf{y}^{(k+1)}. $$

Observe that this vector in general is nonvanishing, since we are solving (14) in the subspace $\mathcal {V}_{d}$ of ${\mathbb R}^{n}$. We enlarge at each step the solution subspace by adding the scaled residual vector $\mathbf {v}_{\text {new}}=\mathbf {r}^{(k+1)}/\|\mathbf {r}^{(k+1)}\|$ to the solution subspace. This vector is orthogonal to the columns of the matrix V_d. Define the matrix $V_{d+1}=[V_{d},\mathbf {v}_{\text {new}}]\in {\mathbb R}^{n\times (d+1)}$. Its columns form an orthonormal basis for the new solution subspace $\mathcal {V}_{d+1}$. The so determined solution subspace is referred to as a Generalized Krylov subspace (GKS); see, e.g., [34] for another application of this kind of subspaces. The computations are initialized by choosing a d₀-dimensional solution subspace of ${\mathbb R}^{n}$ and letting the columns of the matrix $V_{d_{0}}$ form an orthonormal basis for this subspace. Typically, setting d₀ = 1 and $V_{0}=A^{T}\mathbf {b}^{\delta }/\| A^{T}\mathbf {b}^{\delta } \|_{2}$ is appropriate. An algorithm that describes these computations and determines the regularization parameter μ by the GCV method is described in the following subsection.

3.2 GCV and MM-GKS

In the computations outlined in the previous subsection, the regularization parameter is kept fixed. We now describe how a value μ_k of the regularization parameter can be determined in a fairly simple manner before computing the next approximate solution x^(k+ 1) of (2).

Equation (15) is the normal equation associated with the minimization problem

$$ \mathbf{y}^{(k+1)}=\arg\underset{\mathbf{y}\in{\mathbb R}^{d}}{\min}\left\{\| R_{A}\mathbf{y}-{Q_{A}^{T}}W^{(k)}_{\text{fid}}\mathbf{b}^{\delta} \|_{2}^{2}+ \mu\| R_{L}\mathbf{y} \|_{2}^{2}\right\}. $$

This minimization problem is analogous to (10). Since it stems from projecting a large problem into a generalized Krylov subspace of modest dimension d, the matrices R_A and R_L are small enough to allow the computation of the GSVD of the matrix pair {R_A,R_L} at moderate expense. We therefore can use the technique described in Section 2.1 to determine a suitable value of the regularization parameter by GCV. In detail, let the GSVD of the matrix pair {R_A,R_L} be given by

$$ \begin{array}{l} R_{A}=U{\varSigma}_{A}Y^{T},\\ R_{L}=V{\varSigma}_{L}Y^{T}. \end{array} $$

Similarly as in Section 2.1, we define, for any μ > 0, the quantities

$$ \begin{array}{ll} r^{(k)}_{\mu}&=\| U^{T}{Q_{A}^{T}}\mathbf{b}^{\delta}-{{\varSigma}_{A}^{2}}({{\varSigma}_{A}^{2}}+{\mu{\varSigma}_{L}^{2}})^{-1} U^{T}{Q_{A}^{T}}W^{(k)}_{\text{fid}}\mathbf{b}^{\delta} \|_{2}^{2},\\ t^{(k)}_{\mu}&=\text{\textsf{trace}}\left( I-{{\varSigma}_{A}^{2}}({{\varSigma}_{A}^{2}}+{\mu{\varSigma}_{L}^{2}})^{-1}\right). \end{array} $$

(16)

This derivation uses that both matrices R_A and R_L are square. We compute the GCV parameter μ^(k) as the solution of

$$ \mu^{(k)}=\arg\underset{\mu}{\min}\frac{r^{(k)}_{\mu}}{t^{(k)}_{\mu}}. $$

If the minimum is not unique, then we choose the largest one. Typically, the minimum is unique. This is assumed in Algorithm 1, which describes the computations. The parameter μ^(k) is updated in each iteration.

Algorithm 1 requires the computation of the GSVD of the matrix pair {R_A,R_L} in each iteration. Since the matrices R_A and R_L are not very large, this is not very expensive.

3.3 Smoothed GCV for impulse noise

We describe a modified version of the GCV that, as we will see in Section 4, is well suited for impulse noise. The Euclidean norm of impulse noise is infinite. Therefore, the numerator in G(μ), see (5), may be infinite. It therefore may be difficult to compute the minimum of G(μ). In computations in finite precision arithmetic, the Euclidean norm of impulse noise is extremely large and this leads to that it may be difficult to determine the minimum of G(μ). To remedy this issue, we smooth the data vector b^δ by convolving it with a Gaussian filter with small variance. Let $\mathbf {k}_{\nu ^{2}}\in {\mathbb R}^{m}$ represent a sampling of a Gaussian function with mean zero and variance ν², and define the smoothed vector

$$ \mathbf{b}^{\delta}_{\text{smooth}}=\mathbf{k}_{\nu^{2}}*\mathbf{b}^{\delta}, $$

where ∗ denotes the convolution operator. It is well known that if we assume periodic boundary conditions, then convolution can be carried out easily with the aid of the Fourier transform $\mathfrak {F}$. We have

$$ \mathfrak{F}(\mathbf{b}^{\delta}_{\text{smooth}})=\mathfrak{F}(\mathbf{k}_{\nu^{2}})\odot\mathfrak{F}(\mathbf{b}^{\delta}), $$

where ⊙ denotes element-wise multiplication. Moreover, if b^δ represents the vectorization of a multi-dimensional array, e.g., an image, it is beneficial to let $\mathbf {k}_{\nu ^{2}}$ be the vectorization of a discretized multi-dimensional Gaussian function.

Using $\mathbf {b}^{\delta }_{\text {smooth}}$ we can define the smoothed version of the GCV functional by

$$ G_{\text{smooth}}(\mu)=\frac{\| \mathbf{b}^{\delta}_{\text{smooth}}-A\mathbf{x}_{\mu} \|_{2}^{2}} {\left( \text{\textsf{trace}}\left( I-A\left( A^{T}A+\mu L^{T}L\right)^{-1}A^{T}\right)\right)^{2}}. $$

This functional is the standard GCV functional (5) with the data vector b^δ replaced by the smoothed data vector $\mathbf {b}^{\delta }_{\text {smooth}}$. The MM-GKS-GCV-Smooth algorithm is obtained by minimizing G_smooth at each iteration in Algorithm 1 instead of G.

4 Numerical examples

This section shows some numerical examples that illustrate the performance of the proposed methods. The examples are concerned with the restoration of images that have been contaminated by noise and space-invariant blur, as well as with computerized tomography; see [28] for details on image restoration and [9] for a description of computerized tomography.

We set q = 0.1 in all examples, while the value of p depends on the type of noise the data vector b^δ is corrupted by. For white Gaussian noise, we set p = 2; for other kinds of noise, we let p = 0.8.

We would like to compute a sparse solution in the framelet domain and therefore use a two-level framelet analysis operator as regularization operator L. Framelets are extensions of wavelets. They are defined as follows:

Definition 2

Let $\mathcal {W}\in {\mathbb R}^{r\times n}$ with n ≤ r. The set of the rows of $\mathcal {W}$ is a framelet system for $\mathbb {R}^{n}$ if $\forall \mathbf {x}\in {\mathbb R}^{n}$ it holds

$$ \| \mathbf{x} \|_{2}^{2}=\sum\limits_{j=1}^{r}{\mathbf{y}_{j}^{T}\mathbf{x}}, $$

where $\mathbf {y}_{j}\in \mathbb {R}^{n}$ is the j th row of $\mathcal {W}$ (written as a column vector), i.e., $\mathcal {W}=\left [\mathbf {y}_{1},\ldots ,\mathbf {y}_{r}\right ]^{T}$. The matrix $\mathcal {W}$ is referred to as an analysis operator and $\mathcal {W}^{T}$ as a synthesis operator. The matrix $\mathcal {W}$ defines a tight frame if and only if $\mathcal {W}^{T}\mathcal {W}=I$.

We remark that in general $\mathcal {W}\mathcal {W}^{T}\ne I$, unless r = n and the framelets are orthonormal. Note that ${\mathcal N}({\mathcal W})=\{\mathbf {0}\}$. Therefore, property (3) holds. Tight frames have been used in many image restoration applications including inpainting and deblurring; see, e.g., [2,3,4,5,6, 10,11,12,13].

We use framelets determined by linear B-splines. In one space-dimension they are made up of a low-pass filter W₀ and two high-pass filters W₁ and W₂, whose masks are

$$ \mathbf{w}^{(0)}=\frac{1}{2}\left[1, 2, 1\right], \mathbf{w}^{(1)}= \frac{\sqrt{2}}{4}\left[1, 0, -1\right], \mathbf{w}^{(2)}=\frac{1}{4}\left[-1, 2, -1\right], $$

respectively. We can derive the synthesis operator $\mathcal {W}$ starting from these masks by imposing reflexive boundary conditions. These boundary conditions secure that $\mathcal {W}^{T}\mathcal {W}= I$. Defining

$$ W_{0}=\frac{1}{4}\left[\begin{array}{ccccc} 3&1 &0 &\ldots&0\\ 1&2 &1 & & \\ &\ddots&\ddots&\ddots& \\ & & 1& 2&1\\ 0&\ldots& 0& 1&3 \end{array}\right],\quad W_{1}=\frac{\sqrt{2}}{4}\left[\begin{array}{ccccc} -1&1 &0 &\ldots& 0\\ -1&0 &1 & & \\ &\ddots&\ddots&\ddots& \\ & & -1& 0& 1\\ 0&\ldots& 0& -1& 1 \end{array}\right], $$

and

$$ W_{2}=\frac{1}{4}\left[\begin{array}{ccccc} 1&-1 &0 &\ldots& 0\\ -1&2 &-1 & & \\ &\ddots&\ddots&\ddots& \\ & & -1& 2&-1\\ 0&\ldots& 0& 1& 1 \end{array}\right]. $$

The synthesis operator is obtained by stacking the three matrices above

$$ \mathcal{W}=\left[\begin{array}{c}W_{0}\\W_{1}\\W_{2} \end{array} \right]. $$

We can extend the operator above to two space-dimension by means of tensor products

$$ W_{ij} = W_{i} \otimes W_{j}, \quad i, j = 0, 1, 2. $$

This yields the analysis operator

$$ \mathcal{W}=\left[\begin{array}{c}W_{00}\\W_{01}\\\vdots\\W_{22} \end{array} \right]. $$

The matrix W₀₀ is a low-pass filter. All the other matrices W_ij contain at least one high-pass filter in some direction. In our examples, we set $L=\mathcal {W}$.

We compare generalized cross validation with two other approaches to selecting the regularization parameter in (2), including cross validation as described in [8] and the discrepancy principle as discussed in [7]. We also compare the quality of the computed solutions determined by these method with the solution obtained with the optimal choice of the regularization parameter. The latter choice of the regularization parameter is not practical, because it requires that the desired solution x_true be known; the regularization parameter is then determined to yield a computed solution that minimizes the restoration error

$$ \text{RRE}(\mathbf{x})=\frac{\| \mathbf{x}-\mathbf{x}_{\text{true}} \|_{2}}{\| \mathbf{x}_{\text{true}} \|_{2}}, $$

(17)

where x_true denotes the desired solution.

The method described in [7] determines a sequence of regularization parameters, in a similar fashion as in this work, by requiring that the computed approximate solution in each iteration satisfies the discrepancy principle (DP). To perform well, this method requires that a fairly accurate estimate of the norm of the error ∥η∥₂ in the data vector b^δ be known. This norm is not meaningful for impulse noise. Therefore, the discrepancy principle cannot be used when b^δ is contaminated by impulse noise.

Two methods for determining the regularization parameter by cross-validation are described in [8]. The first method determines the regularization parameter so that missing data is well predicted. This classical approach of carrying out cross-validation is referred to as CV in the tables. The other method, which we refer to as modified cross-validation (MCV), seeks to determine the regularization parameter so that missing entries of the computed solution are well predicted. The MCV method typically gives approximations of the desired solution x_true of somewhat higher quality than the CV method; see [8] for illustrations. This paper compares different parameter choice rules for the minimization problem (2). A comparison with other methods for solving the minimization problem (2), that require the regularization parameter to be hand-tuned, can be found in [30, 35].

The restoration error (17) and the structural and similarity index (SSIM) are compared in our tests. The definition of the latter is somewhat involved; here we just note that the SSIM measures how well the overall structure of an image is recovered, and that the larger the index the better the reconstruction. In particular, the highest value achievable is 1; see [40] for details.

For all examples we set ε = 1 in (11); see [7] for a discussion on the choice of ε. We terminate the iterations either after 100 iterations or when

$$ \frac{\| \mathbf{x}^{(k+1)}-\mathbf{x}^{(k)} \|_{2}}{\| \mathbf{x}^{(k)} \|_{2}}<10^{-4}. $$

Finally, for the computations of $\mathbf {b}^{\delta }_{\text {smooth}}$, we set ν² = 1 in all examples.

All the computations were performed using Matlab R2018b running on a Windows 10 laptop computer with a i7-8750H @2.20 GHz CPU and 16 GB of RAM.

Tomography

The first example considers a tomography problem. The example is constructed with the aid of the IR Tools Matlab package [21]. The phantom has 256 × 256 pixels, and we consider 90 equispaced angles in the interval [0,π(, and 362 rays per angle. This results in a matrix $A\in {\mathbb R}^{32580\times 65536}$. We added 1% of white Gaussian noise to the sinogram, i.e., ∥η∥₂ = 0.01∥b∥₂. Since the noise is Gaussian, we set p = 2 and, like in all considered examples, q = 0.1. We show the true phantom and the sinogram in Fig. 2.

Table 1 reports the RRE and SSIM for the computed solutions determined by all the parameter choice rules considered in Table 1. All the rules can be seen to perform very similarly. Obviously, the optimal parameter provides the best reconstruction in terms of RRE. This parameter choice rule requires knowledge of the exact solution x_true and therefore cannot be used in real application. The DP provides the second best approximation of x_true, however, it demands knowledge of the norm of the noise η. The CV, MCV, and GCV methods do not require any knowledge of the noise. We can observe, that although the MCV method slightly outperforms the GCV method in terms of the RRE (but not in terms of the SSIM), the GCV method demands much less CPU time than the CV and MCV methods. This is due to the fact that both the CV and MCV methods require several runs of the MM-GKS algorithm, while the GCV method only carries out a single run. Finally, we note that the GCV-Smooth method does not perform well. This is due to the fact that the noise in the data b^δ is purely Gaussian. The smoothing of b^δ leads to an underestimation of the amount of noise in b^δ, which results in a too small value of the regularization parameter; the computed solution is under-regularized. These considerations are confirmed by the visual comparison of the reconstructions shown in Fig. 3.

Table 1 RRE, SSIM, and CPU time in seconds obtained with the different choice rule for the regularization parameter considered

Full size table

Peppers

Our second example considers an image deblurring problem. We blur the image in Fig. 4a with the PSF shown in Fig. 4b, and add 25% of impulse noise. This yields the blurred and noisy image displayed in Fig. 4c. When we construct this example, we cut the boundaries of the blurred image to simulate realistic data; see [28] for more details. Since the image is generic, we impose reflexive boundary conditions. Moreover, since the blurred image is corrupted by impulse noise, we set p = 0.8.

We report the results obtained in Table 1. Since the noise is not Gaussian, we cannot apply the discrepancy principle to determine the regularization parameter. However, the CV, MCV, and GCV methods can be applied. We also report the RRE and SSIM for the optimal choice of the regularization parameter. We can see that the GCV and GCV-Smooth methods provide the best reconstructions. In particular, the reconstruction obtained with GCV is slightly better than the optimal one, and the one obtained with GCV-Smooth is significantly better. While this may appear strange, it is due to the construction of the generalized Krylov subspace. In particular, since the solution subspace depends on the approximate solutions generated during the computations, the solution subspaces determined by the various methods differ, and this may lead to different results. We remark that, while we for stationary methods (with μ fixed) can show converge of the computed iterates, we have no such result for the iterates determined by the CV and GCV methods. Finally, we observe that the use of the smoothed GCV function yields a very accurate reconstruction which, in particular, is more accurate than the one obtained with the standard GCV method. These observations are confirmed by visual inspection of the reconstructions displayed in Fig. 5.

Cameraman

The last example describes another image deblurring problem. We consider the exact image in Fig. 6a and blur it with the PSF in Fig. 6b. Adding 20% of salt-and-pepper noise produces the blurred and noisy image in Fig. 6c. As in the previous example, we cut the boundary of the blurred image to simulate realistic data. In view of that the image is generic, we impose reflexive boundary conditions. Since the PSF is quadrantically symmetric, the matrix A obtained can be diagonalized by the discrete cosine transform. Because L^TL = I, we therefore could avoid the application of generalized Krylov subspaces in the MM-GKS method. However, for consistency we use the same approach as in the previous examples. Since the noise is not Gaussian, we set p = 0.8, similarly as in the example above.

Table 1 shows the RRE and SSIM values obtained with the different parameter choice rules. We observe that the GCV and GCV-smooth methods provide more accurate approximations of x_true in terms of RRE, than both the CV and MCV, for a much lower computational cost. Moreover, the RRE obtained with the GCV method is close to the optimal one, and the RRE for the restoration determined by the GCV-Smooth method is significantly smaller. Figure 7 displays the obtained reconstructions. Visual inspection shows that the reconstruction obtained with the GCV-Smooth method is very detailed, but “grayish”. The latter is due to the presence of some pixels, whose values are significantly larger than 255 (which is the value corresponding to white, the value 0 corresponds to black), which causes a rescaling before plotting. Projecting the reconstruction orthogonally into the cube [0,255]ⁿ gives the image in Fig. 8. We can observe that the gray-scale level is now correct and the reconstruction is very close to the original image.

5 Conclusions

We propose the GCV method for choosing the regularization parameter for the ℓ^p-ℓ^q regularization method. This method is easy to implement and computationally fairly inexpensive thanks to that the original minimization problem (2) is projected into a generalized Krylov subspace of small to moderate size. A comparison with some other available methods shows that the proposed approach is competitive and provides accurate approximations of the desired solution. Moreover, the proposed approach is completely automatic and does not need hand-tuning of any parameter. Thus, it can be considered for real world applications.

References

Bai, Z.: The CSD, GSVD, their applications and computation 958 (1992)
Bianchi, D., Buccini, A.: Generalized structure preserving preconditioners for frame-based image deblurring. Mathematics 8(4), 468 (2020)
Article Google Scholar
Bianchi, D., Buccini, A., Donatelli, M.: Structure preserving preconditioning for frame-based image deblurring. In: Donatelli, M., Serra-Capizzano, S. (eds.) Computational Methods for Inverse Problems in Imaging, pp 33–49. Springer, Cham (2019)
Buccini, A., Donatelli, M.: A multigrid frame based method for image deblurring. Electron. Trans. Numer. Anal. 53, 283–312 (2020)
Article MathSciNet Google Scholar
Buccini, A., Park, Y., Reichel, L.: Numerical aspects of the nonstationary modified linearized Bregman algorithm. Appl. Math. Comput. 337, 386–398 (2018)
MathSciNet MATH Google Scholar
Buccini, A., Pasha, M., Reichel, L.: Linearized Krylov subspace Bregman iteration with nonnegativity constraint. Numer. Algorithms, in press (2020)
Buccini, A., Reichel, L.: An ℓ²-ℓ^q regularization method for large discrete ill-posed problems. J. Sci. Comput. 78(3), 1526–1549 (2019)
Article MathSciNet Google Scholar
Buccini, A., Reichel, L.: An ℓ^p-ℓ^q minimization method with cross-validation for the restoration of impulse noise contaminated images. J. Comput. Appl. Math. 375, Art. 112824 (2020)
Article MathSciNet Google Scholar
Buzug, T.M.: Computed tomography. In: Kramme, R., Hoffmann, K.P., Pozos, R.S. (eds.) Springer Handbook of Medical Technology, pp 311–342. Springer, Berlin (2011)
Cai, J.F., Chan, R.H., Shen, L., Shen, Z.: Simultaneously inpainting in image and transformed domains. Numer. Math. 112(4), 509–533 (2009)
Article MathSciNet Google Scholar
Cai, J.F., Chan, R.H., Shen, Z.: A framelet-based image inpainting algorithm. Appl. Comput. Harmon. Anal. 24(2), 131–149 (2008)
Article MathSciNet Google Scholar
Cai, J.F., Osher, S., Shen, Z.: Linearized Bregman iterations for frame-based image deblurring. SIAM J. Imaging Sci. 2(1), 226–252 (2009)
Article MathSciNet Google Scholar
Cai, Y., Donatelli, M., Bianchi, D., Huang, T.Z.: Regularization preconditioners for frame-based image deblurring with reduced boundary artifacts. SIAM J. Sci. Comput. 38(1), B164–B189 (2016)
Article MathSciNet Google Scholar
Chan, R.H., Liang, H.X.: Half-Quadratic algorithm for ℓ_p-ℓ_q problems with applications to TV-ℓ₁ image restoration and compressive sensing. In: Efficient Algorithms for Global Optimization Methods in Computer Vision, Lecture Notes in Comput. Sci. #8293, Springer, Berlin, pp. 78--103 (2014)
Donatelli, M., Neuman, A., Reichel, L.: Square regularization matrices for large linear discrete ill-posed problems. Numer. Linear Algebra Appl. 19(6), 896–913 (2012)
Article MathSciNet Google Scholar
Dykes, L., Noschese, S., Reichel, L.: Rescaling the GSVD with application to ill-posed problems. Numer. Algorithms 68, 531–545 (2015)
Article MathSciNet Google Scholar
Engl, H.W., Hanke, M., Neubauer, A.: Regularization of Inverse Problems. Kluwer, Doordrecht (1996)
Book Google Scholar
Estatico, C., Gratton, S., Lenti, F., Titley-Peloquin, D.: A conjugate gradient like method for p-norm minimization in functional spaces. Numer. Math. 137(4), 895–922 (2017)
Article MathSciNet Google Scholar
Fenu, C., Reichel, L., Rodriguez, G.: GCV For Tikhonov regularization via global Golub-Kahan decomposition. Numer. Linear Algebra Appl. 23 (3), 467–484 (2016)
Article MathSciNet Google Scholar
Fenu, C., Reichel, L., Rodriguez, G., Sadok, H.: GCV For Tikhonov regularization by partial SVD. BIT Numer. Math. 57, 1019–1039 (2017)
Article MathSciNet Google Scholar
Gazzola, S., Hansen, P.C., Nagy, J.G.: IR Tools: a MATLAB package of iterative regularization methods and large-scale test problems. Numer. Algorithms 81(3), 773–811 (2019)
Article MathSciNet Google Scholar
Gazzola, S., Novati, P., Russo, M.R.: On Krylov projection methods and Tikhonov regularization. Electron. Trans. Numer. Anal. 44, 83–123 (2015)
MathSciNet MATH Google Scholar
Golub, G.H., Heath, M., Wahba, G.: Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics 21(2), 215–223 (1979)
Article MathSciNet Google Scholar
Golub, G.H., von Matt, U.: Generalized cross-validation for large-scale problems. J. Comput. Graph. Stat. 6, 1–34 (1997)
MathSciNet MATH Google Scholar
Golub, G.H., Van Loan, C.F.: Matrix Computations, 4th edn. Johns Hopkins University Press, Baltimore (2013)
MATH Google Scholar
Hanke, M., Hansen, P.C.: Regularization methods for large-scale problems. Surv. Math. Ind. 3(4), 253–315 (1993)
MathSciNet MATH Google Scholar
Hansen, P.C.: Rank Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion. SIAM, Philadelphia (1998)
Book Google Scholar
Hansen, P.C., Nagy, J.G., O’Leary, D.P.: Deblurring Images: Matrices, Spectra, and Filtering. SIAM, Philadelphia (2006)
Book Google Scholar
Hansen, P.C., Nagy, J.G., Tigkos, K.: Rotational image deblurring with sparse matrices. BIT Numer. Math. 54, 649–671 (2014)
Article MathSciNet Google Scholar
Huang, G., Lanza, A., Morigi, S., Reichel, L., Sgallari, F.: Majorization-minimization generalized Krylov subspace methods for ℓ_p-ℓ_q optimization applied to image. BIT Numer. Math. 57, 351–378 (2017)
Article Google Scholar
Huang, G., Reichel, L., Yin, F.: On the choice of solution subspace for nonstationary iterated Tikhonov regularization. Numer. Algorithms 81, 33–55 (2019)
Article MathSciNet Google Scholar
Kindermann, S.: Convergence analysis of minimization-based noise level-free parameter choice rules for linear ill-posed problems. Electron. Trans. Numer. Anal. 38, 233–257 (2011)
MathSciNet MATH Google Scholar
Kindermann, S., Raik, K.: A simplified L-curve as error estimator. Electron. Trans. Numer. Anal. 53, 213–238 (2020)
MathSciNet MATH Google Scholar
Lampe, J., Reichel, L., Voss, H.: Large-scale Tikhonov regularization via reduction by orthogonal projection. Linear Algebra Appl. 436, 2845–2865 (2012)
Article MathSciNet Google Scholar
Lanza, A., Morigi, S., Reichel, L., Sgallari, F.: A generalized Krylov subspace method for ℓ_p-ℓ_q minimization. SIAM J. Sci. Comput. 37(5), S30–S50 (2015)
Article MathSciNet Google Scholar
Lanza, A., Morigi, S., Sgallari, F.: Constrained TV_p−ℓ₂ model for image restoration. J. Sci. Comput. 68(1), 64–91 (2016)
Article MathSciNet Google Scholar
Reichel, L., Rodriguez, G.: Old and new parameter choice rules for discrete ill-posed problems. Numer. Algorithms 63(1), 65–87 (2013)
Article MathSciNet Google Scholar
Reichel, L., Yu, X.: Matrix decompositions for Tikhonov regularization. Electron. Trans. Numer. Anal. 43, 223–243 (2015)
MathSciNet MATH Google Scholar
Stone, M.: Cross-validatory choice and assessment of statistical prediction. J. R Stat. Soc. Series B 36, 111–147 (1977)
MathSciNet MATH Google Scholar
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Article Google Scholar

Download references

Acknowledgements

We would like to thank Marco Donatelli and Caterina Fenu for the precious discussions that inspired and greatly improved this work.

Funding

Open access funding provided by Università degli Studi di Cagliari within the CRUI-CARE Agreement. A.B. is a member of the GNCS-INdAM group that partially supported this work with the Young Researchers Project (Progetto Giovani Ricercatori) “Variational methods for the approximation of sparse data”. Moreover, his research is partially supported by the Regione Autonoma della Sardegna research project “Algorithms and Models for Imaging Science [AMIS]” (RASSR57257, intervento finanziato con risorse FSC 2014-2020 - Patto per lo Sviluppo della Regione Sardegna). The work of L.R is supported in part by NSF grants DMS-1720259 and DMS-1729509.

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, University of Cagliari, 09124, Cagliari, Italy
Alessandro Buccini
Department of Mathematical Sciences, Kent State University, Kent, OH, 44242, USA
Lothar Reichel

Authors

Alessandro Buccini
View author publications
You can also search for this author in PubMed Google Scholar
Lothar Reichel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alessandro Buccini.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Buccini, A., Reichel, L. Generalized cross validation for ℓ^p-ℓ^q minimization. Numer Algor 88, 1595–1616 (2021). https://doi.org/10.1007/s11075-021-01087-9

Download citation

Received: 07 July 2020
Accepted: 18 February 2021
Published: 29 March 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s11075-021-01087-9

Keywords

Mathematics Subject Classification (2010)

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Generalized cross validation for ℓ^p-ℓ^q minimization

Abstract

Similar content being viewed by others

Estimates for the generalized cross-validation function via an extrapolation and statistical approach

GCV for Tikhonov regularization by partial SVD

Fully robust one-sided cross-validation for regression functions

1 Introduction