1 Introduction

We are concerned with the solution of linear least-squares problems

$$\begin{aligned} \min _{\varvec{\scriptstyle x}\in {\mathbb {R}}^n}\Vert A{\varvec{x}}-{\varvec{b}}\Vert , \end{aligned}$$
(1)

where \(A\in {{\mathbb {R}}}^{m\times n}\) is a large matrix, whose singular values “cluster” at the origin, and \({\varvec{b}}\in {{\mathbb {R}}}^m\). In particular, A has many “tiny” singular values of different orders of magnitude. Least-squares problem with a matrix of this kind commonly are referred to as linear discrete ill-posed problems. They arise, for instance, from the discretization of Fredholm integral equations of the first kind; see, e.g., [10, 19]. Applications of this kind of least-squares problems include image reconstruction and remote sensing. Throughout this paper \(\Vert \cdot \Vert \) denotes the Euclidean vector norm or the spectral matrix norm. Both the situations when \(m\ge n\) and when \(m<n\) will be considered.

The vector \(\varvec{b}\) in linear discrete ill-posed problems that arise in applications typically represents measured data and is contaminated by a measurement error \(\varvec{e}\in {\mathbb {R}}^m\). Thus,

$$\begin{aligned} {\varvec{b}} = {\varvec{b}}_\mathrm{exact} + {\varvec{e}}, \end{aligned}$$
(2)

where \({\varvec{b}}_\mathrm{exact}\) denotes the unknown error-free vector associated with \(\varvec{b}\). We are interested in determining the solution, \(\varvec{x}_\mathrm{exact}\), of minimal Euclidean norm of the unavailable least-squares problem

$$\begin{aligned} \min _{\varvec{\scriptstyle x}\in {\mathbb {R}}^n}\Vert A{\varvec{x}}-{\varvec{b}}_\mathrm{exact}\Vert . \end{aligned}$$

Due to the error \(\varvec{e}\) in \(\varvec{b}\) and the presence of small positive singular values of A, the solution of (1) of minimal Euclidean norm typically is not a useful approximation of \(\varvec{x}_\mathrm{exact}\). To determine a meaningful approximation of \(\varvec{x}_\mathrm{exact}\), one generally replaces the minimization problem (1) by a nearby problem, whose solution is less sensitive to the error \(\varvec{e}\). This replacement is known as regularization. One of the most popular regularization methods is due to Tikhonov. It replaces (1) by a penalized least-squares problem of the form

$$\begin{aligned} \min _{{\varvec{\scriptstyle x}}\in {{\mathbb {R}}}^n}\{\Vert A{\varvec{x}}-{\varvec{b}}\Vert ^2 + \mu \Vert L{\varvec{x}}\Vert ^2\}, \end{aligned}$$
(3)

where \(L\in {{\mathbb {R}}}^{p\times n}\) is a regularization matrix and \(\mu \ge 0\) a regularization parameter. We require that \({\mathcal N}(A)\cap {\mathcal N}(L)=\{\varvec{0}\}\), where \({\mathcal N}(M)\) denotes the null space of the matrix M. Then the penalized least-squares problem (3) has a unique solution

$$\begin{aligned} {\varvec{x}}_\mu =(A^TA+\mu L^TL)^{-1}A^T\varvec{b}\end{aligned}$$
(4)

for any \(\mu >0\); the superscript \(^T\) denotes transposition. Common choices of regularization matrices L include the identity matrix, denoted by I, and discrete approximations of differential operators; see, e.g., [5, 7, 19, 23]. The value of the regularization parameter \(\mu >0\) determines how sensitive the vector (4) is to the error in \(\varvec{b}\) and how close it is to the desired vector \(\varvec{x}_\mathrm{exact}\).

We will determine \(\mu \) with the aid of the discrepancy principle; see below. This requires that a bound \(\epsilon \) for the error \(\varvec{e}\) be known, i.e.,

$$\begin{aligned} \Vert \varvec{e}\Vert \le \epsilon , \end{aligned}$$
(5)

and that \({\varvec{b}}_\mathrm{exact}\in {\mathcal R}(A)\), where \({\mathcal R}(A)\) denotes the range of A. If these requirements are not satisfied, then other methods, including the L-curve criterion and generalized cross validation can be used to determine a suitable value of \(\mu \); see, e.g., [11, 25, 26, 29] for discussions and illustrations.

The Tikhonov solution (4) is said to satisfy the discrepancy principle if

$$\begin{aligned} \Vert {\varvec{b}} - A{\varvec{x}}_\mu \Vert = \eta \epsilon , \end{aligned}$$
(6)

where \(\eta >1\) is a user-chosen parameter that is independent of \(\epsilon \). When \(\epsilon \) in (5) is known to be an accurate estimate of \(\Vert \varvec{e}\Vert \) and \(\varvec{e}\) represents white Gaussian noise, then generally \(\eta \) is chosen to be close to unity. Equation (6) has a unique solution \(\mu _\mathrm{discr}=\mu >0\) for many reasonable values of \(\eta \epsilon >0\); see, e.g., [2]. Several zero-finders for determining \(\mu _\mathrm{discr}\) are described in [30]. A proof in a Hilbert space setting that \(\varvec{x}_\mu \rightarrow \varvec{x}_\mathrm{exact}\) as \(\epsilon \searrow 0\) can be found in [10].

It is the purpose of the present paper to compare a solution method for large-scale Tikhonov minimization problems (3) based on partial Golub–Kahan bidiagonalization of A to a randomized solution method. Partial Golub–Kahan bidiagonalization is the basis for the possibly most popular Krylov subspace methods for the solution of large-scale problems (3) with a nonsymmetric or symmetric indefinite matrix A; see, e.g., [2, 15, 18, 19, 22, 24] for discussions and illustrations of this solution approach. Iterative solution methods that are based on the Arnoldi process instead of Golub–Kahan bidiagonalization are competitive for certain problems, but may fail to determine accurate approximations of \(\varvec{x}_\mathrm{exact}\) for some problems; see [6, 14, 15, 27] for discussions and applications of the Arnoldi process to large-scale linear discrete ill-posed problems. We therefore focus on Golub–Kahan bidiagonalization in the present paper.

When solving (3) by application of \(\ell \) steps of Golub–Kahan bidiagonalization, the matrix A is replaced by an approximation of rank at most \(\ell \). Typically, \(1\le \ell \ll \max \{m,n\}\) in applications. Thus, Golub–Kahan bidiagonalization applied to the solution of (3) replaces A by a low-rank approximation of the matrix A, and then solves the low-rank problem instead of (3).

Randomized solution methods for the solution of large-scale problems have received considerable attention; see Halko et al. [17] for a survey. When applied to the solution of (3), these methods also determine a low-rank approximation of A. They compute an approximate solution of the original problem by replacing the given matrix by its low-rank approximation, and then solve the low-rank problem so obtained. Xiang and Zou [33, 34] describe applications of this approach to the solution of large-scale Tikhonov minimization problems (3). To the best of our knowledge, very few comparisons of randomized and Krylov subspace-based solution methods for linear discrete ill-posed problems are available in the literature. It is quite natural to compare these approaches to the solution of large-scale linear discrete ill-posed problems, because they both determine low-rank approximations of the large matrix A. Vatankhah et al. [32] show that randomized methods may be faster than Krylov subspace methods based on Golub–Kahan bidiagonalization for certain problems. We illustrate that, for some linear discrete ill-posed problems, methods based on Golub–Kahan bidiagonalization are competitive and we seek to shed light on for which kinds of linear discrete ill-posed problems Golub–Kahan bidiagonalization may be preferable.

This paper is organized as follows. Section 2 reviews methods based on partial Golub–Kahan bidiagonalization of the matrix A described in [2, 22] for the solution of large-scale Tikhonov regularization problems (3). Section 3 outlines the randomized method proposed by Xiang and Zou [33]. The randomized method discussed in [34] also is commented on. Section 4 presents computed results, and Sect. 5 contains concluding remarks.

2 Solution methods based on Golub–Kahan bidiagonalization

This section reviews solution methods for the Tikhonov minimization problem described in [2, 22]. They are based on reducing the matrix A to a small bidiagonal matrix by the application of \(1\le \ell \ll \min \{m,n\}\) steps of Golub–Kahan bidiagonalization to A. The number of steps is chosen as small as possible so that the computed solution can satisfy the discrepancy principle. Thus, application of \(\ell \) steps of Golub–Kahan bidiagonalization to A with initial vector \(\varvec{b}\) gives the decompositions

$$\begin{aligned} AV_\ell = U_{\ell +1}{\bar{C}}_\ell ,\qquad A^T U_\ell = V_\ell C_{\ell }^T, \end{aligned}$$
(7)

where the matrices \(U_{\ell +1}\in {{\mathbb {R}}}^{m\times (\ell +1)}\) and \(V_\ell \in {{\mathbb {R}}}^{n\times \ell }\) have orthonormal columns, \(U_\ell \in {{\mathbb {R}}}^{m\times \ell }\) consists of the first \(\ell \) columns of \(U_{\ell +1}\), and

$$\begin{aligned} U_{\ell +1}\varvec{e}_1=\varvec{b}/\Vert \varvec{b}\Vert . \end{aligned}$$
(8)

Here and throughout this paper \(\varvec{e}_1=[1,0,\ldots ,0]^T\) denotes the first axis vector of appropriate dimension. The range of \(V_\ell \) is the Krylov subspace

$$\begin{aligned} {\mathbb K}_\ell (A^TA,A^T\varvec{b})= \mathrm{span}\{A^T\varvec{b},(A^TA)A^T\varvec{b},\ldots ,(A^TA)^{\ell -1}A^T\varvec{b}\}. \end{aligned}$$
(9)

Further, the matrix

$$\begin{aligned} {\bar{C}}_{\ell }= \left[ \begin{array} {ccccc} \rho _1 &{} &{} &{} &{} {0} \\ \sigma _2&{} \rho _2 &{} &{} &{} \\ &{} \ddots &{} \ddots &{} &{} \\ &{} &{} \sigma _{\ell -1} &{} \rho _{\ell -1} &{} \\ &{} &{} &{} \sigma _{\ell } &{} \rho _{\ell } \\ {0} &{} &{} &{} &{} \sigma _{\ell +1} \\ \end{array} \right] \in {{\mathbb {R}}}^{(\ell +1)\times \ell } \end{aligned}$$

is lower bidiagonal with positive entries \(\sigma _k\) and \(\rho _k\), and \(C_{\ell }\in {{\mathbb {R}}}^{\ell \times \ell }\) is obtained by removing the last row of \({\bar{C}}_{\ell }\). We assume that \(\ell \) is small enough so that the decompositions (7) with the described properties exist. This is the generic situation. The dominating computational effort required to determine the decompositions (7) is the sequential evaluation of \(\ell \) matrix–vector products with each one the matrices A and \(A^T\); see, e.g., [16, Sect. 10.4.4] for an algorithm.

Following [22], we compute an approximate solution of (3) by minimizing over the Krylov subspace (9) instead of over \({\mathbb {R}}^n\). Thus, we solve

$$\begin{aligned} \min _{\varvec{\scriptstyle x}\in {\mathbb K}_\ell (A^TA,A^T\varvec{\scriptstyle b})}\{\Vert A\varvec{x}-\varvec{b}\Vert ^2+\mu \Vert L\varvec{x}\Vert ^2\}, \end{aligned}$$
(10)

which, by using the representation \(\varvec{x}=V_\ell \varvec{y}\), and the relations (7) and (8), can be expressed as

$$\begin{aligned} \min _{\varvec{\scriptstyle y}\in {{\mathbb {R}}}^\ell }\{\Vert {\bar{C}}_\ell \varvec{y}-\varvec{e}_1\Vert \varvec{b}\Vert \,\Vert ^2+\mu \Vert LV_\ell \varvec{y}\Vert ^2\}. \end{aligned}$$
(11)

Denote the solution of (11) by \(\varvec{y}_{\mu ,\ell }\). Then \(\varvec{x}_{\mu ,\ell }=V_\ell \varvec{y}_{\mu ,\ell }\) is an approximate solution of (3). We point out that the problems (3) and (10) only differ in the spaces over which they are minimized. The randomized method of Sect. 3 gives a minimization problem that differs in several ways from the problem (3).

First consider the situation when \(L=I\). Then it is shown in [1, Theorem 5.1] that

$$\begin{aligned} \Vert A\varvec{x}_{\mu ,\ell }-\varvec{b}\Vert =\Vert {\bar{C}}_\ell \varvec{y}_{\mu ,\ell }-\varvec{e}_1\Vert \varvec{b}\Vert \,\Vert . \end{aligned}$$
(12)

It therefore suffices to choose \(\mu >0\) so that the reduced problem on the right-hand side satisfies the discrepancy principle; see [2] for details. It follows that it is quite inexpensive to determine a value of \(\mu >0\) such that the approximate solution \(\varvec{x}_{\mu ,\ell }=V_\ell \varvec{y}_{\mu ,\ell }\) of (3) with \(L=I\) satisfies (6). A discussion on how this can be done by using Newton’s method can be found in [2]; other zero-finders are discussed in [30].

The expressions (12) decrease as \(\ell \) increases. This follows from the fact that the dimension of the Krylov subspace in (10) increases with \(\ell \) and that the subspaces are nested, i.e., \({\mathbb K}_\ell (A^TA,A^T\varvec{b})\subset {\mathbb K}_{\ell +1}(A^TA,A^T\varvec{b})\) for \(\ell =1,2,\ldots ~\). We choose the number of bidiagonalization steps, \(\ell \), as small as possible to satisfy the discrepancy principle for some \(0<\mu <\infty \), i.e., we choose \(\ell \) so that

$$\begin{aligned} \Vert {\bar{C}}_\ell \varvec{y}_{\mu ,\ell }-\varvec{e}_1\Vert \varvec{b}\Vert \Vert <\eta \epsilon \le \Vert \Vert {\bar{C}}_{\ell -1}\varvec{y}_{\mu ,\ell -1}-\varvec{e}_1\Vert \varvec{b}\Vert \,\Vert . \end{aligned}$$

Further details on the choice of \(\ell \) are described in [2].

We turn to the case when \(L\ne I\). This situation can be handled by several approaches; see, e.g., [3, 9, 22]. In the numerical examples of Sect. 4, we will apply the method described in [22]. Let \(L\in {\mathbb {R}}^{p\times n}\) and assume that \(\ell \) in (7) satisfies \(1\le \ell \le \min \{p,n\}\). Compute the QR factorization \(Q_\ell R_\ell =LV_\ell \), where \(Q_\ell \in {\mathbb {R}}^{n\times \ell }\) has orthonormal columns and \(R_\ell \in {{\mathbb {R}}}^{\ell \times \ell }\) is upper triangular. Then (11) becomes

$$\begin{aligned} \min _{\varvec{\scriptstyle y}\in {{\mathbb {R}}}^\ell }\{\Vert {\bar{C}}_\ell \varvec{y}-\varvec{e}_1\Vert \varvec{b}\Vert \,\Vert ^2+\mu \Vert R_\ell \varvec{y}\Vert ^2\}. \end{aligned}$$

Typically, the matrix \(R_\ell \) is nonsingular and not very ill-conditioned. Then the change of variables \(\varvec{z}=R_\ell \varvec{y}\) results in the minimization problem

$$\begin{aligned} \min _{\varvec{\scriptstyle z}\in {{\mathbb {R}}}^\ell }\{\Vert {\bar{C}}_\ell R_\ell ^{-1} \varvec{z}-\varvec{e}_1\Vert \varvec{b}\Vert \,\Vert ^2+\mu \Vert \varvec{z}\Vert ^2\}. \end{aligned}$$

It can be shown that if \(\mu \) is determined so that the solution \(\varvec{z}_{\mu ,\ell }\) satisfies \(\Vert {\bar{C}}_\ell R_\ell ^{-1} \varvec{z}-\varvec{e}_1\Vert \varvec{b}\Vert \,\Vert =\eta \epsilon \), then the associated approximate solution \(\varvec{x}_{\mu ,\ell }=V_\ell R_\ell ^{-1}\varvec{z}_{\mu ,\ell }\) satisfies the discrepancy principle (6); see [22] for details. Hence, it is quite cheap to determine an approximate solution of (3) that satisfies (6) also when \(L\ne I\).

3 Randomized solution methods

The reduced singular value decomposition (SVD) of the matrix \(A\in {\mathbb {R}}^{m\times n}\), with \(m\ge n\), is of the form

$$\begin{aligned} A=U\Sigma V^T, \end{aligned}$$
(13)

where the matrices \(U\in {{\mathbb {R}}}^{m\times n}\) and \(V\in {{\mathbb {R}}}^{n \times n}\) have orthonormal columns, and

$$\begin{aligned} \Sigma =\mathrm{diag}[\sigma _1,\sigma _2,\ldots ,\sigma _n]\in {{\mathbb {R}}}^{m\times n}, \qquad \sigma _1\ge \sigma _2\ge \cdots \ge \sigma _n\ge 0. \end{aligned}$$

The diagonal entries \(\sigma _j\) are known as the singular values of A. An analogous decomposition is available when \(m<n\); see, e.g., [16, Sect. 2.4] for details on the SVD of a matrix.

The computation of the SVD (13) requires \({\mathcal O}(\max \{m,n\}\min \{m,n\}^2)\) flops and, therefore, is expensive when m and n are large. Randomized SVD methods determine an approximation of the factorization (13) and are less expensive; see, e.g., Halko et al. [17]. Xiang and Zou [33] describe how randomized SVD methods can be applied to the solution of large-scale Tikhonov minimization problems (3) when \(L=I\). We will outline their approaches and compare the performance of these randomized methods to the Golub–Kahan bidiagonalization method of Sect. 2 in Sect. 4.

Xiang and Zou [34] also describe several randomized approaches to the solution of Tikhonov regularization problems in general form (3). Some of these methods are based on first transforming (3) to an equivalent problem with \(L=I\), similarly as outlined at the end of Sect. 2, while others apply a randomized generalized SVD. We will not discuss the latter methods in the present paper.

We first describe the method proposed by Xiang and Zou [33] for the approximate solution of (3) when \(m\ge n\) and \(L=I\). Let the entries of the matrix \(\Omega _\ell \in {{\mathbb {R}}}^{n\times \ell }\), where \(1\le \ell \ll n\), be identically and normally distributed random numbers with zero mean, and compute the QR factorization

$$\begin{aligned} Q_\ell R_\ell =A\Omega _\ell , \end{aligned}$$

where \(Q_\ell \in {{\mathbb {R}}}^{m\times \ell }\) has orthonormal columns and \(R_\ell \in {{\mathbb {R}}}^{\ell \times \ell }\) is upper triangular. The matrix \(R_\ell \) is assumed to be nonsingular in [33] and we will, for now, assume the same. Then the columns of \(Q_\ell \) form an orthonormal basis for \({\mathcal R}(A\Omega _\ell )\). Let \(B=Q^T_\ell A\in {{\mathbb {R}}}^{\ell \times n}\) and compute the reduced SVD,

$$\begin{aligned} B = {\widehat{W}} {\widehat{\Sigma }} {\widehat{V}}^T, \end{aligned}$$
(14)

where the matrices \({\widehat{W}}\in {\mathbb {R}}^{\ell \times \ell }\) and \({\widehat{V}}\in {\mathbb {R}}^{n\times \ell }\) have orthonormal columns, and \({\widehat{\Sigma }}\in {\mathbb {R}}^{\ell \times \ell }\) is a diagonal matrix with nonnegative diagonal entries arranged in decreasing order. The right-hand side of

$$\begin{aligned} Q_\ell B = Q_\ell Q_\ell ^T A = Q_\ell {\widehat{W}} {\widehat{\Sigma }} {\widehat{V}}^T \end{aligned}$$
(15)

is an approximation of the SVD of A (13). The decomposition (15) is much cheaper to compute than (13) when m and n are large and \(1\le \ell \ll n\le m\). The following proposition provides bounds for the closeness of A and \(Q_\ell Q_\ell ^T A\).

Proposition 1

Suppose that \(A\in {\mathbb {R}}^{m\times n}\) has the singular values \(\sigma _1\ge \sigma _2\ge \cdots \ge \sigma _{\min \{m,n\}}\ge 0\). Let \(\Omega _\ell \in {\mathbb {R}}^{n\times \ell }\) be a Gaussian matrix with \(\ell :=k+p\le \min \{m,n\}\) and \(p\ge 4\). Let the columns of \(Q_\ell \) form an orthonormal basis for \({\mathcal {R}}(A\Omega _\ell )\). Then

$$\begin{aligned} \sigma _{\ell +1}\le \Vert A-Q_\ell Q_\ell ^T A\Vert \le (1+6\sqrt{\ell p\log p})\sigma _{k+1}+ 3(\ell \Sigma _{j>k}\sigma _j^2)^{1/2} \end{aligned}$$
(16)

with probability not less than \(1-3p^{-p}\).

Proof

The left-hand side inequality is a consequence of the Eckart and Young theorem [8], and the right-hand side inequality is shown by Halko et al. [17, Corollary 10.9]. \(\square \)

We will comment below on the significance of the upper bound (16). In order for this bound to be small, we have to choose k large enough so that \(\sigma _{k+1}\) is small, and p large enough so that the right-hand side inequality (16) holds with high probability. Common choices of p are 5 or 10.

The computational cost for determining the matrices \(Q_\ell \), \({\widehat{W}}\), \({\widehat{\Sigma }}\), and \({\widehat{V}}\) in (15) is comprised of \({\mathcal O}(m\ell ^2)\) flops for determining \(Q_\ell \) from \(\Omega _\ell \), \(\ell \) matrix–vector product evaluations with A (to form \(A\Omega _\ell \)) and \(\ell \) matrix–vector product evaluations with \(A^T\) (to form \(B=Q_\ell ^TA\)), as well as \({\mathcal O}(n\ell ^2)\) flops for the computation of the SVD of B (14). Some of these computations can be implemented efficiently by using high-level BLAS; see, e.g., [12] for a discussion on implementation issues.

We will use the decomposition (15) to determine an approximate solution of (3). Replacing A by this decomposition in (3) gives

$$\begin{aligned} \min _{{\varvec{\scriptstyle x}}\in {\mathbb {R}}^n}\{\Vert Q_\ell Q_\ell ^T A{\varvec{x}}-Q_\ell Q_\ell ^T{\varvec{b}}\Vert ^2 +\mu \Vert L{\varvec{x}}\Vert ^2\}+\Vert (I-Q_\ell Q_\ell ^T)\varvec{b}\Vert ^2, \end{aligned}$$
(17)

which can be expressed as

$$\begin{aligned} \min _{{\varvec{\scriptstyle x}}\in {\mathbb {R}}^n}\{\Vert {\widehat{W}}{\widehat{\Sigma }}{\widehat{V}}^T{\varvec{x}}-Q_\ell ^T{\varvec{b}}\Vert ^2+ \mu \Vert L{\varvec{x}}\Vert ^2\}+\Vert (I-Q_\ell Q_\ell ^T)\varvec{b}\Vert ^2. \end{aligned}$$
(18)

Since we would like a solution \(\varvec{x}\) of minimal Euclidean norm, it is natural to require the solution to be of the form \(\varvec{x}={\widehat{V}}\varvec{y}\) for some \(\varvec{y}\in {\mathbb {R}}^\ell \). Substitution into (18) gives the minimization problem

$$\begin{aligned} \min _{{\varvec{\scriptstyle y}}\in {\mathbb {R}}^\ell } \{\Vert {\widehat{\Sigma }}\varvec{y}-{\widehat{W}}^TQ_\ell ^T\varvec{b}\Vert ^2+ \mu \Vert L{\widehat{V}}{\varvec{y}}\Vert ^2\}. \end{aligned}$$
(19)

This problem differs from (3) in several ways: i) A is replaced by \(Q_\ell Q_\ell ^TA\), ii) the term \(\Vert (I-Q_\ell Q_\ell ^T)\varvec{b}\Vert ^2\) in (18) is ignored, and iii) the space \({\mathbb {R}}^n\) is replaced by \({\mathcal R}({\widehat{V}})\). The differences between the minimization problems (3) and (19) are small if p and k, and therefore \(\ell =p+k\), in Proposition 1 are sufficiently large. In particular, the choice of k has to be large enough in relation to how quickly the singular values of A decay to zero with increasing index.

Let \(\varvec{x}\) denote the solution of (17) with \(\mu >0\) chosen so that \(\varvec{x}\) satisfies the discrepancy principle (6). The discrepancy principle suggests that \(\ell \) be chosen large enough so that

$$\begin{aligned} \Vert (A-Q_\ell Q_\ell ^T A)\varvec{x}\Vert \le \Vert A-Q_\ell Q_\ell ^T A\Vert \Vert \varvec{x}\Vert \le \eta '\epsilon , \end{aligned}$$

where the parameter \(\epsilon \) is the same as in (6), and \(0<\eta '\le \eta \) is a user-chosen parameter with \(\eta \) the same as in (6). An accurate upper bound for \(\Vert A-Q_\ell Q_\ell ^T A\Vert \) can be determined with high probability by evaluating \(\Vert (A-Q_\ell Q_\ell ^T A)\varvec{w}\Vert \) for sufficiently many random vectors with normally distributed entries with zero mean; see [17, Eq. (4.3) and Lemma 4.1]. The evaluation of such a bound increases the computational effort required by the randomized method. Moreover, we would like \(\ell \) to be large enough so that

$$\begin{aligned} \Vert (I-Q_\ell Q_\ell ^T)\varvec{b}\Vert \le \eta '\epsilon . \end{aligned}$$

We illustrate in Sect. 4 that for some problems the parameters p and k in (16) have to be chosen too large to make the randomized solution method of this section competitive with the Krylov subspace method of Sect. 2.

There is a small probability that in a particular application of the randomized method, columns of the matrix \(Q_\ell \) are singular vectors associated with “tiny” singular values of A. These singular vectors typically “oscillate” a lot, i.e., the vector entries as a function of their index number can be thought of as the discretization of a highly oscillatory function. The presence of such vectors in the solution subspace \({\mathcal {R}}(Q_\ell )\) typically would result in an a highly oscillatory, and therefore undesired, approximate solution of (3). This phenomenon may be considered an instability of the randomized method. However, we hasten to add that we have not observed this instability in any one of numerous computed examples that we have carried out. The occurrence of this instability, indeed, is rare.

When \(L=I\), we have \(\Vert L{\widehat{V}}{\varvec{y}}\Vert =\Vert \varvec{y}\Vert \). Otherwise, we compute the QR factorization \({\widehat{Q}}{\widehat{R}}=L{\widehat{V}}\), where \({\widehat{Q}}\in {\mathbb {R}}^{p\times \ell }\) has orthonormal columns and \({\widehat{R}}\in {\mathbb {R}}^{\ell \times \ell }\) is upper triangular. When the matrix \({\widehat{R}}\) is of full rank and fairly well-conditioned, we proceed similarly as described at the end of Sect. 2.

Now consider the case when, numerically, \(\mathrm{rank}(A\Omega _\ell )<\ell \). This situation may arises when the matrix A is of numerical rank less than \(\ell \). Then we compute the SVD

$$\begin{aligned} R_\ell =U_\ell \Sigma _\ell V_\ell ^T, \end{aligned}$$

where the matrices \(U_\ell =[\varvec{u}_{\ell ,1},\varvec{u}_{\ell ,2},\ldots ,\varvec{u}_{\ell ,\ell }]\in {\mathbb {R}}^{\ell \times \ell }\) and \(V_\ell \in {\mathbb {R}}^{\ell \times \ell }\) are orthogonal and \(\Sigma _\ell \in {\mathbb {R}}^{\ell \times \ell }= \mathrm{diag}[\sigma _{\ell ,1},\sigma _{\ell ,2},\ldots ,\sigma _{\ell ,\ell }]\) with \(\sigma _{\ell ,1}\ge \sigma _{\ell ,2}\ge \cdots \ge \sigma _{\ell ,\ell }\ge 0\). Let \(\sigma _{\ell ,j}\) be the smallest numerically nonvanishing diagonal entry. Then, numerically, the columns of \(Q_{\ell ,j}:=Q_\ell [\varvec{u}_{\ell ,1},\varvec{u}_{\ell ,2},\ldots ,\varvec{u}_{\ell ,j}]\in {\mathbb {R}}^{m\times j}\) form an orthonormal basis for \({\mathcal R}(A\Omega _\ell )\), and we replace the matrix \(Q_\ell \) in (15), (17), and (19) by \(Q_{\ell ,j}\).

We turn to the situation when \(m<n\). Following Xiang and Zou [33], let \(\Omega _\ell \in {{\mathbb {R}}}^{\ell \times m}\), with \(1\le \ell \ll m\), be a random matrix with the same kind of entries as above, and let the columns of \(Q_\ell \in {{\mathbb {R}}}^{n\times \ell }\) form an orthonormal basis for a linear space that contains \({\mathcal R}((\Omega _\ell A)^T)\). We compute \(Q_\ell \) by evaluating the QR factorization of \((\Omega _\ell A)^T\). Then we calculate the SVD of \(AQ_\ell \),

$$\begin{aligned} AQ_\ell = {\widehat{U}}{\widehat{\Sigma }}{\widehat{W}}^T, \end{aligned}$$

where the matrices \({\widehat{U}}\in {\mathbb {R}}^{n\times \ell }\) and \({\widehat{W}}\in {\mathbb {R}}^{\ell \times \ell }\) have orthonormal columns, and \({\widehat{\Sigma }}\in {\mathbb {R}}^{\ell \times \ell }\) is a diagonal matrix with nonnegative diagonal entries arranged in decreasing order. The expression

$$\begin{aligned} AQ_\ell Q_\ell ^T = {\widehat{U}}{\widehat{\Sigma }}{\widehat{W}}^TQ_\ell ^T \end{aligned}$$

is an approximation of the SVD of A.

We determine an approximate solution of (3) by solving

$$\begin{aligned} \min _{{\varvec{\scriptstyle x}}\in {\mathcal R}(Q_k)}\{\Vert A Q_\ell Q_\ell ^T {\varvec{x}}- {\widehat{U}}{\widehat{U}}^T{\varvec{b}}\Vert ^2 + \mu \Vert L{\varvec{x}}\Vert ^2\}, \end{aligned}$$
(20)

which, with \(\varvec{x}=Q_\ell {\widehat{W}}\varvec{y}\), can be written as

$$\begin{aligned} \min _{{\varvec{\scriptstyle y}}\in {{\mathbb {R}}}^\ell } \{\Vert {\widehat{\Sigma }}{\varvec{y}}-{\widehat{U}}^T{\varvec{b}}\Vert ^2 + \mu \Vert LQ_\ell {\widehat{W}}{\varvec{y}}\Vert ^2\}. \end{aligned}$$
(21)

Finally, let \({\widetilde{R}}\in {\mathbb {R}}^{\ell \times \ell }\) be the upper triangular matrix in a QR factorization of \(LQ_\ell {\widehat{W}}\). Since, generally, \({\widetilde{R}}\) is nonsingular and not very ill-conditioned, we may transform the Tikhonov minimization problem (21) to standard from by the change of variables \(\varvec{z}={\widetilde{R}}\varvec{y}\).

We conclude this section with a discussion on the application of the discrepancy principle, and first consider the situation when \(m\ge n\). Then we solve (17) by computing the solution of (19). Assume that the error \(\varvec{e}\) in \(\varvec{b}\) is normally distributed with zero mean and variance \(\epsilon ^2\). Then, since \(Q_\ell Q_\ell ^T\) is an orthogonal projector, the variance of \(Q_\ell Q_\ell ^T\varvec{e}\) is \(\frac{\ell }{m}\epsilon ^2\). Therefore, when determining the regularization parameter \(\mu \) for the problem (17), we replace \(\epsilon \) by \(\sqrt{\frac{\ell }{m}}\epsilon \) in (6).

When \(m<n\), we solve (20) by computing the solution of (21). Since \({\widehat{U}}{\widehat{U}}^T\) is an orthogonal projector, the variance of \({\widehat{U}}{\widehat{U}}^T\varvec{e}\) is \(\frac{\ell }{n}\epsilon ^2\). Therefore, when determining the regularization parameter \(\mu \) for the minimization problem (21), we replace \(\epsilon \) by \(\sqrt{\frac{\ell }{n}}\epsilon \) in (6).

The value of \(\ell =p+k\) affects both the quality of the computed solution and the computing time required. This value has to be large enough so that \(AQ_\ell Q_\ell ^T\) is a sufficiently accurate approximations of A. The computed examples of the following section illustrate this.

4 Computed examples

The examples reported in this section show the performance of the methods discussed in the previous sections. All computations were carried out on a Windows computer with an i7-8750H @2.2 GHz CPU and 16 GB of memory. The implementations were done in MATLAB R2018b.

The noise level is defined by

$$\begin{aligned} \delta = \frac{\Vert {\varvec{e}}\Vert }{\Vert {\varvec{b}}_\mathrm{exact}\Vert }. \end{aligned}$$

In all experiments, the regularization parameter was determined with the aid of the discrepancy principle and computed by Newton’s method as outlined above with initial iterate \(\mu _0=0\). More details on the application of Newton’s method can be found in [2]. This method also was used to determine the regularization parameter in the randomized SVD (RSVD) method in a similar way.

We consider several examples described in Regularization Tools by Hansen [20] and in IR Tools by Gazzola et al. [13]. Both Regularization Tools and IR Tools are program packages written in MATLAB. Problems in both one and two space-dimensions will be discussed. We compare the performance of the Krylov method of Sect. 2 and the randomized SVD method of Sect. 3. The computed examples show that rapid decay of the singular values to zero with increasing index number is essential for the success of the randomized method. We will refer to the Krylov subspace-based Tikhonov regularization method of Sect. 2 as “K-Tikhonov”, and to the Tikhonov regularization method based on the randomized SVD technique as “R-Tikhonov”. Throughout this section, \(\ell \) denotes the number of bidiagonalization steps carried out by the Golub–Kahan bidiagonalization method, as well as the number of columns of the random matrix \(\Omega _\ell \) when \(m\ge n\), or the number of rows of the random matrix when \(m<n\); see Sect. 3.

Before comparing the two methods, we would like to discuss the computational cost of the K-Tikhonov and R-Tikhonov methods. Let us assume that \(\ell \ll n\). Then for K-Tikhonov, the computational cost is dominated by the \(\ell \) matrix–vector product evaluations with A and the \(\ell \) matrix–vector product evaluations with \(A^T\). The R-Tikhonov method requires the evaluation of \(A\Omega _\ell \) and \(A^TQ_\ell \). The flop count for these evaluations in K-Tikhonov and R-Tikhonov is the same, and of order \({\mathcal O}(mn\ell )\), but the evaluations in R-Tikhonov can be implemented as matrix–matrix products, while this is not possible in K-Tikhonov, because in Golub–Kahan bidiagonalization the columns of the matrices \(V_\ell \) and \(U_{\ell +1}\) are determined sequentially one-by-one. As we will illustrate in the following, when A is stored as a matrix, the matrix–matrix product evaluations in R-Tikhonov are faster than the matrix–vector product evaluations in K-Tikhonov. However, when A is not explicitly stored and it therefore is not possible to evaluate the matrix–matrix products \(A\Omega _\ell \) and \(A^TQ_\ell \) efficiently, i.e., when we need to compute \(2\ell \) matrix–vector products (one for each column of \(\Omega _\ell \) and \(Q_\ell \)), the computing time required by K-Tikhonov and R-Tikhonov to evaluate the necessary matrix–vector products is almost identical.

Shaw. We first consider the Shaw test problem described in [31]. It is an integral equation of the first kind with a smooth kernel in one space-dimension. MATLAB code that gives a discretization of this integral equation is provided in [20]. This code gives the matrix \(A\in {\mathbb {R}}^{2048\times 2048}\) and a vector \(\varvec{x}_\mathrm{exact}\in {\mathbb {R}}^{2048}\), from which we compute \(\varvec{b}_\mathrm{exact}=A\varvec{x}_\mathrm{exact}\). We add a “noise vector” \(\varvec{e}\in {\mathbb {R}}^{2048}\) to \(\varvec{b}_\mathrm{exact}\) that models white Gaussian noise with noise level \(\delta =0.01\) to obtain the “available” noise-contaminated data vector \(\varvec{b}\); cf. (2). The moderate dimension of this problem allows us to compute the solution of the Tikhonov regularized problem also in the full space since it is possible to explicitly compute the SVD of the matrix A. We refer to the latter approach as “standard Tikhonov”. The regularization matrix L used in this example is the discrete Laplacian in one space-dimension.

We apply the K-Tikhonov and R-Tikhonov methods for different dimensions \(\ell \) of the solution subspace and compare the results obtained with those obtained with standard Tikhonov, i.e., the solution of (3). In particular, we are interested in comparing the quality of the computed approximations of \(\varvec{x}_\mathrm{exact}\) determined by the different methods, as well as in the timings. Let \(\varvec{x}\) denote an approximate solution computed by one of the methods considered. We define the relative reconstruction error

$$\begin{aligned} \mathrm{RRE}(\varvec{x})=\frac{\Vert \varvec{x}-\varvec{x}_\mathrm{exact}\Vert }{\Vert \varvec{x}_\mathrm{exact}\Vert }. \end{aligned}$$

Figure 1 reports RREs and timings for the K-Tikhonov and R-Tikhonov methods. The horizontal axes in the subfigures show the dimension, \(\ell \), of the solution subspaces for the K-Tikhonov and R-Tikhonov method. We can observe that the RRE for R-Tikhonov is slightly smaller than for K-Tikhonov for \(\ell >5\). Moreover, the CPU time required for the computation of the solution with R-Tikhonov is smaller than the CPU time needed for the computation of the solution with K-Tikhonov for solution subspaces of the same dimension. Finally, we note that the RRE obtained with K-Tikhonov rapidly converges to the RRE of the solution determined by standard Tikhonov, while the RRE for the solutions determined by R-Tikhonov typically is smaller. The difference in the quality of the computed solutions is made possibly by the facts that the computed solutions are determined by the discrepancy principle and live in different solution subspace.

Figure 2 displays the singular values \(\sigma _\ell {_+1}\) of A and the quantities \(\Vert (I-Q_\ell Q_\ell ^T)A\Vert \), \(\Vert (I-Q_\ell Q_\ell ^T)A\Vert \Vert \varvec{x}_\mathrm{exact}\Vert \), and \(\Vert (I-Q_\ell Q_\ell ^T)\varvec{b}\Vert \) of interest for Proposition 1 as functions of \(\ell \). We can observe in Fig. 2a that, since the singular values of A decay to zero extremely fast with increasing index \(\ell \), the matrix \(Q_\ell Q_\ell ^TA\) approximates A well; the approximation error is very close to the optimal one, i.e., to \(\sigma _\ell {_+1}\). Figure 2b shows that the discrepancy principle already can be satisfied in a subspace of fairly small dimension. We remark that, although the function \(\ell \rightarrow \Vert (I-Q_\ell Q_\ell ^T)\varvec{b}\Vert \) appears to be constant for \(\ell \) large enough, this function is decreasing very slowly as \(\ell \) increases and vanishes for \(\ell =n\).

Fig. 1
figure 1

Shaw test problem, comparison between K-Tikhonov and R-Tikhonov: a RRE for solutions determined by K-Tikhonov (red dashed curve), R-Tikhonov (blue solid curve), and standard Tikhonov (black dotted curve). The horizontal axes show the dimension, \(\ell \), of the solution subspaces for K-Tikhonov and R-Tikhonov. b CPU time in seconds for K-Tikhonov (red dashed curve) and R-Tikhonov (blue solid curve) for different values of \(\ell \) (color figure online)

Fig. 2
figure 2

Shaw test problem, bounds of Proposition 1: a Comparison of the approximation error \(\Vert (I-Q_\ell Q_\ell ^T)A\Vert \) (blue solid curve) and the optimal error, i.e., the singular values \(\sigma _{\ell _{+1}}\) of the matrix A, (black dotted curve) versus \(\ell \). b Comparison of \(\Vert (I-Q_\ell Q_\ell ^T)A\Vert \Vert {\mathbf {x}}_\mathrm{exact}\Vert \) (blue solid curve), \(\Vert (I-Q_\ell Q_\ell ^T){\mathbf {b}}\Vert \) (red dashed curve), and \(\eta \epsilon \) (black dotted line) versus \(\ell \) (color figure online)

Heat. We consider the heat test problem in [20]. It is described in [4]. This problem models inverse heat conduction in one space-dimension. We use MATLAB code supplied in [20]. This code requires a parameter \(\kappa \), which is set to the default value 1. Discretization gives a problem (1) with a matrix \(A\in {\mathbb {R}}^{2048\times 2048}\) and a vector \(\varvec{x}_\mathrm{exact}\), from which we compute the exact data vector \(\varvec{b}_\mathrm{exact}=A\varvec{x}_\mathrm{exact}\). We add an error vector \(\varvec{e}\) that models white Gaussian noise and corresponds to a noise level of \(\delta =0.02\) to \(\varvec{b}_\mathrm{exact}\) to obtain the error contaminated vector \(\varvec{b}\) in (1). Similarly as in the previous example, we let L be the discrete Laplacian in one space-dimension, and we display the RRE and CPU times for K-Tikhonov and R-Tikhonov.

Figure 3 shows the RRE and CPU times for several \(\ell \)-values. Similarly as in the Shaw example, the computing times required for the computation of the R-Tikhonov solutions are much smaller than the times required for K-Tikhonov for solution subspaces of the same dimension. For the present example, the RRE-values obtained with R-Tikhonov are slightly larger than those obtained with K-Tikhonov for solution subspaces of the same dimension, at least for \(\ell <23\). However, the RREs are of the same order of magnitude for both methods for solution subspaces of the same dimension. Finally, we observe that, differently from the previous example, the RRE obtained with the K-Tikhonov method is not the same, for \(\ell \) large, as for standard Tikhonov. This is due to the fact that the discrepancy principle does not determine a unique solution; the computed solution depends on the solution subspace used.

Like in the previous example, we report the singular values \(\sigma _\ell +1\) of A and the quantities \(\Vert (I-Q_\ell Q_\ell ^T)A\Vert \), \(\Vert (I-Q_\ell Q_\ell ^T)A\Vert \Vert \varvec{x}_\mathrm{exact}\Vert \), \(\Vert (I-Q_\ell Q_\ell ^T)\varvec{b}\Vert \) in Fig. 4. In the present example, the singular values of A decrease to zero slower than in the previous example. Although the matrix \(Q_\ell Q_\ell ^TA\) is still a good approximation of A, the approximation error is visibly larger than the optimal one, given by \(\sigma _\ell +1\). This is illustrated by Fig. 4b. We observe that in order to satisfy the discrepancy principle, the parameter \(\ell \) has to be larger than in the previous example.

Fig. 3
figure 3

Heat test problem, comparison between K-Tikhonov and R-Tikhonov: a RRE for solutions determined by K-Tikhonov (red dashed curve), R-Tikhonov (blue solid curve), and standard Tikhonov (black dotted curve). The horizontal axes show the dimension, \(\ell \), of the solution subspaces for K-Tikhonov and R-Tikhonov. b CPU time in seconds for K-Tikhonov (red dashed curve) and R-Tikhonov (blue solid curve) for different values of \(\ell \) (color figure online)

Fig. 4
figure 4

Heat test problem, bounds of Proposition 1: a Comparison of the approximation error \(\Vert (I-Q_\ell Q_\ell ^T)A\Vert \) (blue solid curve) and the optimal error, i.e., the singular values \(\sigma _{\ell {_+1}}\) of the matrix A (black dotted curve) as a function of \(\ell \). b Comparison of \(\Vert (I-Q_\ell Q_\ell ^T)A\Vert \Vert {\mathbf {x}}_\mathrm{exact}\Vert \) (blue solid curve), \(\Vert (I-Q_\ell Q_\ell ^T){\mathbf {b}}\Vert \) (red dashed curve), and \(\eta \epsilon \) (black dotted line) versus \(\ell \) (color figure online)

Phillips. Our last example in one space-dimension is the phillips test problem from [20], where MATLAB code is available. This code provides a discretization of a convolution. A background for this problem is given in [28]. We generate the matrix A and the noise-contaminated vector \(\varvec{b}\) in the same manner as in the previous examples. Thus, the noise is white Gaussian and corresponds to the noise level \(\delta =0.05\). The regularization matrix L is the discrete Laplacian in one space-dimension.

We report the results obtained with the K-Tikhonov and R-Tikhonov methods in Fig. 5. These results are similar to the ones obtained for the Shaw test problem. Thus, R-Tikhonov outperforms K-Tikhonov in terms of timings and RRE (for \(\ell \) sufficiently large). Similarly to the Shaw test problem, the solutions computed with K-Tikhonov give the same RRE as the solutions computed by standard Tikhonov already for small subspace dimensions \(\ell \), while the solution computed by R-Tikhonov provide a better approximation of \(\varvec{x}_\mathrm{exact}\).

As above we display the singular values \(\sigma _\ell {_+1}\) of A and the norms \(\Vert (I-Q_\ell Q_\ell ^T)A\Vert \), \(\Vert (I-Q_\ell Q_\ell ^T)A\Vert \Vert \varvec{x}_\mathrm{exact}\Vert \), and \(\Vert (I-Q_\ell Q_\ell ^T)\varvec{b}\Vert \) as functions of \(\ell \) in Fig. 6. This example behaves like the Heat example and we therefore can draw the same conclusions. Nevertheless, let us observe that the decay of the singular values, even though it is slower than in the Shaw example, is still fast enough to yield a fairly accurate approximation of A using the randomized method.

Fig. 5
figure 5

Phillips test problem, comparison between K-Tikhonov and R-Tikhonov: a RRE for solutions determined by K-Tikhonov (red dashed curve), R-Tikhonov (blue solid curve), and standard Tikhonov (black dotted curve). The horizontal axes show the dimension, \(\ell \), of the solution subspace for K-Tikhonov and R-Tikhonov. b CPU time for K-Tikhonov (red dashed curve) and R-Tikhonov (blue solid curve) for different values of \(\ell \) (color figure online)

Fig. 6
figure 6

Phillips test problem, bounds of Proposition 1: a Comparison of the approximation error \(\Vert (I-Q_\ell Q_\ell ^T)A\Vert \) (blue solid curve) and the optimal error, i.e., the singular values \(\sigma _{\ell +1}\) of the matrix A (black dotted curve), versus \(\ell \). b Comparison of \(\Vert (I-Q_\ell Q_\ell ^T)A\Vert \Vert {\mathbf {x}}_\mathrm{exact}\Vert \) (blue solid curve), \(\Vert (I-Q_\ell Q_\ell ^T){\mathbf {b}}\Vert \) (red dashed curve), and \(\eta \epsilon \) (black dotted line) versus \(\ell \) (color figure online)

The R-Tikhonov method performs well in all the above examples, even though the matrices A in these examples have different properties. The matrix in the Shaw and Phillips examples are symmetric, while in the Heat example, the matrix is very far from a symmetric matrix. The singular values of the matrix decay to zero quite quickly with increasing index in the Shaw and Heat test problems, while they do not for the problem Phillips.

The above examples are discretizations of problems in one space-dimension. We now turn to problems that are discretizations of ill-posed problems in two space-dimensions. The relative performance of the methods in our comparison will be seen to be different for this kind of problems.

Blur. We determine the matrix A with the MATLAB function blur(45,8,1) from [20]. This function call generates a symmetric block-Toeplitz-Toeplitz-block (BTTB) matrix \(A\in {\mathbb {R}}^{2025\times 2025}\), which models a Gaussian point spread function in two space-dimensions. The parameter value 8 is the half-bandwidth of the Toeplitz blocks. Thus, the matrix A is very sparse. It is stored in sparse matrix format. The regularization matrix L is the discrete Laplacian in two space-dimensions.

Let the entries of \(\varvec{x}_\mathrm{exact}\) be pixel values of an \(45\times 45\)-pixel image, with the pixels ordered column-wise. Then \(\varvec{b}_\mathrm{exact}=A\varvec{x}_\mathrm{exact}\) represents a blurred image associated with \(\varvec{x}_\mathrm{exact}\). Add a vector \(\varvec{e}\) that represents white Gaussian noise with noise level \(\delta =0.03\) to \(\varvec{b}_\mathrm{exact}\) to obtain the contaminated data vector \(\varvec{b}\); cf. (2).

We compare RREs and CPU times for different values of \(\ell \) similarly as in the previous examples. The results are reported in Fig. 7. The figure shows that the R-Tikhonov method does not perform well in terms of the RRE. In fact, the RRE is very large and does not decrease significantly as the dimension of the solution subspace, \(\ell \), increases. Moreover, since the matrix A is very sparse, the matrix–vector products required by the K-Tikhonov method are not computational demanding. Therefore, the computational cost of the two methods is about the same.

Fig. 7
figure 7

Blur test problem, comparison between K-Tikhonov and R-Tikhonov: a RRE for solutions determined by K-Tikhonov (red dashed curve), R-Tikhonov (blue solid curve), and standard Tikhonov (black dotted curve). The horizontal axes show the dimension, \(\ell \), of the solution subspaces for K-Tikhonov and R-Tikhonov. b CPU time in seconds for K-Tikhonov (red dashed curve) and R-Tikhonov (blue solid curve) for different values of \(\ell \) (color figure online)

The relatively poor performance of the R-Tikhonov method in this example is due to the fact that the singular values of the matrix A decrease fairly slowly to zero with increasing index and are not approximated well by the singular values of the reduced matrix \(AQ_{30}Q_{30}^T\) used to compute the R-Tikhonov solution. To see this in more detail, we plot in Fig. 8 the singular values of the matrices \(AQ_{30}Q_{30}^T\), \({\bar{C}}_{30}\), and A for all the considered examples. We can observe that in all examples in one space-dimension, the largest singular values of the matrix A are approximated well by the singular values of \(AQ_{30}Q_{30}^T\) and \({\bar{C}}_{30}\). However, the singular values of \(AQ_{30}Q_{30}^T\) for the present example are very poor approximations of the largest singular values of A. On the other hand, the singular values of \({\bar{C}}_{30}\) match very well the largest singular values of A. This suggests that the R-Tikhonov method may not be effective for the solution of linear discrete ill-posed problems with a matrix A whose singular values decay fairly slowly with their index number. To illustrate this, we choose the solution subspace for the R-Tikhonov method to be \(\ell =1000\), and compare the error in the computed solution with the errors in the solution determined by the K-Tikhonov method with \(\ell =30\) and by standard Tikhonov. This comparison is reported in Table 1. We can see that even when \(\ell =1000\), the R-Tikhonov method provides less accurate results than K-Tikhonov with \(\ell =30\), and requires much more execution time (about 45 times as much).

Fig. 8
figure 8

The largest singular values of the matrices \(AQ_{30}Q_{30}^T\) (blue solid curve), \({\bar{C}}_{30}\) (red dashed curve), and A (black dotted curve) as a function of the index for all the considered examples: a Shaw, b Heat, c Phillips, d Blur (color figure online)

Table 1 Blur test problem: RRE and CPU time in seconds for standard Tikhonov, K-Tikhonov, and R-Tikhonov for selected values of k and \(\ell \)

These observations are corroborated by Fig. 9, which shows the singular values \(\sigma _\ell \) of A and compares them to \(\Vert (I-Q_\ell Q_\ell ^T)A\Vert \) as functions of \(\ell \). The figure also displays the norms \(\Vert (I-Q_\ell Q_\ell ^T)A\Vert \Vert \varvec{x}_\mathrm{exact}\Vert \) and \(\Vert (I-Q_\ell Q_\ell ^T)\varvec{b}\Vert \). We can observe in Fig. 9a that since the singular values of A do not decay fast enough to zero with increasing index \(\ell \), the approximation error \(\Vert (I-Q_\ell Q_\ell ^T)A\Vert \) is extremely large even for \(\ell =100\). Moreover, we can see that the discrepancy principle cannot be satisfied for \(\ell \le 100\). By visual inspection of Fig. 9b, we can deduce that a very large value of \(\ell \) may be required to satisfy the discrepancy principle.

Fig. 9
figure 9

Blur test problem, bounds of Proposition 1: a Comparison of the approximation error \(\Vert (I-Q_\ell Q_\ell ^T)A\Vert \) (blue solid curve) and the optimal error, i.e., the singular values \(\sigma _{\ell }\) of the matrix A (black dotted curve), versus \(\ell \). b Comparison of \(\Vert (I-Q_\ell Q_\ell ^T)A\Vert \Vert {\mathbf {x}}_\mathrm{exact}\Vert \) (blue solid curve), \(\Vert (I-Q_\ell Q_\ell ^T){\mathbf {b}}\Vert \) (red dashed curve), and \(\eta \epsilon \) (black dotted line) against \(\ell \) (color figure online)

Hubble. We turn to a deblurring problem from [13]. Specifically, we consider the deblurring problem obtained when the available image is blurred by a medium speckle PSF and, in addition, is contaminated by \(5\%\) white Gaussian noise. The size of the image is \(512 \times 512\) pixels; see Fig. 10. We impose periodic boundary conditions. Then the blurring matrix \(A\in {\mathbb {R}}^{n\times n}\), with \(n=512^2\), is block circulant with circulant blocks (BCCB). Thus, A can be diagonalized by the bidimensional Fourier matrix. We can compute the eigenvalues of A in \({\mathcal O}(n\log n)\) flops with the aid of the fast Fourier transform (FFT) algorithm; see, e.g., [21] for a discussion on image deblurring and boundary conditions. By choosing L as the discretization of the bidimensional Laplacian with periodic boundary conditions, we can solve (3) in \({\mathcal O}(n\log n)\) flops for each value of the regularization parameter \(\mu \). This allows us to compute the solution by standard Tikhonov regularization and to compare results obtained in this manner with those determined by the K-Tikhonov and R-Tikhonov methods.

Similarly as above, we apply both the R-Tikhonov and K-Tikhonov methods for different values of \(\ell \), and compare results in terms of CPU time and accuracy. The matrix A is not explicitly formed; instead we evaluate matrix–vector products with A and \(A^T\) by using the FFT algorithm. Hence, in R-Tikhonov the matrices \(A\Omega _\ell \) and \(A^TQ_\ell \) are computed by evaluating \(\ell \) matrix–vector products with A and \(\ell \) matrix–vector products with \(A^T\). We therefore expect the computing time for R-Tikhonov and K-Tikhonov to be about the same. This is confirmed by the graphs of Fig. 11b. On the other hand, we can see from Fig. 11a and d that the R-Tikhonov method fails to accurately determine the largest singular values of A, and that the restored image determined by R-Tikhonov is of very poor quality; see Fig. 12b. This is due to the fact that, as we can see in Fig. 11c, the singular values of A do not decrease very fast to zero.

Figure 12 displays the reconstructed images obtained with standard Tikhonov, R-Tikhonov, and K-Tikhonov. Visual inspection shows that K-Tikhonov is able to provide a reconstruction of similar quality as standard Tikhonov, while R-Tikhonov fails to determine an accurate approximation of \({\mathbf {x}}_\mathrm{exact}\).

Fig. 10
figure 10

Hubble test case: a true image (\(512\times 512\) pixels), b PSF (\(512\times 512\) pixels), c blurred and noisy image with \(5\%\) of white Gaussian noise (\(512\times 512\) pixels)

Fig. 11
figure 11

Hubble test problem: Comparison between K-Tikhonov and R-Tikhonov: a RRE for solutions determined by K-Tikhonov (red dashed curve), R-Tikhonov (blue solid curve), and standard Tikhonov (black dotted curve). The horizontal axis shows \(\ell \) for K-Tikhonov and k for R-Tikhonov, b CPU times for K-Tikhonov (red dashed curve) and R-Tikhonov (blue solid curve) for different values of \(\ell \) and k, c singular values of the blurring matrix A, d the largest singular values of the matrices \(AQ_{50}Q_{50}^T\) (blue solid curve), \({\bar{C}}_{50}\) (red dashed curve), and A (black dotted curve) as a function of their index (color figure online)

Fig. 12
figure 12

Hubble test case reconstructions: a Tikhonov, b R-Tikhonov (\(k=50\)), c K-Tikhonov (\(\ell =50\))

5 Conclusion

The application of randomized algorithms to the solution of large-scale problems has received considerable attention. This paper compares their performance with a Krylov subspace method when applied to the solution of linear discrete ill-posed problems by Tikhonov regularization. The singular values of linear discrete ill-posed problems “cluster” at the origin, however, their rate of decay towards zero with increasing index is problem dependent. The randomized method is found to be competitive for the solution of linear discrete ill-posed problems in one space-dimension, for which the singular values decay to zero fast enough with increasing index. However, when the singular values do not decrease quickly enough, the Krylov method considered outperforms the randomized method. This depends on that Krylov methods determine more appropriate solution subspaces of low dimensions for linear discrete ill-posed problems than the randomized method when the singular values do not decay to zero sufficiently rapidly.

We only consider one Krylov subspace method, Golub–Kahan bidiagonalization, in this paper. However, our conclusions carry over to other Krylov subspace solution methods, such as the Arnoldi method, as well, at least when the matrix A is not too far from symmetric. When A is far from symmetric, solution methods for discrete ill-posed problems based on the Arnoldi process are known not to provide satisfactory results; see, e.g., [6, 14] an illustration.