Abstract
Low-rank approximation by QR decomposition with pivoting (pivoted QR) is known to be less accurate than singular value decomposition (SVD); however, the calculation amount is smaller than that of SVD. The least upper bound of the ratio of the truncation error, defined by \(\Vert A-BC\Vert _2\), using pivoted QR to that using SVD is proved to be \(\sqrt{\frac{4^k-1}{3}(n-k)+1}\) for \(A\in {\mathbb {R}}^{m\times n}\) \((m\ge n)\), approximated as a product of \(B\in {\mathbb {R}}^{m\times k}\) and \(C\in {\mathbb {R}}^{k\times n}\) in this study.
Similar content being viewed by others
1 Introduction
1.1 Low-rank approximation
Low-rank matrix approximation involves approximating a matrix by a matrix whose rank is less than that of the original matrix. Let \(A\in {\mathbb {R}}^{m\times n}\); then, a rank k approximation of A is given by
where \(B\in {\mathbb {R}}^{m\times k}\) and \(C\in {\mathbb {R}}^{k\times n}\). Low-rank matrix approximation appears in many applications such as data mining [5] and machine learning [14]. It also plays an important role in tensor decompositions [12].
This paper discusses truncation errors of low-rank matrix approximation using QR decomposition with pivoting, or pivoted QR. In this study, rounding errors are not considered, and the norm used is basically 2-norm. \(A\in {\mathbb {R}}^{m\times n}\) (without loss of generality, we assume that \(m\ge n\)) is approximated by a product of \(B\in {\mathbb {R}}^{m\times k}\) and \(C\in {\mathbb {R}}^{k\times n}\), and the truncation error is defined by \(\Vert A-BC\Vert _2\).
It is well-known that for any matrix \(A\in {\mathbb {R}}^{m\times n}\) (\(m\ge n\)), there are orthogonal matrices \(U \in {\mathbb {R}}^{m\times m}\) and \(V\in {\mathbb {R}}^{n\times n}\) and a diagonal matrix \(\varSigma \in {\mathbb {R}}^{n\times n}\) with nonnegative diagonal elements that satisfy
This is a singular value decomposition (SVD) of A. We define \(\sigma _i(A)\) for \(i = 1\), 2, ..., n, satisfying
and assume that \(\sigma _1(A) \ge \sigma _2(A) \ge \dots \ge \sigma _n(A) \ge 0\) without loss of generality. The \(\sigma _i\) values are singular values of A. A has rank k if and only if \(\sigma _k(A) > 0 = \sigma _{k+1}(A)\). Let
Then,
holds [8]. Therefore, this is an A’s rank-k approximation whose 2-norm of truncation error is the smallest. We define the truncation error of low-rank approximation by SVD as
The amount of computation required to calculate SVD is \(O(nm \min (n,m))\).
Pivoted QR was proposed by Golub in 1965 [7]. Because the amount of computation required to calculate the low-rank approximation by pivoted QR is O(nmk), it is cheaper than SVD and hence useful in many applications such as solving rank-deficient least squares problems [2]. It consists of QR decomposition and pivoting. For any matrix A, there exist \(Q\in {\mathbb {R}}^{m\times n}\) and an upper triangular matrix \(R\in {\mathbb {R}}^{n\times n}\) that satisfy \(A=QR\) and \(Q^TQ=I_n\). This is a QR decomposition of A. We use pivoting to determine the permutation matrix \(\varPi _{grd}\) and apply the QR decomposition algorithm to \(A\varPi _{grd}\). The subscript grd signifies the greedy method, as explained previously. Hereafter, we redefine QR as a QR decomposition of \(A\varPi _{grd}=QR\). Let Q and R be partitioned as
where \(Q_{1k}\in {\mathbb {R}}^{m\times k}\) and \(R_{1k}\in {\mathbb {R}}^{k\times k}\). Then, we can approximate A to \(Q_{1k}\begin{pmatrix} R_{1k}&R_{2k} \end{pmatrix}\varPi _{grd}^T\) and
holds. We define the truncation error of low-rank approximation by pivoted QR as
In this study, the greedy method is used to make \(\Vert R_{3k}\Vert _2\) small in pivoting. Pivoting is performed such that the elements in \(R = (r_{ij})\) satisfy the following inequalities [1, p.103]
Condition (1) is not used to analyze the error for \(l=k+1\), \(k+2\), ..., \(n-1\).
The greedy method of pivoting is not always optimal. QR decompositions of \(A\varPi _{RR}\), where \(\varPi _{RR}\) is chosen such that \(R_{RR}\) has a small lower right block and where \(Q_{RR}R_{RR}\) is a QR decomposition of \(A\varPi _{RR}\), are called rank-revealing QR (RRQR). The following theorem was shown by Hong et al. in 1992 [9].
Theorem 1
Let \(m\ge n > k\), and \(A\in {\mathbb {R}}^{m\times n}\). Then, there exists a permutation matrix \(\varPi \in {\mathbb {R}}^{n\times n}\) such that the diagonal blocks of \(R = \begin{pmatrix}R_1 &{}\quad R_2\\ O &{}\quad R_3\end{pmatrix}\), the upper triangular factor of the QR decomposition of \(A\varPi\) with \(R_1\in {\mathbb {R}}^{k\times k}\), satisfy the following inequality:
Finding the optimal permutation matrix is not practical from the viewpoint of computational complexity.
1.2 Truncation error of pivoted QR
Pivoted QR sometimes results in a large truncation error. A well-known example was shown by Kahan, whose work we do not reproduce here [10]. In 1968, Faddeev et al. [6] showed that
Furthermore,
holds [3].
However, in a survey in 2017, it was stated that “very little is known in theory about its behaviour” [13, p. 2218] with regard to pivoted QR, thus there is still room for further research on pivoted QR.
Our previous work showed that the least upper bound of the ratio of the truncation error of pivoted QR to that of SVD is \(\sqrt{\frac{4^{n-1}+2}{3}}\) in case an \(m\times n\) (\(m\ge n\)) matrix is approximated to a matrix whose rank is \(n-1\), i.e., for \(k = n - 1\) [11]. The tight upper bound for all k is proved in the rest of this paper.
We assume that all matrices and vectors in this paper are real numbers; however, we can easily extend the discussion in this paper to complex numbers, and the same results can be obtained.
2 Preliminaries
In this section, we define the notations and examine the basic properties to analyze the truncation errors. First, we introduce the concept \({\mathrm{resi}}\).
Proposition 1
[1, p. 16] For \(A\in {\mathbb {R}}^{m\times n}\), there exists \(X\in {\mathbb {R}}^{n\times m}\) that satisfies
and X is uniquely determined by the four conditions.
Definition 1
For \(A\in {\mathbb {R}}^{m\times n}\) (\(m\ge n\)), the generalized inverse of A is defined by \(X\in {\mathbb {R}}^{n\times m}\) that satisfies the four conditions in Proposition 1 and is denoted by \(A^{\dagger }\).
The following notation is closely related to the truncation error of pivoted QR.
Definition 2
Let \(A\in {\mathbb {R}}^{m\times n}\) (\(m\ge n\)) and \(B\in {\mathbb {R}}^{m \times l}\). We define \({\mathrm{resi}}(A,B)\) as
We denote the inner product of two vectors \(\varvec{x}\) and \(\varvec{y}\) as \((\varvec{x},\varvec{y})\).
Example 1
For \(\varvec{x}\in {\mathbb {R}}^{n}\) and \(\varvec{y}\in {\mathbb {R}}^{n}\), if \(\varvec{x} \ne \varvec{0}\), then the following holds:
The following lemma will be used to identify \({\mathrm{resi}}\).
Lemma 1
Let \(A\in {\mathbb {R}}^{m\times n}\) \((m\ge n)\) and \(B\in {\mathbb {R}}^{m\times l}\). For \(X\in {\mathbb {R}}^{n\times l}\),
holds.
Proof
If \({\mathrm{resi}}(A,B) = B-AX\) holds, then
holds. If \(A^TAX-A^TB = O\) holds, then
holds. \(\square\)
Lemma 2
[1, p. 5] Let \(A\in {\mathbb {R}}^{m\times n}\) \((m\ge n)\), \(\varvec{b}\in {\mathbb {R}}^{m}\), and \(\varvec{x}\in {\mathbb {R}}^{n}\). \(\Vert \varvec{b}-A\varvec{x}\Vert \le \Vert \varvec{b}-A\varvec{y}\Vert\) holds for any \(\varvec{y}\in {\mathbb {R}}^n\) if and only if \(A^T(A\varvec{x}-\varvec{b})=\varvec{0}\) holds.
Using Lemmas 1 and 2, we can obtain the following lemma.
Lemma 3
Let \(A\in {\mathbb {R}}^{m\times n}\) \((m\ge n)\), \(\varvec{b}\in {\mathbb {R}}^{m}\), and \(\varvec{x}\in {\mathbb {R}}^{n}\). \(\Vert \varvec{b}-A\varvec{x}\Vert \le \Vert \varvec{b}-A\varvec{y}\Vert\) holds for any \(\varvec{y}\in {\mathbb {R}}^n\) if and only if \({\mathrm{resi}}(A,\varvec{b}) = \varvec{b} - A\varvec{x}\) holds.
Lemma 4
Let \(m\ge n > k\), \(A\in {\mathbb {R}}^{m\times n}\), and \(B\in {\mathbb {R}}^{m\times l}\). Let A be partitioned as
where \(A_{1k}\in {\mathbb {R}}^{m\times k}\). Then,
holds.
Proof
From the definition of \({\mathrm{resi}}\), we can see that
and
hold where \(X = A_{1k}^{\dagger }A_{2k}\), \(Y = A_{1k}^{\dagger }B\) and \(Z = {\mathrm{resi}}(A_{1k},A_{2k})^{\dagger }{\mathrm{resi}}(A_{1k},B)\). Thus,
holds from (2), (3), and (4). Lemma 1 proves
from (2),
from (3), and
from (4). We can see that
from (5), (6), and (7). We can see that
from (2), (4), (8), and (9). Then, (9) and (10) can be combined as
Next, (5) can be rewritten as
From this and (11), we have
Application of Lemma 1 to this proves the lemma. \(\square\)
QR decomposition and \({\mathrm{resi}}\) have the following relation. Note that QR in this lemma is without pivoting.
Lemma 5
Let \(m\ge n > l\), \(A\in {\mathbb {R}}^{m\times n}\), and \(A=QR\) be a QR decomposition partitioned as
where \(A_{1l}\in {\mathbb {R}}^{m\times l},Q_{1l}\in {\mathbb {R}}^{m\times l},R_{1l}\in {\mathbb {R}}^{l\times l}\). If \({\mathrm{rank}}(A_{1l}) = l\) holds, then
holds.
Proof
We have
Let
Then, we have
Furthermore,
holds. Application of Lemma 1 to this proves the lemma. \(\square\)
Then, we return to pivoted QR. Let
where \(A_{1k}\in {\mathbb {R}}^{m\times k}\). From Lemma 5, we can see that
for \(l=1\), 2, ..., k and \(j=l+1\), \(l+2\), ..., n and
if \({\mathrm{rank}}(A_{1k}) = k\) holds. The last equation suggests that, as long as \({\mathrm{rank}}(A_{1k}) = k\) holds, the value of \(pivotQR_k(A)\) is determined only from \(A_{1k}\) and \(A_{2k}\), or equivalently from \(\varPi _{grd}\), and is independent of how (or in what algorithm) the QR decomposition is computed.
3 Evaluation from above
We bound \(\frac{pivotQR_k(A)}{SVD_k(A)}\) from above in this section. Since \(pivotQR_k(A) = SVD_k(A) = 0\) holds if \({\mathrm{rank}}(A) \le k\) holds, we only consider the case \({\mathrm{rank}}(A) > k\). Let \(A=U\varSigma V^T\) be one SVD. Since \(A\varPi _{grd} = U\varSigma (\varPi _{grd}^TV)^T\) and \((\varPi _{grd}^TV)^T(\varPi _{grd}^TV)=I_n\) hold, \(U\varSigma (\varPi _{grd}^TV)^T\) is one SVD of \(A\varPi _{grd}\). Then, we can see that
Hereafter, we change what A represents. The previous \(A\varPi _{grd}\) is replaced by A. Let \(A\in {\mathbb {R}}^{m\times n}\) that satisfies
be partitioned as
where \(A_{1k} \in {\mathbb {R}}^{m\times k}\) and \({{\mathrm{rank}}}(A_{1k}) = k\). We should compare \(\sigma _{k+1}(A) = SVD_k(A)\) and \(\Vert {\mathrm{resi}}(A_{1k},A_{2k})\Vert _2=pivotQR_k(A)\).
Lemma 6
Let \(m\ge n\), \(A\in {\mathbb {R}}^{m\times n}\), and \(B\in {\mathbb {R}}^{m\times l}\). For any \(\varvec{v}\in {\mathbb {R}}^{l}\),
holds.
Proof
From the definition of \({\mathrm{resi}}\),
holds. Thus,
holds. \(\square\)
We can see that
from the definition of 2-norm and Lemma 6. Now, we introduce an essential theorem of this paper.
Theorem 2
Let \(m\ge n > 1\), \(A\in {\mathbb {R}}^{m\times n}\), \({\mathrm{rank}}(A)=n\), and A be partitioned as
We define \(\hat{A_i}\) as
for \(i=1\), 2, ..., n, and \(\varvec{d}_i\) as
for \(i=1\), 2, ..., n. Then, \(\varvec{d}_i \ne \varvec{0}\) for \(i=1\), 2, ..., n and
hold.
Proof
Since \({\mathrm{rank}}(A) = n\), \(\{\varvec{a}_1,\varvec{a}_2,\dots ,\varvec{a}_n\}\) is linearly independent. Because \(\varvec{d}_i\) is a linear combination of \(\{\varvec{a}_1,\varvec{a}_2,\dots ,\varvec{a}_n\}\) with the coefficient of \(\varvec{a}_i\) being 1, \(\varvec{d}_i \ne \varvec{0}\) holds for \(i=1\), 2, ..., n. From the definition of \({\mathrm{resi}}\),
holds, where \(\varvec{x}_1 = \hat{A_1}^{\dagger }\varvec{a}_1\). Let \(\varvec{x}_1 = \begin{pmatrix} x_{12}&x_{13}&\dots&x_{1n} \end{pmatrix}^T\). Let i be one of 2, 3, ..., n. We can see that
holds if \(x_{1i} \ne 0\) from Lemma 3. Thus,
holds. This (13) also holds if \(x_{1i} = 0\). We define \(\varvec{y}\in {\mathbb {R}}^{m}\) as
Since \(\{\varvec{a}_1,\varvec{a}_2,\dots ,\varvec{a}_{n-1}\}\) is linearly independent, \(\varvec{y}\ne \varvec{0}\) holds. As Lemma 1 gives \(\hat{A_1}^T\varvec{d}_1=\varvec{0}\), we have \((\varvec{a}_n,\varvec{d}_1) = 0\). Thus,
holds. We can see that
holds from Lemma 3 because \(\varvec{y}\) is a linear combination of \(\varvec{a}_i\) \((i=1, 2, \dots , n-1)\). Since
and \(\Vert \varvec{d}_n\Vert > 0\) hold,
holds. Furthermore, since
holds from (13),
holds, and the theorem has been proved. \(\square\)
We refer to an essential theorem by Hong et al.
Theorem 3
[9, p. 218] Let \(m\ge n > l\), \(A\in {\mathbb {R}}^{m\times n}\) and \(A = QR = U\varSigma V^T\) be a QR decomposition and an SVD, respectively. Let R and V be partitioned as
where \(R_{1l}\in {\mathbb {R}}^{l\times l}\) and \(V_{1l}\in {\mathbb {R}}^{l\times l}\).
holds.
In the present study, this theorem is only used for \(l = n-1\). The following lemma provides an inequality between \({\mathrm{resi}}\) and the singular value.
Lemma 7
Under the same assumptions as Theorem 2,
holds.
Proof
Let \(A=U\varSigma V^T\) be an SVD partitioned as
where \(V_1\in {\mathbb {R}}^{n\times (n-1)}\). Let \(\varvec{e}_i\) be the ith column of \(I_n\) for \(i=1\), 2, ..., n. Define a permutation matrix \(\varPi _i\) as
for \(i=1\), 2, ..., n. Since
and \((\varPi _i^TV)^T(\varPi _i^TV) = I_n\), \(U\varSigma (\varPi _i^TV)^T\) is one SVD of \(\begin{pmatrix} \hat{A_i}&\varvec{a}_i \end{pmatrix}\). Let \(A\varPi _i = Q_iR_i\) be a QR decomposition partitioned as
where \(Q_{i1}\in {\mathbb {R}}^{m\times (n-1)},R_{i1}\in {\mathbb {R}}^{(n-1)\times (n-1)}\). Using Theorem 3,
holds. We can see that
holds from Lemma 5. Thus,
holds. Then,
holds. \(\square\)
Proposition 2
Let \(m\ge n > k\) and \(A \in {\mathbb {R}}^{m\times n}\) satisfy (12) with being partitioned as
where \(A_{1k} \in {\mathbb {R}}^{m\times k}\). Let A satisfy \({\mathrm{rank}}(A_{1k})=k\). Then, for all \(\varvec{z}\in {\mathbb {R}}^{n-k}\) with \(\Vert \varvec{z}\Vert =1\),
holds.
Proof
From (12) and Lemma 6, the following holds for \(i=1\), 2, ..., k:
Define \(A'\) as
If \({\mathrm{rank}}(A') \ne k+1\), then \(\{ \varvec{a}_1, \varvec{a}_2, \dots , \varvec{a}_k, A_{2k}\varvec{z}\}\) is linearly dependent. Since
\({\mathrm{rank}}(A_{1k}) = k\), \(\{\varvec{a}_1 , \varvec{a}_2 , \dots , \varvec{a}_k\}\) is linearly independent, and \(A_{2k}\varvec{z}\) can be expressed as a linear combination of \(\{\varvec{a}_1, \varvec{a}_2, \dots , \varvec{a}_k\}\). Then, we have
\({\mathrm{resi}}(A_{1k},A_{2k}\varvec{z}) = \varvec{0}\) from Lemma 3, and the conclusion holds. Therefore, we only consider the case \({\mathrm{rank}}(A') = k+1\) in the remainder of this proof. We define \(\varvec{d}'_i\) as
From Lemma 4, we can see that
holds for \(i=1\), 2, ..., k and \(j=i\), \(i+1\), ..., k, where
\(A_{ijk}' = \begin{pmatrix} \varvec{a}_{i}&\dots&\varvec{a}_{j-1}&\varvec{a}_{j+1}&\dots&\varvec{a}_k&A_{2k}\varvec{z} \end{pmatrix}\), and
holds for \(i=1\), 2, ..., k. Using Theorem 2 on \({\mathrm{resi}}\left( \begin{pmatrix} \varvec{a}_1&\varvec{a}_2&\dots&\varvec{a}_{i-1} \end{pmatrix},\begin{pmatrix} \varvec{a}_{i}&\varvec{a}_{i+1}&\dots&\varvec{a}_k&A_{2k}\varvec{z} \end{pmatrix}\right)\), we can see that
holds. Thus,
holds for \(i = 1\), 2, ..., k from (12) and (14). Thus,
holds. We want to show that
and prove this using induction in the order of \(i=k\), \(k-1\), ..., 1. Applying (15) for \(i=k\) gives
Thus, (16) is shown in case \(i=k\). Then, we prove that (16) holds for \(i=l\), assuming that (16) holds for \(i=l+1\), \(l+2\), ..., k. We can see that
holds from (15) and the assumption of induction. Thus, (16) has been shown in case \(i=1\), 2, ..., k. Using Lemma 7 on \(A'\),
holds. Thus,
holds. Now, if we can show that
then the proof is complete. Considering the fact that
we want a subspace \(\varTheta\) that satisfies
Let
Then, we have \({\mathrm{dim}}(\varTheta ') = k+1\) since \(\left\{ \varvec{e}_1 , \varvec{e}_2 , \dots , \varvec{e}_k , \begin{pmatrix} \varvec{0}\\ \varvec{z} \end{pmatrix} \right\}\) is linearly independent. Let \(\varvec{y} = (y_i)\in {\mathbb {R}}^{k+1}\). Since \(\begin{pmatrix} \varvec{e}_1 &{} \varvec{e}_2 &{} \dots &{} \varvec{e}_k &{} \begin{pmatrix} \varvec{0}\\ \varvec{z} \end{pmatrix} \end{pmatrix}^T\begin{pmatrix} \varvec{e}_1 &{} \varvec{e}_2 &{} \dots &{} \varvec{e}_k &{} \begin{pmatrix} \varvec{0}\\ \varvec{z} \end{pmatrix} \end{pmatrix}=I_{k+1}\) holds,
holds. For all \(\varvec{y}\in {\mathbb {R}}^{k+1}\) that satisfies the right-hand side of (17),
holds. Then,
holds. \(\square\)
Thus, we have proved that
4 Evaluation from below
In this section, we show that the inequality proved in the previous section is tight. An example of matrix \(R_h\) with real-valued parameter h that satisfies
is shown. \(R_h\) is as follows:
The Kahan matrix is [10]
Therefore, \(R_h\) is the same as the Kahan matrix in case \(m = n = k+1\) and is an extension of the Kahan matrix otherwise.
Proposition 3
Let \(m\ge n > k\). Define \(\varSigma _h \in {\mathbb {R}}^{m\times n}\), \((w_{hij}) = W_h \in {\mathbb {R}}^{n\times n}\), and \(R_h\in {\mathbb {R}}^{m\times n}\) as follows:
and \(R_h=\varSigma _h W_h\) where \(0< h < 1\). Then,
holds.
Proof
Let \(Q = \begin{pmatrix} I_n\\ O \end{pmatrix} \in {\mathbb {R}}^{m \times n}\) and \(R = {\mathrm{diag}}(1,h,\dots ,h^{k},0,0,\dots ,0)W_h \in {\mathbb {R}}^{n \times n}\). Since R is an upper triangular matrix and \(Q^TQ=I_n\) holds, \(R_h = QR\) is one QR decomposition. We check (1) for this R. Since
(1) holds for \(l=1\), 2, ..., \(k+1\), \(j=l+1\), \(l+2\), ..., n. Obviously (1) also holds for \(l=k+2\), \(k+3\), ..., \(n-1\), \(j=l+1\), \(l+2\), ..., n. As in Sect. 2, let R be partitioned as
where \(R_{1k} \in {\mathbb {R}}^{k\times k}\). Then,
holds. Define \(V\in {\mathbb {R}}^{(n-k)\times (n-k)}\) and \(\varvec{v}_1\in {\mathbb {R}}^{n-k}\) as follows:
where \(\varvec{v}_2\), \(\varvec{v}_3\), ..., \(\varvec{v}_{n-k}\) are chosen such that \(V^TV = I_{n-k}\) holds. We can choose them freely as long as this is satisfied. Since
holds, \(\Vert R_{3k}\Vert _2 = h^{k}\sqrt{n-k}\) holds. We consider the value of \(SVD_k(R_h) = \sigma _{k+1}(R_h)\). Considering the fact that
we want a subspace \(\varTheta\) whose \(\max _{\varvec{x}\in \varTheta ,\Vert \varvec{x}\Vert =1}\Vert R_h\varvec{x}\Vert\) is small. Since \(\varvec{v}_1^T\varvec{v}_i=0\) holds for \(i=2\), 3, ..., \(n-k\),
holds for \(i=2\), 3, ..., \(n-k\). We define \(y_j = 1\) for \(j=k+1\), ..., n and define \(y_j\) from \(j=k\) down to \(j=1\) as \(y_j = \sqrt{1-h^2}\sum _{i=j+1}^n y_i\). We define \(\varvec{y}\in {\mathbb {R}}^n\) as \(\begin{pmatrix} y_1&y_2&\dots y_n \end{pmatrix}^T\). Then,
holds. Since
holds,
holds. Let
Then, we have \({\mathrm{dim}}(\varTheta ') = n-k\). Since
holds, we have
for \(z_1\), \(z_2\), ..., \(z_{n-k}\in {\mathbb {R}}\). Thus, if the right-hand side of (18) holds, we have
Then, we have
Thus,
holds, and the theorem has been proved using the results of the previous section. \(\square\)
5 Numerical experiments
5.1 Experiments
Because \(SVD_{k}(R_h)\) cannot be calculated numerically when h is very small, we prepare another matrix for the experiments. Here, we present the matrix in case \(k=n-1\).
Proposition 4
[11] Let \(m\ge n\) and \(\epsilon _i\in {\mathbb {R}}\) satisfy \(0<\epsilon _i < 1\) \((i=1, 2, \dots , n)\). Let \(\{\varvec{v}_i\}_{i=1}^n\in {\mathbb {R}}^n\) be
where \(v_{j,i}\) is the j-th element of \(\varvec{v}_i\). Let \(\{\varvec{w}_i\}_{i=1}^n\) be the output of performing the Gram-Schmidt orthonormalization on \(\{\varvec{v}_i\}_{i=1}^n\). Define \(\varSigma \in {\mathbb {R}}^{m\times n}\) and \(W \in {\mathbb {R}}^{n\times n}\) as follows:
where \(\sigma _1 \ge \sigma _2 \ge \dots \ge \sigma _n > 0\). Let \(A=\varSigma W^T\). Then,
holds.
This proposition introduces matrices that converge to the least upper bound only in case \(k = n-1\). In this paper, matrices that conjecturally converge to the least upper bound in case \(k < n-1\) are defined in an analogous manner as follows.
Conjecture 1
Let \(m\ge n > k\) and \(\epsilon _i\in {\mathbb {R}}\) satisfy \(0< \epsilon _i < 1\) \((i=1, 2, \dots , n)\). Let \(\{\varvec{v}_i\}_{i=1}^n\in {\mathbb {R}}^n\) be
where \(v_{j,i}\) is the j-th element of \(\varvec{v}_i\). Let \(\{\varvec{w}_i\}_{i=1}^n\) be the output of performing the Gram-Schmidt orthonormalization on \(\{\varvec{v}_i\}_{i=1}^n\). Define \(\varSigma \in {\mathbb {R}}^{m\times n}\) and \(W \in {\mathbb {R}}^{n\times n}\) as follows:
where \(\sigma _1 \ge \sigma _2 \ge \dots \ge \sigma _{k+1} > 0\). Let \(A=\varSigma W^T\). Then,
holds. \(\square\)
\(\{\varvec{v}_i\}_{i=1}^n\) is as follows:
Because the limits cannot be taken in numerical experiments, we assign \(s^{n-i}\) to \(\sigma _i\) for \(i=1\), 2, ..., \(k+1\) and \(s^{-1}\) to \(\epsilon _i\) for \(i=1\), 2, ..., n. Numerical experiments were performed for \(n=2\), 3, ..., 25 and \(k=1\), 2, ..., \(n-1\) with \(m = n\). We calculate u as follows:
Because we want \(\frac{pivotQR_{k}(A)}{SVD_{k}(A)}\rightarrow \sqrt{\frac{4^k-1}{3}(n-k)+1}\) as \(s \rightarrow \infty\), we want \(u \rightarrow 0\) as \(s \rightarrow \infty\).
5.2 Environment
Python 3.5.2, scipy 1.4.1, and numpy 1.13.3 were used. The data type used in these experiments was double precision. We used SVD of numpy and pivoted QR of scipy. Gram-Schmidt orthonormalization was performed twice using the scipy QR decomposition. Let \(V = \begin{pmatrix} \varvec{v}_1&\dots&\varvec{v}_n \end{pmatrix}\) and \(V = QR\) and \(Q = WR'\) be QR decompositions. The i-th column of W is assigned to \(\varvec{w}_i\) for \(i=1\), 2, ..., n.
5.3 Results
The maximum value of u is \(2.3482077194\cdot 10^{-5}\) and \(u \ge 0\) holds in all cases where \(s = 10^9\). The results of cases \(m = n = 2\), 3, ..., 25, \(k = n-1\), \(\lfloor \frac{n}{2}\rfloor\), \(\lfloor \log _2 n\rfloor\), and \(s = 10^9\) are shown in Fig. 1. We can see that it almost approaches the least upper bound, and u increases monotonically with n and k. The results of cases \(m = n = 25\), \(k = n-1\), \(\lfloor \frac{n}{2}\rfloor\), \(\lfloor \log _2 n\rfloor\), and \(s=10\), \(10^2\), ..., \(10^9\) are shown in Fig. 2. We can see that u decreases monotonically with s. From these results, we can see that numerical solutions are likely to converge to the least upper bound.
6 Conclusion
We compare the 2-norm of the truncation error of pivoted QR to that of SVD. We obtain the following theorem:
Theorem 4
Let \(m\ge n > k\). For any \(A\in {\mathbb {R}}^{m\times n}\),
holds. Furthermore, for \(t\in {\mathbb {R}}\) that satisfies \(t < \sqrt{\frac{4^k-1}{3}(n-k)+1}\), there exists \(A\in {\mathbb {R}}^{m\times n}\) that satisfies
\(\square\)
This proposition states that the least upper bound of the ratio of the truncation error of pivoted QR to that of SVD is \(\sqrt{\frac{4^k-1}{3}(n-k)+1}\) when an \(m\times n\) \((m\ge n)\) matrix is approximated to a matrix whose rank is k. Furthermore, an example where the ratio converges to the least upper bound is found. We also find an example where the ratio is close to the least upper bound through a numerical experiment in case n is small.
7 Future work
We can see that the least upper bound of the ratio of the truncation error of pivoted QR to that of SVD is \(O(2^k\sqrt{n})\). However, this upper bound is attained only on a contrived example [4]. We expect that the upper bound may become significantly smaller by adding some restrictions to the matrices. We intend to find such a property.
References
Björck, Å.: Numerical Methods for Least Squares Problems. SIAM, Philadelphia (1996)
Businger, P., Golub, G.H.: Linear least squares solutions by householder transformations. Numer. Math. 7, 269–276 (1965)
Chandrasekaran, S., Ipsen, I.C.F.: On rank-revealing factorisations. SIAM J. Matrix Anal. Appl. 15(2), 592–622 (1994)
Drmač, Z., Gugercin, S.: A new selection operator for the discrete empirical interpolation method-improved a priori error bound and extensions. SIAM J. Sci. Comput. 38(2), A631–A648 (2016)
Eldén, L.: Numerical linear algebra in data mining. Acta Numer. 15, 327–384 (2006)
Faddeev, D.K., Kublanovskaya, V.N., Faddeeva, V.N.: Solution of linear algebraic systems with rectangular matrices. Trudy Mat. Inst. Steklov. 96, 76–92 (1968)
Golub, G.H.: Numerical methods for solving linear least squares problems. Numer. Math. 7, 206–216 (1965)
Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. The Johns Hopkins University Press, Baltimore (1996)
Hong, Y.P., Pan, C.-T.: Rank-revealing QR factorizations and the singular value decomposition. Math. Comput. 58, 213–232 (1992)
Kahan, W.: Numerical linear algebra. Can. Math. Bull. 9, 757–801 (1966)
Kawamura, H.: Analysis of truncation error of matrix low rank approximation algorithm using QR decomposition with pivot selection. Trans. Jpn. Soc. Ind. Appl. Math. 30(2), 163–176 (2020)
Khoromskij, B.N.: Tensors-structured numerical methods in scientific computing: survey on recent advances. Chemom. Intell. Lab. Syst. 110, 1–19 (2012)
Kumar, N.K., Schneider, J.: Literature survey on low rank approximation of matrices. Linear Multilinear Algebra 65(11), 2212–2244 (2017)
Ye, J.: Generalized low rank approximations of matrices. Mach. Learn. 61, 167–191 (2005)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
This article is published under an open access license. Please check the 'Copyright Information' section either on this page or in the PDF for details of this license and what re-use is permitted. If your intended use exceeds what is permitted by the license or if you are unable to locate the licence and re-use information, please contact the Rights and Permissions team.
About this article
Cite this article
Kawamura, H., Suda, R. Least upper bound of truncation error of low-rank matrix approximation algorithm using QR decomposition with pivoting. Japan J. Indust. Appl. Math. 38, 757–779 (2021). https://doi.org/10.1007/s13160-021-00459-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13160-021-00459-x