Randomized Kaczmarz with averaging

Moorman, Jacob D.; Tu, Thomas K.; Molitor, Denali; Needell, Deanna

doi:10.1007/s10543-020-00824-1

Randomized Kaczmarz with averaging

Published: 11 August 2020

Volume 61, pages 337–359, (2021)
Cite this article

BIT Numerical Mathematics Aims and scope Submit manuscript

Jacob D. Moorman ORCID: orcid.org/0000-0002-4291-1561¹,
Thomas K. Tu¹,
Denali Molitor¹ &
…
Deanna Needell¹

989 Accesses
26 Citations
Explore all metrics

Abstract

The randomized Kaczmarz (RK) method is an iterative method for approximating the least-squares solution of large linear systems of equations. The standard RK method uses sequential updates, making parallel computation difficult. Here, we study a parallel version of RK where a weighted average of independent updates is used. We analyze the convergence of RK with averaging and demonstrate its performance empirically. We show that as the number of threads increases, the rate of convergence improves and the convergence horizon for inconsistent systems decreases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Faster randomized block sparse Kaczmarz by averaging

Article Open access 28 December 2022

Randomized Kaczmarz method with adaptive stepsizes for inconsistent linear systems

Article 01 June 2023

Randomized Kaczmarz algorithm with averaging and block projection

Article 12 December 2023

References

Aharoni, R., Censor, Y.: Block-iterative projection methods for parallel computation of solutions to convex feasibility problems. Linear Algebra Appl. 120, 165–175 (1989)
Article MathSciNet Google Scholar
Bottou, L.: Online algorithms and stochastic approximations. In: Online Learning and Neural Networks. Cambridge University Press, Cambridge (1998)
Charles, L.: Applied Iterative Methods. Ak Peters/CRC Press, Boca Raton (2007)
Google Scholar
Cai, Y., Zhao, Y., Tang, Y.: Exponential convergence of a randomized Kaczmarz algorithm with relaxation. In: Gaol, F.L., Nguyen, Q.V. (eds.) Proceedings of the 2011 2nd International Congress on Computer Applications and Computational Science, pp. 467–473, Springer, Berlin (2012)
Chen, X., Powell, A.M.: Almost sure convergence of the Kaczmarz algorithm with random measurements. J. Fourier Anal. Appl. 18(6), 1195–1214 (2012)
Article MathSciNet Google Scholar
Eggermont, P.P.B., Herman, G.T., Lent, A.: Iterative algorithms for large partitioned linear systems, with applications to image reconstruction. Linear Algebra Appl. 40, 37–67 (1981)
Article MathSciNet Google Scholar
Eldar, Y.C., Needell, D.: Acceleration of randomized Kaczmarz method via the Johnson–Lindenstrauss lemma. Numer. Algorithms 58(2), 163–177 (2011)
Article MathSciNet Google Scholar
Elfving, T.: Block-iterative methods for consistent and inconsistent linear equations. Numer. Math. 35(1), 1–12 (1980)
Article MathSciNet Google Scholar
Gordon, D., Gordon, R.: Component-averaged row projections: a robust, block-parallel scheme for sparse linear systems. SIAM J. Sci. Comput. 27(3), 1092–1117 (2005)
Article MathSciNet Google Scholar
Gordon, R., Bender, R., Herman, G.T.: Algebraic reconstruction techniques (art) for three-dimensional electron microscopy and X-ray photography. J. Theor. Biol. 29(3), 471–481 (1970)
Article Google Scholar
Gower, R.M., Richtárik, P.: Randomized iterative methods for linear systems. SIAM J. Matrix Anal. Appl. 36(4), 1660–1690 (2015)
Article MathSciNet Google Scholar
Hamaker, C., Solmon, D.C.: The angles between the null spaces of X rays. J. Math. Anal. Appl. 62(1), 1–23 (1978)
Article MathSciNet Google Scholar
Hanke, M., Niethammer, W.: On the acceleration of Kaczmarz’s method for inconsistent linear systems. Linear Algebra Appl. 130, 83–98 (1990)
Article MathSciNet Google Scholar
Hanke, M., Niethammer, W.: On the use of small relaxation parameters in Kaczmarz method. Z. Angew. Math. Mech. 70(6), T575–T576 (1990)
MathSciNet MATH Google Scholar
Herman, G.T., Meyer, L.B.: Algebraic reconstruction techniques can be made computationally efficient (positron emission tomography application). IEEE Trans. Med. Imaging 12(3), 600–609 (1993)
Article Google Scholar
Kaczmarz, S.M.: Angenäherte auflösung von systemen linearer gleichungen. Bull. Int. Acad. Pol. Sci. Lett. Classe Sci. Math. Nat. Sér. A Sci. Math. 35, 355–357 (1937)
MATH Google Scholar
Leventhal, D., Lewis, A.S.: Randomized methods for linear constraints: convergence rates and conditioning. Math. Oper. Res. 35(3), 641–654 (2010)
Article MathSciNet Google Scholar
Liu, J., Wright, S.J., Srikrishna, S.: An asynchronous parallel randomized Kaczmarz algorithm. arXiv:1401.4780 (2014)
Ma, A., Needell, D., Ramdas, A.: Convergence properties of the randomized extended Gauss–Seidel and Kaczmarz methods. SIAM J. Matrix Anal. A 36(4), 1590–1604 (2015)
Article MathSciNet Google Scholar
Necoara, I.: Faster randomized block Kaczmarz algorithms. SIAM J. Matrix Anal. Appl. 40(4), 1425–1452 (2019)
Article MathSciNet Google Scholar
Needell, D.: Randomized Kaczmarz solver for noisy linear systems. BIT Numer. Math. 50(2), 395–403 (2010)
Article MathSciNet Google Scholar
Needell, D., Srebro, N., Ward, R.: Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm. Math. Program. 155(1), 549–573 (2015)
MathSciNet MATH Google Scholar
Needell, D., Tropp, J.A.: Paved with good intentions: analysis of a randomized block Kaczmarz method. Linear Algebra Appl. 441(August), 199–221 (2012)
MathSciNet MATH Google Scholar
Needell, D., Ward, R.: Two-subspace projection method for coherent overdetermined systems. J. Fourier Anal. Appl. 19(2), 256–269 (2013)
Article MathSciNet Google Scholar
Needell, D., Ward, R.: Batched stochastic gradient descent with weighted sampling. Approx. Theory XV San Antonio 2016, 279–306 (2017)
Article MathSciNet Google Scholar
Niu, F., Recht, B., Ré, C., Wright, S.J.: HOGWILD!: A lock-free approach to parallelizing stochastic gradient descent. In: Neural Information Processing Systems (2011)
Richtárik, P., Takáč, M.: Stochastic reformulations of linear systems: algorithms and convergence theory. SIAM J. Matrix Anal. Appl. 41(2), 487–524 (2020)
Article MathSciNet Google Scholar
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407 (1951)
Article MathSciNet Google Scholar
Strohmer, T., Vershynin, R.: A randomized Kaczmarz algorithm with exponential convergence. J. Fourier Anal. Appl. 15(2), 262–278 (2009)
Article MathSciNet Google Scholar
Jinchao, X., Zikatanov, L.: The method of alternating projections and the method of subspace corrections in hilbert space. J. Am. Math. Soc. 15(3), 573–597 (2002)
Article MathSciNet Google Scholar
Yangyang, X., Yin, W.: Block stochastic gradient iteration for convex and nonconvex optimization. SIAM J. Optim. 25(3), 1686–1716 (2015)
Article MathSciNet Google Scholar
Zouzias, A., Freris, N.M.: Randomized extended Kaczmarz for solving least squares. SIAM J. Matrix Anal. Appl. 34(2), 773–793 (2013)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, University of California, Los Angeles, Los Angeles, CA, 90095-0001, USA
Jacob D. Moorman, Thomas K. Tu, Denali Molitor & Deanna Needell

Authors

Jacob D. Moorman
View author publications
You can also search for this author in PubMed Google Scholar
Thomas K. Tu
View author publications
You can also search for this author in PubMed Google Scholar
Denali Molitor
View author publications
You can also search for this author in PubMed Google Scholar
Deanna Needell
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jacob D. Moorman.

Additional information

Communicated by Rosemary Anne Renaut.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Molitor, Moorman, and Needell were partially supported by NSF CAREER DMS $\#1348721$ and NSF BIGDATA $\#1740325$. Moorman was additionally supported by NSF Grant DGE-1829071. Tu was supported by DARPA under agreement number FA8750-18-2-0066.

Appendices

A Proof of Lemma 1

Let ${ \mathbb {E}}_{i}\left[ \,\cdot \,\right] $ denote ${ \mathbb {E}}_{i \sim \mathcal{D}}\left[ \,\cdot \,\right] $. Expanding the definition of the weighted sampling matrix $\mathbf{M }_k$ as a weighted average of the i.i.d. sampling matrices $\frac{\mathbf{I }_i^\top \mathbf{I }_i}{\Vert \mathbf{A }_i\Vert ^2}$, we see that

$$\begin{aligned} \mathbb {E}_{k}\left[ \mathbf{M }_k\right] = \mathbb {E}_{k}\left[ \frac{1}{q}\sum _{i\in \tau _k} w_i \frac{\mathbf{I }_i^\top \mathbf{I }_i}{\Vert \mathbf{A }_i\Vert ^2}\right] = { \mathbb {E}}_{i}\left[ w_i \frac{\mathbf{I }_i^\top \mathbf{I }_i}{\Vert \mathbf{A }_i\Vert ^2}\right] = \sum _{i=1}^m p_i w_i \frac{\mathbf{I }_i^\top \mathbf{I }_i}{\Vert \mathbf{A }_i\Vert ^2} = \mathbf{P }\mathbf{W }\mathbf{D }^{-2}. \end{aligned}$$

Likewise, we can compute

$$\begin{aligned}&\mathbb {E}_{k}\left[ \mathbf{M }_k^\top \mathbf{A }\mathbf{A }^\top \mathbf{M }_k\right] \\&= \mathbb {E}_{k}\left[ \left( \frac{1}{q}\sum _{i\in \tau _k} w_i \frac{\mathbf{I }_i^\top \mathbf{A }_i}{\Vert \mathbf{A }_i\Vert ^2}\right) \left( \frac{1}{q}\sum _{j\in \tau _k} w_j \frac{\mathbf{A }_j^\top \mathbf{I }_j}{\Vert \mathbf{A }_j\Vert ^2}\right) \right] \\&= \frac{1}{q} { \mathbb {E}}_{i}\left[ \left( w_i \frac{\mathbf{I }_i^\top \mathbf{A }_i}{\Vert \mathbf{A }_i\Vert ^2}\right) \left( w_i \frac{\mathbf{A }_i^\top \mathbf{I }_i}{\Vert \mathbf{A }_i\Vert ^2}\right) \right] + (1 - \frac{1}{q}) { \mathbb {E}}_{i}\left[ w_i \frac{\mathbf{I }_i^\top \mathbf{A }_i}{\Vert \mathbf{A }_i\Vert ^2}\right] { \mathbb {E}}_{i}\left[ w_i \frac{\mathbf{A }_i^\top \mathbf{I }_i}{\Vert \mathbf{A }_i\Vert ^2}\right] \\&= \frac{1}{q} { \mathbb {E}}_{i}\left[ \left( w_i \frac{\mathbf{I }_i^\top \mathbf{A }_i}{\Vert \mathbf{A }_i\Vert ^2}\right) \left( w_i \frac{\mathbf{A }_i^\top \mathbf{I }_i}{\Vert \mathbf{A }_i\Vert ^2}\right) \right] + (1 - \frac{1}{q}) { \mathbb {E}}_{i}\left[ w_i \frac{\mathbf{I }_i^\top \mathbf{I }_i}{\Vert \mathbf{A }_i\Vert ^2}\right] \mathbf{A }\mathbf{A }^\top { \mathbb {E}}_{i}\left[ w_i \frac{\mathbf{I }_i^\top \mathbf{I }_i}{\Vert \mathbf{A }_i\Vert ^2}\right] \\&= \frac{1}{q} { \mathbb {E}}_{i}\left[ w_i^2 \frac{\mathbf{I }_i^\top \mathbf{I }_i}{\Vert \mathbf{A }_i\Vert ^2}\right] + \left( 1 - \frac{1}{q}\right) \mathbf{P }\mathbf{W }\mathbf{D }^{-2} \mathbf{A }\mathbf{A }^\top \mathbf{P }\mathbf{W }\mathbf{D }^{-2}\\&= \frac{1}{q} \mathbf{P }\mathbf{W }^2 \mathbf{D }^{-2}+ \left( 1 - \frac{1}{q}\right) \mathbf{P }\mathbf{W }\mathbf{D }^{-2} \mathbf{A }\mathbf{A }^\top \mathbf{P }\mathbf{W }\mathbf{D }^{-2} \end{aligned}$$

by separating the cases where $i=j$ from those where $i \ne j$ and utilizing the independence of the indices sampled in $\tau _k$.

B Proof of Theorem 1

We now prove Theorem 1 starting from the error update in Eq. (6). Expanding the squared error norm,

$$\begin{aligned} \Vert e^{k+1}\Vert ^2&= \Vert (\mathbf{I }- \mathbf{A }^\top \mathbf{M }_k \mathbf{A })e^{k} + \mathbf{A }^\top \mathbf{M }_k r^\star \Vert ^2\nonumber \\&= \Vert (\mathbf{I }- \mathbf{A }^\top \mathbf{M }_k \mathbf{A })e^{k}\Vert ^2 + 2\langle (\mathbf{I }- \mathbf{A }^\top \mathbf{M }_k \mathbf{A })e^{k} , \mathbf{A }^\top \mathbf{M }_k r^\star \rangle + \Vert \mathbf{A }^\top \mathbf{M }_k r^\star \Vert ^2. \end{aligned}$$

(11)

Under Assumption 1, the expectations in Lemma 1 simplify to

$$\begin{aligned} \mathbb {E}_{k}\left[ \mathbf{M }_k\right] = \frac{\alpha }{\Vert \mathbf{A }\Vert _F^2} \mathbf{I }\end{aligned}$$

and

$$\begin{aligned} \mathbb {E}_{k}\left[ \mathbf{M }_k^\top \mathbf{A }\mathbf{A }^\top \mathbf{M }_k\right] = \frac{\alpha }{q}\frac{\mathbf{W }}{\Vert \mathbf{A }\Vert _F^2} + \alpha ^2 \left( 1 - \frac{1}{q}\right) \frac{\mathbf{A }\mathbf{A }^\top }{\Vert \mathbf{A }\Vert _F^4}. \end{aligned}$$

Upon taking expections on both sides of Eq. (11), the middle term simplifies since

$$\begin{aligned} \mathbb {E}_{k}\left[ \langle e^{k} , \mathbf{A }^\top \mathbf{M }_k r^\star \rangle \right] = \langle e^{k} , \mathbf{A }^\top \mathbb {E}_{k}\left[ \mathbf{M }_k\right] r^\star \rangle = \langle e^{k} , \frac{\alpha }{\Vert \mathbf{A }\Vert _F^2}\mathbf{A }^\top r^\star \rangle =0. \end{aligned}$$

Thus,

$$\begin{aligned} \begin{aligned}&\mathbb {E}_{k}\left[ \Vert e^{k+1}\Vert ^2\right] \\&\quad = \underbrace{\mathbb {E}_{k}\left[ \Vert (\mathbf {I }- \mathbf {A }^\top \mathbf {M }_k \mathbf {A })e^{k}\Vert ^2\right] }_{\textcircled {1}} - \underbrace{2\mathbb {E}_{k}\left[ \langle \mathbf {A }^\top \mathbf {M }_k \mathbf {A }e^{k} , \mathbf {A }^\top \mathbf {M }_k r^\star \rangle \right] }_{\textcircled {1}} + \underbrace{\mathbb {E}_{k}\left[ \Vert \mathbf {A }^\top \mathbf {M }_k r^\star \Vert ^2\right] }_{\textcircled {1}}. \end{aligned} \end{aligned}$$

(12)

Making use of Lemma 1 to take the expectation in the first term in Eq. (12),

$$\begin{aligned}&\textcircled {1} =\mathbb {E}_{k}\left[ \Vert (\mathbf {I }- \mathbf {A }^\top \mathbf {M }_k \mathbf {A })e^{k}\Vert ^2 \right] \\ {}&\quad = \mathbb {E}_{k}\left[ \bigg \langle e^k, (\mathbf {I }- \mathbf {A }^\top \mathbf {M }_k \mathbf {A })^\top (\mathbf {I }- \mathbf {A }^\top \mathbf {M }_k \mathbf {A })e^{k} \bigg \rangle \right] \\ {}&\quad = \bigg \langle e^k, (\mathbf {I }- 2\mathbf {A }^\top \mathbb {E}_{k}\left[ \mathbf {M }_k\right] \mathbf {A }+ \mathbf {A }^\top \mathbb {E}_{k}\left[ \mathbf {M }_k^\top \mathbf {A }\mathbf {A }^\top \mathbf {M }_k\right] \mathbf {A })e^{k} \bigg \rangle \\ {}&\quad \overset{lem.1}{=} \bigg \langle e^k, \left( \mathbf {I }- 2\alpha \frac{\mathbf {A }^\top \mathbf {A }}{\Vert \mathbf {A }\Vert _F^2} + \frac{\alpha }{q}\frac{\mathbf {A }^\top \mathbf {W }\mathbf {A }}{\Vert \mathbf {A }\Vert _F^2} + \alpha ^2\left( 1 - \frac{1}{q}\right) \left( \frac{\mathbf {A }^\top \mathbf {A }}{\Vert \mathbf {A }\Vert _F^2}\right) ^2 \right) e^{k}\bigg \rangle \\ {}&\quad = \bigg \langle e^k, \left( \left( \mathbf {I }- \alpha \frac{\mathbf {A }^\top \mathbf {A }}{\Vert \mathbf {A }\Vert _F^2}\right) ^2 + \frac{\mathbf {A }^\top }{\Vert \mathbf {A }\Vert _F} \left( \frac{\alpha }{q} \mathbf {W }- \frac{\alpha ^2}{q} \frac{\mathbf {A }\mathbf {A }^\top }{\Vert \mathbf {A }\Vert _F^2}\right) \frac{\mathbf {A }}{\Vert \mathbf {A }\Vert _F} \right) e^{k}\bigg \rangle . \end{aligned}$$

For the second term in Eq. 12,

$$\begin{aligned}&{\textcircled {2}}=2\mathbb {E}_{k}\left[ \langle \mathbf {A }^\top \mathbf {M }_k \mathbf {A }e^{k} , \mathbf {A }^\top \mathbf {M }_k r^\star \rangle \right] \\ {}&\quad = 2\langle \mathbf {A }e^{k} , \mathbb {E}_{k}\left[ \mathbf {M }_k^\top \mathbf {A }\mathbf {A }^\top \mathbf {M }_k \right] r^\star \rangle \\ {}&\quad \overset{lem.1}{=} 2\langle \mathbf {A }e^{k} , \left( \frac{\alpha }{q}\mathbf {W }+ \alpha ^2 \left( 1 - \frac{1}{q}\right) \mathbf {A }\mathbf {A }^\top \right) r^\star \rangle \\ {}&\quad \overset{\mathbf {A }^\top r^\star = 0}{=} 2\frac{\alpha }{q\Vert \mathbf {A }\Vert _F^2}\langle \mathbf {A }e^k , \mathbf {W }r^\star \rangle .\\ \end{aligned}$$

Similarly, for the last term in Eq. (12),

$$\begin{aligned} \textcircled {3}= \mathbb {E}_{k}\left[ \Vert \mathbf {A }^\top \mathbf {M }_k r^\star \Vert ^2\right] = \frac{\alpha }{q}\frac{\Vert r^\star \Vert _\mathbf {W }^2}{\Vert \mathbf {A }\Vert _F^2} . \end{aligned}$$

Combining these in Eq. (12),

$$\begin{aligned} {\mathbb {E}}\left[ \Vert e^{k+1}\Vert ^2\right]&= \bigg \langle e^k , \left( \mathbf{I }- \alpha \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) ^2e^{k} \bigg \rangle \\&\quad +\bigg \langle e^k , \frac{\mathbf{A }^\top }{\Vert \mathbf{A }\Vert _F^2} \left( \frac{\alpha }{q} \mathbf{W }- \frac{\alpha ^2}{q} \frac{\mathbf{A }\mathbf{A }^\top }{\Vert \mathbf{A }\Vert _F^2}\right) \mathbf{A }e^{k} \bigg \rangle -2\frac{\alpha }{q} \frac{\langle \mathbf{A }e^k , \mathbf{W }r^\star \rangle }{\Vert \mathbf{A }\Vert _F^2} + \frac{\alpha }{q}\frac{\Vert r^\star \Vert _\mathbf{W }^2}{\Vert \mathbf{A }\Vert _F^2}\\&= \bigg \langle e^k , \left( \left( \mathbf{I }- \alpha \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) ^2 - \frac{\alpha ^2}{q} \left( \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) ^2\right) e^{k} \bigg \rangle + \frac{\alpha }{q}\frac{\Vert r^k\Vert _\mathbf{W }^2}{\Vert \mathbf{A }\Vert _F^2} \\&\le \sigma _{\max }\left( \left( \mathbf{I }- \alpha \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) ^2 - \frac{\alpha ^2}{q} \left( \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) ^2\right) \Vert e^{k}\Vert ^2 + \frac{\alpha }{q}\frac{ \Vert r^k\Vert _\mathbf{W }^2}{\Vert \mathbf{A }\Vert _F^2}. \end{aligned}$$

C Proof of Theorem 2

Proof

We seek to optimize the convergence rate constant from Corollary 1 when using uniform weights $\mathbf{W }= \alpha \mathbf{I }$,

$$\begin{aligned} \sigma _{\max }\left( \left( \mathbf{I }- \alpha \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) ^2 + \frac{\alpha ^2 }{q}\left( \mathbf{I }- \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) \end{aligned}$$

(13)

with respect to $\alpha $. To do this, we first simplify from a matrix polynomial to a maximum over scalar polynomials in $\alpha $ with coefficients based on each singular value of $\mathbf{A }$. We then show that the maximum occurs when either the minimum or maximum singular value of $\mathbf{A }$ is used. Finally, we derive a condition for which singular value to use, and determine the optimal $\alpha $ that minimizes the maximum singular value.

Defining $\mathbf{Q }^\top \varvec{\Sigma } \mathbf{Q }= \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}$ as the eigendecomposition, and the polynomial

$$\begin{aligned} p(\sigma ) \overset{\text {def}}{=}1-2\alpha \sigma +\alpha ^2\left( \frac{\sigma }{q} +\left( 1-\frac{1}{q}\right) \sigma ^2\right) , \end{aligned}$$

the convergence rate constant from Eq. (13) can be written as $\sigma _{\max }\left( p\left( \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) \right) $. Since $p\left( \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) $ is a polynomial of a symmetric matrix, its singular vectors are the same as those of its argument, while its corresponding singular values are the polynomial p applied to the singular values of the original matrix. That is,

$$\begin{aligned} p\left( \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) = p\left( \mathbf{Q }^\top \varvec{\Sigma } \mathbf{Q }\right) = \mathbf{Q }^\top p\left( \varvec{\Sigma } \right) \mathbf{Q }. \end{aligned}$$

Thus, the convergence rate constant can be written as

$$\begin{aligned} \sigma _{\max }\left( p\left( \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) \right) = \sigma _{\max }\left( p(\varvec{\Sigma })\right) . \end{aligned}$$

Moreover, we can bound this extremal singular value by the maximum of the polynomial p over an interval containing the spectrum of $\varvec{\Sigma }$

$$\begin{aligned} \sigma _{\max }\left( p\left( \varvec{\Sigma }\right) \right) \le \max \left| p\left( \sigma \right) \right| \quad \text {subject to} \quad \sigma \in \left[ s_{\min }, s_{\max }\right] . \end{aligned}$$

Here, the singular values of $\varvec{\Sigma }$ are bounded from below by $s_{\min }\overset{\text {def}}{=}\frac{\sigma ^2_{\min }(\mathbf{A })}{\Vert \mathbf{A }\Vert _F^2}$ and above by $s_{\max }\overset{\text {def}}{=}\frac{\sigma ^2_{\max }(\mathbf{A })}{\Vert \mathbf{A }\Vert _F^2}$ since $\varvec{\Sigma }$ is the diagonal matrix of singular values of $\frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}$. Note that the polynomial can be factored as $p(\sigma ) = (1-\sigma \alpha )^2+\frac{\sigma \alpha ^2}{q}\left( 1-\sigma \right) $, and is positive for $\sigma \in \left[ 0, 1\right] $, which contains $\left[ s_{\min }, s_{\max }\right] $. Also, since the coefficient of the $\sigma ^2$ term of the polynomial p is $\alpha ^2 \left( 1 - \frac{1}{q} \right) $ which is greater than or equal to zero, the polynomial is convex in $\sigma $ on the interval $\left[ s_{\min }, s_{\max }\right] $. Thus, the maximum of p on the interval $\left[ s_{\min }, s_{\max }\right] $ is attained at one of the two endpoints $s_{\min }, s_{\max }$ and we have the bound

$$\begin{aligned} \sigma _{\max }\left( p\left( \varvec{\Sigma }\right) \right) = \max \left( p\left( s_{\min }\right) , p\left( s_{\max }\right) \right) . \end{aligned}$$

To optimize this bound with respect to $\alpha $, we first find conditions on $\alpha $ such that $p(s_{\min }) < p(s_{\max })$. If $s_{\max }=s_{\min }$, this obviously never holds; otherwise, $s_{\max }>s_{\min }$ and

$$\begin{aligned} p(s_{\min })&< p(s_{\max })\\ 1-2\alpha s_{\min }+\alpha ^2\left[ \frac{s_{\min }}{q} +\left( 1-\frac{1}{q}\right) s_{\min }^2\right]&< 1-2\alpha s_{\max }+\alpha ^2\left[ \frac{s_{\max }}{q} +\left( 1-\frac{1}{q}\right) s_{\max }^2\right] \end{aligned}$$

Grouping like terms and cancelling, we get

$$\begin{aligned} \alpha \left( 2 - \frac{\alpha }{q} \right) \left( s_{\max } - s_{\min }\right)&< \alpha ^2\left( 1 - \frac{1}{q}\right) \left( s_{\max }^2 - s_{\min }^2\right) \end{aligned}$$

Since $\frac{\alpha }{q}>0$, we can divide it from both sides.

$$\begin{aligned} \left( 2q- \alpha \right) \left( s_{\max } - s_{\min }\right)&< \alpha \left( q- 1\right) \left( s_{\max }^2 - s_{\min }^2\right) \end{aligned}$$

Since $s_{\max } > s_{\min }$, we can divide both sides by $s_{\max }-s_{\min }$.

$$\begin{aligned} 2 q- \alpha&< \alpha \left( q-1\right) \left( s_{\max }+s_{\min }\right) \\ 2q&< \alpha \left( 1+\left( q-1\right) \left( s_{\max }+s_{\min }\right) \right) \end{aligned}$$

and since the number of threads $q\ge 1$, we can divide both sides by $1 + \left( q-1\right) \left( s_{\min } + s_{\max }\right) $ to get

$$\begin{aligned} \alpha > \frac{2 q}{1 + \left( q-1\right) \left( s_{\min } + s_{\max }\right) } \overset{\text {def}}{=}\widehat{\alpha }. \end{aligned}$$

Thus,

$$\begin{aligned} \sigma _{\max }\left( p\left( \varvec{\Sigma }\right) \right) = {\left\{ \begin{array}{ll} p\left( s_{\max }\right) , &{}\quad \alpha > \widehat{\alpha } \\ p\left( s_{\min }\right) , &{}\quad \alpha \le \widehat{\alpha } \end{array}\right. } \end{aligned}$$

For the first term,

$$\begin{aligned} \frac{\partial }{\partial \alpha } p(s_{\max })&= -2s_{\max } + 2 \left( \frac{s_{\max }}{q} + \left( 1-\frac{1}{q}\right) s_{\max }^2\right) \alpha \\&> -2s_{\max } + 2\left( \frac{s_{\max }}{q} + \left( 1-\frac{1}{q}\right) s_{\max }^2\right) \widehat{\alpha } \end{aligned}$$

since $\alpha > \widehat{\alpha }$ and the coefficient is positive. Factoring $\frac{2 s_{\max }}{q}$ from the second term and substituting for $\widehat{\alpha }$, we get

$$\begin{aligned}&= -2s_{\max } + \frac{2s_{\max }}{q}\left( 1+ \left( q-1\right) s_{\max }\right) \widehat{\alpha } \\&= -2s_{\max } + \frac{2s_{\max }}{q}\left( 1+ \left( q-1\right) s_{\max }\right) \frac{2 q}{1 + \left( 1 - q\right) \left( s_{\min } + s_{\max }\right) } \\&= -2 s_{\max } + 2 s_{\max }\frac{2 \left( 1+(q-1)s_{\max }\right) }{1+(q-1)(s_{\max }+s_{\min })} \\&= 2 s_{\max } \left[ -1 + \frac{2 \left( 1+(q-1)s_{\max }\right) }{1+(q-1)(s_{\max }+s_{\min })}\right] \\&= 2 s_{\max } \left[ \frac{ 1+(q-1)(s_{\max }-s_{\min })}{1+(q-1)(s_{\max }+s_{\min })}\right] \\&> 0 \end{aligned}$$

since all terms in both numerator and denominator are positive. Thus, the function is monotonic increasing on $\alpha \in [\widehat{\alpha },\infty )$, and the minimum is at the lower endpoint, i.e. $\alpha ^\star = \widehat{\alpha }$.

Similarly, for the second term, $\alpha \le \widehat{\alpha }$ and

$$\begin{aligned} \frac{\partial }{\partial \alpha } p(s_{\min })&= -2s_{\min } + 2 \left( \frac{s_{\min }}{q} + \left( 1-\frac{1}{q}\right) s_{\min }^2\right) \alpha \\&\le -2s_{\min } + 2\left( \frac{s_{\min }}{q} + \left( 1-\frac{1}{q}\right) s_{\min }^2\right) \widehat{\alpha } \\&= 2 s_{\min } \left[ \frac{ 1-(q-1)(s_{\max }-s_{\min })}{1+(q-1)(s_{\max }+s_{\min })}\right] \end{aligned}$$

If

$$\begin{aligned} 1-(q-1)(s_{\max }-s_{\min })<0, \end{aligned}$$

(14)

this function is monotonic decreasing on $\alpha \in (-\infty ,\alpha ^\star ]$, and the minimum is at the upper endpoint i.e. $\alpha = \alpha ^\star $. Otherwise, since $p(s_{\min })$ is quadratic in $\alpha $ with positive leading coefficient, the minimum occurs at the critical point, so we set the derivative to 0 and solve for $\alpha ^\star $

$$\begin{aligned} \frac{\partial }{\partial \alpha } p(s_{\min })&= -2s_{\min } + 2 \left( \frac{s_{\min }}{q} + \left( 1-\frac{1}{q}\right) s_{\min }^2\right) \alpha ^\star \\&= -2s_{\min } + \frac{2s_{\min }}{q}\left( 1+ \left( q-1\right) s_{\min }\right) \alpha ^\star \\&= 0 \\ \frac{2s_{\min }}{q}\left( 1 + \left( q-1\right) s_{\min }\right) \alpha ^\star&= 2s_{\min } \\ \alpha ^\star&= \frac{q}{1+(q-1)s_{\min }} \end{aligned}$$

D Corollary Proofs

We provide proofs for the corollaries of Sect. 2, which follow from Theorem 1.

1.1 D.1 Proof of Corollary 2

Suppose $p_i = \frac{\Vert \mathbf{A }_i\Vert ^2}{\Vert \mathbf{A }\Vert _F^2}$ and $\mathbf{W }= \alpha \mathbf{I }$. From the proof of Theorem 1,

$$\begin{aligned} \mathbb {E}_{k}\left[ \Vert e^{k+1}\Vert ^2\right]&= \bigg \langle e^k , \left( \left( \mathbf{I }- \alpha \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) ^2 - \frac{\alpha ^2}{q} \left( \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) ^2\right) e^{k} \bigg \rangle + \frac{\alpha }{q}\frac{\Vert r^k\Vert _\mathbf{W }^2}{\Vert \mathbf{A }\Vert _F^2}. \end{aligned}$$

In this case, since $\mathbf{A }^\top r^\star = 0$, $\langle \mathbf{A }e^k , r^\star \rangle = 0$ and

$$\begin{aligned} \Vert r^k\Vert _\mathbf{W }^2&= \alpha \Vert \mathbf{A }e^k\Vert ^2 + 2 \alpha \langle \mathbf{A }e^k , r^\star \rangle + \alpha \Vert r^\star \Vert ^2 \\&= \alpha \langle e^k , \mathbf{A }^\top \mathbf{A }e^k \rangle + \alpha \Vert r^\star \Vert ^2. \end{aligned}$$

Combining the inner products,

$$\begin{aligned} \mathbb {E}_{k}\left[ \Vert e^{k+1}\Vert ^2\right]&= \bigg \langle e^k,\left( \left( \mathbf{I }- \alpha \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) ^2 + \frac{\alpha ^2}{q} \left( \mathbf{I }- \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) e^{k}\bigg \rangle + \frac{\alpha ^2\Vert r^\star \Vert ^2}{q\Vert \mathbf{A }\Vert _F^2}\\&\le \sigma _{\max }\left( \left( \mathbf{I }- \alpha \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) ^2 + \frac{\alpha ^2 }{q}\left( \mathbf{I }- \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) \Vert e^{k}\Vert ^2 + \frac{\alpha ^2\Vert r^\star \Vert ^2}{q\Vert \mathbf{A }\Vert _F^2} . \end{aligned}$$

1.2 D.2 Proof of Corollary 3

Suppose $q= 1$, $\mathbf{W }=\mathbf{I }$ and $p_i = \frac{\Vert \mathbf{A }_i\Vert ^2}{\Vert \mathbf{A }\Vert _F^2}$.

$$\begin{aligned} \mathbb {E}_{k}\left[ \Vert e^{k+1}\Vert ^2\right]&\le \sigma _{\max }\left( \mathbf{I }- \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) \Vert e^{k}\Vert ^2 + \frac{ \Vert r^\star \Vert ^2}{\Vert \mathbf{A }\Vert _F^2} \\&= \left( 1 - \frac{\sigma ^2_{\min }(\mathbf{A })}{\Vert \mathbf{A }\Vert _F^2}\right) \Vert e^{k}\Vert ^2 + \frac{ \Vert r^\star \Vert ^2}{\Vert \mathbf{A }\Vert _F^2}. \end{aligned}$$

From the proof of Theorem 1,

$$\begin{aligned} \mathbb {E}_{k}\left[ \Vert e^{k+1}\Vert ^2\right]&= \bigg \langle e^k , \left( \left( \mathbf{I }- \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) ^2 - \left( \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) ^2\right) e^{k} \bigg \rangle + \frac{\Vert r^k\Vert ^2}{\Vert \mathbf{A }\Vert _F^2}. \end{aligned}$$

Decomposing $r^k$,

$$\begin{aligned} \Vert r^k\Vert ^2&= \Vert \mathbf{A }e^k\Vert ^2 + \Vert r^\star \Vert ^2 \\&=\langle e^k , \mathbf{A }^\top \mathbf{A }e^k \rangle + \Vert r^\star \Vert ^2 . \end{aligned}$$

Combining the inner products,

$$\begin{aligned} \mathbb {E}_{k}\left[ \Vert e^{k+1}\Vert ^2\right]&= \bigg \langle e^k, \left( \left( \mathbf{I }- \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) ^2 -\left( \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2} \right) ^2+ \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) e^{k}\bigg \rangle + \frac{\Vert r^\star \Vert ^2}{\Vert \mathbf{A }\Vert _F^2} \\&= \bigg \langle e^k , \left( \mathbf{I }- \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) e^{k} \bigg \rangle + \frac{\Vert r^\star \Vert ^2}{\Vert \mathbf{A }\Vert _F^2}\\&\le \sigma _{\max }\left( \mathbf{I }- \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) \Vert e^{k}\Vert ^2 + \frac{ \Vert r^\star \Vert ^2}{\Vert \mathbf{A }\Vert _F^2}\\&= \left( 1 - \frac{\sigma ^2_{\min }(\mathbf{A })}{\Vert \mathbf{A }\Vert _F^2}\right) \Vert e^{k}\Vert ^2 + \frac{ \Vert r^\star \Vert ^2}{\Vert \mathbf{A }\Vert _F^2}. \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Moorman, J.D., Tu, T.K., Molitor, D. et al. Randomized Kaczmarz with averaging. Bit Numer Math 61, 337–359 (2021). https://doi.org/10.1007/s10543-020-00824-1

Download citation

Received: 07 February 2020
Accepted: 22 July 2020
Published: 11 August 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s10543-020-00824-1

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Randomized Kaczmarz with averaging

Abstract

Access this article

Similar content being viewed by others

Faster randomized block sparse Kaczmarz by averaging

Randomized Kaczmarz method with adaptive stepsizes for inconsistent linear systems

Randomized Kaczmarz algorithm with averaging and block projection

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

A Proof of Lemma 1

B Proof of Theorem 1

C Proof of Theorem 2

Proof

D Corollary Proofs

1.1 D.1 Proof of Corollary 2

1.2 D.2 Proof of Corollary 3

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Randomized Kaczmarz with averaging

Abstract

Access this article

Similar content being viewed by others

Faster randomized block sparse Kaczmarz by averaging

Randomized Kaczmarz method with adaptive stepsizes for inconsistent linear systems

Randomized Kaczmarz algorithm with averaging and block projection

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

A Proof of Lemma 1

B Proof of Theorem 1

C Proof of Theorem 2

Proof

D Corollary Proofs

1.1 D.1 Proof of Corollary 2

1.2 D.2 Proof of Corollary 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation