Skip to main content
Log in

Randomized Kaczmarz with averaging

  • Published:
BIT Numerical Mathematics Aims and scope Submit manuscript

Abstract

The randomized Kaczmarz (RK) method is an iterative method for approximating the least-squares solution of large linear systems of equations. The standard RK method uses sequential updates, making parallel computation difficult. Here, we study a parallel version of RK where a weighted average of independent updates is used. We analyze the convergence of RK with averaging and demonstrate its performance empirically. We show that as the number of threads increases, the rate of convergence improves and the convergence horizon for inconsistent systems decreases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Aharoni, R., Censor, Y.: Block-iterative projection methods for parallel computation of solutions to convex feasibility problems. Linear Algebra Appl. 120, 165–175 (1989)

    Article  MathSciNet  Google Scholar 

  2. Bottou, L.: Online algorithms and stochastic approximations. In: Online Learning and Neural Networks. Cambridge University Press, Cambridge (1998)

  3. Charles, L.: Applied Iterative Methods. Ak Peters/CRC Press, Boca Raton (2007)

    Google Scholar 

  4. Cai, Y., Zhao, Y., Tang, Y.: Exponential convergence of a randomized Kaczmarz algorithm with relaxation. In: Gaol, F.L., Nguyen, Q.V. (eds.) Proceedings of the 2011 2nd International Congress on Computer Applications and Computational Science, pp. 467–473, Springer, Berlin (2012)

  5. Chen, X., Powell, A.M.: Almost sure convergence of the Kaczmarz algorithm with random measurements. J. Fourier Anal. Appl. 18(6), 1195–1214 (2012)

    Article  MathSciNet  Google Scholar 

  6. Eggermont, P.P.B., Herman, G.T., Lent, A.: Iterative algorithms for large partitioned linear systems, with applications to image reconstruction. Linear Algebra Appl. 40, 37–67 (1981)

    Article  MathSciNet  Google Scholar 

  7. Eldar, Y.C., Needell, D.: Acceleration of randomized Kaczmarz method via the Johnson–Lindenstrauss lemma. Numer. Algorithms 58(2), 163–177 (2011)

    Article  MathSciNet  Google Scholar 

  8. Elfving, T.: Block-iterative methods for consistent and inconsistent linear equations. Numer. Math. 35(1), 1–12 (1980)

    Article  MathSciNet  Google Scholar 

  9. Gordon, D., Gordon, R.: Component-averaged row projections: a robust, block-parallel scheme for sparse linear systems. SIAM J. Sci. Comput. 27(3), 1092–1117 (2005)

    Article  MathSciNet  Google Scholar 

  10. Gordon, R., Bender, R., Herman, G.T.: Algebraic reconstruction techniques (art) for three-dimensional electron microscopy and X-ray photography. J. Theor. Biol. 29(3), 471–481 (1970)

    Article  Google Scholar 

  11. Gower, R.M., Richtárik, P.: Randomized iterative methods for linear systems. SIAM J. Matrix Anal. Appl. 36(4), 1660–1690 (2015)

    Article  MathSciNet  Google Scholar 

  12. Hamaker, C., Solmon, D.C.: The angles between the null spaces of X rays. J. Math. Anal. Appl. 62(1), 1–23 (1978)

    Article  MathSciNet  Google Scholar 

  13. Hanke, M., Niethammer, W.: On the acceleration of Kaczmarz’s method for inconsistent linear systems. Linear Algebra Appl. 130, 83–98 (1990)

    Article  MathSciNet  Google Scholar 

  14. Hanke, M., Niethammer, W.: On the use of small relaxation parameters in Kaczmarz method. Z. Angew. Math. Mech. 70(6), T575–T576 (1990)

    MathSciNet  MATH  Google Scholar 

  15. Herman, G.T., Meyer, L.B.: Algebraic reconstruction techniques can be made computationally efficient (positron emission tomography application). IEEE Trans. Med. Imaging 12(3), 600–609 (1993)

    Article  Google Scholar 

  16. Kaczmarz, S.M.: Angenäherte auflösung von systemen linearer gleichungen. Bull. Int. Acad. Pol. Sci. Lett. Classe Sci. Math. Nat. Sér. A Sci. Math. 35, 355–357 (1937)

    MATH  Google Scholar 

  17. Leventhal, D., Lewis, A.S.: Randomized methods for linear constraints: convergence rates and conditioning. Math. Oper. Res. 35(3), 641–654 (2010)

    Article  MathSciNet  Google Scholar 

  18. Liu, J., Wright, S.J., Srikrishna, S.: An asynchronous parallel randomized Kaczmarz algorithm. arXiv:1401.4780 (2014)

  19. Ma, A., Needell, D., Ramdas, A.: Convergence properties of the randomized extended Gauss–Seidel and Kaczmarz methods. SIAM J. Matrix Anal. A 36(4), 1590–1604 (2015)

    Article  MathSciNet  Google Scholar 

  20. Necoara, I.: Faster randomized block Kaczmarz algorithms. SIAM J. Matrix Anal. Appl. 40(4), 1425–1452 (2019)

    Article  MathSciNet  Google Scholar 

  21. Needell, D.: Randomized Kaczmarz solver for noisy linear systems. BIT Numer. Math. 50(2), 395–403 (2010)

    Article  MathSciNet  Google Scholar 

  22. Needell, D., Srebro, N., Ward, R.: Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm. Math. Program. 155(1), 549–573 (2015)

    MathSciNet  MATH  Google Scholar 

  23. Needell, D., Tropp, J.A.: Paved with good intentions: analysis of a randomized block Kaczmarz method. Linear Algebra Appl. 441(August), 199–221 (2012)

    MathSciNet  MATH  Google Scholar 

  24. Needell, D., Ward, R.: Two-subspace projection method for coherent overdetermined systems. J. Fourier Anal. Appl. 19(2), 256–269 (2013)

    Article  MathSciNet  Google Scholar 

  25. Needell, D., Ward, R.: Batched stochastic gradient descent with weighted sampling. Approx. Theory XV San Antonio 2016, 279–306 (2017)

    Article  MathSciNet  Google Scholar 

  26. Niu, F., Recht, B., Ré, C., Wright, S.J.: HOGWILD!: A lock-free approach to parallelizing stochastic gradient descent. In: Neural Information Processing Systems (2011)

  27. Richtárik, P., Takáč, M.: Stochastic reformulations of linear systems: algorithms and convergence theory. SIAM J. Matrix Anal. Appl. 41(2), 487–524 (2020)

    Article  MathSciNet  Google Scholar 

  28. Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407 (1951)

    Article  MathSciNet  Google Scholar 

  29. Strohmer, T., Vershynin, R.: A randomized Kaczmarz algorithm with exponential convergence. J. Fourier Anal. Appl. 15(2), 262–278 (2009)

    Article  MathSciNet  Google Scholar 

  30. Jinchao, X., Zikatanov, L.: The method of alternating projections and the method of subspace corrections in hilbert space. J. Am. Math. Soc. 15(3), 573–597 (2002)

    Article  MathSciNet  Google Scholar 

  31. Yangyang, X., Yin, W.: Block stochastic gradient iteration for convex and nonconvex optimization. SIAM J. Optim. 25(3), 1686–1716 (2015)

    Article  MathSciNet  Google Scholar 

  32. Zouzias, A., Freris, N.M.: Randomized extended Kaczmarz for solving least squares. SIAM J. Matrix Anal. Appl. 34(2), 773–793 (2013)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jacob D. Moorman.

Additional information

Communicated by Rosemary Anne Renaut.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Molitor, Moorman, and Needell were partially supported by NSF CAREER DMS \(\#1348721\) and NSF BIGDATA \(\#1740325\). Moorman was additionally supported by NSF Grant DGE-1829071. Tu was supported by DARPA under agreement number FA8750-18-2-0066.

Appendices

A Proof of Lemma 1

Let \({ \mathbb {E}}_{i}\left[ \,\cdot \,\right] \) denote \({ \mathbb {E}}_{i \sim \mathcal{D}}\left[ \,\cdot \,\right] \). Expanding the definition of the weighted sampling matrix \(\mathbf{M }_k\) as a weighted average of the i.i.d. sampling matrices \(\frac{\mathbf{I }_i^\top \mathbf{I }_i}{\Vert \mathbf{A }_i\Vert ^2}\), we see that

$$\begin{aligned} \mathbb {E}_{k}\left[ \mathbf{M }_k\right] = \mathbb {E}_{k}\left[ \frac{1}{q}\sum _{i\in \tau _k} w_i \frac{\mathbf{I }_i^\top \mathbf{I }_i}{\Vert \mathbf{A }_i\Vert ^2}\right] = { \mathbb {E}}_{i}\left[ w_i \frac{\mathbf{I }_i^\top \mathbf{I }_i}{\Vert \mathbf{A }_i\Vert ^2}\right] = \sum _{i=1}^m p_i w_i \frac{\mathbf{I }_i^\top \mathbf{I }_i}{\Vert \mathbf{A }_i\Vert ^2} = \mathbf{P }\mathbf{W }\mathbf{D }^{-2}. \end{aligned}$$

Likewise, we can compute

$$\begin{aligned}&\mathbb {E}_{k}\left[ \mathbf{M }_k^\top \mathbf{A }\mathbf{A }^\top \mathbf{M }_k\right] \\&= \mathbb {E}_{k}\left[ \left( \frac{1}{q}\sum _{i\in \tau _k} w_i \frac{\mathbf{I }_i^\top \mathbf{A }_i}{\Vert \mathbf{A }_i\Vert ^2}\right) \left( \frac{1}{q}\sum _{j\in \tau _k} w_j \frac{\mathbf{A }_j^\top \mathbf{I }_j}{\Vert \mathbf{A }_j\Vert ^2}\right) \right] \\&= \frac{1}{q} { \mathbb {E}}_{i}\left[ \left( w_i \frac{\mathbf{I }_i^\top \mathbf{A }_i}{\Vert \mathbf{A }_i\Vert ^2}\right) \left( w_i \frac{\mathbf{A }_i^\top \mathbf{I }_i}{\Vert \mathbf{A }_i\Vert ^2}\right) \right] + (1 - \frac{1}{q}) { \mathbb {E}}_{i}\left[ w_i \frac{\mathbf{I }_i^\top \mathbf{A }_i}{\Vert \mathbf{A }_i\Vert ^2}\right] { \mathbb {E}}_{i}\left[ w_i \frac{\mathbf{A }_i^\top \mathbf{I }_i}{\Vert \mathbf{A }_i\Vert ^2}\right] \\&= \frac{1}{q} { \mathbb {E}}_{i}\left[ \left( w_i \frac{\mathbf{I }_i^\top \mathbf{A }_i}{\Vert \mathbf{A }_i\Vert ^2}\right) \left( w_i \frac{\mathbf{A }_i^\top \mathbf{I }_i}{\Vert \mathbf{A }_i\Vert ^2}\right) \right] + (1 - \frac{1}{q}) { \mathbb {E}}_{i}\left[ w_i \frac{\mathbf{I }_i^\top \mathbf{I }_i}{\Vert \mathbf{A }_i\Vert ^2}\right] \mathbf{A }\mathbf{A }^\top { \mathbb {E}}_{i}\left[ w_i \frac{\mathbf{I }_i^\top \mathbf{I }_i}{\Vert \mathbf{A }_i\Vert ^2}\right] \\&= \frac{1}{q} { \mathbb {E}}_{i}\left[ w_i^2 \frac{\mathbf{I }_i^\top \mathbf{I }_i}{\Vert \mathbf{A }_i\Vert ^2}\right] + \left( 1 - \frac{1}{q}\right) \mathbf{P }\mathbf{W }\mathbf{D }^{-2} \mathbf{A }\mathbf{A }^\top \mathbf{P }\mathbf{W }\mathbf{D }^{-2}\\&= \frac{1}{q} \mathbf{P }\mathbf{W }^2 \mathbf{D }^{-2}+ \left( 1 - \frac{1}{q}\right) \mathbf{P }\mathbf{W }\mathbf{D }^{-2} \mathbf{A }\mathbf{A }^\top \mathbf{P }\mathbf{W }\mathbf{D }^{-2} \end{aligned}$$

by separating the cases where \(i=j\) from those where \(i \ne j\) and utilizing the independence of the indices sampled in \(\tau _k\).

B Proof of Theorem 1

We now prove Theorem 1 starting from the error update in Eq. (6). Expanding the squared error norm,

$$\begin{aligned} \Vert e^{k+1}\Vert ^2&= \Vert (\mathbf{I }- \mathbf{A }^\top \mathbf{M }_k \mathbf{A })e^{k} + \mathbf{A }^\top \mathbf{M }_k r^\star \Vert ^2\nonumber \\&= \Vert (\mathbf{I }- \mathbf{A }^\top \mathbf{M }_k \mathbf{A })e^{k}\Vert ^2 + 2\langle (\mathbf{I }- \mathbf{A }^\top \mathbf{M }_k \mathbf{A })e^{k} , \mathbf{A }^\top \mathbf{M }_k r^\star \rangle + \Vert \mathbf{A }^\top \mathbf{M }_k r^\star \Vert ^2. \end{aligned}$$
(11)

Under Assumption 1, the expectations in Lemma 1 simplify to

$$\begin{aligned} \mathbb {E}_{k}\left[ \mathbf{M }_k\right] = \frac{\alpha }{\Vert \mathbf{A }\Vert _F^2} \mathbf{I }\end{aligned}$$

and

$$\begin{aligned} \mathbb {E}_{k}\left[ \mathbf{M }_k^\top \mathbf{A }\mathbf{A }^\top \mathbf{M }_k\right] = \frac{\alpha }{q}\frac{\mathbf{W }}{\Vert \mathbf{A }\Vert _F^2} + \alpha ^2 \left( 1 - \frac{1}{q}\right) \frac{\mathbf{A }\mathbf{A }^\top }{\Vert \mathbf{A }\Vert _F^4}. \end{aligned}$$

Upon taking expections on both sides of Eq. (11), the middle term simplifies since

$$\begin{aligned} \mathbb {E}_{k}\left[ \langle e^{k} , \mathbf{A }^\top \mathbf{M }_k r^\star \rangle \right] = \langle e^{k} , \mathbf{A }^\top \mathbb {E}_{k}\left[ \mathbf{M }_k\right] r^\star \rangle = \langle e^{k} , \frac{\alpha }{\Vert \mathbf{A }\Vert _F^2}\mathbf{A }^\top r^\star \rangle =0. \end{aligned}$$

Thus,

$$\begin{aligned} \begin{aligned}&\mathbb {E}_{k}\left[ \Vert e^{k+1}\Vert ^2\right] \\&\quad = \underbrace{\mathbb {E}_{k}\left[ \Vert (\mathbf {I }- \mathbf {A }^\top \mathbf {M }_k \mathbf {A })e^{k}\Vert ^2\right] }_{\textcircled {1}} - \underbrace{2\mathbb {E}_{k}\left[ \langle \mathbf {A }^\top \mathbf {M }_k \mathbf {A }e^{k} , \mathbf {A }^\top \mathbf {M }_k r^\star \rangle \right] }_{\textcircled {1}} + \underbrace{\mathbb {E}_{k}\left[ \Vert \mathbf {A }^\top \mathbf {M }_k r^\star \Vert ^2\right] }_{\textcircled {1}}. \end{aligned} \end{aligned}$$
(12)

Making use of Lemma 1 to take the expectation in the first term in Eq. (12),

$$\begin{aligned}&\textcircled {1} =\mathbb {E}_{k}\left[ \Vert (\mathbf {I }- \mathbf {A }^\top \mathbf {M }_k \mathbf {A })e^{k}\Vert ^2 \right] \\ {}&\quad = \mathbb {E}_{k}\left[ \bigg \langle e^k, (\mathbf {I }- \mathbf {A }^\top \mathbf {M }_k \mathbf {A })^\top (\mathbf {I }- \mathbf {A }^\top \mathbf {M }_k \mathbf {A })e^{k} \bigg \rangle \right] \\ {}&\quad = \bigg \langle e^k, (\mathbf {I }- 2\mathbf {A }^\top \mathbb {E}_{k}\left[ \mathbf {M }_k\right] \mathbf {A }+ \mathbf {A }^\top \mathbb {E}_{k}\left[ \mathbf {M }_k^\top \mathbf {A }\mathbf {A }^\top \mathbf {M }_k\right] \mathbf {A })e^{k} \bigg \rangle \\ {}&\quad \overset{lem.1}{=} \bigg \langle e^k, \left( \mathbf {I }- 2\alpha \frac{\mathbf {A }^\top \mathbf {A }}{\Vert \mathbf {A }\Vert _F^2} + \frac{\alpha }{q}\frac{\mathbf {A }^\top \mathbf {W }\mathbf {A }}{\Vert \mathbf {A }\Vert _F^2} + \alpha ^2\left( 1 - \frac{1}{q}\right) \left( \frac{\mathbf {A }^\top \mathbf {A }}{\Vert \mathbf {A }\Vert _F^2}\right) ^2 \right) e^{k}\bigg \rangle \\ {}&\quad = \bigg \langle e^k, \left( \left( \mathbf {I }- \alpha \frac{\mathbf {A }^\top \mathbf {A }}{\Vert \mathbf {A }\Vert _F^2}\right) ^2 + \frac{\mathbf {A }^\top }{\Vert \mathbf {A }\Vert _F} \left( \frac{\alpha }{q} \mathbf {W }- \frac{\alpha ^2}{q} \frac{\mathbf {A }\mathbf {A }^\top }{\Vert \mathbf {A }\Vert _F^2}\right) \frac{\mathbf {A }}{\Vert \mathbf {A }\Vert _F} \right) e^{k}\bigg \rangle . \end{aligned}$$

For the second term in Eq. 12,

$$\begin{aligned}&{\textcircled {2}}=2\mathbb {E}_{k}\left[ \langle \mathbf {A }^\top \mathbf {M }_k \mathbf {A }e^{k} , \mathbf {A }^\top \mathbf {M }_k r^\star \rangle \right] \\ {}&\quad = 2\langle \mathbf {A }e^{k} , \mathbb {E}_{k}\left[ \mathbf {M }_k^\top \mathbf {A }\mathbf {A }^\top \mathbf {M }_k \right] r^\star \rangle \\ {}&\quad \overset{lem.1}{=} 2\langle \mathbf {A }e^{k} , \left( \frac{\alpha }{q}\mathbf {W }+ \alpha ^2 \left( 1 - \frac{1}{q}\right) \mathbf {A }\mathbf {A }^\top \right) r^\star \rangle \\ {}&\quad \overset{\mathbf {A }^\top r^\star = 0}{=} 2\frac{\alpha }{q\Vert \mathbf {A }\Vert _F^2}\langle \mathbf {A }e^k , \mathbf {W }r^\star \rangle .\\ \end{aligned}$$

Similarly, for the last term in Eq. (12),

$$\begin{aligned} \textcircled {3}= \mathbb {E}_{k}\left[ \Vert \mathbf {A }^\top \mathbf {M }_k r^\star \Vert ^2\right] = \frac{\alpha }{q}\frac{\Vert r^\star \Vert _\mathbf {W }^2}{\Vert \mathbf {A }\Vert _F^2} . \end{aligned}$$

Combining these in Eq. (12),

$$\begin{aligned} {\mathbb {E}}\left[ \Vert e^{k+1}\Vert ^2\right]&= \bigg \langle e^k , \left( \mathbf{I }- \alpha \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) ^2e^{k} \bigg \rangle \\&\quad +\bigg \langle e^k , \frac{\mathbf{A }^\top }{\Vert \mathbf{A }\Vert _F^2} \left( \frac{\alpha }{q} \mathbf{W }- \frac{\alpha ^2}{q} \frac{\mathbf{A }\mathbf{A }^\top }{\Vert \mathbf{A }\Vert _F^2}\right) \mathbf{A }e^{k} \bigg \rangle -2\frac{\alpha }{q} \frac{\langle \mathbf{A }e^k , \mathbf{W }r^\star \rangle }{\Vert \mathbf{A }\Vert _F^2} + \frac{\alpha }{q}\frac{\Vert r^\star \Vert _\mathbf{W }^2}{\Vert \mathbf{A }\Vert _F^2}\\&= \bigg \langle e^k , \left( \left( \mathbf{I }- \alpha \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) ^2 - \frac{\alpha ^2}{q} \left( \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) ^2\right) e^{k} \bigg \rangle + \frac{\alpha }{q}\frac{\Vert r^k\Vert _\mathbf{W }^2}{\Vert \mathbf{A }\Vert _F^2} \\&\le \sigma _{\max }\left( \left( \mathbf{I }- \alpha \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) ^2 - \frac{\alpha ^2}{q} \left( \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) ^2\right) \Vert e^{k}\Vert ^2 + \frac{\alpha }{q}\frac{ \Vert r^k\Vert _\mathbf{W }^2}{\Vert \mathbf{A }\Vert _F^2}. \end{aligned}$$

C Proof of Theorem 2

Proof

We seek to optimize the convergence rate constant from Corollary 1 when using uniform weights \(\mathbf{W }= \alpha \mathbf{I }\),

$$\begin{aligned} \sigma _{\max }\left( \left( \mathbf{I }- \alpha \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) ^2 + \frac{\alpha ^2 }{q}\left( \mathbf{I }- \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) \end{aligned}$$
(13)

with respect to \(\alpha \). To do this, we first simplify from a matrix polynomial to a maximum over scalar polynomials in \(\alpha \) with coefficients based on each singular value of \(\mathbf{A }\). We then show that the maximum occurs when either the minimum or maximum singular value of \(\mathbf{A }\) is used. Finally, we derive a condition for which singular value to use, and determine the optimal \(\alpha \) that minimizes the maximum singular value.

Defining \(\mathbf{Q }^\top \varvec{\Sigma } \mathbf{Q }= \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\) as the eigendecomposition, and the polynomial

$$\begin{aligned} p(\sigma ) \overset{\text {def}}{=}1-2\alpha \sigma +\alpha ^2\left( \frac{\sigma }{q} +\left( 1-\frac{1}{q}\right) \sigma ^2\right) , \end{aligned}$$

the convergence rate constant from Eq. (13) can be written as \(\sigma _{\max }\left( p\left( \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) \right) \). Since \(p\left( \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) \) is a polynomial of a symmetric matrix, its singular vectors are the same as those of its argument, while its corresponding singular values are the polynomial p applied to the singular values of the original matrix. That is,

$$\begin{aligned} p\left( \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) = p\left( \mathbf{Q }^\top \varvec{\Sigma } \mathbf{Q }\right) = \mathbf{Q }^\top p\left( \varvec{\Sigma } \right) \mathbf{Q }. \end{aligned}$$

Thus, the convergence rate constant can be written as

$$\begin{aligned} \sigma _{\max }\left( p\left( \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) \right) = \sigma _{\max }\left( p(\varvec{\Sigma })\right) . \end{aligned}$$

Moreover, we can bound this extremal singular value by the maximum of the polynomial p over an interval containing the spectrum of \(\varvec{\Sigma }\)

$$\begin{aligned} \sigma _{\max }\left( p\left( \varvec{\Sigma }\right) \right) \le \max \left| p\left( \sigma \right) \right| \quad \text {subject to} \quad \sigma \in \left[ s_{\min }, s_{\max }\right] . \end{aligned}$$

Here, the singular values of \(\varvec{\Sigma }\) are bounded from below by \(s_{\min }\overset{\text {def}}{=}\frac{\sigma ^2_{\min }(\mathbf{A })}{\Vert \mathbf{A }\Vert _F^2}\) and above by \(s_{\max }\overset{\text {def}}{=}\frac{\sigma ^2_{\max }(\mathbf{A })}{\Vert \mathbf{A }\Vert _F^2}\) since \(\varvec{\Sigma }\) is the diagonal matrix of singular values of \(\frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\). Note that the polynomial can be factored as \(p(\sigma ) = (1-\sigma \alpha )^2+\frac{\sigma \alpha ^2}{q}\left( 1-\sigma \right) \), and is positive for \(\sigma \in \left[ 0, 1\right] \), which contains \(\left[ s_{\min }, s_{\max }\right] \). Also, since the coefficient of the \(\sigma ^2\) term of the polynomial p is \(\alpha ^2 \left( 1 - \frac{1}{q} \right) \) which is greater than or equal to zero, the polynomial is convex in \(\sigma \) on the interval \(\left[ s_{\min }, s_{\max }\right] \). Thus, the maximum of p on the interval \(\left[ s_{\min }, s_{\max }\right] \) is attained at one of the two endpoints \(s_{\min }, s_{\max }\) and we have the bound

$$\begin{aligned} \sigma _{\max }\left( p\left( \varvec{\Sigma }\right) \right) = \max \left( p\left( s_{\min }\right) , p\left( s_{\max }\right) \right) . \end{aligned}$$

To optimize this bound with respect to \(\alpha \), we first find conditions on \(\alpha \) such that \(p(s_{\min }) < p(s_{\max })\). If \(s_{\max }=s_{\min }\), this obviously never holds; otherwise, \(s_{\max }>s_{\min }\) and

$$\begin{aligned} p(s_{\min })&< p(s_{\max })\\ 1-2\alpha s_{\min }+\alpha ^2\left[ \frac{s_{\min }}{q} +\left( 1-\frac{1}{q}\right) s_{\min }^2\right]&< 1-2\alpha s_{\max }+\alpha ^2\left[ \frac{s_{\max }}{q} +\left( 1-\frac{1}{q}\right) s_{\max }^2\right] \end{aligned}$$

Grouping like terms and cancelling, we get

$$\begin{aligned} \alpha \left( 2 - \frac{\alpha }{q} \right) \left( s_{\max } - s_{\min }\right)&< \alpha ^2\left( 1 - \frac{1}{q}\right) \left( s_{\max }^2 - s_{\min }^2\right) \end{aligned}$$

Since \(\frac{\alpha }{q}>0\), we can divide it from both sides.

$$\begin{aligned} \left( 2q- \alpha \right) \left( s_{\max } - s_{\min }\right)&< \alpha \left( q- 1\right) \left( s_{\max }^2 - s_{\min }^2\right) \end{aligned}$$

Since \(s_{\max } > s_{\min }\), we can divide both sides by \(s_{\max }-s_{\min }\).

$$\begin{aligned} 2 q- \alpha&< \alpha \left( q-1\right) \left( s_{\max }+s_{\min }\right) \\ 2q&< \alpha \left( 1+\left( q-1\right) \left( s_{\max }+s_{\min }\right) \right) \end{aligned}$$

and since the number of threads \(q\ge 1\), we can divide both sides by \(1 + \left( q-1\right) \left( s_{\min } + s_{\max }\right) \) to get

$$\begin{aligned} \alpha > \frac{2 q}{1 + \left( q-1\right) \left( s_{\min } + s_{\max }\right) } \overset{\text {def}}{=}\widehat{\alpha }. \end{aligned}$$

Thus,

$$\begin{aligned} \sigma _{\max }\left( p\left( \varvec{\Sigma }\right) \right) = {\left\{ \begin{array}{ll} p\left( s_{\max }\right) , &{}\quad \alpha > \widehat{\alpha } \\ p\left( s_{\min }\right) , &{}\quad \alpha \le \widehat{\alpha } \end{array}\right. } \end{aligned}$$

For the first term,

$$\begin{aligned} \frac{\partial }{\partial \alpha } p(s_{\max })&= -2s_{\max } + 2 \left( \frac{s_{\max }}{q} + \left( 1-\frac{1}{q}\right) s_{\max }^2\right) \alpha \\&> -2s_{\max } + 2\left( \frac{s_{\max }}{q} + \left( 1-\frac{1}{q}\right) s_{\max }^2\right) \widehat{\alpha } \end{aligned}$$

since \(\alpha > \widehat{\alpha }\) and the coefficient is positive. Factoring \(\frac{2 s_{\max }}{q}\) from the second term and substituting for \(\widehat{\alpha }\), we get

$$\begin{aligned}&= -2s_{\max } + \frac{2s_{\max }}{q}\left( 1+ \left( q-1\right) s_{\max }\right) \widehat{\alpha } \\&= -2s_{\max } + \frac{2s_{\max }}{q}\left( 1+ \left( q-1\right) s_{\max }\right) \frac{2 q}{1 + \left( 1 - q\right) \left( s_{\min } + s_{\max }\right) } \\&= -2 s_{\max } + 2 s_{\max }\frac{2 \left( 1+(q-1)s_{\max }\right) }{1+(q-1)(s_{\max }+s_{\min })} \\&= 2 s_{\max } \left[ -1 + \frac{2 \left( 1+(q-1)s_{\max }\right) }{1+(q-1)(s_{\max }+s_{\min })}\right] \\&= 2 s_{\max } \left[ \frac{ 1+(q-1)(s_{\max }-s_{\min })}{1+(q-1)(s_{\max }+s_{\min })}\right] \\&> 0 \end{aligned}$$

since all terms in both numerator and denominator are positive. Thus, the function is monotonic increasing on \(\alpha \in [\widehat{\alpha },\infty )\), and the minimum is at the lower endpoint, i.e. \(\alpha ^\star = \widehat{\alpha }\).

Similarly, for the second term, \(\alpha \le \widehat{\alpha }\) and

$$\begin{aligned} \frac{\partial }{\partial \alpha } p(s_{\min })&= -2s_{\min } + 2 \left( \frac{s_{\min }}{q} + \left( 1-\frac{1}{q}\right) s_{\min }^2\right) \alpha \\&\le -2s_{\min } + 2\left( \frac{s_{\min }}{q} + \left( 1-\frac{1}{q}\right) s_{\min }^2\right) \widehat{\alpha } \\&= 2 s_{\min } \left[ \frac{ 1-(q-1)(s_{\max }-s_{\min })}{1+(q-1)(s_{\max }+s_{\min })}\right] \end{aligned}$$

If

$$\begin{aligned} 1-(q-1)(s_{\max }-s_{\min })<0, \end{aligned}$$
(14)

this function is monotonic decreasing on \(\alpha \in (-\infty ,\alpha ^\star ]\), and the minimum is at the upper endpoint i.e. \(\alpha = \alpha ^\star \). Otherwise, since \(p(s_{\min })\) is quadratic in \(\alpha \) with positive leading coefficient, the minimum occurs at the critical point, so we set the derivative to 0 and solve for \(\alpha ^\star \)

$$\begin{aligned} \frac{\partial }{\partial \alpha } p(s_{\min })&= -2s_{\min } + 2 \left( \frac{s_{\min }}{q} + \left( 1-\frac{1}{q}\right) s_{\min }^2\right) \alpha ^\star \\&= -2s_{\min } + \frac{2s_{\min }}{q}\left( 1+ \left( q-1\right) s_{\min }\right) \alpha ^\star \\&= 0 \\ \frac{2s_{\min }}{q}\left( 1 + \left( q-1\right) s_{\min }\right) \alpha ^\star&= 2s_{\min } \\ \alpha ^\star&= \frac{q}{1+(q-1)s_{\min }} \end{aligned}$$

D Corollary Proofs

We provide proofs for the corollaries of Sect. 2, which follow from Theorem 1.

1.1 D.1 Proof of Corollary 2

Suppose \(p_i = \frac{\Vert \mathbf{A }_i\Vert ^2}{\Vert \mathbf{A }\Vert _F^2}\) and \(\mathbf{W }= \alpha \mathbf{I }\). From the proof of Theorem 1,

$$\begin{aligned} \mathbb {E}_{k}\left[ \Vert e^{k+1}\Vert ^2\right]&= \bigg \langle e^k , \left( \left( \mathbf{I }- \alpha \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) ^2 - \frac{\alpha ^2}{q} \left( \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) ^2\right) e^{k} \bigg \rangle + \frac{\alpha }{q}\frac{\Vert r^k\Vert _\mathbf{W }^2}{\Vert \mathbf{A }\Vert _F^2}. \end{aligned}$$

In this case, since \(\mathbf{A }^\top r^\star = 0\), \(\langle \mathbf{A }e^k , r^\star \rangle = 0\) and

$$\begin{aligned} \Vert r^k\Vert _\mathbf{W }^2&= \alpha \Vert \mathbf{A }e^k\Vert ^2 + 2 \alpha \langle \mathbf{A }e^k , r^\star \rangle + \alpha \Vert r^\star \Vert ^2 \\&= \alpha \langle e^k , \mathbf{A }^\top \mathbf{A }e^k \rangle + \alpha \Vert r^\star \Vert ^2. \end{aligned}$$

Combining the inner products,

$$\begin{aligned} \mathbb {E}_{k}\left[ \Vert e^{k+1}\Vert ^2\right]&= \bigg \langle e^k,\left( \left( \mathbf{I }- \alpha \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) ^2 + \frac{\alpha ^2}{q} \left( \mathbf{I }- \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) e^{k}\bigg \rangle + \frac{\alpha ^2\Vert r^\star \Vert ^2}{q\Vert \mathbf{A }\Vert _F^2}\\&\le \sigma _{\max }\left( \left( \mathbf{I }- \alpha \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) ^2 + \frac{\alpha ^2 }{q}\left( \mathbf{I }- \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) \Vert e^{k}\Vert ^2 + \frac{\alpha ^2\Vert r^\star \Vert ^2}{q\Vert \mathbf{A }\Vert _F^2} . \end{aligned}$$

1.2 D.2 Proof of Corollary 3

Suppose \(q= 1\), \(\mathbf{W }=\mathbf{I }\) and \(p_i = \frac{\Vert \mathbf{A }_i\Vert ^2}{\Vert \mathbf{A }\Vert _F^2}\).

$$\begin{aligned} \mathbb {E}_{k}\left[ \Vert e^{k+1}\Vert ^2\right]&\le \sigma _{\max }\left( \mathbf{I }- \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) \Vert e^{k}\Vert ^2 + \frac{ \Vert r^\star \Vert ^2}{\Vert \mathbf{A }\Vert _F^2} \\&= \left( 1 - \frac{\sigma ^2_{\min }(\mathbf{A })}{\Vert \mathbf{A }\Vert _F^2}\right) \Vert e^{k}\Vert ^2 + \frac{ \Vert r^\star \Vert ^2}{\Vert \mathbf{A }\Vert _F^2}. \end{aligned}$$

From the proof of Theorem 1,

$$\begin{aligned} \mathbb {E}_{k}\left[ \Vert e^{k+1}\Vert ^2\right]&= \bigg \langle e^k , \left( \left( \mathbf{I }- \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) ^2 - \left( \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) ^2\right) e^{k} \bigg \rangle + \frac{\Vert r^k\Vert ^2}{\Vert \mathbf{A }\Vert _F^2}. \end{aligned}$$

Decomposing \(r^k\),

$$\begin{aligned} \Vert r^k\Vert ^2&= \Vert \mathbf{A }e^k\Vert ^2 + \Vert r^\star \Vert ^2 \\&=\langle e^k , \mathbf{A }^\top \mathbf{A }e^k \rangle + \Vert r^\star \Vert ^2 . \end{aligned}$$

Combining the inner products,

$$\begin{aligned} \mathbb {E}_{k}\left[ \Vert e^{k+1}\Vert ^2\right]&= \bigg \langle e^k, \left( \left( \mathbf{I }- \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) ^2 -\left( \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2} \right) ^2+ \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) e^{k}\bigg \rangle + \frac{\Vert r^\star \Vert ^2}{\Vert \mathbf{A }\Vert _F^2} \\&= \bigg \langle e^k , \left( \mathbf{I }- \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) e^{k} \bigg \rangle + \frac{\Vert r^\star \Vert ^2}{\Vert \mathbf{A }\Vert _F^2}\\&\le \sigma _{\max }\left( \mathbf{I }- \frac{\mathbf{A }^\top \mathbf{A }}{\Vert \mathbf{A }\Vert _F^2}\right) \Vert e^{k}\Vert ^2 + \frac{ \Vert r^\star \Vert ^2}{\Vert \mathbf{A }\Vert _F^2}\\&= \left( 1 - \frac{\sigma ^2_{\min }(\mathbf{A })}{\Vert \mathbf{A }\Vert _F^2}\right) \Vert e^{k}\Vert ^2 + \frac{ \Vert r^\star \Vert ^2}{\Vert \mathbf{A }\Vert _F^2}. \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Moorman, J.D., Tu, T.K., Molitor, D. et al. Randomized Kaczmarz with averaging. Bit Numer Math 61, 337–359 (2021). https://doi.org/10.1007/s10543-020-00824-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10543-020-00824-1

Keywords

Mathematics Subject Classification

Navigation