Skip to main content
Log in

Randomized Kaczmarz for tensor linear systems

  • Published:
BIT Numerical Mathematics Aims and scope Submit manuscript

Abstract

Solving linear systems of equations is a fundamental problem in mathematics. When the linear system is so large that it cannot be loaded into memory at once, iterative methods such as the randomized Kaczmarz method excel. Here, we extend the randomized Kaczmarz method to solve multi-linear (tensor) systems under the tensor–tensor t-product. We present convergence guarantees for tensor randomized Kaczmarz in two ways: using the classical matrix randomized Kaczmarz analysis and taking advantage of the tensor–tensor t-product structure. We demonstrate experimentally that the tensor randomized Kaczmarz method converges faster than traditional randomized Kaczmarz applied to a naively matricized version of the linear system. In addition, we draw connections between the proposed algorithm and a previously known extension of the randomized Kaczmarz algorithm for matrix linear systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. While the randomized Kaczmarz literature typically abbreviates randomized Kaczmarz as RK, throughout this work, MRK is used to distinguish the matrix and tensor versions of randomized Kaczmarz.

  2. These linear systems are typically written as linear systems with block circulant matrices which are equivalent to the t-product as discussed in Definition 2.2.

References

  1. Agmon, S.: The relaxation method for linear inequalities. Can. J. Math. 6, 382–392 (1954)

    Article  MathSciNet  Google Scholar 

  2. Ahn, C.H., Jeong, B.S., Lee, S.Y.: Efficient hybrid finite element-boundary element method for 3-dimensional open-boundary field problems. IEEE Trans. Magn. 27, 4069–4072 (1991)

    Article  Google Scholar 

  3. Bone and joint ct-scan data. https://isbweb.org/data/vsj/

  4. Censor, Y.: Row-action methods for huge and sparse systems and their applications. SIAM Rev. 23(4), 444–466 (1981)

    Article  MathSciNet  Google Scholar 

  5. Czuprynski, K.D., Fahnline, J.B., Shontz, S.M.: Parallel boundary element solutions of block circulant linear systems for acoustic radiation problems with rotationally symmetric boundary surfaces. In: INTER-NOISE and NOISE-CON Congress and Conference Proceedings, vol. 2012, pp. 2812–2823. Institute of Noise Control Engineering (2012)

  6. De Loera, J.A., Haddock, J., Needell, D.: A sampling Kaczmarz–Motzkin algorithm for linear feasibility. SIAM J. Sci. Comput. 39(5), S66–S87 (2017)

    Article  MathSciNet  Google Scholar 

  7. Drineas, P., Mahoney, M.W.: Randnla: randomized numerical linear algebra. Commun. ACM 59(6), 80–90 (2016)

    Article  Google Scholar 

  8. Finding his Voice. Western Electric Company (1929). https://archive.org/details/FindingH1929

  9. Elfving, T.: Block-iterative methods for consistent and inconsistent linear equations. Numer. Math. 35(1), 1–12 (1980)

    Article  MathSciNet  Google Scholar 

  10. Gower, R.M., Richtárik, P.: Randomized iterative methods for linear systems. SIAM J. Matrix Anal. Appl. 36(4), 1660–1690 (2015)

    Article  MathSciNet  Google Scholar 

  11. Haddock, J., Needell, D.: On Motzkin’s method for inconsistent linear systems. BIT 59(2), 387–401 (2019)

    Article  MathSciNet  Google Scholar 

  12. Hao, N., Kilmer, M.E., Braman, K., Hoover, R.C.: Facial recognition using tensor–tensor decompositions. SIAM J. Imaging Sci. 6(1), 437–463 (2013)

    Article  MathSciNet  Google Scholar 

  13. Kaczmarz, M.S.: Angenäherte auflösung von systemen linearer gleichungen. Bull. Acad. Polonaise Sci. Lett. 35, 355–357 (1937)

    MATH  Google Scholar 

  14. Kernfeld, E., Kilmer, M., Aeron, S.: Tensor–tensor products with invertible linear transforms. Linear Algebra Appl. 485, 545–570 (2015)

    Article  MathSciNet  Google Scholar 

  15. Kilmer, M.E., Braman, K., Hao, N., Hoover, R.C.: Third-order tensors as operators on matrices: a theoretical and computational framework with applications in imaging. SIAM J. Matrix Anal. Appl. 34(1), 148–172 (2013)

    Article  MathSciNet  Google Scholar 

  16. Kilmer, M.E., Martin, C.D.: Factorization strategies for third-order tensors. Linear Algebra Appl. 435(3), 641–658 (2011)

    Article  MathSciNet  Google Scholar 

  17. Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 42(8), 30–37 (2009)

    Article  Google Scholar 

  18. Liu, Z., Zhao, H.V., Elezzabi, A.Y.: Block-based adaptive compressed sensing for video. In: IEEE Image Proc., pp. 1649–1652. IEEE (2010)

  19. Lund, K.: The tensor t-function: a definition for functions of third-order tensors. Numer. Linear Algebra Appl. 27(3), e2288 (2020)

    Article  MathSciNet  Google Scholar 

  20. Ma, A., Needell, D., Ramdas, A.: Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methods. SIAM J. Matrix Anal. A 36(4), 1590–1604 (2015)

    Article  MathSciNet  Google Scholar 

  21. Majumdar, A., Ward, R.K.: Face recognition from video: an MMV recovery approach. In: Int. Conf. Acoust. Spee, pp. 2221–2224. IEEE (2012)

  22. Miao, Y., Qi, L., Wei, Y.: Generalized tensor function via the tensor singular value decomposition based on the t-product. Linear Algebra Its Appl. 590, 258–303 (2020)

    Article  MathSciNet  Google Scholar 

  23. Motzkin, T.S., Schoenberg, I.J.: The relaxation method for linear inequalities. Can. J. Math 6, 393–404 (1954)

    Article  MathSciNet  Google Scholar 

  24. Needell, D.: Randomized Kaczmarz solver for noisy linear systems. BIT 50(2), 395–403 (2010)

    Article  MathSciNet  Google Scholar 

  25. Needell, D., Tropp, J.A.: Paved with good intentions: analysis of a randomized block Kaczmarz method. Linear Algebra Appl. 441, 199–221 (2014)

    Article  MathSciNet  Google Scholar 

  26. Needell, D., Ward, R., Srebro, N.: Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm. Adv. Neural Inf. Process. Syst. 27, 1017–1025 (2014)

    MATH  Google Scholar 

  27. Newman, E., Horesh, L., Avron, H., Kilmer, M.: Stable tensor neural networks for rapid deep learning. arXiv preprint arXiv:1811.06569 (2018)

  28. Nutini, J., Sepehry, B., Virani, A., Laradji, I., Schmidt, M., Koepke, H.: Convergence rates for greedy Kaczmarz algorithms. In: UAI (2016)

  29. Petra, S., Popa, C.: Single projection Kaczmarz extended algorithms. Numer. Algorithms 73(3), 791–806 (2016)

    Article  MathSciNet  Google Scholar 

  30. Richtárik, P., Takác, M.: Stochastic reformulations of linear systems: algorithms and convergence theory. SIAM J. Matrix Anal. Appl. 41(2), 487–524 (2020)

    Article  MathSciNet  Google Scholar 

  31. Semerci, O., Hao, N., Kilmer, M.E., Miller, E.L.: Tensor-based formulation and nuclear norm regularization for multienergy computed tomography. IEEE Trans. Image Process. 23(4), 1678–1693 (2014)

    Article  MathSciNet  Google Scholar 

  32. Soltani, S., Kilmer, M.E., Hansen, P.C.: A tensor-based dictionary learning approach to tomographic image reconstruction. BIT 56(4), 1425–1454 (2016)

    Article  MathSciNet  Google Scholar 

  33. Song, G., Ng, M.K., Zhang, X.: Robust tensor completion using transformed tensor singular value decomposition. Numer. Linear Algebra Appl. 27(3), e2299 (2020). https://doi.org/10.1002/nla.2299

    Article  MathSciNet  MATH  Google Scholar 

  34. Strohmer, T., Vershynin, R.: A randomized Kaczmarz algorithm with exponential convergence. J. Fourier Anal. Appl. 15(2), 262 (2009)

    Article  MathSciNet  Google Scholar 

  35. Vescovo, R.: Electromagnetic scattering from cylindrical arrays of infinitely long thin wires. Electron. Lett. 31(19), 1646–1647 (1995)

    Article  Google Scholar 

  36. Wang, X., Che, M., Wei, Y.: Tensor neural network models for tensor singular value decompositions. Comput. Optim. Appl. 75(3), 753–777 (2020)

    Article  MathSciNet  Google Scholar 

  37. Zhang, Z., Aeron, S.: Denoising and completion of 3d data via multidimensional dictionary learning. In: Int. Join. Conf. Artif., pp. 2371–2377 (2016)

  38. Zhang, Z., Aeron, S.: Exact tensor completion using t-SVD. IEEE Trans. Signal Process. 65(6), 1511–1526 (2017)

    Article  MathSciNet  Google Scholar 

  39. Zhang, Z., Ely, G., Aeron, S., Hao, N., Kilmer, M.: Novel methods for multilinear data completion and de-noising based on tensor-SVD. In: CVPR, pp. 3842–3849. IEEE (2014)

  40. Zhou, P., Lu, C., Lin, Z., Zhang, C.: Tensor factorization for low-rank tensor completion. IEEE Trans. Image Process. 27(3), 1152–1163 (2018)

    Article  MathSciNet  Google Scholar 

  41. Zouzias, A., Freris, N.M.: Randomized extended Kaczmarz for solving least squares. SIAM J. Matrix Anal. Appl. 34(2), 773–793 (2013)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anna Ma.

Additional information

Communicated by Daniel Kressner.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work began at the 2019 workshop for Women in Science of Data and Math (WISDM) held at the Institute for Computational and Experimental Research in Mathematics (ICERM). This workshop is partially supported by an NSF ADVANCE grant (award #1500481) to the Association for Women in Mathematics (AWM). Ma was partially supported by U.S. Air Force Award FA9550-18-1-0031 led by Roman Vershynin. Molitor is grateful to and was partially supported by NSF CAREER DMS \(\#1348721\) and NSF BIGDATA DMS \(\#1740325\) led by Deanna Needell. The authors would also like to thank Misha Kilmer for her advising during the WISDM workshop and valuable feedback that improved earlier versions of this manuscript.

Appendices

Proof of Fact 1

The following properties of block circulant matrices will be useful in proving Fact 1.

Fact 2

(Lemma 1.iii [19]) For tensors \({{\mathscr {A}}}\) and \({{\mathscr {B}}}\), \(\text {bcirc}\left( {{\mathscr {A}}}{{\mathscr {B}}}\right) = \text {bcirc}\left( {{\mathscr {A}}}\right) \text {bcirc}\left( {{\mathscr {B}}}\right) \).

Fact 3

(Theorem 6.ii [19]) The block circulant operator \(\text {bcirc}\left( \cdot \right) \) commutes with the conjugate transpose,

$$\begin{aligned} \text {bcirc}\left( {{\mathscr {M}}}^*\right) = \text {bcirc}\left( {{\mathscr {M}}}\right) ^*. \end{aligned}$$

1. Part 1 of Fact 1 states

$$\begin{aligned} \text {bdiag}\left( \widehat{{{\mathscr {A}}}{{\mathscr {B}}}}\right) = \text {bdiag}\left( {\widehat{{{\mathscr {A}}}}}\right) \text {bdiag}\left( {\widehat{{{\mathscr {B}}}}}\right) . \end{aligned}$$

Proof

Let \({{\mathscr {A}}}\in {\mathbb {C}}^{m\times \ell \times n}\) and \({{\mathscr {B}}}\in {\mathbb {C}}^{\ell \times p\times n}\), then

$$\begin{aligned} \text {bdiag}\left( \widehat{{{\mathscr {A}}}{{\mathscr {B}}}}\right) \overset{(4.1)}{=}&\left( \mathbf{F}_n\otimes \mathbf{I}_m\right) \text {bcirc}\left( {{{\mathscr {A}}}}{{\mathscr {B}}}\right) \left( \mathbf{F}_n^*\otimes \mathbf{I}_p\right) \\ \overset{Fact 2}{=}&\left( \mathbf{F}_n\otimes \mathbf{I}_m\right) \text {bcirc}\left( {{{\mathscr {A}}}}\right) \text {bcirc}\left( {{\mathscr {B}}}\right) \left( \mathbf{F}_n^*\otimes \mathbf{I}_p\right) \\ =&\left( \mathbf{F}_n\otimes \mathbf{I}_m\right) \text {bcirc}\left( {{{\mathscr {A}}}}\right) \left( \mathbf{F}_{n}^* \otimes \mathbf{I}_{\ell }\right) \left( \mathbf{F}_{n} \otimes \mathbf{I}_{\ell }\right) \text {bcirc}\left( {{\mathscr {B}}}\right) \left( \mathbf{F}_n^*\otimes \mathbf{I}_p\right) \\ \overset{(4.1)}{=}&\text {bdiag}\left( \widehat{{{\mathscr {A}}}}\right) \text {bdiag}\left( \widehat{{{\mathscr {B}}}}\right) . \end{aligned}$$

\(\square \)

2. Part 2 of Fact 1 states

$$\begin{aligned} \widehat{{{\mathscr {A}}}+ {{\mathscr {B}}}} = {\widehat{{{\mathscr {A}}}}} + {\widehat{{{\mathscr {B}}}}}. \end{aligned}$$

Proof

Let \({{\mathscr {A}}}\in {\mathbb {C}}^{m\times \ell \times n}\) and \({{\mathscr {B}}}\in {\mathbb {C}}^{m \times \ell \times n}\), then

$$\begin{aligned} \text {bdiag}\left( \widehat{{{\mathscr {A}}}+ {{\mathscr {B}}}}\right) \overset{(4.1)}{=}&\left( \mathbf{F}_n\otimes \mathbf{I}_m\right) \text {bcirc}\left( {{{\mathscr {A}}}} + {{\mathscr {B}}}\right) \left( \mathbf{F}_n^*\otimes \mathbf{I}_{\ell }\right) \\ =&\left( \mathbf{F}_n\otimes \mathbf{I}_m\right) \left( \text {bcirc}\left( {{\mathscr {A}}}\right) + \text {bcirc}\left( {{\mathscr {B}}}\right) \right) \left( \mathbf{F}_n^*\otimes \mathbf{I}_{\ell }\right) \\ =&\left( \mathbf{F}_n\otimes \mathbf{I}_m\right) \text {bcirc}\left( {{{\mathscr {A}}}}\right) \left( \mathbf{F}_{n}^* \otimes \mathbf{I}_{\ell }\right) + \left( \mathbf{F}_{n} \otimes \mathbf{I}_{m}\right) \text {bcirc}\left( {{\mathscr {B}}}\right) \left( \mathbf{F}_n^*\otimes \mathbf{I}_{\ell }\right) \\ \overset{(4.1)}{=}&\text {bdiag}\left( {\widehat{{{\mathscr {A}}}}}\right) + \text {bdiag}\left( {\widehat{{{\mathscr {B}}}}}\right) . \end{aligned}$$

\(\square \)

3. Part 3 of Fact 1 states that

$$\begin{aligned} \text {bdiag}\left( \widehat{{{\mathscr {M}}}^*}\right) = \text {bdiag}\left( {\widehat{{{\mathscr {M}}}}}\right) ^*. \end{aligned}$$

Additionally, it states that if \(\text {bcirc}\left( {{\mathscr {M}}}\right) \) is symmetric, \(\text {bdiag}\left( {\widehat{{{\mathscr {M}}}}}\right) \) is also symmetric.

Proof

Let \({{\mathscr {M}}}\in {\mathbb {C}}^{m \times \ell \times n}\). Then

$$\begin{aligned} \text {bdiag}\left( \widehat{{{\mathscr {M}}}^*}\right) \overset{(4.1)}{=}&\left( \mathbf{F}_n \otimes \mathbf{I}_{\ell } \right) \text {bcirc}\left( {{\mathscr {M}}}^*\right) \left( \mathbf{F}_n^* \otimes \mathbf{I}_{m} \right) \\ \overset{Fact 3}{=}&\left( \mathbf{F}_n \otimes \mathbf{I}_{\ell } \right) \text {bcirc}\left( {{\mathscr {M}}}\right) ^* \left( \mathbf{F}_n^* \otimes \mathbf{I}_{m} \right) \\ =&\left[ \left( \mathbf{F}_n \otimes \mathbf{I}_{m} \right) \text {bcirc}\left( {{\mathscr {M}}}\right) \left( \mathbf{F}_n^* \otimes \mathbf{I}_{\ell } \right) \right] ^*\\ \overset{(4.1)}{=}&\text {bdiag}\left( {\widehat{{{\mathscr {M}}}}}\right) ^*. \end{aligned}$$

To see that \(\text {bdiag}\left( {\widehat{{{\mathscr {M}}}}}\right) \) is also symmetric when \(\text {bcirc}\left( {{\mathscr {M}}}\right) \) is symmetric, note that

$$\begin{aligned} \text {bdiag}\left( {\widehat{{{\mathscr {M}}}}}\right) ^* \overset{(4.1)}{=} \left[ \left( \mathbf{F}_n \otimes \mathbf{I}_m \right) \text {bcirc}\left( {{\mathscr {M}}}\right) \left( \mathbf{F}_n^* \otimes \mathbf{I}_n \right) \right] ^*. \end{aligned}$$

\(\square \)

4. Finally, part 4 of Fact 1 states

$$\begin{aligned} \text {bdiag}\left( \widehat{{{\mathscr {M}}}^{-1}}\right) =\text {bdiag}\left( {\widehat{{{\mathscr {M}}}}}\right) ^{-1}. \end{aligned}$$

Proof

Let \({{\mathscr {M}}}\in {\mathbb {C}}^{m\times m \times n}\). Note that \(\text {bcirc}\left( {{\mathscr {I}}}_m\right) = \mathbf{I}_{mn}\). Using Fact 2 and Eq. (4.1),

$$\begin{aligned} \text {bdiag}\left( \widehat{{{\mathscr {M}}}^{-1}}\right)&\text {bdiag}\left( {\widehat{{{\mathscr {M}}}}}\right) \\ \overset{(4.1)}{=}&\left( \mathbf{F}_n \otimes \mathbf{I}_m \right) \text {bcirc}\left( {{\mathscr {M}}}^{-1}\right) \left( \mathbf{F}_n^* \otimes \mathbf{I}_m \right) \left( \mathbf{F}_n \otimes \mathbf{I}_m \right) \text {bcirc}\left( {{\mathscr {M}}}\right) \left( \mathbf{F}_n^* \otimes \mathbf{I}_m \right) \\ =&\left( \mathbf{F}_n \otimes \mathbf{I}_m \right) \text {bcirc}\left( {{\mathscr {M}}}^{-1}\right) \text {bcirc}\left( {{\mathscr {M}}}\right) \left( \mathbf{F}_n^* \otimes \mathbf{I}_m \right) \\ \overset{Fact 2}{=}&\left( \mathbf{F}_n \otimes \mathbf{I}_m \right) \text {bcirc}\left( {{\mathscr {I}}}_m\right) \left( \mathbf{F}_n^* \otimes \mathbf{I}_m \right) \\ =&\mathbf{I}_{mn}. \end{aligned}$$

Analogously, one can show \(\text {bdiag}\left( {\widehat{{{\mathscr {M}}}}}\right) \text {bdiag}\left( \widehat{{{\mathscr {M}}}^{-1}}\right) = \mathbf{I}_{mn}\). \(\square \)

Proof of Theorem 4.1

We now prove Theorem 4.1.

Proof

Let \(\widehat{{{\mathscr {P}}}_i}\) be the tensor formed by applying FFTs to each tube fiber of \({{\mathscr {P}}}_i = {{\mathscr {A}}}_{i::}^* \left( {{\mathscr {A}}}_{i::}{{\mathscr {A}}}_{i::}^*\right) ^{-1}{{\mathscr {A}}}_{i::}\) and \({{\mathscr {E}}}^t = {{\mathscr {X}}}^t - {{\mathscr {X}}}^*\). By Eq. (4.1), we have that

$$\begin{aligned} \text {bdiag}\left( \widehat{{{\mathscr {P}}}_i}\right) = \left( \mathbf{F}_{n} \otimes \mathbf{I}_{\ell }\right) {\text {bcirc}\left( {{\mathscr {P}}}_i\right) }\left( \mathbf{F}_{n}^* \otimes \mathbf{I}_{\ell }\right) , \end{aligned}$$

is a block diagonal matrix with blocks \(\left( \widehat{\mathbf{P}_i}\right) _k\), where \(\left( \widehat{\mathbf{P}_i}\right) _k\) is the kth frontal slice of the tensor \(\widehat{{{\mathscr {P}}}_i}\). We note that the projected error can be rewritten as

$$\begin{aligned}&{\mathbb {E}}\left[ \left\Vert {{\mathscr {P}}}_i{{\mathscr {E}}}^t\right\Vert _F^2\right] = \sum _{j=1}^{p}\langle {\mathbb {E}}\left[ \text {bcirc}\left( {{\mathscr {P}}}_i\right) \right] \text {unfold}\left( {{\mathscr {E}}}^t\right) _{:j},\text {unfold}\left( {{\mathscr {E}}}^t\right) _{:j}\rangle \\&\quad = \sum _{j=1}^{p}{\mathbb {E}}\left[ \langle \left( \mathbf{F}_{n} \otimes \mathbf{I}_{\ell }\right) {\text {bcirc}\left( {{\mathscr {P}}}_i\right) }\left( \mathbf{F}_{n}^* \otimes \mathbf{I}_{\ell }\right) \left( \mathbf{F}_{n} \otimes \mathbf{I}_{\ell }\right) \text {unfold}\left( {{\mathscr {E}}}^t\right) _{:j}, \left( \mathbf{F}_{n} \otimes \mathbf{I}_{\ell }\right) \text {unfold}\left( {{\mathscr {E}}}^t\right) _{:j}\rangle \right] \\&\quad = \sum _{j=1}^{p}{\mathbb {E}}\left[ \langle \text {bdiag}\left( \widehat{{{\mathscr {P}}}_i}\right) \left( \mathbf{F}_{n} \otimes \mathbf{I}_{\ell }\right) \text {unfold}\left( {{\mathscr {E}}}^t\right) _{:j}, \left( \mathbf{F}_{n} \otimes \mathbf{I}_{\ell }\right) \text {unfold}\left( {{\mathscr {E}}}^t\right) _{:j}\rangle \right] . \end{aligned}$$

Note that the rows of \(\text {bcirc}\left( {{\mathscr {A}}}_{i::}\right) \) are also rows of \(\text {bcirc}\left( {{\mathscr {A}}}\right) \). Thus, \({{\mathscr {X}}}^{t+1} \in \text {rowsp}\left( {{\mathscr {A}}}\right) \). Since \({{\mathscr {X}}}^*\) is the tensor of least Frobenius norm, \({{\mathscr {X}}}^*\in \text {rowsp}\left( {{\mathscr {A}}}\right) \). Therefore \({{\mathscr {E}}}^ = {{\mathscr {X}}}^t - {{\mathscr {X}}}^* \in \text {rowsp}\left( {{\mathscr {A}}}\right) \) as long as \({{\mathscr {X}}}^0\in \text {rowsp}\left( {{\mathscr {A}}}\right) \).

Now, since \({\mathbb {E}}\left[ \text {bdiag}\left( \widehat{{{\mathscr {P}}}_i}\right) \right] \) is symmetric and \({{\mathscr {E}}}^t\in \text {rowsp}\left( {{\mathscr {A}}}\right) \), by Fact 1,

$$\begin{aligned} {\mathbb {E}}\left[ \left\Vert {{\mathscr {P}}}_i{{\mathscr {E}}}^t\right\Vert _F^2\right] \ge \sigma _{\min }^+\left( {\mathbb {E}}\left[ \text {bdiag}\left( \widehat{{{\mathscr {P}}}_i}\right) \right] \right) \left\Vert \left( \mathbf{F}_{n} \otimes \mathbf{I}_{\ell }\right) \text {unfold}\left( {{\mathscr {E}}}^t\right) \right\Vert _F^2. \end{aligned}$$
(B.1)

Note that,

$$\begin{aligned} \left\Vert \left( \mathbf{F}_{n} \otimes \mathbf{I}_{\ell }\right) \text {unfold}\left( {{\mathscr {E}}}^t\right) \right\Vert _F^2&= \sum _{j=1}^{p} \langle \left( \mathbf{F}_{n} \otimes \mathbf{I}_{\ell }\right) \text {unfold}\left( {{\mathscr {E}}}^t\right) _{:j},\left( \mathbf{F}_{n} \otimes \mathbf{I}_{\ell }\right) \text {unfold}\left( {{\mathscr {E}}}^t\right) _{:j}\rangle \\&= \sum _{j=1}^{p} \langle \text {unfold}\left( {{\mathscr {E}}}^t\right) _{:j},\text {unfold}\left( {{\mathscr {E}}}^t\right) _{:j}\rangle \\&= \left\Vert \text {unfold}\left( {{\mathscr {E}}}^t\right) \right\Vert _F^2\\&= \left\Vert {{\mathscr {E}}}^t\right\Vert _F^2. \end{aligned}$$

Since \(\text {bdiag}\left( \widehat{{{\mathscr {P}}}_i}\right) \) is block diagonal,

$$\begin{aligned} \sigma _{\min }^+\left( {\mathbb {E}}\left[ \text {bdiag}\left( \widehat{{{\mathscr {P}}}_i}\right) \right] \right) = \min _{k \in [n-1]} \sigma _{\min }^+\left( {\mathbb {E}}\left[ \left( \widehat{\mathbf{P}_i}\right) _k\right] \right) . \end{aligned}$$

Factoring \(\text {bdiag}\left( \widehat{{{\mathscr {P}}}_i}\right) \) and using Fact 1,

Noting that \(\text {bdiag}\left( \widehat{{{{\mathscr {A}}}}_{i::}}\right) \text {bdiag}\left( \widehat{{{{\mathscr {A}}}}_{i::}^*}\right) \) is a diagonal matrix, one can see that \(\left( \widehat{\mathbf{P}_i}\right) _k\) is the projection onto \(\left( \widehat{{{{\mathscr {A}}}}_{i::}}\right) _k\) by rewriting the kth frontal face of \(\widehat{{{\mathscr {P}}}_i}\) as

$$\begin{aligned} \left( \widehat{\mathbf{P}_i}\right) _k = \frac{\left( \widehat{{{{\mathscr {A}}}}_{i::}}\right) ^*_k\left( \widehat{ {{{\mathscr {A}}}}_{i::}}\right) _k}{\left( \widehat{{{{\mathscr {A}}}}_{i::}}\widehat{{{{\mathscr {A}}}}_{i::}^*}\right) _k}. \end{aligned}$$

We can thus rewrite Eq. (B.1) as

$$\begin{aligned} {\mathbb {E}}\left[ \left\Vert {{\mathscr {P}}}_i{{\mathscr {E}}}^t\right\Vert _F^2\right] \ge \min _{k\in [n-1]} \sigma _{\min }^+\left( {\mathbb {E}}\left[ \frac{\left( \widehat{{{{\mathscr {A}}}}_{i::}}\right) ^*_k\left( \widehat{ {{{\mathscr {A}}}}_{i::}}\right) _k}{\left( \widehat{{{{\mathscr {A}}}}_{i::}}\widehat{{{{\mathscr {A}}}}_{i::}^*}\right) _k}\right] \right) \left\Vert {{\mathscr {E}}}^t\right\Vert _F^2. \end{aligned}$$
(B.2)

The expectation of Eq. (B.1) can now be calculated explicitly. For simplicity, we assume that the row indices i are sampled uniformly. As in MRK extensions and literature, many other sampling distributions could be used.

To derive a lower bound for the smallest singular value in Eq. (B.2), define

$$\begin{aligned} \left\Vert {\widehat{\mathbf{A}}}_k\right\Vert _{\infty ,2}^2 {:=} \max _{i}\left[ \left( \widehat{{{{\mathscr {A}}}}_{i::}}\widehat{{{{\mathscr {A}}}}_{i::}^*}\right) _k\right] . \end{aligned}$$
(B.3)

The values \(\left( \widehat{{{{\mathscr {A}}}}_{i::}}\widehat{{{{\mathscr {A}}}}_{i::}^*}\right) _k\) are necessarily positive for all \(k \in [n-1]\) when \({{\mathscr {A}}}_{i::}{{\mathscr {A}}}_{i::}^*\) is invertible for all \(i \in [m-1]\). as

$$\begin{aligned} \left( \widehat{{{{\mathscr {A}}}}_{i::}}\widehat{{{{\mathscr {A}}}}_{i::}^*}\right) _k&= \text {bdiag}\left( \widehat{{{\mathscr {A}}}_{i::}}\widehat{{{\mathscr {A}}}_{i::}^*}\right) _{kk} \\&= \text {bdiag}\left( \widehat{{{\mathscr {A}}}_{i::}}\right) _k \text {bdiag}\left( \widehat{{{\mathscr {A}}}_{i::}^*}\right) _k\\&= \left( \mathbf{F}_n\right) _{k:} \text {bcirc}\left( {{\mathscr {A}}}_{i::}\right) \text {bcirc}\left( {{\mathscr {A}}}_{i::}^*\right) \left( \mathbf{F}_n\right) _{k:}^* \\&= \left( \mathbf{F}_n\right) _{k:} \text {bcirc}\left( {{\mathscr {A}}}_{i::}\right) \text {bcirc}\left( {{\mathscr {A}}}_{i::}\right) ^* \left( \mathbf{F}_n\right) _{k:}^*\\&= \left\Vert \text {bcirc}\left( {{\mathscr {A}}}_{i::}\right) ^* \left( \mathbf{F}_n\right) _{k:}^*\right\Vert _2^2. \end{aligned}$$

Now, it can be easily verified that

$$\begin{aligned} \sigma _{\min }^+\left( {\mathbb {E}}\left[ \frac{\left( \widehat{{{{\mathscr {A}}}}_{i::}}\right) _k^*\left( \widehat{ {{{\mathscr {A}}}}_{i::}}\right) _k}{\left( \widehat{{{{\mathscr {A}}}}_{i::}}\widehat{{{{\mathscr {A}}}}_{i::}^*}\right) _k}\right] \right)&\ge \sigma _{\min }^+\left( \frac{1}{m} \sum _{i=0}^{m-1} \frac{\left( \widehat{{{{\mathscr {A}}}}_{i::}}\right) _k^*\left( \widehat{ {{{\mathscr {A}}}}_{i::}}\right) _k}{\left\Vert {\widehat{\mathbf{A}}}_k\right\Vert _{\infty ,2}^2}\right) \\&= \frac{\left[ \sigma ^+_{\min }\left( {\widehat{\mathbf{A}}}_k \right) \right] ^2}{m\left\Vert {\widehat{\mathbf{A}}}_k\right\Vert _{\infty ,2}^2} . \end{aligned}$$

The projected error of Eq. (B.2) then becomes

$$\begin{aligned} {\mathbb {E}}\left[ \left\Vert {{\mathscr {P}}}_i{{\mathscr {E}}}^t\right\Vert _F^2\right] \ge \min _{k\in [n-1]} \frac{\left[ \sigma ^+_{\min }\left( {\widehat{\mathbf{A}}}_k \right) \right] ^2}{m\left\Vert {\widehat{\mathbf{A}}}_k\right\Vert _{\infty ,2}^2} \left\Vert {{\mathscr {E}}}^t\right\Vert _F^2., \end{aligned}$$

We can thus rewrite the guarantee in Theorem 4.1 for uniform random sampling of the row indices i as

$$\begin{aligned} {\mathbb {E}}\left[ \left\Vert {{\mathscr {X}}}^{t+1}-{{\mathscr {X}}}^*\right\Vert _F^2\bigg | {{\mathscr {X}}}^{0}\right] \le \left( 1-\min _{k\in [n-1]} \frac{\left[ \sigma ^+_{\min }\left( {\widehat{\mathbf{A}}}_k \right) \right] ^2}{m\left\Vert {\widehat{\mathbf{A}}}_k\right\Vert _{\infty ,2}^2}\right) ^{t+1} \left\Vert {{\mathscr {X}}}^{0}-{{\mathscr {X}}}^*\right\Vert _F^2. \end{aligned}$$

\(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, A., Molitor, D. Randomized Kaczmarz for tensor linear systems. Bit Numer Math 62, 171–194 (2022). https://doi.org/10.1007/s10543-021-00877-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10543-021-00877-w

Keywords

Mathematics Subject Classification

Navigation