Abstract
Solving linear systems of equations is a fundamental problem in mathematics. When the linear system is so large that it cannot be loaded into memory at once, iterative methods such as the randomized Kaczmarz method excel. Here, we extend the randomized Kaczmarz method to solve multi-linear (tensor) systems under the tensor–tensor t-product. We present convergence guarantees for tensor randomized Kaczmarz in two ways: using the classical matrix randomized Kaczmarz analysis and taking advantage of the tensor–tensor t-product structure. We demonstrate experimentally that the tensor randomized Kaczmarz method converges faster than traditional randomized Kaczmarz applied to a naively matricized version of the linear system. In addition, we draw connections between the proposed algorithm and a previously known extension of the randomized Kaczmarz algorithm for matrix linear systems.
Similar content being viewed by others
Notes
While the randomized Kaczmarz literature typically abbreviates randomized Kaczmarz as RK, throughout this work, MRK is used to distinguish the matrix and tensor versions of randomized Kaczmarz.
These linear systems are typically written as linear systems with block circulant matrices which are equivalent to the t-product as discussed in Definition 2.2.
References
Agmon, S.: The relaxation method for linear inequalities. Can. J. Math. 6, 382–392 (1954)
Ahn, C.H., Jeong, B.S., Lee, S.Y.: Efficient hybrid finite element-boundary element method for 3-dimensional open-boundary field problems. IEEE Trans. Magn. 27, 4069–4072 (1991)
Bone and joint ct-scan data. https://isbweb.org/data/vsj/
Censor, Y.: Row-action methods for huge and sparse systems and their applications. SIAM Rev. 23(4), 444–466 (1981)
Czuprynski, K.D., Fahnline, J.B., Shontz, S.M.: Parallel boundary element solutions of block circulant linear systems for acoustic radiation problems with rotationally symmetric boundary surfaces. In: INTER-NOISE and NOISE-CON Congress and Conference Proceedings, vol. 2012, pp. 2812–2823. Institute of Noise Control Engineering (2012)
De Loera, J.A., Haddock, J., Needell, D.: A sampling Kaczmarz–Motzkin algorithm for linear feasibility. SIAM J. Sci. Comput. 39(5), S66–S87 (2017)
Drineas, P., Mahoney, M.W.: Randnla: randomized numerical linear algebra. Commun. ACM 59(6), 80–90 (2016)
Finding his Voice. Western Electric Company (1929). https://archive.org/details/FindingH1929
Elfving, T.: Block-iterative methods for consistent and inconsistent linear equations. Numer. Math. 35(1), 1–12 (1980)
Gower, R.M., Richtárik, P.: Randomized iterative methods for linear systems. SIAM J. Matrix Anal. Appl. 36(4), 1660–1690 (2015)
Haddock, J., Needell, D.: On Motzkin’s method for inconsistent linear systems. BIT 59(2), 387–401 (2019)
Hao, N., Kilmer, M.E., Braman, K., Hoover, R.C.: Facial recognition using tensor–tensor decompositions. SIAM J. Imaging Sci. 6(1), 437–463 (2013)
Kaczmarz, M.S.: Angenäherte auflösung von systemen linearer gleichungen. Bull. Acad. Polonaise Sci. Lett. 35, 355–357 (1937)
Kernfeld, E., Kilmer, M., Aeron, S.: Tensor–tensor products with invertible linear transforms. Linear Algebra Appl. 485, 545–570 (2015)
Kilmer, M.E., Braman, K., Hao, N., Hoover, R.C.: Third-order tensors as operators on matrices: a theoretical and computational framework with applications in imaging. SIAM J. Matrix Anal. Appl. 34(1), 148–172 (2013)
Kilmer, M.E., Martin, C.D.: Factorization strategies for third-order tensors. Linear Algebra Appl. 435(3), 641–658 (2011)
Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 42(8), 30–37 (2009)
Liu, Z., Zhao, H.V., Elezzabi, A.Y.: Block-based adaptive compressed sensing for video. In: IEEE Image Proc., pp. 1649–1652. IEEE (2010)
Lund, K.: The tensor t-function: a definition for functions of third-order tensors. Numer. Linear Algebra Appl. 27(3), e2288 (2020)
Ma, A., Needell, D., Ramdas, A.: Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methods. SIAM J. Matrix Anal. A 36(4), 1590–1604 (2015)
Majumdar, A., Ward, R.K.: Face recognition from video: an MMV recovery approach. In: Int. Conf. Acoust. Spee, pp. 2221–2224. IEEE (2012)
Miao, Y., Qi, L., Wei, Y.: Generalized tensor function via the tensor singular value decomposition based on the t-product. Linear Algebra Its Appl. 590, 258–303 (2020)
Motzkin, T.S., Schoenberg, I.J.: The relaxation method for linear inequalities. Can. J. Math 6, 393–404 (1954)
Needell, D.: Randomized Kaczmarz solver for noisy linear systems. BIT 50(2), 395–403 (2010)
Needell, D., Tropp, J.A.: Paved with good intentions: analysis of a randomized block Kaczmarz method. Linear Algebra Appl. 441, 199–221 (2014)
Needell, D., Ward, R., Srebro, N.: Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm. Adv. Neural Inf. Process. Syst. 27, 1017–1025 (2014)
Newman, E., Horesh, L., Avron, H., Kilmer, M.: Stable tensor neural networks for rapid deep learning. arXiv preprint arXiv:1811.06569 (2018)
Nutini, J., Sepehry, B., Virani, A., Laradji, I., Schmidt, M., Koepke, H.: Convergence rates for greedy Kaczmarz algorithms. In: UAI (2016)
Petra, S., Popa, C.: Single projection Kaczmarz extended algorithms. Numer. Algorithms 73(3), 791–806 (2016)
Richtárik, P., Takác, M.: Stochastic reformulations of linear systems: algorithms and convergence theory. SIAM J. Matrix Anal. Appl. 41(2), 487–524 (2020)
Semerci, O., Hao, N., Kilmer, M.E., Miller, E.L.: Tensor-based formulation and nuclear norm regularization for multienergy computed tomography. IEEE Trans. Image Process. 23(4), 1678–1693 (2014)
Soltani, S., Kilmer, M.E., Hansen, P.C.: A tensor-based dictionary learning approach to tomographic image reconstruction. BIT 56(4), 1425–1454 (2016)
Song, G., Ng, M.K., Zhang, X.: Robust tensor completion using transformed tensor singular value decomposition. Numer. Linear Algebra Appl. 27(3), e2299 (2020). https://doi.org/10.1002/nla.2299
Strohmer, T., Vershynin, R.: A randomized Kaczmarz algorithm with exponential convergence. J. Fourier Anal. Appl. 15(2), 262 (2009)
Vescovo, R.: Electromagnetic scattering from cylindrical arrays of infinitely long thin wires. Electron. Lett. 31(19), 1646–1647 (1995)
Wang, X., Che, M., Wei, Y.: Tensor neural network models for tensor singular value decompositions. Comput. Optim. Appl. 75(3), 753–777 (2020)
Zhang, Z., Aeron, S.: Denoising and completion of 3d data via multidimensional dictionary learning. In: Int. Join. Conf. Artif., pp. 2371–2377 (2016)
Zhang, Z., Aeron, S.: Exact tensor completion using t-SVD. IEEE Trans. Signal Process. 65(6), 1511–1526 (2017)
Zhang, Z., Ely, G., Aeron, S., Hao, N., Kilmer, M.: Novel methods for multilinear data completion and de-noising based on tensor-SVD. In: CVPR, pp. 3842–3849. IEEE (2014)
Zhou, P., Lu, C., Lin, Z., Zhang, C.: Tensor factorization for low-rank tensor completion. IEEE Trans. Image Process. 27(3), 1152–1163 (2018)
Zouzias, A., Freris, N.M.: Randomized extended Kaczmarz for solving least squares. SIAM J. Matrix Anal. Appl. 34(2), 773–793 (2013)
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Daniel Kressner.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work began at the 2019 workshop for Women in Science of Data and Math (WISDM) held at the Institute for Computational and Experimental Research in Mathematics (ICERM). This workshop is partially supported by an NSF ADVANCE grant (award #1500481) to the Association for Women in Mathematics (AWM). Ma was partially supported by U.S. Air Force Award FA9550-18-1-0031 led by Roman Vershynin. Molitor is grateful to and was partially supported by NSF CAREER DMS \(\#1348721\) and NSF BIGDATA DMS \(\#1740325\) led by Deanna Needell. The authors would also like to thank Misha Kilmer for her advising during the WISDM workshop and valuable feedback that improved earlier versions of this manuscript.
Appendices
Proof of Fact 1
The following properties of block circulant matrices will be useful in proving Fact 1.
Fact 2
(Lemma 1.iii [19]) For tensors \({{\mathscr {A}}}\) and \({{\mathscr {B}}}\), \(\text {bcirc}\left( {{\mathscr {A}}}{{\mathscr {B}}}\right) = \text {bcirc}\left( {{\mathscr {A}}}\right) \text {bcirc}\left( {{\mathscr {B}}}\right) \).
Fact 3
(Theorem 6.ii [19]) The block circulant operator \(\text {bcirc}\left( \cdot \right) \) commutes with the conjugate transpose,
1. Part 1 of Fact 1 states
Proof
Let \({{\mathscr {A}}}\in {\mathbb {C}}^{m\times \ell \times n}\) and \({{\mathscr {B}}}\in {\mathbb {C}}^{\ell \times p\times n}\), then
\(\square \)
2. Part 2 of Fact 1 states
Proof
Let \({{\mathscr {A}}}\in {\mathbb {C}}^{m\times \ell \times n}\) and \({{\mathscr {B}}}\in {\mathbb {C}}^{m \times \ell \times n}\), then
\(\square \)
3. Part 3 of Fact 1 states that
Additionally, it states that if \(\text {bcirc}\left( {{\mathscr {M}}}\right) \) is symmetric, \(\text {bdiag}\left( {\widehat{{{\mathscr {M}}}}}\right) \) is also symmetric.
Proof
Let \({{\mathscr {M}}}\in {\mathbb {C}}^{m \times \ell \times n}\). Then
To see that \(\text {bdiag}\left( {\widehat{{{\mathscr {M}}}}}\right) \) is also symmetric when \(\text {bcirc}\left( {{\mathscr {M}}}\right) \) is symmetric, note that
\(\square \)
4. Finally, part 4 of Fact 1 states
Proof
Let \({{\mathscr {M}}}\in {\mathbb {C}}^{m\times m \times n}\). Note that \(\text {bcirc}\left( {{\mathscr {I}}}_m\right) = \mathbf{I}_{mn}\). Using Fact 2 and Eq. (4.1),
Analogously, one can show \(\text {bdiag}\left( {\widehat{{{\mathscr {M}}}}}\right) \text {bdiag}\left( \widehat{{{\mathscr {M}}}^{-1}}\right) = \mathbf{I}_{mn}\). \(\square \)
Proof of Theorem 4.1
We now prove Theorem 4.1.
Proof
Let \(\widehat{{{\mathscr {P}}}_i}\) be the tensor formed by applying FFTs to each tube fiber of \({{\mathscr {P}}}_i = {{\mathscr {A}}}_{i::}^* \left( {{\mathscr {A}}}_{i::}{{\mathscr {A}}}_{i::}^*\right) ^{-1}{{\mathscr {A}}}_{i::}\) and \({{\mathscr {E}}}^t = {{\mathscr {X}}}^t - {{\mathscr {X}}}^*\). By Eq. (4.1), we have that
is a block diagonal matrix with blocks \(\left( \widehat{\mathbf{P}_i}\right) _k\), where \(\left( \widehat{\mathbf{P}_i}\right) _k\) is the kth frontal slice of the tensor \(\widehat{{{\mathscr {P}}}_i}\). We note that the projected error can be rewritten as
Note that the rows of \(\text {bcirc}\left( {{\mathscr {A}}}_{i::}\right) \) are also rows of \(\text {bcirc}\left( {{\mathscr {A}}}\right) \). Thus, \({{\mathscr {X}}}^{t+1} \in \text {rowsp}\left( {{\mathscr {A}}}\right) \). Since \({{\mathscr {X}}}^*\) is the tensor of least Frobenius norm, \({{\mathscr {X}}}^*\in \text {rowsp}\left( {{\mathscr {A}}}\right) \). Therefore \({{\mathscr {E}}}^ = {{\mathscr {X}}}^t - {{\mathscr {X}}}^* \in \text {rowsp}\left( {{\mathscr {A}}}\right) \) as long as \({{\mathscr {X}}}^0\in \text {rowsp}\left( {{\mathscr {A}}}\right) \).
Now, since \({\mathbb {E}}\left[ \text {bdiag}\left( \widehat{{{\mathscr {P}}}_i}\right) \right] \) is symmetric and \({{\mathscr {E}}}^t\in \text {rowsp}\left( {{\mathscr {A}}}\right) \), by Fact 1,
Note that,
Since \(\text {bdiag}\left( \widehat{{{\mathscr {P}}}_i}\right) \) is block diagonal,
Factoring \(\text {bdiag}\left( \widehat{{{\mathscr {P}}}_i}\right) \) and using Fact 1,
Noting that \(\text {bdiag}\left( \widehat{{{{\mathscr {A}}}}_{i::}}\right) \text {bdiag}\left( \widehat{{{{\mathscr {A}}}}_{i::}^*}\right) \) is a diagonal matrix, one can see that \(\left( \widehat{\mathbf{P}_i}\right) _k\) is the projection onto \(\left( \widehat{{{{\mathscr {A}}}}_{i::}}\right) _k\) by rewriting the kth frontal face of \(\widehat{{{\mathscr {P}}}_i}\) as
We can thus rewrite Eq. (B.1) as
The expectation of Eq. (B.1) can now be calculated explicitly. For simplicity, we assume that the row indices i are sampled uniformly. As in MRK extensions and literature, many other sampling distributions could be used.
To derive a lower bound for the smallest singular value in Eq. (B.2), define
The values \(\left( \widehat{{{{\mathscr {A}}}}_{i::}}\widehat{{{{\mathscr {A}}}}_{i::}^*}\right) _k\) are necessarily positive for all \(k \in [n-1]\) when \({{\mathscr {A}}}_{i::}{{\mathscr {A}}}_{i::}^*\) is invertible for all \(i \in [m-1]\). as
Now, it can be easily verified that
The projected error of Eq. (B.2) then becomes
We can thus rewrite the guarantee in Theorem 4.1 for uniform random sampling of the row indices i as
\(\square \)
Rights and permissions
About this article
Cite this article
Ma, A., Molitor, D. Randomized Kaczmarz for tensor linear systems. Bit Numer Math 62, 171–194 (2022). https://doi.org/10.1007/s10543-021-00877-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10543-021-00877-w