Abstract
Recent years have witnessed intense development of randomized methods for low-rank approximation. These methods target principal component analysis and the calculation of truncated singular value decompositions. The present article presents an essentially black-box, foolproof implementation for Mathworks’ MATLAB, a popular software platform for numerical computation. As illustrated via several tests, the randomized algorithms for low-rank approximation outperform or at least match the classical deterministic techniques (such as Lanczos iterations run to convergence) in basically all respects: accuracy, computational efficiency (both speed and memory usage), ease-of-use, parallelizability, and reliability. However, the classical procedures remain the methods of choice for estimating spectral norms and are far superior for calculating the least singular values and corresponding singular vectors (or singular subspaces).
Supplemental Material
Available for Download
Software for An Implementation of a Randomized Algorithm for Principal Component Analysis
- Alliance for Telecommunications Industry Solutions Committee PRQC. 2011. ATIS Telecom Glossary, American National Standard T1.523. Alliance for Telecommunications Industry Solutions (ATIS), American National Standards Institute (ANSI), Washington, DC.Google Scholar
- Edward Anderson, Zhaojun Bai, Christian Bischof, Laura Susan Blackford, James Demmel, Jack Dongarra, Jeremy Du Croz, Anne Greenbaum, Sven Hammarling, Alan McKenney, and Daniel Sorensen. 1999. LAPACK User’s Guide. SIAM, Philadelphia, PA.Google Scholar
- Haim Avron, Costas Bekas, Christos Boutsidis, Kenneth Clarkson, Prabhanjan Kambadur, Giorgos Kollias, Michael Mahoney, Ilse Ipsen, Yves Ineichen, Vikas Sindhwani, and David Woodruff. 2014. LibSkylark: Sketching-Based Matrix Computations for Machine Learning. IBM Research, in collaboration with Bloomberg Labs, NCSU, Stanford, UC Berkeley, and Yahoo Labs. Retrieved from http://xdata-skylark.github.io/libskylark.Google Scholar
- Michael Berry, Dany Mezher, Bernard Philippe, and Ahmed Sameh. 2003. Parallel computation of the singular value decomposition. Research report RR-4694, INRIA.Google Scholar
- Timothy A. Davis and Yifan Hu. 2011. The university of Florida sparse matrix collection. ACM Trans. Math. Softw. 38, 1 (2011), 1:1--1:25. Google ScholarDigital Library
- Leslie V. Foster and Timothy A. Davis. 2013. Algorithm 933: Reliable calculation of numerical rank, null space bases, pseudoinverse solutions, and basic solutions using SuiteSparseQR. ACM Trans. Math. Softw. 40, 1 (Sep. 2013), 1--23. Google ScholarDigital Library
- Gene Golub and Charles Van Loan. 2012. Matrix Computations (4th ed.). Johns Hopkins University Press.Google Scholar
- Nathan Halko, Per-Gunnar Martinsson, and Joel Tropp. 2011. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53, 2 (2011), 217--288. Google ScholarDigital Library
- Jacek Kuczyński and Henryk Woźniakowski. 1992. Estimating the largest eigenvalue by the power and Lanczos algorithms with a random start. SIAM J. Matrix Anal. Appl. 13, 4 (1992), 1094--1122. Google ScholarDigital Library
- Rasmus Larsen. 2001. Combining implicit restart and partial reorthogonalization in Lanczos bidiagonalization. Presentation at U.C. Berkeley, sponsored by Stanford’s Scientific Computing and Computational Mathematics (succeeded by the Institute for Computational and Mathematical Engineering). Retrieved from http://sun.stanford.edu/∼rmunk/PROPACK/talk.rev3.pdf.Google Scholar
- Richard Lehoucq, Daniel Sorensen, and Chao Yang. 1998. ARPACK User’s Guide: Solution of Large-Scale Eigenvalue Problems with Implicitly Restarted Arnoldi Methods. SIAM, Philadelphia, PA.Google Scholar
- Per-Gunnar Martinsson and Sergey Voronin. 2015. A randomized blocked algorithm for efficiently computing rank-revealing factorizations of matrices. 1--12.Google Scholar
- Frank McSherry and Dimitris Achlioptas. 2007. Fast computation of low-rank matrix approximations. J. ACM 54, 2 (Apr. 2007), 1--19. Google ScholarDigital Library
- Srinivas Rachakonda, Rogers F. Silva, Jingyu Liu, and Vince Calhoun. 2014. Memory-efficient PCA approaches for large-group ICA. (2014). fMRI Toolbox, Medical Image Analysis Laboratory, University of New Mexico.Google Scholar
- Gil Shabat, Yaniv Shmueli, and Amir Averbuch. 2013. Randomized LU Decomposition. Technical Report 1310.7202. arXiv.Google Scholar
- Rafi Witten and Emmanuel Candès. 2015. Randomized algorithms for low-rank matrix factorizations: Sharp performance bounds. Algorithmica 72, 1 (May 2015), 264--281. Google ScholarDigital Library
- Herman Wold. 1966. Estimation of principal components and related models by iterative least squares. In Multivariate Analysis, Parachuri R. Krishnaiaah (Ed.). Academic Press, 391--420.Google Scholar
- David Woodruff. 2014. Sketching as a Tool for Numerical Linear Algebra. Foundations and Trends in Theoretical Computer Science, Vol. 10. Now Publishers. Google ScholarDigital Library
Index Terms
- Algorithm 971: An Implementation of a Randomized Algorithm for Principal Component Analysis
Recommendations
SVD based initialization: A head start for nonnegative matrix factorization
We describe Nonnegative Double Singular Value Decomposition (NNDSVD), a new method designed to enhance the initialization stage of nonnegative matrix factorization (NMF). NNDSVD can readily be combined with existing NMF algorithms. The basic algorithm ...
An Algorithm for the Principal Component Analysis of Large Data Sets
† Special Section: 2010 Copper Mountain ConferenceRecently popularized randomized methods for principal component analysis (PCA) efficiently and reliably produce nearly optimal accuracy—even on parallel processors—unlike the classical (deterministic) alternatives. We adapt one of these randomized ...
New SVD based initialization strategy for non-negative matrix factorization
We give a new method to determine the rank of the factorization for NMF algorithms.We propose a novel method SVD-NMF to enhance initialization for NMF.The compute process is cheap.Numerical results show that SVD-NMF convergent faster than NNDSVD and ...
Comments