research-article

Algorithm 971: An Implementation of a Randomized Algorithm for Principal Component Analysis

Authors:
Huamin Li

Yale University Program in Applied Mathematics, New Haven, CT

Yale University Program in Applied Mathematics, New Haven, CT
View Profile

,
George C. Linderman

Yale University Program in Applied Mathematics, New Haven, CT

Yale University Program in Applied Mathematics, New Haven, CT
View Profile

,
Arthur Szlam

Facebook Artificial Intelligence Research

Facebook Artificial Intelligence Research
View Profile

,
Kelly P. Stanton

Yale University Department of Pathology and Program in Applied Mathematics

Yale University Department of Pathology and Program in Applied Mathematics
View Profile

,
Yuval Kluger

Yale University Department of Pathology, Program in Applied Mathematics, and Interdepartmental Program in Computational Biology and Bioinformatics

Yale University Department of Pathology, Program in Applied Mathematics, and Interdepartmental Program in Computational Biology and Bioinformatics
View Profile

,
Mark Tygert

Facebook Artificial Intelligence Research and Yale University

Facebook Artificial Intelligence Research and Yale University
View Profile

Authors Info & Claims

ACM Transactions on Mathematical Software Volume 43 Issue 3Article No.: 28pp 1–14https://doi.org/10.1145/3004053

Published:09 January 2017Publication History

ACM Transactions on Mathematical Software

Abstract

Recent years have witnessed intense development of randomized methods for low-rank approximation. These methods target principal component analysis and the calculation of truncated singular value decompositions. The present article presents an essentially black-box, foolproof implementation for Mathworks’ MATLAB, a popular software platform for numerical computation. As illustrated via several tests, the randomized algorithms for low-rank approximation outperform or at least match the classical deterministic techniques (such as Lanczos iterations run to convergence) in basically all respects: accuracy, computational efficiency (both speed and memory usage), ease-of-use, parallelizability, and reliability. However, the classical procedures remain the methods of choice for estimating spectral norms and are far superior for calculating the least singular values and corresponding singular vectors (or singular subspaces).

Supplemental Material

Available for Download

zip

971.zip (23.2 KB)

Software for An Implementation of a Randomized Algorithm for Principal Component Analysis

References

Alliance for Telecommunications Industry Solutions Committee PRQC. 2011. ATIS Telecom Glossary, American National Standard T1.523. Alliance for Telecommunications Industry Solutions (ATIS), American National Standards Institute (ANSI), Washington, DC.Google Scholar
Edward Anderson, Zhaojun Bai, Christian Bischof, Laura Susan Blackford, James Demmel, Jack Dongarra, Jeremy Du Croz, Anne Greenbaum, Sven Hammarling, Alan McKenney, and Daniel Sorensen. 1999. LAPACK User’s Guide. SIAM, Philadelphia, PA.Google Scholar
Haim Avron, Costas Bekas, Christos Boutsidis, Kenneth Clarkson, Prabhanjan Kambadur, Giorgos Kollias, Michael Mahoney, Ilse Ipsen, Yves Ineichen, Vikas Sindhwani, and David Woodruff. 2014. LibSkylark: Sketching-Based Matrix Computations for Machine Learning. IBM Research, in collaboration with Bloomberg Labs, NCSU, Stanford, UC Berkeley, and Yahoo Labs. Retrieved from http://xdata-skylark.github.io/libskylark.Google Scholar
Michael Berry, Dany Mezher, Bernard Philippe, and Ahmed Sameh. 2003. Parallel computation of the singular value decomposition. Research report RR-4694, INRIA.Google Scholar
Timothy A. Davis and Yifan Hu. 2011. The university of Florida sparse matrix collection. ACM Trans. Math. Softw. 38, 1 (2011), 1:1--1:25. Google ScholarDigital Library
Leslie V. Foster and Timothy A. Davis. 2013. Algorithm 933: Reliable calculation of numerical rank, null space bases, pseudoinverse solutions, and basic solutions using SuiteSparseQR. ACM Trans. Math. Softw. 40, 1 (Sep. 2013), 1--23. Google ScholarDigital Library
Gene Golub and Charles Van Loan. 2012. Matrix Computations (4th ed.). Johns Hopkins University Press.Google Scholar
Nathan Halko, Per-Gunnar Martinsson, and Joel Tropp. 2011. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53, 2 (2011), 217--288. Google ScholarDigital Library
Jacek Kuczyński and Henryk Woźniakowski. 1992. Estimating the largest eigenvalue by the power and Lanczos algorithms with a random start. SIAM J. Matrix Anal. Appl. 13, 4 (1992), 1094--1122. Google ScholarDigital Library
Rasmus Larsen. 2001. Combining implicit restart and partial reorthogonalization in Lanczos bidiagonalization. Presentation at U.C. Berkeley, sponsored by Stanford’s Scientific Computing and Computational Mathematics (succeeded by the Institute for Computational and Mathematical Engineering). Retrieved from http://sun.stanford.edu/&sim;rmunk/PROPACK/talk.rev3.pdf.Google Scholar
Richard Lehoucq, Daniel Sorensen, and Chao Yang. 1998. ARPACK User’s Guide: Solution of Large-Scale Eigenvalue Problems with Implicitly Restarted Arnoldi Methods. SIAM, Philadelphia, PA.Google Scholar
Per-Gunnar Martinsson and Sergey Voronin. 2015. A randomized blocked algorithm for efficiently computing rank-revealing factorizations of matrices. 1--12.Google Scholar
Frank McSherry and Dimitris Achlioptas. 2007. Fast computation of low-rank matrix approximations. J. ACM 54, 2 (Apr. 2007), 1--19. Google ScholarDigital Library
Srinivas Rachakonda, Rogers F. Silva, Jingyu Liu, and Vince Calhoun. 2014. Memory-efficient PCA approaches for large-group ICA. (2014). fMRI Toolbox, Medical Image Analysis Laboratory, University of New Mexico.Google Scholar
Gil Shabat, Yaniv Shmueli, and Amir Averbuch. 2013. Randomized LU Decomposition. Technical Report 1310.7202. arXiv.Google Scholar
Rafi Witten and Emmanuel Candès. 2015. Randomized algorithms for low-rank matrix factorizations: Sharp performance bounds. Algorithmica 72, 1 (May 2015), 264--281. Google ScholarDigital Library
Herman Wold. 1966. Estimation of principal components and related models by iterative least squares. In Multivariate Analysis, Parachuri R. Krishnaiaah (Ed.). Academic Press, 391--420.Google Scholar
David Woodruff. 2014. Sketching as a Tool for Numerical Linear Algebra. Foundations and Trends in Theoretical Computer Science, Vol. 10. Now Publishers. Google ScholarDigital Library

Index Terms

Algorithm 971: An Implementation of a Randomized Algorithm for Principal Component Analysis
1. Mathematics of computing
  1. Mathematical software

Recommendations

SVD based initialization: A head start for nonnegative matrix factorization

We describe Nonnegative Double Singular Value Decomposition (NNDSVD), a new method designed to enhance the initialization stage of nonnegative matrix factorization (NMF). NNDSVD can readily be combined with existing NMF algorithms. The basic algorithm ...
Read More
An Algorithm for the Principal Component Analysis of Large Data Sets
^† Special Section: 2010 Copper Mountain Conference

Recently popularized randomized methods for principal component analysis (PCA) efficiently and reliably produce nearly optimal accuracy—even on parallel processors—unlike the classical (deterministic) alternatives. We adapt one of these randomized ...
Read More
New SVD based initialization strategy for non-negative matrix factorization

We give a new method to determine the rank of the factorization for NMF algorithms.We propose a novel method SVD-NMF to enhance initialization for NMF.The compute process is cheap.Numerical results show that SVD-NMF convergent faster than NNDSVD and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Mathematical Software Volume 43, Issue 3
September 2017
232 pages
ISSN:0098-3500
EISSN:1557-7295
DOI:10.1145/2988516
Editor:
Michael A. Heroux
Sandia National Laboratories, USA
Issue’s Table of Contents
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 January 2017
- Revised: 1 September 2016
- Accepted: 1 September 2016
- Received: 1 December 2014
Published in toms Volume 43, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Artifacts Available
- Artifacts Evaluated & Reusable
Author Tags
PCA
Principal component analysis
SVD
singular value decomposition
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 37
  Total Citations
  View Citations
- 627
  Total Downloads
- Downloads (Last 12 months)66
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Algorithm 971: An Implementation of a Randomized Algorithm for Principal Component Analysis

ACM Transactions on Mathematical Software

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

SVD based initialization: A head start for nonnegative matrix factorization

An Algorithm for the Principal Component Analysis of Large Data Sets

New SVD based initialization strategy for non-negative matrix factorization