Eigenvectors from Eigenvalues Sparse Principal Component Analysis,Journal of Computational and Graphical Statistics

当前位置： X-MOL 学术 › J. Comput. Graph. Stat. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Eigenvectors from Eigenvalues Sparse Principal Component Analysis
Journal of Computational and Graphical Statistics ( IF 1.4 ) Pub Date : 2021-11-12 , DOI: 10.1080/10618600.2021.1987254
H Robert Frost ₁

Affiliation

Abstract

We present a novel technique for sparse principal component analysis. This method, named eigenvectors from eigenvalues sparse principal component analysis (EESPCA), is based on the formula for computing squared eigenvector loadings of a Hermitian matrix from the eigenvalues of the full matrix and associated sub-matrices. We explore two versions of the EESPCA method: a version that uses a fixed threshold for inducing sparsity and a version that selects the threshold via cross-validation. Relative to the state-of-the-art sparse PCA methods of Witten et al., Yuan and Zhang, and Tan et al., the fixed threshold EESPCA technique offers an order-of-magnitude improvement in computational speed, does not require estimation of tuning parameters via cross-validation, and can more accurately identify true zero principal component loadings across a range of data matrix sizes and covariance structures. Importantly, the EESPCA method achieves these benefits while maintaining out-of-sample reconstruction error and PC estimation error close to the lowest error generated by all evaluated approaches. EESPCA is a practical and effective technique for sparse PCA with particular relevance to computationally demanding statistical problems such as the analysis of high-dimensional datasets or application of statistical techniques like resampling that involve the repeated calculation of sparse PCs. Supplementary materials for this article are available online.

中文翻译：

特征值稀疏主成分分析中的特征向量

抽象的

我们提出了一种稀疏主成分分析的新技术。该方法称为特征值稀疏主成分分析中的特征向量 (EESPCA)，基于根据全矩阵和关联子矩阵的特征值计算 Hermitian 矩阵的平方特征向量载荷的公式。我们探索了 EESPCA 方法的两个版本：一个版本使用固定阈值来引发稀疏性，另一个版本通过交叉验证选择阈值。相对于Witten等人、Yuan和Zhang以及Tan等人的最先进的稀疏PCA方法，固定阈值EESPCA技术在计算速度上提供了数量级的改进，并且不需要估计通过交叉验证调整参数，并且可以更准确地识别一系列数据矩阵大小和协方差结构的真实零主成分载荷。重要的是，EESPCA 方法实现了这些优势，同时保持样本外重建误差和 PC 估计误差接近所有评估方法产生的最低误差。 EESPCA 是一种实用且有效的稀疏 PCA 技术，特别适用于计算要求较高的统计问题，例如高维数据集的分析或涉及稀疏 PC 重复计算的重采样等统计技术的应用。本文的补充材料可在线获取。

更新日期：2021-11-12

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11