当前位置: X-MOL 学术Can. J. Stat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Automatic sparse principal component analysis
The Canadian Journal of Statistics ( IF 0.6 ) Pub Date : 2020-12-20 , DOI: 10.1002/cjs.11579
Heewon Park 1 , Rui Yamaguchi 2, 3 , Seiya Imoto 4 , Satoru Miyano 1
Affiliation  

The wide availability of computers enables us to accumulate a huge amount of data, thus effective tools to extract information from the huge volume of data have become critical. Principal component analysis (PCA) is a useful and traditional tool for dimensionality reduction of massive high-dimensional datasets. Recently, sparse principal component (PC) loading estimation based on L1-type regularization has drawn a large amount of attention. Although sparse PCA makes interpretation easily and performs dimension reduction without disturbance from noisy features, the existing studies on sparse PCA were based on an arbitrary number of PCs without any statistical justification. We propose a novel method, called as automatic sparse PCA, which can perform PC selection and sparse PC loading estimation, simultaneously. For PC selection, we first develop sparse singular value decomposition (sparse SVD), then incorporate sparsity into PC loading estimation. The proposed method enables us to perform dimension reduction and PC loading estimation, simultaneously. Furthermore, we can perform PCA without disturbance from noisy features. It can be seen through Monte Carlo experiments that the proposed automatic sparse PCA outperforms sparse structure identification and reconstructing data based on low-dimensional projection. The proposed method is also applied to a number of real datasets and it can be also seen that our method achieves effectiveness for estimation accuracy and interpreting PCA results.

中文翻译:

自动稀疏主成分分析

计算机的广泛可用性使我们能够积累大量数据,因此从大量数据中提取信息的有效工具变得至关重要。主成分分析 (PCA) 是一种有用的传统工具,用于对海量高维数据集进行降维。最近,基于L 1 的稀疏主成分(PC)载荷估计-type 正则化引起了大量关注。尽管稀疏 PCA 使解释变得容易并且在没有噪声特征干扰的情况下执行降维,但现有的稀疏 PCA 研究基于任意数量的 PC,没有任何统计依据。我们提出了一种新方法,称为自动稀疏 PCA,它可以同时执行 PC 选择和稀疏 PC 负载估计。对于 PC 选择,我们首先开发稀疏奇异值分解(sparse SVD),然后将稀疏性合并到 PC 负载估计中。所提出的方法使我们能够同时执行降维和 PC 负载估计。此外,我们可以在不受噪声特征干扰的情况下执行 PCA。通过蒙特卡罗实验可以看出,所提出的自动稀疏PCA优于基于低维投影的稀疏结构识别和重建数据。所提出的方法也应用于许多真实数据集,也可以看出我们的方法在估计精度和解释 PCA 结果方面取得了有效性。
更新日期:2020-12-20
down
wechat
bug