当前位置: X-MOL 学术Hum. Hered. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Principal Component Analysis Based on Graph Laplacian and Double Sparse Constraints for Feature Selection and Sample Clustering on Multi-View Data.
Human Heredity ( IF 1.8 ) Pub Date : 2019-08-29 , DOI: 10.1159/000501653
Ming-Juan Wu 1 , Ying-Lian Gao 2 , Jin-Xing Liu 1 , Rong Zhu 1 , Juan Wang 1
Affiliation  

Principal component analysis (PCA) is a widely used method for evaluating low-dimensional data. Some variants of PCA have been proposed to improve the interpretation of the principal components (PCs). One of the most common methods is sparse PCA which aims at finding a sparse basis to improve the interpretability over the dense basis of PCA. However, the performances of these improved methods are still far from satisfactory because the data still contain redundant PCs. In this paper, a novel method called PCA based on graph Laplacian and double sparse constraints (GDSPCA) is proposed to improve the interpretation of the PCs and consider the internal geometry of the data. In detail, GDSPCA utilizes L2,1-norm and L1-norm regularization terms simultaneously to enforce the matrix to be sparse by filtering redundant and irrelative PCs, where the L2,1-norm regularization term can produce row sparsity, while the L1-norm regularization term can enforce element sparsity. This way, we can make a better interpretation of the new PCs in low-dimensional subspace. Meanwhile, the method of GDSPCA integrates graph Laplacian into PCA to explore the geometric structure hidden in the data. A simple and effective optimization solution is provided. Extensive experiments on multi-view biological data demonstrate the feasibility and effectiveness of the proposed approach.

中文翻译:

基于图拉普拉斯算子和双稀疏约束的主成分分析,用于多视图数据的特征选择和样本聚类。

主成分分析(PCA)是评估低维数据的一种广泛使用的方法。已提出PCA的某些变体,以改进对主要组件(PC)的解释。稀疏PCA是最常用的方法之一,其目的是找到稀疏基础以在PCA的密集基础上提高可解释性。但是,这些改进方法的性能仍然不能令人满意,因为数据仍然包含冗余PC。本文提出了一种基于图拉普拉斯算子和双稀疏约束(GDSPCA)的PCA新方法,以改进PC的解释并考虑数据的内部几何形状。详细而言,GDSPCA通过过滤冗余和无关PC来同时利用L2,1-范数和L1-范数正则化项来强制矩阵稀疏,其中L2,1-范数正则项可以产生行稀疏性,而L1-范数正则项可以强制元素稀疏性。这样,我们可以更好地解释低维子空间中的新PC。同时,GDSPCA方法将图拉普拉斯算子集成到PCA中以探索隐藏在数据中的几何结构。提供了一种简单有效的优化解决方案。在多视图生物数据上的大量实验证明了该方法的可行性和有效性。提供了一种简单有效的优化解决方案。在多视图生物数据上的大量实验证明了该方法的可行性和有效性。提供了一种简单有效的优化解决方案。在多视图生物数据上的大量实验证明了该方法的可行性和有效性。
更新日期:2019-11-01
down
wechat
bug