Abstract
In recent years, clustering analysis of cancer genomics data has gained widespread attention. However, limited by the dimensions of the matrix, the traditional methods cannot fully mine the underlying geometric structure information in the data. Besides, noise and outliers inevitably exist in the data. To solve the above two problems, we come up with a new method which uses tensor to represent cancer omics data and applies hypergraph to save the geometric structure information in original data. This model is called hypergraph regularized tensor robust principal component analysis (HTRPCA). The data processed by HTRPCA becomes two parts, one of which is a low-rank component that contains pure underlying structure information between samples, and the other is some sparse interference points. So we can use the low-rank component for clustering. This model can retain complex geometric information between more sample points due to the addition of the hypergraph regularization. Through clustering, we can demonstrate the effectiveness of HTRPCA, and the experimental results on TCGA datasets demonstrate that HTRPCA precedes other advanced methods.
Graphic Abstract
This paper proposes a new method of using tensors to represent cancer omics data and introduces hypergraph items to save the geometric structure information of the original data. At the same time, the model decomposes the original tensor into low-order tensors and sparse tensors. The low-rank tensor was used to cluster cancer samples to verify the effectiveness of the method.
Similar content being viewed by others
References
Laxman N, Rubin C-J, Mallmin H, Nilsson O, Tellgren-Roth C, Kindmark A (2016) Second generation sequencing of microRNA in human bone cells treated with parathyroid hormone or dexamethasone. Bone 84:181–188. https://doi.org/10.1016/j.bone.2015.12.053
Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A (2018) Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA 68(6):394–424. https://doi.org/10.3322/caac.21492
Liu JX, Gao YL, Zheng CH, Xu Y, Yu J (2016) Block-constraint robust principal component analysis and its application to integrated analysis of TCGA data. IEEE Trans Nanobiosci 15(6):510–516. https://doi.org/10.1109/TNB.2016.2574923
Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemom Intell Lab Syst 2(1–3):37–52. https://doi.org/10.1016/0169-7439(87)80084-9
Chun-Mei F, Ying-Lian G, Jin-Xing L, Juan W, Dong-Qin W, Chang-Gang W (2017) Joint L1/2-norm constraint and graph-Laplacian PCA method for feature extraction. Biomed Res Int 2017:5073427. https://doi.org/10.1155/2017/5073427
Liu G, Lin Z, Yan S, Sun J, Yu Y, Ma Y (2012) Robust recovery of subspace structures by low-rank representation. IEEE Trans Pattern Anal Mach Intell 35(1):171–184. https://doi.org/10.1109/TPAMI.2012.88
Babacan SD, Luessi M, Molina R, Katsaggelos AK (2012) Sparse Bayesian methods for low-rank matrix estimation. IEEE Trans Signal Process 60(8):3964–3977. https://doi.org/10.1109/TSP.2012.2197748
Balkau CLB, Fezeu L, Tichet J, De Lauzonguillain B, Czernichow S, Fumeron F, Froguel P, Vaxillaire M, Cauchi S (2008) Predicting diabetes: clinical, biological, and genetic approaches: data from the Epidemiological Study on the Insulin Resistance Syndrome (DESIR). Diabetes Care 31(10):2056–2061. https://doi.org/10.2337/dc08-0368
Chen J, Yang J (2013) Robust subspace segmentation via low-rank representation. IEEE Trans Cybernet 44(8):1432–1445. https://doi.org/10.1109/TCYB.2013.2286106
Liu J, Wang Y, Zheng C, Sha W, Mi J, Xu Y (2013) Robust PCA based method for discovering differentially expressed genes. BMC Bioinform BioMed Central 14(8):1–10. https://doi.org/10.1186/1471-2105-14-S8-S3
Zheng C, Yuan L, Sha W, Sun Z (2014) Gene differential coexpression analysis based on biweight correlation and maximum clique. BMC Bioinform BioMed Central 15(15):1–7. https://doi.org/10.1186/1471-2105-15-S15-S3
Lu C, Feng J, Chen Y, Liu W, Lin Z, Yan S (2016) Tensor robust principal component analysis: exact recovery of corrupted low-rank tensors via convex optimization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5249–5257. https://doi.org/10.1109/CVPR.2016.567
Hu Y, Liu JX, Gao YL, Li SJ, Wang J (2019) Differentially expressed genes extracted by the tensor robust principal component analysis (TRPCA) method. Complexity 2019:6136245. https://doi.org/10.1155/2019/6136245
Chen CF, Wei CP, Wang YCF (2012) Low-rank matrix recovery with structural incoherence for robust face recognition. In: 2012 IEEE conference on computer vision and pattern recognition, pp 2618–2625. https://doi.org/10.1109/CVPR.2012.6247981
Zhou P, Feng J (2017) Outlier-robust tensor PCA. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2263–2271. https://doi.org/10.1109/CVPR.2017.419.
Renard N, Bourennane S, Blanc-Talon J (2008) Denoising and dimensionality reduction using multilinear tools for hyperspectral images. IEEE Geosci Remote Sens Lett 5(2):138–142. https://doi.org/10.1109/LGRS.2008.915736
Tao D, Jin L, Liu W, Li X (2013) Hessian regularized support vector machines for mobile image annotation on the cloud. IEEE Trans Multimedia 15(4):833–844. https://doi.org/10.1109/TMM.2013.2238909
Liu W, Tao D (2013) Multiview Hessian regularization for image annotation. IEEE Trans Image Process 22(7):2676–2687. https://doi.org/10.1109/TIP.2013.2255302
Nie Y, Chen L, Zhu H, Du S, Yue T, Cao X (2017) Graph-regularized tensor robust principal component analysis for hyperspectral image denoising. Appl Opt 56(22):6094–6102. https://doi.org/10.1364/AO.56.006094
Yu N, Gao Y-L, Liu J-X, Wang J, Shang J (2019) Robust hypergraph regularized non-negative matrix factorization for sample clustering and feature selection in multi-view gene expression data. Hum Genom 13(1):1–10. https://doi.org/10.1186/s40246-019-0222-6
Yu J, Rui Y, Chen B (2013) Exploiting click constraints and multi-view features for image re-ranking. IEEE Trans Multimedia 16(1):159–168. https://doi.org/10.1109/TMM.2013.2284755
Kilmer ME, Martin CD (2011) Factorization strategies for third-order tensors. Linear Algebra Appl 435(3):641–658. https://doi.org/10.1016/j.laa.2010.09.020
Candes EJ, Xiaodong L, Yi M, Wright J (2011) Robust principal component analysis? JACM 58(3):1–37. https://doi.org/10.1145/1970392.1970395
Jin T, Yu J, You J, Zeng K, Li C, Yu Z (2015) Low-rank matrix factorization with multiple hypergraph regularizer. Pattern Recogn 48(3):1011–1022. https://doi.org/10.1016/j.patcog.2014.09.002
Zeng K, Yu J, Li C, You J, Jin T (2014) Image clustering by hyper-graph regularized non-negative matrix factorization. Neurocomputing 138:209–217. https://doi.org/10.1016/j.neucom.2014.01.043
Chen C, He B, Ye Y, Yuan X (2016) The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent. Math Program 155(1–2):57–59. https://doi.org/10.1007/s10107-014-0826-5
Zhang Z, Ely G, Aeron S, Hao N, Kilmer ME (2014) Novel methods for multilinear data completion and de-noising based on tensor-SVD. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3842–3849. https://doi.org/10.1109/CVPR.2014.485
Yu N, Wu M-J, Liu J-X, Zheng C-H, Xu Y (2020) Correntropy-based hypergraph regularized NMF for clustering and feature selection on multi-cancer integrated data. IEEE Trans Cybern 32603306:1–12. https://doi.org/10.1109/TCYB.2020.3000799
Cai D, He X, Han J, Huang TS (2010) Graph regularized nonnegative matrix factorization for data representation. IEEE Trans Pattern Anal Mach Intell 33(8):1548–1560. https://doi.org/10.1109/TPAMI.2010.231
Yu F, Liu L, Yu N, Ji L, Qiu D (2020) A method of L1-norm principal component analysis for functional data. Symmetry 12(1):182. https://doi.org/10.3390/sym12010182
Guo Q, Wu W, Massart DL, Boucon C, Jong SD (2002) Feature selection in principal component analysis of analytical data. Chemom Intell Lab Syst 61(1–2):123–132. https://doi.org/10.1016/S0169-7439(01)00203-9
Oh T-H, Tai Y-W, Bazin J-C, Kim H, Kweon IS (2015) Partial sum minimization of singular values in robust PCA: algorithm and applications. IEEE Trans Pattern Anal Mach Intell 38(4):744–758. https://doi.org/10.1109/TPAMI.2015.2465956
Lu C, Feng J, Chen Y, Liu W, Lin Z, Yan S (2019) Tensor robust principal component analysis with a new tensor nuclear norm. IEEE Trans Pattern Anal Mach Intell 42(4):925–938. https://doi.org/10.1109/TPAMI.2019.2891760
Funding
This work was supported in part by the National Natural Science Foundation of China under Grant Nos. 61872220.
Author information
Authors and Affiliations
Contributions
YYZ and CNJ proposed the HTRPCA method, performed the experiments, and drafted the manuscript. MLW and JXL contributed to the design of the study and manuscript. JW and CHZ contributed to the data analysis. JXL contributed to improving the writing of manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Rights and permissions
About this article
Cite this article
Zhao, YY., Jiao, CN., Wang, ML. et al. HTRPCA: Hypergraph Regularized Tensor Robust Principal Component Analysis for Sample Clustering in Tumor Omics Data. Interdiscip Sci Comput Life Sci 14, 22–33 (2022). https://doi.org/10.1007/s12539-021-00441-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12539-021-00441-8