Abstract
In this paper, we propose a new method for disjoint principal component analysis based on an intelligent search. The method consists of a principal component analysis with constraints, allowing us to determine components that are linear combinations of disjoint subsets of the original variables. The effectiveness of the proposed method contributes to solve one of the crucial problems of multivariate analysis, that is, the interpretation of the vectorial subspaces in the reduction of the dimensionality. The method selects the variables that contribute the most to each of the principal components in a clear and direct way. Numerical results are provided to confirm the quality of the solutions attained by the proposed method. This method avoids a local optimum and obtains a high success rate when reaching the best solution, which occurs in all the cases of our simulation study. An illustration with environmental real data shows the good performance of the method and its potential applications.
Similar content being viewed by others
References
Alatas B, Akin E (2008) Rough particle swarm optimization and its applications in data mining. Soft Comput 12:1205–1218
Barnston AG, Livezey RE (1987) Classification, seasonality and persistence of low-frequency atmospheric circulation patterns. Mon Weather Rev 115:1083–1126
Beaton D, Chin Fatt C, Abdi H (2014) An exposition of multivariate analysis with the singular value decomposition in R. Computatl Stat Data Anal 72:176–189
Carrasco JMF, Figueroa-Zuniga JI, Leiva V, Riquelme M, Aykroyd RG (2020) An errors-in-variables model based on the Birnbaum-Saunders and its diagnostics with an application to earthquake data. Stoch Environ Res Risk Assess 34:369–380
Chu W, Gao X, Sorooshian S (2011) Fortify particle swarm optimizer with principal components analysis: a case study in improving bound-handling for optimizing high-dimensional and complex problems. IEEE Congr Evolut Comput 2011:1644–1648
Esmin A, Matwin S(2012) Data clustering using hybrid particle swarm optimization. In: Proceedings of the 13th international conference on intelligent data engineering and automated learning, pp. 159–1662
Ferrara C, Martella F, Vichi M (2016) Dimensions of well-being and their statistical measurements. Studies in theoretical and applied statistics. Springer, NY, pp 85–99
Freitas A, Macedo E, Vichi M (2020) An empirical comparison of two approaches for CDPCA in high-dimensional data. Statistical Methods and Applications, pages in press available at https://doi.org/10.1007/s10260-020-00546-2
Frutos E, Galindo MP, Leiva V (2014) An interactive biplot implementation in R for modeling genotype-by-environment interaction. Stoch Environ Res Risk Assess 28:1629–1641
Gajawada S, Toshniwal D (2012) Projected clustering using particle swarm optimization. Proc Technol 4:360–364
Grossman GD, Nickerson DM, Freeman MC (1991) Principal component analyses of assemblage structure data: utility of tests based on eigenvalues. Ecology 72:341–347
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Springer, NY
Hsieh WW (2004) Nonlinear multivariate and time series analysis by neural network methods. Rev Geophy 42(RG1003):1–25
Huerta M, Leiva V, Liu S, Rodriguez M, Villegas D (2019) On a partial least squares regression model for asymmetric data with a chemical application in mining. Chemom Intell Lab Syst 190:55–68
Imran M, Hashim R, Khalid NEA (2013) An overview of particle swarm optimization variants. Proc Eng 53:491–496
Jolliffe IT (1973) Discarding variables in a principal component analysis. II: real data. J R Stat Soc C 22:21–31
Jolliffe IT (2002) Principal component analysis. Springer, New York
King JR, Jackson DA (1999) Variable selection in large environmental data sets using principal components analysis. Environmetrics 10:67–77
Lou S, Wu P, Guo L, Duan Y, Zhang X, Gao J (2020) Sparse principal component analysis using particle swarm optimization. J Chem Eng Jpn 53:327–336
Ma B, Ji H (2012) Particle swarm optimization algorithm establish the model of tobacco ingredients in near infrared spectroscopy quantitative analysis. Int Conf Comput Comput Technol Agric 393:92–98
Macedo E, Freitas A (2015) The alternating least-squares algorithm for CDPCA. In: Plakhov A, Tchemisova T, Freitas A (eds) Optimization in the natural sciences. Springer, NY, pp 173–191
Mahoney MW, Drineas P (2009) CUR matrix decompositions for improved data analysis. Proc Natl Acad Sci U. S. Am 106:697–702
Martí L, García J, Berlanga A, Molina JM (2009) An approach to stopping criteria for multi-objective optimization evolutionary algorithms: The mgbm criterion. In: 2009 IEEE congress on evolutionary computation, pp. 1263–1270
Martinez S, Giraldo R, Leiva V (2019) Birnbaum-Saunders functional regression models for spatial data. Stoch Environ Res Risk Assess 30:1765–1780
Nezamabadi-Pour H, Rostami-Sharbabaki M, Farsangi M (2008) Binary particle swarm optimization: challenges and new solutions. J Comput Soc Iran 6:21–32
Nieto-Librero AB (2015) Inferential version of biplot methods based on bootstrap resampling and its application to three-way tables.. PhD thesis, Universidad de Salamanca, Salamanca, Spain (In Spanish)
Nieto-Librero AB, Sierra-Fernández C, Vicente-Galindo MP, Ruíz-Barzola O, Galindo-Villardón MP (2017) Clustering disjoint hj-biplot: a new tool for identifying pollution patterns in geochemical studies. Chemosphere 176:389–396
R Core Team (2018) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria
Sangwook L, Sangmoon S, Sanghoun O, Witold P, Moongu J (2008) Modified binary particle swarm optimization. Prog Natl Sci 18:1161–1166
Song S, Wang Q, Chen J, Li Y, Zhang W, Ruan Y (2017) Fuzzy c-means clustering analysis based on quantum particle swarm optimization algorithm for the grouping of rock discontinuity sets. J Civ Eng 21:1115–1122
Sorzano COS, Vargas J, Montano AP (2014) A survey of dimensionality reduction techniques. Retrieved September 5, 2020, from http://arxiv.org/abs/1403.2877
Van der Merwe DW, Engelbrecht A (2003) Data clustering using particle swarm optimization. In: The 2003 congress on evolutionary computation, 2003, vol. 1, pp. 215–220
Vasile CI, Buiu C (2011) A software system for collaborative robotics applications and its application in particle swarm optimization implementations. Appl Soft Comput J 11:5498–5507
Vichi M, Saporta G (2009) Clustering and disjoint principal component analysis. Comput Stat Data Anal 53:3194–3208
Vigneau E, Qannari EM (2003) Clustering of variables around latent components. Commun Stat Simul Comput 32:1131–1150
Vines SK (2000) Simple principal components. J R Stat Soc C 49:441–451
Voss MS (2005) Principal component particle swarm optimization: a step towards topological swarm intelligence. In: 2005 IEEE congress on evolutionary computation, vol. 1, pp. 298–305
Wang L, Liu X, Sun M, Qu J, Wei Y (2018) A new chaotic starling particle swarm optimization algorithm for clustering problems. Math Probl Eng 2018:1–14
Zhao X, Lin W, Zhang Q (2014) Enhanced particle swarm optimization based on principal component analysis and line search. Appl Math Comput 229:440–456
Zou H, Hastie T, Tibshirani R (2006) Sparse principal component analysis. J Comput Gr Stat 15:265–286
Acknowledgements
The authors thank the editors and reviewers for their constructive comments on an earlier version of this manuscript. The research was partially supported by ESPOL Polytechnic University, Escuela Superior Politécnica del Litoral, ESPOL, Guayaquil, Ecuador (J.A. Ramirez-Figueroa and C. Martin-Barreiro), and by FONDECYT (grant 1200525) from the National Agency for Research and Development (ANID) of the Chilean government (V. Leiva).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ramirez-Figueroa, J.A., Martin-Barreiro, C., Nieto-Librero, A. et al. A new principal component analysis by particle swarm optimization with an environmental application for data science. Stoch Environ Res Risk Assess 35, 1969–1984 (2021). https://doi.org/10.1007/s00477-020-01961-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00477-020-01961-3