Skip to main content
Log in

A new principal component analysis by particle swarm optimization with an environmental application for data science

  • Original Paper
  • Published:
Stochastic Environmental Research and Risk Assessment Aims and scope Submit manuscript

Abstract

In this paper, we propose a new method for disjoint principal component analysis based on an intelligent search. The method consists of a principal component analysis with constraints, allowing us to determine components that are linear combinations of disjoint subsets of the original variables. The effectiveness of the proposed method contributes to solve one of the crucial problems of multivariate analysis, that is, the interpretation of the vectorial subspaces in the reduction of the dimensionality. The method selects the variables that contribute the most to each of the principal components in a clear and direct way. Numerical results are provided to confirm the quality of the solutions attained by the proposed method. This method avoids a local optimum and obtains a high success rate when reaching the best solution, which occurs in all the cases of our simulation study. An illustration with environmental real data shows the good performance of the method and its potential applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Alatas B, Akin E (2008) Rough particle swarm optimization and its applications in data mining. Soft Comput 12:1205–1218

    Article  Google Scholar 

  • Barnston AG, Livezey RE (1987) Classification, seasonality and persistence of low-frequency atmospheric circulation patterns. Mon Weather Rev 115:1083–1126

    Article  Google Scholar 

  • Beaton D, Chin Fatt C, Abdi H (2014) An exposition of multivariate analysis with the singular value decomposition in R. Computatl Stat Data Anal 72:176–189

    Article  Google Scholar 

  • Carrasco JMF, Figueroa-Zuniga JI, Leiva V, Riquelme M, Aykroyd RG (2020) An errors-in-variables model based on the Birnbaum-Saunders and its diagnostics with an application to earthquake data. Stoch Environ Res Risk Assess 34:369–380

    Article  Google Scholar 

  • Chu W, Gao X, Sorooshian S (2011) Fortify particle swarm optimizer with principal components analysis: a case study in improving bound-handling for optimizing high-dimensional and complex problems. IEEE Congr Evolut Comput 2011:1644–1648

    Google Scholar 

  • Esmin A, Matwin S(2012) Data clustering using hybrid particle swarm optimization. In: Proceedings of the 13th international conference on intelligent data engineering and automated learning, pp. 159–1662

  • Ferrara C, Martella F, Vichi M (2016) Dimensions of well-being and their statistical measurements. Studies in theoretical and applied statistics. Springer, NY, pp 85–99

    Google Scholar 

  • Freitas A, Macedo E, Vichi M (2020) An empirical comparison of two approaches for CDPCA in high-dimensional data. Statistical Methods and Applications, pages in press available at https://doi.org/10.1007/s10260-020-00546-2

  • Frutos E, Galindo MP, Leiva V (2014) An interactive biplot implementation in R for modeling genotype-by-environment interaction. Stoch Environ Res Risk Assess 28:1629–1641

    Article  Google Scholar 

  • Gajawada S, Toshniwal D (2012) Projected clustering using particle swarm optimization. Proc Technol 4:360–364

    Article  Google Scholar 

  • Grossman GD, Nickerson DM, Freeman MC (1991) Principal component analyses of assemblage structure data: utility of tests based on eigenvalues. Ecology 72:341–347

    Article  Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Springer, NY

    Book  Google Scholar 

  • Hsieh WW (2004) Nonlinear multivariate and time series analysis by neural network methods. Rev Geophy 42(RG1003):1–25

    Google Scholar 

  • Huerta M, Leiva V, Liu S, Rodriguez M, Villegas D (2019) On a partial least squares regression model for asymmetric data with a chemical application in mining. Chemom Intell Lab Syst 190:55–68

    Article  CAS  Google Scholar 

  • Imran M, Hashim R, Khalid NEA (2013) An overview of particle swarm optimization variants. Proc Eng 53:491–496

    Article  Google Scholar 

  • Jolliffe IT (1973) Discarding variables in a principal component analysis. II: real data. J R Stat Soc C 22:21–31

    Google Scholar 

  • Jolliffe IT (2002) Principal component analysis. Springer, New York

    Google Scholar 

  • King JR, Jackson DA (1999) Variable selection in large environmental data sets using principal components analysis. Environmetrics 10:67–77

    Article  Google Scholar 

  • Lou S, Wu P, Guo L, Duan Y, Zhang X, Gao J (2020) Sparse principal component analysis using particle swarm optimization. J Chem Eng Jpn 53:327–336

    Article  CAS  Google Scholar 

  • Ma B, Ji H (2012) Particle swarm optimization algorithm establish the model of tobacco ingredients in near infrared spectroscopy quantitative analysis. Int Conf Comput Comput Technol Agric 393:92–98

    Google Scholar 

  • Macedo E, Freitas A (2015) The alternating least-squares algorithm for CDPCA. In: Plakhov A, Tchemisova T, Freitas A (eds) Optimization in the natural sciences. Springer, NY, pp 173–191

    Chapter  Google Scholar 

  • Mahoney MW, Drineas P (2009) CUR matrix decompositions for improved data analysis. Proc Natl Acad Sci U. S. Am 106:697–702

    Article  CAS  Google Scholar 

  • Martí L, García J, Berlanga A, Molina JM (2009) An approach to stopping criteria for multi-objective optimization evolutionary algorithms: The mgbm criterion. In: 2009 IEEE congress on evolutionary computation, pp. 1263–1270

  • Martinez S, Giraldo R, Leiva V (2019) Birnbaum-Saunders functional regression models for spatial data. Stoch Environ Res Risk Assess 30:1765–1780

    Article  Google Scholar 

  • Nezamabadi-Pour H, Rostami-Sharbabaki M, Farsangi M (2008) Binary particle swarm optimization: challenges and new solutions. J Comput Soc Iran 6:21–32

    Google Scholar 

  • Nieto-Librero AB (2015) Inferential version of biplot methods based on bootstrap resampling and its application to three-way tables.. PhD thesis, Universidad de Salamanca, Salamanca, Spain (In Spanish)

  • Nieto-Librero AB, Sierra-Fernández C, Vicente-Galindo MP, Ruíz-Barzola O, Galindo-Villardón MP (2017) Clustering disjoint hj-biplot: a new tool for identifying pollution patterns in geochemical studies. Chemosphere 176:389–396

    Article  CAS  Google Scholar 

  • R Core Team (2018) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria

  • Sangwook L, Sangmoon S, Sanghoun O, Witold P, Moongu J (2008) Modified binary particle swarm optimization. Prog Natl Sci 18:1161–1166

    Article  Google Scholar 

  • Song S, Wang Q, Chen J, Li Y, Zhang W, Ruan Y (2017) Fuzzy c-means clustering analysis based on quantum particle swarm optimization algorithm for the grouping of rock discontinuity sets. J Civ Eng 21:1115–1122

    Google Scholar 

  • Sorzano COS, Vargas J, Montano AP (2014) A survey of dimensionality reduction techniques. Retrieved September 5, 2020, from http://arxiv.org/abs/1403.2877

  • Van der Merwe DW, Engelbrecht A (2003) Data clustering using particle swarm optimization. In: The 2003 congress on evolutionary computation, 2003, vol. 1, pp. 215–220

  • Vasile CI, Buiu C (2011) A software system for collaborative robotics applications and its application in particle swarm optimization implementations. Appl Soft Comput J 11:5498–5507

    Article  Google Scholar 

  • Vichi M, Saporta G (2009) Clustering and disjoint principal component analysis. Comput Stat Data Anal 53:3194–3208

    Article  Google Scholar 

  • Vigneau E, Qannari EM (2003) Clustering of variables around latent components. Commun Stat Simul Comput 32:1131–1150

    Article  Google Scholar 

  • Vines SK (2000) Simple principal components. J R Stat Soc C 49:441–451

    Article  Google Scholar 

  • Voss MS (2005) Principal component particle swarm optimization: a step towards topological swarm intelligence. In: 2005 IEEE congress on evolutionary computation, vol. 1, pp. 298–305

  • Wang L, Liu X, Sun M, Qu J, Wei Y (2018) A new chaotic starling particle swarm optimization algorithm for clustering problems. Math Probl Eng 2018:1–14

    Google Scholar 

  • Zhao X, Lin W, Zhang Q (2014) Enhanced particle swarm optimization based on principal component analysis and line search. Appl Math Comput 229:440–456

    Google Scholar 

  • Zou H, Hastie T, Tibshirani R (2006) Sparse principal component analysis. J Comput Gr Stat 15:265–286

    Article  Google Scholar 

Download references

Acknowledgements

The authors thank the editors and reviewers for their constructive comments on an earlier version of this manuscript. The research was partially supported by ESPOL Polytechnic University, Escuela Superior Politécnica del Litoral, ESPOL, Guayaquil, Ecuador (J.A. Ramirez-Figueroa and C. Martin-Barreiro), and by FONDECYT (grant 1200525) from the National Agency for Research and Development (ANID) of the Chilean government (V. Leiva).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Victor Leiva.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ramirez-Figueroa, J.A., Martin-Barreiro, C., Nieto-Librero, A. et al. A new principal component analysis by particle swarm optimization with an environmental application for data science. Stoch Environ Res Risk Assess 35, 1969–1984 (2021). https://doi.org/10.1007/s00477-020-01961-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00477-020-01961-3

Keywords

Navigation