Relevant feature selection and ensemble classifier design using bi-objective genetic algorithm

Das, Asit Kumar; Pati, Soumen Kumar; Ghosh, Arka

doi:10.1007/s10115-019-01341-6

Relevant feature selection and ensemble classifier design using bi-objective genetic algorithm

Regular Paper
Published: 05 March 2019

Volume 62, pages 423–455, (2020)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Asit Kumar Das¹,
Soumen Kumar Pati² &
Arka Ghosh¹

431 Accesses
11 Citations
Explore all metrics

Abstract

In the era of digital boom, single classifier cannot perform well in various datasets. Ensemble classifier aims to bridge this performance gap by combining multiple classifiers of diverse characteristics to get better generalization. But classifier selection highly depends on the dataset, and its efficiency degrades tremendously due to the presence of irrelevant features. Feature selection aids the performance of classifier by removing those irrelevant features. Initially, we have proposed a bi-objective genetic algorithm-based feature selection method (FSBOGA), where nonlinear, uniform, hybrid cellular automata are used to generate an initial population. Objective functions are defined using lower bound approximation of rough set theory and Kullback–Leibler divergence method of information theory to select unambiguous and informative features. The replacement strategy for creation of next-generation population is based on the Pareto optimal solution with respect to both the objective functions. Next, a novel bi-objective genetic algorithm-based ensemble classification method (CCBOGA) is devised to ensemble the individual classifiers designed using obtained reduced datasets. It is observed that the constructed ensemble classifier performs better than the individual classifiers. The performances of proposed FSBOGA and CCBOGA are investigated on some popular datasets and compared with the state-of-the-art algorithms to demonstrate their effectiveness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A new ensemble feature selection approach based on genetic algorithm

Article 11 April 2020

Hongzhi Wang, Chengquan He & Zhuping Li

Feature Subset Selection Approach by Gray-Wolf Optimization

Ensemble Feature Selection Method Based on Bio-inspired Algorithms for Multi-objective Classification Problem

References

Abbaszadeh O, Amiri A, Khanteymoori AR (2015) An ensemble method for data stream classification in the presence of concept drift. Front Inf Technol Electron Eng 16(2):1059–1068
Article Google Scholar
Acharyya A, Rakshit S, Sarkar R, Basu S, Nasipuri M (2013) Handwritten word recognition using MLP based classifier: a holistic approach. IJCSI Int J Comput Sci Issues 10(2):422–427
Google Scholar
Bache K, Lichman M (2013) UCI machine learning repository, p 901. http://archive.ics.uci.edu/ml. Accessed 2013
Bandyopadhyay S, Bhadra T, Mitra P, Maulik U (2014) Integration of dense sub graph finding with feature clustering for unsupervised feature selection. Pattern Recogn Lett 40:104–112
Article Google Scholar
Bauer E, Kohavi R (1999) An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach Learn 35:1–38
Google Scholar
Bernstein E, Amit Y (2005) Part-based statistical models for object classification and detection. Proc Comput Vis Pattern Recognit (CVPR) 2:734–740
Google Scholar
Cai D, Zhang C, He X (2010) Unsupervised feature selection for multi-cluster data. In: 16th ACM SIGKDD international conference on knowledge discovery and Data mining, pp 333–342
Chaconas G, Lavoie BD, Watson MA (1996) DNA transposition: jumping gene machine. Curr Biol 6(7):817–820
Article Google Scholar
Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28
Article Google Scholar
Cheng X, Cai H, Zhang Y, Xu B, Su W (2015) Optimal combination of feature selection and classification via local hyperplane based learning strategy. BMC Bioinform 16:219. https://doi.org/10.1186/s12859-015-0629-6
Article Google Scholar
Cheng S, Chen M, Wai R, Wang F (2014) Optimal placement of distributed generation units in distribution systems via an enhanced multi-objective particle swarm optimization algorithm. J Zhejiang Univ Sci 15(4):300–311
Article Google Scholar
Cyganek B (2015) Hybrid ensemble of classifiers for logo and trademark symbols recognition. Soft Comput 19(12):3413–3430
Article Google Scholar
Das AK, Sil J (2011) An efficient classifier design integrating rough set and set oriented database operations. Appl Soft Comput 11:2279–2285
Article Google Scholar
Das AK, Das S, Ghosh A (2017) Ensemble feature selection using bi-objective genetic algorithm. Knowl Based Syst 123:116–127
Article Google Scholar
Deb K, Pratap A, Agarwal S, Meyarivan TA (2002) A fast and elitist multi objective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6(2):182–197
Article Google Scholar
Dietterich TG (2000) An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting and randomization. Mach Learn 40(2):139–157
Article Google Scholar
Ding C, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol 3(2):185–205
Article Google Scholar
Fortino V, Kinaret P, Fyhrquist N, Alenius H, Greco D (2014) A robust and accurate method for feature selection and prioritization from multi-class OMICs data. PLoS ONE 9(9):e107801. https://doi.org/10.1371/journal.pone.0107801
Article Google Scholar
Freund Y, Schapire R (1996) Experiments with new boosting algorithms. In: International conference on machine learning
Gabrys B, Ruta D (2006) Genetic algorithms in classifier fusion. Appl Soft Comput 6(4):337–347
Article Google Scholar
Gu F, Liu HL, Tan KC (2015) A hybrid evolutionary multi-objective optimization algorithm with adaptive multi-fitness assignment. Soft Comput 19(11):3249–3259
Article Google Scholar
Hall AM (1999) Correlation-based feature selection for machine learning. The University of Waikato
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor 11(1):10–18
Article Google Scholar
Jing SY (2014) A hybrid genetic algorithm for feature subset selection in rough set theory. Soft Comput 18(7):1373–1382
Article Google Scholar
Kent Ridge Bio-medical Data Set Repository. http://datam.i2r.a-star.edu.sg/datasets/krbd. Accessed 2002
Kerber R (1992) ChiMerge: discretization of numeric attributes. In: Tenth national conference on artificial intelligence, pp 123–128
Kim S, Scalzo F, Telesca D, Hu X (2015) Ensemble of sparse classifiers for high-dimensional biological data. Int. J. Data Min Bioinform 12(2):167–183
Article Google Scholar
Knowles JD, Corne DW (2000) M-PAES: a memetic algorithm for multi-objective optimization. In: IEEE congress on evolutionary computation, pp 325–332
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86
Article MathSciNet MATH Google Scholar
Kuncheva LI, Jain LC (2000) Designing classifier fusion systems by genetic algorithms. IEEE Trans Evol Comput 4(4):327–336
Article Google Scholar
Lazar C, Taminau J, Meganck S, Steenhoff D, Coletta A, Molter C, Schaetzen V, Duque R, Bersini H, Nowe A (2012) A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE ACM Trans Comput Biol Bioinform 9(4):1106–1119
Article Google Scholar
Laura EA, Santana A, Canuto MP (2012) Bi-objective genetic algorithm for feature selection in ensemble systems. In: Artificial neural networks and machine learning—ICANN 2012. LNCS, vol 7552. Springer, Berlin, pp 701–709
Lehmann EL, Romano JP (2006) Testing statistical hypothese, vol 64, no 2. Springer, Berlin, pp 255–256
Ma X, Huo J, Wang Q (2010) A multi-objective genetic algorithm approach based on the uniform design metmod. In: International conference on computational intelligence and security, Nanning, pp 160–164. https://doi.org/10.1109/cis.2010.43
Maaranen H, Miettinen K, Makela MM (2004) A quasi-random initial population for genetic algorithms. Comput Math Appl 47(12):1885–1895
Article MathSciNet MATH Google Scholar
Mitra P, Murthy CA, Pal SK (2002) Unsupervised feature selection using feature similarity. IEEE Trans Pattern Anal Mach Intell 24(3):301–312
Article Google Scholar
Neumann JV (1996) Cellular automata. In: Burks AW (ed) Theory of self-reproducing automata. Chap. 2, University of Illinois Press, Champaign
Oliveira LS, Sabourin R, Bortolozzi F, Suen CY (2003) Feature selection for ensembles: a hierarchical multi-objective genetic algorithm approach. In: Seventh international conference on document analysis and recognition-ICDAR, vol 2. IEEE Computer Society, Washington, p 676
Pawlak Z (1997) Rough set approach to knowledge-based decision support. Eur J Oper Res 99(1):48–57
Article MATH Google Scholar
Price K, Storn RM, Lampinen JA (2005) Differential evolution: a practical approach to global optimization. Natural computing series. Springer, New York
MATH Google Scholar
Rokach L (2010) Ensemble based classifiers. Artif Intell Rev 33(1–2):1–39
Article Google Scholar
Roth V, Lange T (2004) Bayesian class discovery in microarray dataset. IEEE Trans Biomed Eng 51(5):707–718
Article Google Scholar
Santana LS, Canuto AM (2014) Filter-based optimization techniques for selection of feature subsets in ensemble systems. Expert Syst Appl 41(4):1622–1631
Article Google Scholar
Schapire RE, Freund Y, Bartlett P (1998) Boosting the margin: a new explanation for the effectiveness of voting method. Ann Stat 26(5):1651–1686
Article MathSciNet MATH Google Scholar
Schölkopf AJ, Smola R, Bartlett P (2000) New support vector algorithms. Neural Comput 12(5):1207–1245
Article Google Scholar
Stoorvogel AA, Saberi A (2014) On global external stochastic stabilization of linear systems with input saturation. In: American control conference, OR, pp 2972–2976. https://doi.org/10.1109/acc.2014.6858588
Teli S, Kanikar P (2015) A survey on decision tree based approaches in data mining. Int J Adv Res Comput Sci Softw Eng 5(4):613–617
Google Scholar
Thandar AM, Khaing MK (2012) Radial basis function (RBF) neural network classification based on consistency evaluation measure. Int J Comput Appl 54(15):20–23
Google Scholar
Trivedi A, Srinivasan D, Sanyal K, Ghosh A (2017) A survey of multiobjective evolutionary algorithms based on decomposition. IEEE Trans Evol Comput 21(3):440–462. https://doi.org/10.1109/TEVC.2016.2608507
Article Google Scholar
Wang H, Jiao L, Yao X (2014) An improved two-archive algorithm for many-objective optimization. IEEE Trans Evolut Comput. https://doi.org/10.1109/TEVC.2014.2350987
Article Google Scholar
Webb G, Zheng Z (2004) Multi-strategy ensemble learning: reducing error by combining ensemble learning techniques. IEEE Trans Knowl Data Eng 16(8):980–991
Article Google Scholar
Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. In: ICML, vol 97, pp 412–420
Yang P, Zhang Z (2007) Hybrid methods to select informative gene sets in microarray data classification. In: Proceedings of AI 2007. LNAI, vol 4830. Springer, Berlin, pp 811–815
Zhang Q, Li H (2007) MOEA/D: a multi-objective evolutionary algorithm based on decomposition. IEEE Trans Evolut Comput 11(6):712–731
Article Google Scholar
Zhang Z, Yang P (2008) An ensemble of classifiers with genetic algorithm based feature selection. IEEE Intell Inform Bull 9(1):18–24
Google Scholar
Zitzler E, Thiele L (1999) Multi-objective evolutionary algorithms: a comparative case study and the strength Pareto approach. IEEE Trans Evol Comput 3(4):257–271
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank anonymous reviewers for their valuable comments.

Author information

Authors and Affiliations

Computer Science and Technology, Indian Institute of Engineering Science and Technology, Shibpur, Howrah, West Bengal, 711103, India
Asit Kumar Das & Arka Ghosh
Bioinformatics, Maulana Abul Kalam Azad University of Technology, Haringhata, Nadia, West Bengal, 741249, India
Soumen Kumar Pati

Authors

Asit Kumar Das
View author publications
You can also search for this author in PubMed Google Scholar
Soumen Kumar Pati
View author publications
You can also search for this author in PubMed Google Scholar
Arka Ghosh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Asit Kumar Das.

Ethics declarations

Conflict of interest

The authors declare that there are no conflicts of interest in this paper.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Das, A.K., Pati, S.K. & Ghosh, A. Relevant feature selection and ensemble classifier design using bi-objective genetic algorithm. Knowl Inf Syst 62, 423–455 (2020). https://doi.org/10.1007/s10115-019-01341-6

Download citation

Received: 30 January 2017
Revised: 22 January 2019
Accepted: 23 February 2019
Published: 05 March 2019
Issue Date: February 2020
DOI: https://doi.org/10.1007/s10115-019-01341-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Relevant feature selection and ensemble classifier design using bi-objective genetic algorithm

Abstract

Access this article

Similar content being viewed by others

A new ensemble feature selection approach based on genetic algorithm

Feature Subset Selection Approach by Gray-Wolf Optimization

Ensemble Feature Selection Method Based on Bio-inspired Algorithms for Multi-objective Classification Problem

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Relevant feature selection and ensemble classifier design using bi-objective genetic algorithm

Abstract

Access this article

Similar content being viewed by others

A new ensemble feature selection approach based on genetic algorithm

Feature Subset Selection Approach by Gray-Wolf Optimization

Ensemble Feature Selection Method Based on Bio-inspired Algorithms for Multi-objective Classification Problem

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation