An efficient henry gas solubility optimization for feature selection

https://doi.org/10.1016/j.eswa.2020.113364Get rights and content

Highlights

  • Henry gases solubility optimization is used for the first time for feature selection.

  • The results revealed that HGSO shows high efficiency over the 12 datasets.

  • The proposed method is compared with six well-known optimization algorithms.

  • HGSO shows a high quality over a high accuracy and less number of selected features.

Abstract

In classification, regression, and other data mining applications, feature selection (FS) is an important pre-process step which helps avoid advert effect of noisy, misleading, and inconsistent features on the model performance. Formulating it into a global combinatorial optimization problem, researchers have employed metaheuristic algorithms for selecting the prominent features to simplify and enhance the quality of the high-dimensional datasets, in order to devise efficient knowledge extraction systems. However, when employed on datasets with extensively large feature-size, these methods often suffer from local optimality problem due to considerably large solution space. In this study, we propose a novel approach to dimensionality reduction by using Henry gas solubility optimization (HGSO) algorithm for selecting significant features, to enhance the classification accuracy. By employing several datasets with wide range of feature size, from small to massive, the proposed method is evaluated against well-known metaheuristic algorithms including grasshopper optimization algorithm (GOA), whale optimization algorithm (WOA), dragonfly algorithm (DA), grey wolf optimizer (GWO), salp swarm algorithm (SSA), and others from recent relevant literature. We used k-nearest neighbor (k-NN) and support vector machine (SVM) as expert systems to evaluate the selected feature-set. Wilcoxon’s ranksum non-parametric statistical test was carried out at 5% significance level to judge whether the results of the proposed algorithms differ from those of the other compared algorithms in a statistically significant way. Overall, the empirical analysis suggests that the proposed approach is significantly effective on low, as well as, considerably high dimensional datasets, by producing 100% accuracy on classification problems with more than 11,000 features.

Introduction

The technological development and its extravagant use in the fields of biomedical, bioinformatics, governments, finance, engineering, and social media to name a few has generated exponential amount of ubiquitous data. Generally, the data collected fabricates datasets with thousands of features containing diverse information; making it difficult for the traditional machine learning algorithms perform efficiently; mainly due to the curse of dimensionality dilemma (Manbari, AkhlaghianTab, Salavati, 2019, Wang, Wang, Chang, 2016). These high-dimensional datasets, as often observed, comprise of redundant, noisy, irrelevant, and unimportant information which hurts approximation accuracy of the designed model (Zawbaa, Emary, Grosan, & Snasel, 2018). Filtering out some of these features may result in several benefits including reduced computational cost, improved accuracy, and enhanced generalization ability (Abdel-Basset, El-Shahat, El-henawy, de Albuquerque, & Mirjalili, 2019). However, it is harder than said, as proved by the existing literature up to a larger extent (Xuan, Huang, Wu, Yinsong, & Ye, 2019). That said, selecting a subset of informative features from high-dimensional data poses several challenges that demand efficient feature selection approaches to effectively reduce the original data into a low dimensional space (Gangeh, Zarkoob, Ghodsi, 2017, Maldonado, López, 2018).

Generally, feature selection (FS) is also referred to as variable or attribute selection. Today, FS techniques have been successfully employed by expert systems to solve classification problems in image processing, text mining, speech recognition, biological gene data classification, etc. (Maldonado, Weber, & Famili, 2014). These methods can be classified into wrapper, filter, and embedded approaches. In the prior method (Kohavi & John, 1997), the selection of features is based on the resulting performance of the learning algorithm (e.g., classification accuracy for a specific classifier). In the filter approaches (Hancer, Xue, & Zhang, 2018), correlations between the features are considered in the evaluation process, and no external evaluators are involved. In the embedded methods, the classification model is trained by the available attributes of a dataset and the obtained results are used to evaluate the correlation of each attribute. Despite significant research on FS, there remains a real challenge since it seeks to find a trade-off between the higher classification performance and a lower dimensionality space (Liu & Yu, 2005). The wrapper methods outperform filter methods in terms of classification accuracy (Maldonado & Weber, 2009), hence adopted in this study. In fact, to achieve high classification accuracy, this approach does not necessarily depend on a large number of selected features for many classification problems. Moreover, the approach can employ numerous knowledge extraction and machine learning methods such as discriminant analysis (Hastie, Tibshirani, & Friedman, 2002), the k-nearest neighbor (k-NN) (Dasarathy, 1991), artificial neural networks (ANN) (Verikas & Bacauskiene, 2002), and support vector machines (SVMs) (Vapnik & Vapnik, 1998). In FS literature, k-NN and SVM are the most commonly used classification systems (Maldonado, Weber, Famili, 2014, Manbari, AkhlaghianTab, Salavati, 2019, Xue, Zhang, Browne, Yao, 2016).

In the related research, the traditional exhaustive methods such as breadth search, depth search, and others are deemed infeasible for finding the best subset of features due to limitations; especially when it comes to large datasets (Abdel-Basset et al., 2019). Alternatively, FS is considered as an NP-hard optimization problem where an optimal subset of features, from thousands of others, is searched so that the minimization of feature size and maximization of classification accuracy is achieved. In this connection, the metaheuristic algorithms have been successfully employed. The dynamic searching behaviors and global search ability mandate metaheuristic algorithms to solve hard optimization problem effectively. Some of the recent critical reviews (Ramanujam, David, 2019, Xue, Zhang, Browne, Yao, 2016) have duly highlighted the importance and potential of these methods in solving FS problems. In fact, several important gaps transpired in these studies encourage effective work to be performed in this particular line of research.

Recently, as mentioned earlier, several metaheuristic algorithms have been applied as an attempt to solve the FS problem, some of these algorithms are used alone while the others combined to compose hybrid approaches (Neggaz, Ewees, Elaziz, & Mafarja, 2020). Besides, some of the algorithms are devised to create variants like binary and chaotic ones (Thaher, Heidari, Mafarja, Dong, & Mirjalili, Zhang, Xu, Yu, Heidari, Li, Chen, Li, 2020). In classic approaches, particle swarm optimization (PSO) (Taradeh et al., 2019), harmony search (HS) (Soungsill, Hyoseok, & Park, 2019), artificial bee colony (ABC) (Gherardo et al., 2016), ant colony optimization (ACO) (Ghimatgar, Kazemi, Helfroush, & Aarabi, 2018) are some of the prominent works in this domain; while some of the recent and promising methods are grey wolf optimizer (GWO) (Abdel-Basset, El-Shahat, El-henawy, de Albuquerque, Mirjalili, 2019, Zakeri, Hokmabadi, 2019), grasshopper optimization algorithm (GOA) (Mafarja, Jaber, Ahmed, Thaher, 2019b, Xue, Zhang, Browne, 2014), antlion optimization (Antlion) (Aljarah et al., 2018), cuttlefish optimization algorithm (COA) (Moayedikia, Ong, Boo, Yeoh, & Jensen, 2017), whale optimization algorithm (WOA) (Hashim, Houssein, Mabrouk, Al-Atabany, Mirjalili, 2019, Wolpert, Macready, 1997), salp swarm algorithm (SSA) (Li, Xu, Liu, & Lu, 2017), firefly algorithm (FA) (Chen & Hao, 2017), sine-cosine algorithm (SCO) (Tharwat, Hassanien, & Elnaghi, 2017), and gravitational search algorithm (GSA) (Tharwat, 2019).

In binary algorithms category, several metaheuristic variants have been applied on FS problem; such as, binary GWO (Emary, Zawbaa, & Hassanien, 2016b), binary butterfly optimization (Arora & Anand, 2019), binary GOA (Mafarja et al., 2019a), binary antlion (Emary, Zawbaa, & Hassanien, 2016a), binary PSO (Chuang, Tsai, & Yang, 2011a), binary SSA (Faris et al., 2018), binary WOA (Hussien, Hassanien, Houssein, Bhattacharyya, Amin, 2019, Hussien, Houssein, Hassanien, 2017), and binary dragonfly optimization (Mafarja et al., 2018). On the other hand, some chaotic metaheuristic approaches applied on these problems such as chaotic SSA (Sayed, Khoriba, & Haggag, 2018), chaotic PSO (Chuang, Yang, & Li, 2011b), chaotic crow search algorithm (CSA) (Sayed, Hassanien, & Azar, 2019), and chaotic bird swarm optimization (BSO) (Ismail, Houssein, & Hassanien, 2018). Several hybrid approaches have also been proposed to FS problem; such as, hybrid ACO (Kabir, Shahjahan, & Murase, 2012), hybrid PSO with spiral-shaped mechanism (SSM) (Chen, Zhou, & Yuan, 2019), hybrid WOA with simulated annealing (SA) (Mafarja & Mirjalili, 2017), ant and bee colony optimization (AC-ABC) (Shunmugapriya & Kanmani, 2017), differential evolution (DE) and ABC Zorarpaci and Özel (2016), bat algorithm (BA) and PSO (Tawhid & Dsouza, 2018), opposition-based MFO improved by DE (Elaziz, Ewees, Ibrahim, & Lu, 2019), and harmony search with PSO (Ouyang, Gao, Kong, Li, & Zou, 2016).

As discussed earlier, the metaheuristic algorithms have shown encouraging results when employed on FS problems in the last decades. However, despite growing research in this direction, a fundamental question still arises that if we need more optimization techniques to find further improved results. These recently introduced metaheuristic algorithms, derived from biological evolution, swarm behavior, physical principles, and mathematics rules, have been increasingly investigated in this regard, yet researchers contend that these methods often perform ineffectively when the complexity and problem dimensionality surge significantly. There are two main motivations for this work: a) No-Free-Lunch (NFL) (Wolpert & Macready, 1997) which states that there is no optimization technique for solving all optimization problems, hence the superior performance of an optimizer on a particular group of problems does not assure the equally effective performance on another group of problems. This has motivated many researchers in this field to adapt the latest techniques for new groups of problems. Same is the foundation and motivation for this work as well, in which we propose a novel Henry gas solubility optimization (HGSO) (Hashim et al., 2019) for solving the FS problem with greater dimensionality. b) To the best of authors’ knowledge, HGSO is used for the first time for FS in this work. The major contributions of this paper are as follows:

  • Introducing a novel algorithm for FS problem based on HGSO that imitates the Henry’s law of physics.

  • Comparing the performance of HGSO with established swarm intelligence algorithms such as GOA, DA, WOA, SSA, and GWO. Furthermore, a fair comparison is realized with some works of the literature in terms of accuracy and the number of selected features.

  • The HGSO approach is evaluated on 18 datasets where 9 of them maintain significantly high dimensions exceeding 15,000 features, with small instances. It is noteworthy to mention that implementation of the metaheuristic algorithms on FS problems with this much high dimensionality is rare in literature.

  • The impact of the classifier type based on HGSO is realized using k-nearest neighbors (k-NN) and support vector machines (SVM).

The rest of the paper is organized as follows. The subsequent section provides description of the HGSO algorithm, the k nearest neighbors (k-NN) and support vector machines (SVM). Section 3 explains the implementation HGSO on FS problem. The results and discussion are presented in Section 4. Finally, conclusion and the future directions are drawn in Section 5.

Section snippets

Henry gas solubility optimization (HGSO)

Recently, a novel metaheuristic algorithm, called Henry gas solubility optimization (HGSO) (Hashim et al., 2019), has been proposed, which is inspired from as well-known law of physics Henry’s law. The law explains the phenomenon related to solubility of gas in a liquid under certain pressure. Interestingly, the Henry’s law revolves around our daily life in the form of cans of carbonized beverages (Hashim et al., 2019) formulated HGSO based on huddling behaviour of gas particles when immersed

HGSO for feature selection

In this section, we discuss the implementation of HGSO on FS problem. The proposed FS solution is a three-phased process, each described in the following subsections:

Experimental evaluation and discussion

In order to validate the efficiency of the proposed algorithm HGSO, three experiments are carried over eighteen datasets taken from UCI (Frank, 2010), using programming tool MATLAB 2017b on computing environment with Intel® CoreTM i7 (3.40 GHz) CPU with RAM 32GB and operating system Microsoft Windows 10. The first experiment is focused on treatment of the impact of cluster numbers in HGSO with k-NN classifier (k=5)over different datasets and different metrics as mean fitness, mean accuracy, the

Conclusion

The goal of this paper was to propose a new feature selection approach based on the Henry gas solubility optimization (HGSO). To the best of authors’ knowledge, this was the first study that implemented HGSO to solve the feature selection problem. In order to investigate performance of the proposed HGSO approach, the experiments are applied on eighteen benchmark datasets from UCI datasets and five evaluation criteria to assess different aspects of compared algorithms performance are evaluated.

Declaration of Competing Interest

The authors have declared that there is no conflict of interest.

References (63)

  • H. Ghimatgar et al.

    An improved feature selection algorithm based on graph clustering and ant colony optimization

    Knowledge-Based Systems

    (2018)
  • E. Hancer et al.

    Differential evolution for filter feature selection based on information theory and feature ranking

    Knowledge-Based Systems

    (2018)
  • M.M. Kabir et al.

    A new hybrid ant colony optimization algorithm for feature selection

    Expert Systems with Applications

    (2012)
  • R. Kohavi et al.

    Wrappers for feature subset selection

    Artificial Intelligence

    (1997)
  • M. Mafarja et al.

    Binary grasshopper optimisation algorithm approaches for feature selection problems

    Expert Systems with Applications

    (2019)
  • M. Mafarja et al.

    Binary dragonfly optimization for feature selection using time-varying transfer functions

    Knowledge-Based Systems

    (2018)
  • M.M. Mafarja et al.

    Hybrid whale optimization algorithm with simulated annealing for feature selection

    Neurocomputing

    (2017)
  • S. Maldonado et al.

    Dealing with high-dimensional class-imbalanced datasets: Embedded feature selection for SVM classification

    Applied Soft Computing

    (2018)
  • S. Maldonado et al.

    A wrapper method for feature selection using support vector machines

    Information Sciences

    (2009)
  • S. Maldonado et al.

    Feature selection for high-dimensional class-imbalanced data sets using support vector machines

    Information Sciences

    (2014)
  • Z. Manbari et al.

    Hybrid fast unsupervised feature selection for high-dimensional data

    Expert Systems with Applications

    (2019)
  • A. Moayedikia et al.

    Feature selection for high dimensional imbalanced class data using harmony search

    Engineering Applications of Artificial Intelligence

    (2017)
  • N. Neggaz et al.

    Boosting salp swarm algorithm by sine cosine algorithm and disrupt operator for feature selection

    Expert Systems with Applications

    (2020)
  • H.-b. Ouyang et al.

    Hybrid harmony search particle swarm optimization with global dimension selection

    Information Sciences

    (2016)
  • P. Shunmugapriya et al.

    A hybrid algorithm using ant and bee colony optimization for feature selection and classification (ac-abc hybrid)

    Swarm and Evolutionary Computation

    (2017)
  • M. Taradeh et al.

    An evolutionary gravitational search-based feature selection

    Information Sciences

    (2019)
  • A. Tharwat et al.

    A ba-based algorithm for parameter optimization of support vector machine

    Pattern Recognition Letters

    (2017)
  • A. Verikas et al.

    Feature selection with neural networks

    Pattern Recognition Letters

    (2002)
  • L. Wang et al.

    Feature selection methods for big data bioinformatics: A survey from the search perspective

    Methods

    (2016)
  • Y. Wu et al.

    Improved k-nearest neighbor classification

    Pattern recognition

    (2002)
  • B. Xue et al.

    Particle swarm optimisation for feature selection in classification: Novel initialisation and updating mechanisms

    Applied soft computing

    (2014)
  • Cited by (158)

    View all citing articles on Scopus
    View full text