Abstract
Features Selection (FS) approaches have more attention since they have been applied to several fields primarily to deal with high dimensional data. An increase in the dimension of data can lead to degradation of the accuracy of the machine learning method. Therefore, there are several FS methods based on meta-heuristic (MH) techniques that have been developed to tackle the FS problem and avoid the limitations of traditional FS approaches. However, those MH methods still need improvements that suffer from some drawbacks that affect the quality of the final output. So, this paper proposed a modified Henry Gas Solubility Optimization (HGSO) using enhanced Harris hawks optimization (HHO) based on Heavy-tailed distributions (HTDs). In this study, a dynamical exchange between five HTDs is used to boost the HHO that modifies, in turn, the exploitation phase in HGSO. As a result, we proposed a dynamic modified HGSO based on enhanced HHO (DHGHHD). To assess the efficiency of the proposed DHGHHD, a set of eighteen UCI datasets are used. Furthermore, it applied to improve the prediction of two real-world datasets in the drug design and discovery field. The DHGHHD is compared with eight well-known MH methods. Comparison results illustrate the high quality of DHGHHD according to the values of accuracy, fitness value, and the number of selected features.
Similar content being viewed by others
References
Abd Elaziz M, Attiya I (2020) An improved henry gas solubility optimization algorithm for task scheduling in cloud computing. Artif Intell Rev 1–39
Abd Elaziz M, Moemen YS, Hassanien AE, Xiong S (2020) Toxicity risks evaluation of unknown fda biotransformed drugs based on a multi-objective feature selection approach. Appl Soft Computd 97:105509
Abualigah L, Diabat A, Mirjalili S, Abd Elaziz M, Gandomi AH (2021) The arithmetic optimization algorithm. Comput Methods Appl Mech Eng 376:113609
Algamal Z, Qasim M, Ali H (2017) A qsar classification model for neuraminidase inhibitors of influenza a viruses (h1n1) based on weighted penalized support vector machine. SAR QSAR Environ Res 28:415–426
Allen J, Davey HM, Broadhurst D, Heald JK, Rowland JJ, Oliver SG, Kell DB (2003) High-throughput classification of yeast mutants for functional genomics using metabolic footprinting. Nat Biotechnol 21:692–696
Arora S, Anand P (2019) Binary butterfly optimization approaches for feature selection. Expert Syst Appl 116:147–160
Asuncion A, Newman D (2010) Uci machine learning repository
Cao W, Liu X, Ni J (2020) Parameter optimization of support vector regression using henry gas solubility optimization algorithm. IEEE Access 8:88633–88642
Cong Y, Li B-K, Yang X-G, Xue Y, Chen Y-Z, Zeng Y (2013) Quantitative structure-activity relationship study of influenza virus neuraminidase a/pr/8/34 (h1n1) inhibitors by genetic algorithm feature selection and support vector regression. Chemom Intell Lab Syst 127:35–42
Das AK, Sengupta S, Bhattacharyya S (2018) A group incremental feature selection for classification using rough set theory based genetic algorithm. Appl Soft Comput 65:400–411
Elkan C (2013) Predictive analytics and data mining. University of California
Emary E, Zawbaa HM, Hassanien AE (2016) Binary grey wolf optimization approaches for feature selection. Neurocomputing 172:371–381
Faris H, Mafarja MM, Heidari AA, Aljarah I, Ala’M A-Z, Mirjalili S, Fujita H (2018) An efficient binary salp swarm algorithm with crossover scheme for feature selection problems. Knowl Based Syst 154:43–67
Goldberg DE (1989) Genetic algorithms in search. Addison Wesley Publishing Co. Inc, Boston
Halvorsen AR, Helland Å, Gromov P, Wielenga VT, Talman M-LM, Brunner N, Sandhu V, Børresen-Dale A-L, Gromova I, Haakensen VD (2017) Profiling of micro rna s in tumor interstitial fluid of breast tumors-a novel resource to identify biomarkers for prognostic classification and detection of cancer. Mol Oncol 11:220–234
Hancer E, Xue B, Zhang M (2018) Differential evolution for filter feature selection based on information theory and feature ranking. Knowl Based Syst 140:103–119
Hashim FA, Houssein EH, Mabrouk MS, Al-Atabany W, Mirjalili S (2019) Henry gas solubility optimization: a novel physics-based algorithm. Future Gener Comput Syst 101:646–667
Heidari AA, Mirjalili S, Faris H, Aljarah I, Mafarja M, Chen H (2019) Harris hawks optimization: algorithm and applications. Future Gener Comput Syst 97:849–872
Hossin M, Sulaiman M (2015) A review on evaluation metrics for data classification evaluations. Int J Data Min Knowl Manag Process 5:1
Jain I, Jain VK, Jain R (2018) Correlation feature selection based improved-binary particle swarm optimization for gene selection and cancer classification. Appl Soft Comput 62:203–215
Kohavi R, John GH et al (1997) Wrappers for feature subset selection. Artif Intell 97:273–324
Mafarja M, Aljarah I, Faris H, Hammouri AI, Ala’M A-Z, Mirjalili S (2019) Binary grasshopper optimisation algorithm approaches for feature selection problems. Expert Syst Appl 117:267–286
Mafarja M, Aljarah I, Heidari AA, Hammouri AI, Faris H, Ala’M A-Z, Mirjalili S (2018) Evolutionary population dynamics and grasshopper optimization approaches for feature selection problems. Knowl Based Syst 145:25–45
Mafarja M, Mirjalili S (2018) Whale optimization approaches for wrapper feature selection. Appl Soft Comput 62:441–453
Mafarja MM, Mirjalili S (2017) Hybrid whale optimization algorithm with simulated annealing for feature selection. Neurocomputing 260:302–312
Martínez MJ, Dussaut JS, Ponzoni I (2018) Biclustering as strategy for improving feature selection in consensus qsar modeling. Electron Notes Discret Math 69:117–124
Mirjalili S (2016) SCA: a sine cosine algorithm for solving optimization problems. Knowl Based Syst 96:120–133
Mirjalili S, Gandomi AH, Zahra MS, Shahrzad S, Faris H, Mohammad MS (2017) Salp swarm algorithm: a bio-inspired optimizer for engineering design problems. Adv Eng Softw 114:163–191
Mirjalili S, Lewis A (2016) The whale optimization algorithm. Adv Eng Softw 95:51–67
Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adva Eng Softw 69:46–61
Mirza AF, Mansoor M, Ling Q (2020) A novel mppt technique based on henry gas solubility optimization. Energy Convers Manag 225:113409
Mundra PA, Rajapakse JC (2010) Gene and sample selection for cancer classification with support vectors based t-statistic. Neurocomputing 73:2353–2362
Neggaz N, Ewees AA, Abd Elaziz M, Mafarja M (2020) Boosting salp swarm algorithm by sine cosine algorithm and disrupt operator for feature selection. Expert Syst Appl 145:113103
Neggaz N, Houssein EH, Hussain K (2020) An efficient henry gas solubility optimization for feature selection. Expert Syst Appl 152:113364
Ng KM, Gani R (2019) Chemical product design: advances in and proposed directions for research and teaching. Comput Chem Eng 126:147–156
Osborne SE, Ellington AD (1997) Nucleic acid selection and the challenge of combinatorial chemistry. Chem Rev 97:349–370
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27:1226–1238
Pourpanah F, Lim CP, Wang X, Tan CJ, Seera M, Shi Y (2019) A hybrid model of fuzzy min-max and brain storm optimization for feature selection and data classification. Neurocomputing 333:440–451
Qasim OS, Algamal ZY (2018) Feature selection using particle swarm optimization-based logistic regression model. Chemom Intell Lab Syst 182:41–46
Rao H, Shi X, Rodrigue AK, Feng J, Xia Y, Elhoseny M, Yuan X, Gu L (2019) Feature selection based on artificial bee colony and gradient boosting decision tree. Appl Soft Comput 74:634–642
Rodrigues D, Pereira LA, Nakamura RY, Costa KA, Yang X-S, Souza AN, Papa JP (2014) A wrapper approach for feature selection based on bat algorithm and optimum-path forest. Expert Syst Appl 41:2250–2258
Sander T, Freyss J, von Korff M, Rufener C (2015) Datawarrior: an open-source program for chemistry aware data visualization and analysis. J Chem Inf Model 55:460–473
Shehabeldeen TA, Abd Elaziz M, Elsheikh AH, Hassan OF, Yin Y, Ji X, Shen X, Zhou J (2020) A novel method for predicting tensile strength of friction stir welded aa6061 aluminium alloy joints based on hybrid random vector functional link and henry gas solubility optimization. IEEE Access 8:79896–79907
Shunmugapriya P, Kanmani S (2017) A hybrid algorithm using ant and bee colony optimization for feature selection and classification (ac-abc hybrid). Swarm Evolut Comput 36:27–36
Taradeh M, Mafarja M, Heidari AA, Faris H, Aljarah I, Mirjalili S, Fujita H (2019) An evolutionary gravitational search-based feature selection. Inf Sci 497:219–239
Taşkın G, Kaya H, Bruzzone L (2017) Feature selection based on high dimensional model representation for hyperspectral images. IEEE Trans Image Process 26:2918–2928
Tourassi GD, Frederick ED, Markey MK, Floyd CE Jr (2001) Application of the mutual information criterion for feature selection in computer-aided diagnosis. Med Phys 28:2394–2402
Tu Q, Chen X, Liu X (2019) Multi-strategy ensemble grey wolf optimizer and its application to feature selection. Appl Soft Comput 76:16–30
Tzanetakis G, Cook P (2002) Musical genre classification of audio signals. IEEE Trans Speech Audio Process 10:293–302
Wan Y, Wang M, Ye Z, Lai X (2016) A feature selection method based on modified binary coded ant colony optimization algorithm. Appl Soft Comput 49:248–258
Yıldız BS, Pholdee N, Panagant N, Bureerat S, Yildiz AR, Sait SM (2021) A novel chaotic henry gas solubility optimization algorithm for solving real-world engineering problems. Eng Comput 1–13
Yousri D, Allam D, Eteiba MB (2020) Optimal photovoltaic array reconfiguration for alleviating the partial shading influence based on a modified harris hawks optimizer. Energy Conversion and Management 206:112470
Zawbaa HM, Emary E, Grosan C, Snasel V (2018) Large-dimensionality small-instance set feature selection: a hybrid bio-inspired heuristic approach. Swarm Evolut Comput 42:29–42
Zhang Y, Wang S, Ji G, Phillips P (2014) Fruit classification using computer vision and feedforward neural network. J Food Eng 143:167–177
Zorarpacı E, Özel SA (2016) A hybrid approach of differential evolution and artificial bee colony for feature selection. Expert Syst Appl 62:91–103
Zouache D, Moussaoui A, Abdelaziz FB (2018) A cooperative swarm intelligence algorithm for multi-objective discrete optimization with application to the knapsack problem. Eur J Oper Res 264:74–88
Acknowledgements
This project was supported financially by the Academy of Scientific Research and Technology (ASRT), Egypt, Grant 6619.
Funding
No Funding.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
None.
Human or animal rights
No human or animal parts were used in this study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Abd Elaziz, M., Yousri, D. Automatic selection of heavy-tailed distributions-based synergy Henry gas solubility and Harris hawk optimizer for feature selection: case study drug design and discovery. Artif Intell Rev 54, 4685–4730 (2021). https://doi.org/10.1007/s10462-021-10009-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-021-10009-z