Skip to main content

Advertisement

Log in

Automatic selection of heavy-tailed distributions-based synergy Henry gas solubility and Harris hawk optimizer for feature selection: case study drug design and discovery

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Features Selection (FS) approaches have more attention since they have been applied to several fields primarily to deal with high dimensional data. An increase in the dimension of data can lead to degradation of the accuracy of the machine learning method. Therefore, there are several FS methods based on meta-heuristic (MH) techniques that have been developed to tackle the FS problem and avoid the limitations of traditional FS approaches. However, those MH methods still need improvements that suffer from some drawbacks that affect the quality of the final output. So, this paper proposed a modified Henry Gas Solubility Optimization (HGSO) using enhanced Harris hawks optimization (HHO) based on Heavy-tailed distributions (HTDs). In this study, a dynamical exchange between five HTDs is used to boost the HHO that modifies, in turn, the exploitation phase in HGSO. As a result, we proposed a dynamic modified HGSO based on enhanced HHO (DHGHHD). To assess the efficiency of the proposed DHGHHD, a set of eighteen UCI datasets are used. Furthermore, it applied to improve the prediction of two real-world datasets in the drug design and discovery field. The DHGHHD is compared with eight well-known MH methods. Comparison results illustrate the high quality of DHGHHD according to the values of accuracy, fitness value, and the number of selected features.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. https://archive.ics.uci.edu/ml/datasets/QSAR+biodegradation

  2. https://archive.ics.uci.edu/ml/datasets/QSAR+Bioconcentration+classes+dataset

References

  • Abd Elaziz M, Attiya I (2020) An improved henry gas solubility optimization algorithm for task scheduling in cloud computing. Artif Intell Rev 1–39

  • Abd Elaziz M, Moemen YS, Hassanien AE, Xiong S (2020) Toxicity risks evaluation of unknown fda biotransformed drugs based on a multi-objective feature selection approach. Appl Soft Computd 97:105509

    Article  Google Scholar 

  • Abualigah L, Diabat A, Mirjalili S, Abd Elaziz M, Gandomi AH (2021) The arithmetic optimization algorithm. Comput Methods Appl Mech Eng 376:113609

    Article  MathSciNet  MATH  Google Scholar 

  • Algamal Z, Qasim M, Ali H (2017) A qsar classification model for neuraminidase inhibitors of influenza a viruses (h1n1) based on weighted penalized support vector machine. SAR QSAR Environ Res 28:415–426

    Article  Google Scholar 

  • Allen J, Davey HM, Broadhurst D, Heald JK, Rowland JJ, Oliver SG, Kell DB (2003) High-throughput classification of yeast mutants for functional genomics using metabolic footprinting. Nat Biotechnol 21:692–696

    Article  Google Scholar 

  • Arora S, Anand P (2019) Binary butterfly optimization approaches for feature selection. Expert Syst Appl 116:147–160

    Article  Google Scholar 

  • Asuncion A, Newman D (2010) Uci machine learning repository

  • Cao W, Liu X, Ni J (2020) Parameter optimization of support vector regression using henry gas solubility optimization algorithm. IEEE Access 8:88633–88642

    Article  Google Scholar 

  • Cong Y, Li B-K, Yang X-G, Xue Y, Chen Y-Z, Zeng Y (2013) Quantitative structure-activity relationship study of influenza virus neuraminidase a/pr/8/34 (h1n1) inhibitors by genetic algorithm feature selection and support vector regression. Chemom Intell Lab Syst 127:35–42

    Article  Google Scholar 

  • Das AK, Sengupta S, Bhattacharyya S (2018) A group incremental feature selection for classification using rough set theory based genetic algorithm. Appl Soft Comput 65:400–411

    Article  Google Scholar 

  • Elkan C (2013) Predictive analytics and data mining. University of California

  • Emary E, Zawbaa HM, Hassanien AE (2016) Binary grey wolf optimization approaches for feature selection. Neurocomputing 172:371–381

    Article  Google Scholar 

  • Faris H, Mafarja MM, Heidari AA, Aljarah I, Ala’M A-Z, Mirjalili S, Fujita H (2018) An efficient binary salp swarm algorithm with crossover scheme for feature selection problems. Knowl Based Syst 154:43–67

    Article  Google Scholar 

  • Goldberg DE (1989) Genetic algorithms in search. Addison Wesley Publishing Co. Inc, Boston

    MATH  Google Scholar 

  • Halvorsen AR, Helland Å, Gromov P, Wielenga VT, Talman M-LM, Brunner N, Sandhu V, Børresen-Dale A-L, Gromova I, Haakensen VD (2017) Profiling of micro rna s in tumor interstitial fluid of breast tumors-a novel resource to identify biomarkers for prognostic classification and detection of cancer. Mol Oncol 11:220–234

    Article  Google Scholar 

  • Hancer E, Xue B, Zhang M (2018) Differential evolution for filter feature selection based on information theory and feature ranking. Knowl Based Syst 140:103–119

    Article  Google Scholar 

  • Hashim FA, Houssein EH, Mabrouk MS, Al-Atabany W, Mirjalili S (2019) Henry gas solubility optimization: a novel physics-based algorithm. Future Gener Comput Syst 101:646–667

    Article  Google Scholar 

  • Heidari AA, Mirjalili S, Faris H, Aljarah I, Mafarja M, Chen H (2019) Harris hawks optimization: algorithm and applications. Future Gener Comput Syst 97:849–872

    Article  Google Scholar 

  • Hossin M, Sulaiman M (2015) A review on evaluation metrics for data classification evaluations. Int J Data Min Knowl Manag Process 5:1

    Google Scholar 

  • Jain I, Jain VK, Jain R (2018) Correlation feature selection based improved-binary particle swarm optimization for gene selection and cancer classification. Appl Soft Comput 62:203–215

    Article  Google Scholar 

  • Kohavi R, John GH et al (1997) Wrappers for feature subset selection. Artif Intell 97:273–324

    Article  MATH  Google Scholar 

  • Mafarja M, Aljarah I, Faris H, Hammouri AI, Ala’M A-Z, Mirjalili S (2019) Binary grasshopper optimisation algorithm approaches for feature selection problems. Expert Syst Appl 117:267–286

    Article  Google Scholar 

  • Mafarja M, Aljarah I, Heidari AA, Hammouri AI, Faris H, Ala’M A-Z, Mirjalili S (2018) Evolutionary population dynamics and grasshopper optimization approaches for feature selection problems. Knowl Based Syst 145:25–45

    Article  Google Scholar 

  • Mafarja M, Mirjalili S (2018) Whale optimization approaches for wrapper feature selection. Appl Soft Comput 62:441–453

    Article  Google Scholar 

  • Mafarja MM, Mirjalili S (2017) Hybrid whale optimization algorithm with simulated annealing for feature selection. Neurocomputing 260:302–312

    Article  Google Scholar 

  • Martínez MJ, Dussaut JS, Ponzoni I (2018) Biclustering as strategy for improving feature selection in consensus qsar modeling. Electron Notes Discret Math 69:117–124

    Article  Google Scholar 

  • Mirjalili S (2016) SCA: a sine cosine algorithm for solving optimization problems. Knowl Based Syst 96:120–133

    Article  Google Scholar 

  • Mirjalili S, Gandomi AH, Zahra MS, Shahrzad S, Faris H, Mohammad MS (2017) Salp swarm algorithm: a bio-inspired optimizer for engineering design problems. Adv Eng Softw 114:163–191

    Article  Google Scholar 

  • Mirjalili S, Lewis A (2016) The whale optimization algorithm. Adv Eng Softw 95:51–67

    Article  Google Scholar 

  • Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adva Eng Softw 69:46–61

    Article  Google Scholar 

  • Mirza AF, Mansoor M, Ling Q (2020) A novel mppt technique based on henry gas solubility optimization. Energy Convers Manag 225:113409

    Article  Google Scholar 

  • Mundra PA, Rajapakse JC (2010) Gene and sample selection for cancer classification with support vectors based t-statistic. Neurocomputing 73:2353–2362

    Article  Google Scholar 

  • Neggaz N, Ewees AA, Abd Elaziz M, Mafarja M (2020) Boosting salp swarm algorithm by sine cosine algorithm and disrupt operator for feature selection. Expert Syst Appl 145:113103

    Article  Google Scholar 

  • Neggaz N, Houssein EH, Hussain K (2020) An efficient henry gas solubility optimization for feature selection. Expert Syst Appl 152:113364

    Article  Google Scholar 

  • Ng KM, Gani R (2019) Chemical product design: advances in and proposed directions for research and teaching. Comput Chem Eng 126:147–156

    Article  Google Scholar 

  • Osborne SE, Ellington AD (1997) Nucleic acid selection and the challenge of combinatorial chemistry. Chem Rev 97:349–370

    Article  Google Scholar 

  • Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27:1226–1238

    Article  Google Scholar 

  • Pourpanah F, Lim CP, Wang X, Tan CJ, Seera M, Shi Y (2019) A hybrid model of fuzzy min-max and brain storm optimization for feature selection and data classification. Neurocomputing 333:440–451

    Article  Google Scholar 

  • Qasim OS, Algamal ZY (2018) Feature selection using particle swarm optimization-based logistic regression model. Chemom Intell Lab Syst 182:41–46

    Article  Google Scholar 

  • Rao H, Shi X, Rodrigue AK, Feng J, Xia Y, Elhoseny M, Yuan X, Gu L (2019) Feature selection based on artificial bee colony and gradient boosting decision tree. Appl Soft Comput 74:634–642

    Article  Google Scholar 

  • Rodrigues D, Pereira LA, Nakamura RY, Costa KA, Yang X-S, Souza AN, Papa JP (2014) A wrapper approach for feature selection based on bat algorithm and optimum-path forest. Expert Syst Appl 41:2250–2258

    Article  Google Scholar 

  • Sander T, Freyss J, von Korff M, Rufener C (2015) Datawarrior: an open-source program for chemistry aware data visualization and analysis. J Chem Inf Model 55:460–473

    Article  Google Scholar 

  • Shehabeldeen TA, Abd Elaziz M, Elsheikh AH, Hassan OF, Yin Y, Ji X, Shen X, Zhou J (2020) A novel method for predicting tensile strength of friction stir welded aa6061 aluminium alloy joints based on hybrid random vector functional link and henry gas solubility optimization. IEEE Access 8:79896–79907

    Article  Google Scholar 

  • Shunmugapriya P, Kanmani S (2017) A hybrid algorithm using ant and bee colony optimization for feature selection and classification (ac-abc hybrid). Swarm Evolut Comput 36:27–36

    Article  Google Scholar 

  • Taradeh M, Mafarja M, Heidari AA, Faris H, Aljarah I, Mirjalili S, Fujita H (2019) An evolutionary gravitational search-based feature selection. Inf Sci 497:219–239

    Article  Google Scholar 

  • Taşkın G, Kaya H, Bruzzone L (2017) Feature selection based on high dimensional model representation for hyperspectral images. IEEE Trans Image Process 26:2918–2928

    Article  MathSciNet  MATH  Google Scholar 

  • Tourassi GD, Frederick ED, Markey MK, Floyd CE Jr (2001) Application of the mutual information criterion for feature selection in computer-aided diagnosis. Med Phys 28:2394–2402

    Article  Google Scholar 

  • Tu Q, Chen X, Liu X (2019) Multi-strategy ensemble grey wolf optimizer and its application to feature selection. Appl Soft Comput 76:16–30

    Article  Google Scholar 

  • Tzanetakis G, Cook P (2002) Musical genre classification of audio signals. IEEE Trans Speech Audio Process 10:293–302

    Article  Google Scholar 

  • Wan Y, Wang M, Ye Z, Lai X (2016) A feature selection method based on modified binary coded ant colony optimization algorithm. Appl Soft Comput 49:248–258

    Article  Google Scholar 

  • Yıldız BS, Pholdee N, Panagant N, Bureerat S, Yildiz AR, Sait SM (2021) A novel chaotic henry gas solubility optimization algorithm for solving real-world engineering problems. Eng Comput 1–13

  • Yousri D, Allam D, Eteiba MB (2020) Optimal photovoltaic array reconfiguration for alleviating the partial shading influence based on a modified harris hawks optimizer. Energy Conversion and Management 206:112470

    Article  Google Scholar 

  • Zawbaa HM, Emary E, Grosan C, Snasel V (2018) Large-dimensionality small-instance set feature selection: a hybrid bio-inspired heuristic approach. Swarm Evolut Comput 42:29–42

    Article  Google Scholar 

  • Zhang Y, Wang S, Ji G, Phillips P (2014) Fruit classification using computer vision and feedforward neural network. J Food Eng 143:167–177

    Article  Google Scholar 

  • Zorarpacı E, Özel SA (2016) A hybrid approach of differential evolution and artificial bee colony for feature selection. Expert Syst Appl 62:91–103

    Article  Google Scholar 

  • Zouache D, Moussaoui A, Abdelaziz FB (2018) A cooperative swarm intelligence algorithm for multi-objective discrete optimization with application to the knapsack problem. Eur J Oper Res 264:74–88

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This project was supported financially by the Academy of Scientific Research and Technology (ASRT), Egypt, Grant 6619.

Funding

No Funding.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamed Abd Elaziz.

Ethics declarations

Conflict of interest

None.

Human or animal rights

No human or animal parts were used in this study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abd Elaziz, M., Yousri, D. Automatic selection of heavy-tailed distributions-based synergy Henry gas solubility and Harris hawk optimizer for feature selection: case study drug design and discovery. Artif Intell Rev 54, 4685–4730 (2021). https://doi.org/10.1007/s10462-021-10009-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-021-10009-z

Keywords

Navigation