Skip to main content

Advertisement

Log in

WOA + BRNN: An imbalanced big data classification framework using Whale optimization and deep neural network

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Nowadays, big data plays a substantial part in information knowledge analysis, manipulation, and forecasting. Analyzing and extracting knowledge from such big datasets are a very challenging task due to the imbalance of data distribution, which could lead to a biased classification results and wrong decisions. The standard classifiers are not capable of handling such datasets. Hence, a new technique for dealing with such datasets is required. This paper proposes a novel classification framework for big data that consists of three developed phases. The first phase is the feature selection phase, which uses the Whale optimization algorithm (WOA) for finding the best set of features. The second phase is the preprocessing phase, which uses the SMOTE algorithm and the LSH-SMOTE algorithm for solving the class imbalance problem. Lastly, the third phase is WOA + BRNN algorithm, which is using the Whale optimization algorithm for training a deep learning approach called bidirectional recurrent neural network for the first time. Our proposed algorithm WOA-BRNN has been tested against nine highly imbalanced datasets one of them is big dataset in terms of area under curve (AUC) against four of the most common use machine learning algorithms (Naïve Bayes, AdaBoostM1, decision table, random tree), in addition to GWO-MLP (training multilayer perceptron using Gray Wolf Optimizer), then we test our algorithm over four well-known datasets against GWO-MLP and particle swarm optimization (PSO-MLP), genetic algorithm (GA-MLP), ant colony optimization (ACO-MLP), evolution strategy (ES-MLP), and population-based incremental learning (PBIL-MLP) in terms of classification accuracy. Experimental results proved that our proposed algorithm WOA + BRNN has achieved promising accuracy and high local optima avoidance, and outperformed four of the most common use machine learning algorithms, and GWO-MLP in terms of AUC.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Ahmed E et al (2017) The role of big data analytics in Internet of Things. Comput Netw 129:459–471

    Article  Google Scholar 

  • Al-Smadi M et al (2018) Deep recurrent neural network vs. support vector machine for aspect-based sentiment analysis of Arabic hotels’ reviews. J Comput Sci 27:386–393

    Article  Google Scholar 

  • Ballabio D, Grisoni F, Todeschini R (2018) Multivariate comparison of classification performance measures. Chemometr Intell Lab Syst 174:33–44

    Article  Google Scholar 

  • Barrow D, Kourentzes N (2018) The impact of special days in call arrivals forecasting: a neural network approach to modelling special days. Eur J Oper Res 264(3):967–977

    Article  MathSciNet  Google Scholar 

  • Bennin KE et al (2018) Mahakil: diversity based oversampling approach to alleviate the class imbalance issue in software defect prediction. IEEE Trans Software Eng 44(6):534–550

    Article  Google Scholar 

  • Chaudhary P, Gupta BB (2017) A novel framework to alleviate dissemination of XSS worms in online social network (OSN) using view segregation. Neural Netw World 27(1):5

    Article  Google Scholar 

  • Chaudhary P, Gupta S, Gupta BB (2016) Auditing defense against XSS worms in online social network-based web applications. In: Gupta B, Agrawal DP, Yamaguchi S (eds) Handbook of research on modern cryptographic solutions for computer and cyber security. IGI Global, Pennsylvania, pp 216–245

    Chapter  Google Scholar 

  • Chawla NV et al (2012) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    Article  Google Scholar 

  • Din S et al (2018) Service orchestration of optimizing continuous features in industrial surveillance using big data based fog-enabled internet of things. IEEE Access 6:21582–21591

    Article  Google Scholar 

  • Faris H, Aljarah I, Mirjalili S (2016) Training feedforward neural networks using multi-verse optimizer for binary classification problems. Appl Intell 45(2):322–332

    Article  Google Scholar 

  • Goodfellow I et al (2016) Deep learning, vol 1. MIT Press, Cambridge

    MATH  Google Scholar 

  • Grover V et al (2018) Creating strategic business value from big data analytics: a research framework. J Manag Inf Syst 35(2):388–423

    Article  Google Scholar 

  • Guan Y et al (2017) FPGA-based accelerator for long short-term memory recurrent neural networks. In: Design automation conference (ASP-DAC), 2017 22nd Asia and South Pacific. IEEE

  • Gupta BB (ed) (2018) Computer and cyber security: principles, algorithm, applications, and perspectives. CRC Press, New York

    Google Scholar 

  • Haixiang G et al (2017) Learning from class-imbalanced data: review of methods and applications. Expert Syst Appl 73:220–239

    Article  Google Scholar 

  • Hassib EM et al (2018) LSH-SMOTE: a modified SMOTE algorithm for imbalanced data-sets. Ciência e Técnica Vitivinícola 33:50–65

    Google Scholar 

  • Huang W et al (2015) Scalable Gaussian process regression using deep neural networks. In: IJCAI

  • Huang J et al (2017) Speed/accuracy trade-offs for modern convolutional object detectors. In: IEEE CVPR, vol 4

  • Kim JS, Jung S (2015) Implementation of the RBF neural chip with the back-propagation algorithm for on-line learning. Appl Soft Comput 29:233–244

    Article  Google Scholar 

  • Li J et al (2017) Rare event prediction using similarity majority under-sampling technique. In: International conference on soft computing in data science. Springer, Singapore

  • Linggard R, Myers DJ, Nightingale C (eds) (2012) Neural networks for vision, speech and natural language, vol 1. Springer, Berlin

    MATH  Google Scholar 

  • Liu W et al (2017) A survey of deep neural network architectures and their applications. Neurocomputing 234:11–26

    Article  Google Scholar 

  • Manogaran G, Thota C, Lopez D (2018) Human–computer interaction with big data analytics. In: Lopez D, Durai MA (eds) HCI challenges and privacy preservation in big data security. IGI Global, Pennsylvania, pp 1–22

    Google Scholar 

  • Mirjalili S (2015) How effective is the Grey Wolf optimizer in training multi-layer perceptrons. Appl Intell 43(1):150–161

    Article  Google Scholar 

  • Mirjalili S (2016) Dragonfly algorithm: a new meta-heuristic optimization technique for solving single-objective, discrete, and multi-objective problems. Neural Comput Appl 27(4):1053–1073

    Article  Google Scholar 

  • Mirjalili S, Lewis A (2016) The whale optimization algorithm. Adv Eng Softw 95:51–67

    Article  Google Scholar 

  • Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61

    Article  Google Scholar 

  • Pascanu R, Montufar G, Bengio Y (2013) On the number of response regions of deep feed forward networks with piece-wise linear activations. arXiv preprint arXiv:1312.6098

  • Piri S, Delen D, Liu T (2018) A synthetic informative minority over-sampling (SIMO) algorithm leveraging support vector machine to enhance learning from imbalanced datasets. Decis Support Syst 106:15–29

    Article  Google Scholar 

  • Plageras AP et al (2017) Efficient large-scale medical data (ehealth big data) analytics in internet of things. In: 2017 IEEE 19th conference on business informatics (CBI), vol 2. IEEE

  • Plageras AP et al (2018) Efficient IoT-based sensor BIG Data collection—processing and analysis in smart buildings. Future Gener Comput Syst 82:349–357

    Article  Google Scholar 

  • Pour SG, Girosi F (2016) Joint prediction of chronic conditions onset: comparing multivariate probits with multiclass support vector machines. In: Symposium on conformal and probabilistic prediction with applications. Springer, Cham

  • Qin P, Xu W, Guo J (2017) Designing an adaptive attention mechanism for relation classification. In: 2017 International joint conference on neural networks (IJCNN). IEEE

  • Rennie JD et al (2003) Tackling the poor assumptions of Naive Bayes text classifiers. In: Proceedings of the 20th international conference on machine learning (icml-03)

  • Rezaeianzadeh M et al (2014) Flood flow forecasting using ANN, ANFIS and regression models. Neural Comput Appl 25(1):25–37

    Article  Google Scholar 

  • Sahoo RR, Ray M (2018) Metaheuristic techniques for test case generation: a review. J Inf Technol Res 11(1):158–171

    Article  Google Scholar 

  • Salehinejad H et al (2017) Recent advances in recurrent neural networks. arXiv preprint arXiv:1801.01078

  • Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117

    Article  Google Scholar 

  • Schuster M, Paliwal KK, Hannun A, Case C, Casper J, Catanzaro B, Diamos G, Ryan EE (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681

    Article  Google Scholar 

  • Sivakumar S, Sivakumar S (2017) Marginally stable triangular recurrent neural network architecture for time series prediction. IEEE Trans Cybern 48(10):2836–2850

    Article  Google Scholar 

  • Sivarajah U et al (2017) Critical analysis of Big Data challenges and analytical methods. J Bus Res 70:263–286

    Article  Google Scholar 

  • Song Q, Guo Y, Shepperd M (2018) A comprehensive investigation of the role of imbalanced learning for software defect prediction. IEEE Trans Software Eng. https://doi.org/10.1109/TSE.2018.2836442

    Article  Google Scholar 

  • Storey VC, Song I-Y (2017) Big data technologies and management: what conceptual modeling can do. Data Knowl Eng 108:50–67

    Article  Google Scholar 

  • Voyant C et al (2017) Machine learning methods for solar radiation forecasting: a review. Renewable Energy 105:569–582

    Article  Google Scholar 

  • Wang L, Zeng Y, Chen T (2015) Back propagation neural network with adaptive differential evolution algorithm for time series forecasting. Expert Syst Appl 42(2):855–863

    Article  Google Scholar 

  • Wang Y, Kung LA, Byrd TA (2018) Big data analytics: understanding its capabilities and potential benefits for healthcare organizations. Technol Forecast Soc Chang 126:3–13

    Article  Google Scholar 

  • Warde-Farley D (2018) Feedforward deep architectures for classification and synthesis

  • Zalesky A et al (2016) Connectome sensitivity or specificity: which is more important? Neuroimage 142:407–420

    Article  Google Scholar 

  • Zhou L et al (2017) Machine learning on big data: opportunities and challenges. Neurocomputing 237:350–361

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to extend their sincere thanks and appreciation to the anonymous reviewers for their valuable comments and feedback, which were extremely helpful in improving the quality of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eslam. M. Hassib.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by B. B. Gupta.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hassib, E.M., El-Desouky, A.I., Labib, L.M. et al. WOA + BRNN: An imbalanced big data classification framework using Whale optimization and deep neural network. Soft Comput 24, 5573–5592 (2020). https://doi.org/10.1007/s00500-019-03901-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-019-03901-y

Keywords

Navigation