Skip to main content
Log in

Dynamic clustering method for imbalanced learning based on AdaBoost

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Our paper aims at learning from imbalance data based on ensemble learning. At the stage, the main solution is to combine under-sampling, oversampling or cost sensitivity learning with ensemble learning. However, these feature space-based methods fail to reflect the transformation of distribution and are usually accompanied with high computational complexity and risk of overfitting. In this paper, we propose a dynamic cluster algorithm based on coefficient of variation (or entropy), which learns the local spatial distribution of data and hierarchically clusters the majority. This algorithm has low complexity and can dynamically adjust the cluster according to the iteration of AdaBoost, adaptively synchronized with changes caused by sample weight changes. Then, we design an index to measure the importance of each cluster. Based on this index, a dynamic sampling algorithm based on maximum weight is proposed. The effectiveness of the sampling algorithm is proved by visual experiments. Finally, we propose a cost-sensitive algorithm based on Bagging, and combine it with the dynamic sampling algorithm to propose a multi-fusion imbalanced ensemble learning algorithm. In experimental research, our algorithms have been validated on three artificial datasets, 22 KEEL datasets and two gene expression cancer datasets, and have shown ideal or better performance than SOTA in terms of AUC, indicating that our algorithms are not only effective imbalance algorithms, but also provide potential for building a reliable biological cyber-physical system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. http://www.keel.es/dataset.php.

References

  1. Breiman L (2017) Classification and regression trees. Routledge, Abingdon

    Book  Google Scholar 

  2. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    Article  MATH  Google Scholar 

  3. Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) Smoteboost: improving prediction of the minority class in boosting. In: European Conference on Principles of Data Mining and Knowledge Discovery. Springer, pp 107–119

  4. Deng X, Zeng D, Shen H (2018) Causation analysis model: based on ahp and hybrid apriori-genetic algorithm. J Intell Fuzzy Syst 35(1):767–778

    Article  Google Scholar 

  5. Deng X, Chen H, Cai R, Zeng F, Xu G, Zhang H (2019) A knowledge-based multiplayer collaborative routing in opportunistic networks. In: 2019 IEEE Intl Conf on Dependable. Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech). IEEE, pp 16–21

  6. Devi RL, Kalaivani V (2019) Machine learning and iot-based cardiac arrhythmia diagnosis using statistical and dynamic features of ecg. J Supercomput 3:1–12

    Google Scholar 

  7. Elkan C (2001) The foundations of cost-sensitive learning. In: International Joint Conference on Artificial Intelligence, Vol 17, No 1. Lawrence Erlbaum Associates Ltd

  8. Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139

    Article  MathSciNet  MATH  Google Scholar 

  9. Freund Y, Schapire RE et al (1996) Experiments with a new boosting algorithm. In: Icml, vol 96. Citeseer, pp 148–156

  10. Han H, Wang WY, Mao BH (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: International Conference on Intelligent Computing, Springer, pp 878–887

  11. Hanifah FS, Wijayanto H, Kurnia A (2015) Smotebagging algorithm for imbalanced dataset in logistic regression analysis (case: Credit of bank x). Appl. Math. Sci. 9(138):6857–6865

    Google Scholar 

  12. He H, Bai Y, Garcia EA, Li S (2008) Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In: IEEE International Joint Conference on Neural Networks, 2008. IJCNN 2008.(IEEE World Congress on Computational Intelligence). IEEE, pp 1322–1328

  13. Hu P, Xia E, Li S, Du X, Ma C, Dong J, Chan KC (2019) Network-based prediction of major adverse cardiac events in acute coronary syndromes from imbalanced emr data. Stud Health Technol Inf 264:1480–1481

    Google Scholar 

  14. Hu S, Liang Y, Ma L, He Y (2009) Msmote: improving classification performance when training data is imbalanced. In: Second International Workshop on Computer Science and Engineering, WCSE’09, vol 2. IEEE, pp 13–17

  15. Desai A, Jadav K, Chaudhary S (2015) An empirical evaluation of costboost extensions for cost-sensitive classification. In: Proceedings of the 8th Annual ACM India Conference, pp 73–77

  16. Kaur P, Negi V (2016) Techniques based upon boosting to counter class imbalance problem?a survey. In: 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), IEEE. pp 2620–2623

  17. Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Prog Artif Intell 5(4):221–232

    Article  Google Scholar 

  18. Lee J, Moon D, Kim I, Lee Y (2019) A semantic approach to improving machine readability of a large-scale attack graph. J Supercomput 75(6):3028–3045

    Article  Google Scholar 

  19. Lingchi C, Xiaoheng D, Hailan S, Congxu Z, Le C (2018) Dycusboost: Adaboost-based imbalanced learning using dynamic clustering and undersampling. In: 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech). IEEE, pp 208–215

  20. Liu TY (2009) Easyensemble and feature selection for imbalance data sets. In: International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing, 2009. IJCBS’09. IEEE, pp 517–520

  21. Liu XY, Wu J, Zhou ZH (2009) Exploratory undersampling for class-imbalance learning. IEEE Trans Syst Man Cybern Part B (Cybern) 39(2):539–550

    Article  Google Scholar 

  22. Lusa L et al (2012) Evaluation of smote for high-dimensional class-imbalanced microarray data. In: 2012 11th International Conference on Machine Learning and Applications, vol 2. IEEE, pp 89–94

  23. Masnadi-Shirazi H, Vasconcelos N (2011) Cost-sensitive boosting. IEEE Trans Pattern Anal Mach Intell 33(2):294–309

    Article  Google Scholar 

  24. Moorthy K, Mohamad MS (2011) Random forest for gene selection and microarray data classification. Bioinformation 7(3):142

    Article  Google Scholar 

  25. Nanni L, Fantozzi C, Lazzarini N (2015) Coupling different methods for overcoming the class imbalance problem. Neurocomputing 158:48–61

    Article  Google Scholar 

  26. Pandey A, Sequeria R, Kumar P, Kumar S (2019) A multistage deep residual network for biomedical cyber-physical systems. IEEE Syst J 55:1–10

    Google Scholar 

  27. Prati RC, Batista GE, Monard MC (2004) Learning with class skews and small disjuncts. In: Brazilian Symposium on Artificial Intelligence. Springer, pp 296–306

  28. Qi K, Yang H, Hu Q, Yang D (2019) A new adaptive weighted imbalanced data classifier via improved support vector machines with high-dimension nature. Knowl-Based Syst 185:104933

    Article  Google Scholar 

  29. Seiffert C, Khoshgoftaar TM, Van Hulse J, Napolitano A (2010) Rusboost: a hybrid approach to alleviating class imbalance. IEEE Trans Syst Man Cybern Part A Syst Humans 40(1):185–197

    Article  Google Scholar 

  30. Smeraldi F, Bicego M, Cristani M, Murino V (2011) Cloosting: Clustering data with boosting. In: International Workshop on Multiple Classifier Systems, vol 6713, pp 289–298

  31. Soltani S, Sadri J, Torshizi HA (2011) Feature selection and ensemble hierarchical cluster-based under-sampling approach for extremely imbalanced datasets: Application to gene classification. In: 2011 1st International eConference on Computer and Knowledge Engineering (ICCKE). IEEE, pp 166–171

  32. Tavallali P, Yazdi M, Khosravi MR (2017) An efficient training procedure for viola-jones face detector. In: 2017 International Conference on Computational Science and Computational Intelligence (CSCI). IEEE, pp 828–831

  33. Tavallali P, Yazdi M, Khosravi MR (2019) Robust cascaded skin detector based on adaboost. Multimed Tools Appl 78(2):2599–2620

    Article  Google Scholar 

  34. Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154

    Article  Google Scholar 

  35. Xu G, Jia L, Lu Y, Zeng X, Yao Z, Li X (2018a) A novel efficient maka protocol with desynchronization for anonymous roaming service in global mobility networks. J Netw Comput Appl 107:S1084804518300407

    Google Scholar 

  36. Xu G, Yao Z, Sangaiah AK, Li X, Castiglione A, Xi Z (2018b) Csp-e 2: An abuse-free contract signing protocol with low-storage TTP for energy-efficient electronic transaction ecosystems. Inf Sci 476:505–515

    Article  Google Scholar 

  37. Yoon K, Kwek S (2007) A data reduction approach for resolving the imbalanced data issue in functional genomics. Neural Comput Appl 16(3):295–306

    Article  Google Scholar 

  38. Zeng X, Xu G, Xi Z, Yang X, Zhou W (2018) E-aua: an efficient anonymous user authentication protocol for mobile iot. IEEE Internet Things J PP(99):1–1

    Google Scholar 

  39. Zhang X, Luo Q (2015) Unbalanced data classification algorithm based on clustering ensemble under-sampling. Comput Sci 42(11):63–66

    Google Scholar 

  40. Zhu T, Lin Y, Liu Y (2020) Improving interpolation-based oversampling for imbalanced data learning. Knowl-Based Syst 187:104826

    Article  Google Scholar 

  41. Zhu ZB, Song ZH (2010) Fault diagnosis based on imbalance modified kernel fisher discriminant analysis. Chem Eng Res Des 88(8):936–951

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the Fundamental Research Funds for the Central Universities of Central South University Grant Nos. 2019zzts588, XCX20190701588 and the National Natural Science Foundation of China under Grant Nos. 61772553, 61379058.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoheng Deng.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Deng, X., Xu, Y., Chen, L. et al. Dynamic clustering method for imbalanced learning based on AdaBoost. J Supercomput 76, 9716–9738 (2020). https://doi.org/10.1007/s11227-020-03211-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-020-03211-3

Keywords

Navigation