Dynamic clustering method for imbalanced learning based on AdaBoost

Deng, Xiaoheng; Xu, Yuebin; Chen, Lingchi; Zhong, Weijian; Jolfaei, Alireza; Zheng, Xi

doi:10.1007/s11227-020-03211-3

Dynamic clustering method for imbalanced learning based on AdaBoost

Published: 02 March 2020

Volume 76, pages 9716–9738, (2020)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Xiaoheng Deng ORCID: orcid.org/0000-0003-2740-8025¹,
Yuebin Xu¹,
Lingchi Chen¹,
Weijian Zhong¹,
Alireza Jolfaei² &
…
Xi Zheng³

743 Accesses
16 Citations
Explore all metrics

Abstract

Our paper aims at learning from imbalance data based on ensemble learning. At the stage, the main solution is to combine under-sampling, oversampling or cost sensitivity learning with ensemble learning. However, these feature space-based methods fail to reflect the transformation of distribution and are usually accompanied with high computational complexity and risk of overfitting. In this paper, we propose a dynamic cluster algorithm based on coefficient of variation (or entropy), which learns the local spatial distribution of data and hierarchically clusters the majority. This algorithm has low complexity and can dynamically adjust the cluster according to the iteration of AdaBoost, adaptively synchronized with changes caused by sample weight changes. Then, we design an index to measure the importance of each cluster. Based on this index, a dynamic sampling algorithm based on maximum weight is proposed. The effectiveness of the sampling algorithm is proved by visual experiments. Finally, we propose a cost-sensitive algorithm based on Bagging, and combine it with the dynamic sampling algorithm to propose a multi-fusion imbalanced ensemble learning algorithm. In experimental research, our algorithms have been validated on three artificial datasets, 22 KEEL datasets and two gene expression cancer datasets, and have shown ideal or better performance than SOTA in terms of AUC, indicating that our algorithms are not only effective imbalance algorithms, but also provide potential for building a reliable biological cyber-physical system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

Article 09 November 2022

Vitor Werner de Vargas, Jorge Arthur Schneider Aranda, … Jorge Luis Victória Barbosa

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Hybrid approaches to optimization and machine learning methods: a systematic literature review

Article Open access 24 January 2024

Beatriz Flamia Azevedo, Ana Maria A. C. Rocha & Ana I. Pereira

Notes

http://www.keel.es/dataset.php.

References

Breiman L (2017) Classification and regression trees. Routledge, Abingdon
Book Google Scholar
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Article MATH Google Scholar
Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) Smoteboost: improving prediction of the minority class in boosting. In: European Conference on Principles of Data Mining and Knowledge Discovery. Springer, pp 107–119
Deng X, Zeng D, Shen H (2018) Causation analysis model: based on ahp and hybrid apriori-genetic algorithm. J Intell Fuzzy Syst 35(1):767–778
Article Google Scholar
Deng X, Chen H, Cai R, Zeng F, Xu G, Zhang H (2019) A knowledge-based multiplayer collaborative routing in opportunistic networks. In: 2019 IEEE Intl Conf on Dependable. Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech). IEEE, pp 16–21
Devi RL, Kalaivani V (2019) Machine learning and iot-based cardiac arrhythmia diagnosis using statistical and dynamic features of ecg. J Supercomput 3:1–12
Google Scholar
Elkan C (2001) The foundations of cost-sensitive learning. In: International Joint Conference on Artificial Intelligence, Vol 17, No 1. Lawrence Erlbaum Associates Ltd
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139
Article MathSciNet MATH Google Scholar
Freund Y, Schapire RE et al (1996) Experiments with a new boosting algorithm. In: Icml, vol 96. Citeseer, pp 148–156
Han H, Wang WY, Mao BH (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: International Conference on Intelligent Computing, Springer, pp 878–887
Hanifah FS, Wijayanto H, Kurnia A (2015) Smotebagging algorithm for imbalanced dataset in logistic regression analysis (case: Credit of bank x). Appl. Math. Sci. 9(138):6857–6865
Google Scholar
He H, Bai Y, Garcia EA, Li S (2008) Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In: IEEE International Joint Conference on Neural Networks, 2008. IJCNN 2008.(IEEE World Congress on Computational Intelligence). IEEE, pp 1322–1328
Hu P, Xia E, Li S, Du X, Ma C, Dong J, Chan KC (2019) Network-based prediction of major adverse cardiac events in acute coronary syndromes from imbalanced emr data. Stud Health Technol Inf 264:1480–1481
Google Scholar
Hu S, Liang Y, Ma L, He Y (2009) Msmote: improving classification performance when training data is imbalanced. In: Second International Workshop on Computer Science and Engineering, WCSE’09, vol 2. IEEE, pp 13–17
Desai A, Jadav K, Chaudhary S (2015) An empirical evaluation of costboost extensions for cost-sensitive classification. In: Proceedings of the 8th Annual ACM India Conference, pp 73–77
Kaur P, Negi V (2016) Techniques based upon boosting to counter class imbalance problem?a survey. In: 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), IEEE. pp 2620–2623
Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Prog Artif Intell 5(4):221–232
Article Google Scholar
Lee J, Moon D, Kim I, Lee Y (2019) A semantic approach to improving machine readability of a large-scale attack graph. J Supercomput 75(6):3028–3045
Article Google Scholar
Lingchi C, Xiaoheng D, Hailan S, Congxu Z, Le C (2018) Dycusboost: Adaboost-based imbalanced learning using dynamic clustering and undersampling. In: 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech). IEEE, pp 208–215
Liu TY (2009) Easyensemble and feature selection for imbalance data sets. In: International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing, 2009. IJCBS’09. IEEE, pp 517–520
Liu XY, Wu J, Zhou ZH (2009) Exploratory undersampling for class-imbalance learning. IEEE Trans Syst Man Cybern Part B (Cybern) 39(2):539–550
Article Google Scholar
Lusa L et al (2012) Evaluation of smote for high-dimensional class-imbalanced microarray data. In: 2012 11th International Conference on Machine Learning and Applications, vol 2. IEEE, pp 89–94
Masnadi-Shirazi H, Vasconcelos N (2011) Cost-sensitive boosting. IEEE Trans Pattern Anal Mach Intell 33(2):294–309
Article Google Scholar
Moorthy K, Mohamad MS (2011) Random forest for gene selection and microarray data classification. Bioinformation 7(3):142
Article Google Scholar
Nanni L, Fantozzi C, Lazzarini N (2015) Coupling different methods for overcoming the class imbalance problem. Neurocomputing 158:48–61
Article Google Scholar
Pandey A, Sequeria R, Kumar P, Kumar S (2019) A multistage deep residual network for biomedical cyber-physical systems. IEEE Syst J 55:1–10
Google Scholar
Prati RC, Batista GE, Monard MC (2004) Learning with class skews and small disjuncts. In: Brazilian Symposium on Artificial Intelligence. Springer, pp 296–306
Qi K, Yang H, Hu Q, Yang D (2019) A new adaptive weighted imbalanced data classifier via improved support vector machines with high-dimension nature. Knowl-Based Syst 185:104933
Article Google Scholar
Seiffert C, Khoshgoftaar TM, Van Hulse J, Napolitano A (2010) Rusboost: a hybrid approach to alleviating class imbalance. IEEE Trans Syst Man Cybern Part A Syst Humans 40(1):185–197
Article Google Scholar
Smeraldi F, Bicego M, Cristani M, Murino V (2011) Cloosting: Clustering data with boosting. In: International Workshop on Multiple Classifier Systems, vol 6713, pp 289–298
Soltani S, Sadri J, Torshizi HA (2011) Feature selection and ensemble hierarchical cluster-based under-sampling approach for extremely imbalanced datasets: Application to gene classification. In: 2011 1st International eConference on Computer and Knowledge Engineering (ICCKE). IEEE, pp 166–171
Tavallali P, Yazdi M, Khosravi MR (2017) An efficient training procedure for viola-jones face detector. In: 2017 International Conference on Computational Science and Computational Intelligence (CSCI). IEEE, pp 828–831
Tavallali P, Yazdi M, Khosravi MR (2019) Robust cascaded skin detector based on adaboost. Multimed Tools Appl 78(2):2599–2620
Article Google Scholar
Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154
Article Google Scholar
Xu G, Jia L, Lu Y, Zeng X, Yao Z, Li X (2018a) A novel efficient maka protocol with desynchronization for anonymous roaming service in global mobility networks. J Netw Comput Appl 107:S1084804518300407
Google Scholar
Xu G, Yao Z, Sangaiah AK, Li X, Castiglione A, Xi Z (2018b) Csp-e 2: An abuse-free contract signing protocol with low-storage TTP for energy-efficient electronic transaction ecosystems. Inf Sci 476:505–515
Article Google Scholar
Yoon K, Kwek S (2007) A data reduction approach for resolving the imbalanced data issue in functional genomics. Neural Comput Appl 16(3):295–306
Article Google Scholar
Zeng X, Xu G, Xi Z, Yang X, Zhou W (2018) E-aua: an efficient anonymous user authentication protocol for mobile iot. IEEE Internet Things J PP(99):1–1
Google Scholar
Zhang X, Luo Q (2015) Unbalanced data classification algorithm based on clustering ensemble under-sampling. Comput Sci 42(11):63–66
Google Scholar
Zhu T, Lin Y, Liu Y (2020) Improving interpolation-based oversampling for imbalanced data learning. Knowl-Based Syst 187:104826
Article Google Scholar
Zhu ZB, Song ZH (2010) Fault diagnosis based on imbalance modified kernel fisher discriminant analysis. Chem Eng Res Des 88(8):936–951
Article Google Scholar

Download references

Acknowledgements

This work was supported by the Fundamental Research Funds for the Central Universities of Central South University Grant Nos. 2019zzts588, XCX20190701588 and the National Natural Science Foundation of China under Grant Nos. 61772553, 61379058.

Author information

Authors and Affiliations

School of Information Science and Engineering, Central South University, Changsha, 410083, China
Xiaoheng Deng, Yuebin Xu, Lingchi Chen & Weijian Zhong
Macquarie University, 4 Research Park Drive, Sydney, Australia
Alireza Jolfaei
Macquarie University, Sydney, Australia
Xi Zheng

Authors

Xiaoheng Deng
View author publications
You can also search for this author in PubMed Google Scholar
Yuebin Xu
View author publications
You can also search for this author in PubMed Google Scholar
Lingchi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Weijian Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Alireza Jolfaei
View author publications
You can also search for this author in PubMed Google Scholar
Xi Zheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoheng Deng.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Deng, X., Xu, Y., Chen, L. et al. Dynamic clustering method for imbalanced learning based on AdaBoost. J Supercomput 76, 9716–9738 (2020). https://doi.org/10.1007/s11227-020-03211-3

Download citation

Published: 02 March 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s11227-020-03211-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dynamic clustering method for imbalanced learning based on AdaBoost

Abstract

Access this article

Similar content being viewed by others

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Hybrid approaches to optimization and machine learning methods: a systematic literature review

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Dynamic clustering method for imbalanced learning based on AdaBoost

Abstract

Access this article

Similar content being viewed by others

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Hybrid approaches to optimization and machine learning methods: a systematic literature review

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation