Abstract
Few-shot imbalanced classification tasks are commonly faced in the real-world applications due to the unbalanced data distribution and few samples of rare classes. As known, the traditional machine learning algorithms perform poorly on the imbalanced classification, usually ignoring the few samples in the minority class to achieve a good overall accuracy. To solve this few-shot problem, a novel data augmentation method was proposed in this study, called H-SMOTE, to rebalance the original imbalanced data in a stable and reasonable way. Extensive experiments were carried out on 12 open datasets covering a wide range of imbalance rate from 3.8 to 16.4. Moreover, two typical classifiers SVM and Random Forest were selected to testify the performance and generalization of proposed H-SMOTE. Further, the typical data oversampling algorithm SMOTE was adopted as the baseline of comparison. The average experimental results show that the proposed H-SMOTE method outperforms the typical SMOTE in terms of accuracy (2.58%), recall (0.67%), F-measure (2.33%), G-mean (2.58%), and AUC (2.5%). Besides, the distribution of augmented dataset by H-SMOTE is more uniform and stable. Thus, this work provides a useful data augmentation method to solve the few-shot imbalanced classification, which can also be generalized to many areas in multimedia systems.
Similar content being viewed by others
References
Li, Y., Yang, J.: Few-shot cotton pest recognition and terminal realization. Comput Electron Agric 169, 105240 (2020)
Buda, M., Maki, A., Mazurowski, M.A.: A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw 106, 249–259 (2018)
Haixiang, G., Yijing, L., Shang, J., et al.: Learning from class-imbalanced data: Review of methods and applications. Expert Syst Appl 73, 220–239 (2017)
Kumar G, Thakur K, Ayyagari M R. MLEsIDSs: machine learning-based ensembles for intrusion detection systems—a review. J Supercomput. 2020: 1–34.
Xi, P.P., Zhao, Y.P., Wang, P.X., et al.: Least squares support vector machine for class imbalance learning and their applications to fault detection of aircraft engine. Aerosp Sci Technol 84, 56–74 (2019)
Carcillo, F., Dal Pozzolo, A., Le Borgne, Y.A., et al.: Scarff: a scalable framework for streaming credit card fraud detection with spark. Information Fusion 41, 182–194 (2018)
Sheng, X., Li, Y., Lian, M., et al.: Influence of coupling interference on arrayed eddy current displacement measurement. Mater Eval 74(12), 1675–1683 (2016)
Li, Y., Chao, X.: ANN-based continual classification in agriculture. Agriculture 10(5), 178 (2020)
Liang X W, Jiang A P, Li T, et al. LR-SMOTE–An improved unbalanced data set oversampling based on K-means and SVM. Knowledge-Based Systems, 2020: 105845.
Tsai, C.F., Lin, W.C., Hu, Y.H., et al.: Under-sampling class imbalanced datasets by combining clustering analysis and instance selection. Inf Sci 477, 47–54 (2019)
Lin, W.C., Tsai, C.F., Hu, Y.H., et al.: Clustering-based undersampling in class-imbalanced data. Inf Sci 409, 17–26 (2017)
Douzas, G., Bacao, F.: Self-Organizing Map Oversampling (SOMO) for imbalanced data set learning. Expert Syst Appl 82, 40–52 (2017)
Gan D, Shen J, An B, et al. Integrating TANBN with cost sensitive classification algorithm for imbalanced data in medical diagnosis. Comput Industrial Eng. 2020: 106266.
Fan, Q., Wang, Z., Li, D., et al.: Entropy-based fuzzy support vector machine for imbalanced datasets. Knowl Based Syst 115, 87–99 (2017)
Tang, B., He, H.: GIR-based ensemble sampling approaches for imbalanced learning. Pattern Recogn 71, 306–319 (2017)
Aurelio, Y.S., de Almeida, G.M., de Castro, C.L., et al.: Learning from imbalanced data sets with weighted cross-entropy function[J]. Neural Process Lett 50(2), 1937–1949 (2019)
Li M, Xiong A, Wang L, et al. Aco Resampling: Enhancing the performance of oversampling methods for class imbalance classification. Knowledge-Based Systems, 2020: 105818.
Koziarski, M., Krawczyk, B., Woźniak, M.: Radial-Based oversampling for noisy imbalanced data classification. Neurocomputing 343, 19–33 (2019)
Zhu, T., Lin, Y., Liu, Y., et al.: Minority oversampling for imbalanced ordinal regression. Knowl Based Syst 166, 140–155 (2019)
Elreedy, D., Atiya, A.F.: A comprehensive analysis of synthetic minority oversampling technique (SMOTE) for handling class imbalance. Inf Sci 505, 32–64 (2019)
Yang J, Zhao Y, Liu J, et al. No Reference Quality Assessment for Screen Content Images Using Stacked Autoencoders in Pictorial and Textual Regions. IEEE Transactions on Cybernetics, 2020.
Yang, J., Wang, C., Jiang, B., et al.: Visual perception enabled industry intelligence: state of the art, challenges and prospects. IEEE Trans Industr Inf 17(3), 2204–2219 (2020)
Yang, J., Wen, J., Wang, Y., et al.: Fog-based marine environmental information monitoring toward ocean of things. IEEE Internet Things J 7(5), 4238–4247 (2019)
Yang, J., Wen, J., Jiang, B., et al.: Blockchain-based sharing and tamper-proof framework of big data networking. IEEE Network 34(4), 62–67 (2020)
Shen, H., Lin, D., Song, T., et al.: Anti-distractors: two-branch siamese tracker with both static and dynamic filters for object tracking. Multimedia Syst 26(6), 631–641 (2020)
Fang, M., Bai, X., Zhao, J., et al.: Integrating Gaussian mixture model and dilated residual network for action recognition in videos. Multimedia Syst 26(6), 715–725 (2020)
Li Y, Yang J. Meta-learning baselines and database for few-shot classification in agriculture[J]. Computers and Electronics in Agriculture, 2021, 182: 106055.
Peng Z, Li Z, Zhang J, et al. Few-shot image recognition with knowledge transfer[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 441–449.
Sung F, Yang Y, Zhang L, et al. Learning to compare: Relation network for few-shot learning[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 1199–1208.
Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks[C]//International Conference on Machine Learning. PMLR, 2017: 1126–1135.
Li, Y., Nie, J., Chao, X.: Do we really need deep CNN for plant diseases identification? Comput Electron Agriculture 178, 105803 (2020)
Acknowledgements
This work was supported by the Major Science and Technology Program of Xinjiang Production and Construction Corps (grant number 2021AA006) and Natural Science Program of Shihezi University (Grant Number KX01230101).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chao, X., Zhang, L. Few-shot imbalanced classification based on data augmentation. Multimedia Systems 29, 2843–2851 (2023). https://doi.org/10.1007/s00530-021-00827-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-021-00827-0