当前位置: X-MOL 学术Multimedia Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Few-shot imbalanced classification based on data augmentation
Multimedia Systems ( IF 3.9 ) Pub Date : 2021-07-03 , DOI: 10.1007/s00530-021-00827-0
Xuewei Chao 1 , Lixin Zhang 1
Affiliation  

Few-shot imbalanced classification tasks are commonly faced in the real-world applications due to the unbalanced data distribution and few samples of rare classes. As known, the traditional machine learning algorithms perform poorly on the imbalanced classification, usually ignoring the few samples in the minority class to achieve a good overall accuracy. To solve this few-shot problem, a novel data augmentation method was proposed in this study, called H-SMOTE, to rebalance the original imbalanced data in a stable and reasonable way. Extensive experiments were carried out on 12 open datasets covering a wide range of imbalance rate from 3.8 to 16.4. Moreover, two typical classifiers SVM and Random Forest were selected to testify the performance and generalization of proposed H-SMOTE. Further, the typical data oversampling algorithm SMOTE was adopted as the baseline of comparison. The average experimental results show that the proposed H-SMOTE method outperforms the typical SMOTE in terms of accuracy (2.58%), recall (0.67%), F-measure (2.33%), G-mean (2.58%), and AUC (2.5%). Besides, the distribution of augmented dataset by H-SMOTE is more uniform and stable. Thus, this work provides a useful data augmentation method to solve the few-shot imbalanced classification, which can also be generalized to many areas in multimedia systems.



中文翻译:

基于数据增强的少样本不平衡分类

由于数据分布不平衡和稀有类样本很少,在实际应用中经常面临少样本不平衡分类任务。众所周知,传统的机器学习算法在不平衡分类上表现不佳,通常会忽略少数类中的少数样本以达到良好的整体精度。为了解决这个少镜头问题,本研究提出了一种新的数据增强方法,称为 H-SMOTE,以稳定合理的方式重新平衡原始不平衡数据。在 12 个开放数据集上进行了广泛的实验,涵盖了从 3.8 到 16.4 的广泛不平衡率。此外,选择了两个典型的分类器 SVM 和随机森林来证明所提出的 H-SMOTE 的性能和泛化性。更多,采用典型的数据过采样算法SMOTE作为比较的基线。平均实验结果表明,所提出的 H-SMOTE 方法在准确率 (2.58%)、召回率 (0.67%)、F-measure (2.33%)、G-mean (2.58%) 和 AUC ( 2.5%)。此外,H-SMOTE 增强数据集的分布更加均匀和稳定。因此,这项工作提供了一种有用的数据增强方法来解决少镜头不平衡分类,这也可以推广到多媒体系统中的许多领域。H-SMOTE 增强数据集的分布更加均匀和稳定。因此,这项工作提供了一种有用的数据增强方法来解决少镜头不平衡分类,这也可以推广到多媒体系统中的许多领域。H-SMOTE 增强数据集的分布更加均匀和稳定。因此,这项工作提供了一种有用的数据增强方法来解决少镜头不平衡分类,这也可以推广到多媒体系统中的许多领域。

更新日期:2021-07-04
down
wechat
bug