当前位置: X-MOL 学术J. Supercomput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
BCGAN: A CGAN-based over-sampling model using the boundary class for data balancing
The Journal of Supercomputing ( IF 2.5 ) Pub Date : 2021-03-05 , DOI: 10.1007/s11227-021-03688-6
Minjae Son , Seungwon Jung , Seungmin Jung , Eenjun Hwang

A class imbalance problem occurs when a dataset is decomposed into one majority class and one minority class. This problem is critical in the machine learning domains because it induces bias in training machine learning models. One popular method to solve this problem is using a sampling technique to balance the class distribution by either under-sampling the majority class or over-sampling the minority class. So far, diverse over-sampling techniques have suffered from overfitting and noisy data generation problems. In this paper, we propose an over-sampling scheme based on the borderline class and conditional generative adversarial network (CGAN). More specifically, we define a borderline class based on the minority class data near the majority class. Then, we generate data for the borderline class using the CGAN for data balancing. To demonstrate the performance of the proposed scheme, we conducted various experiments on diverse imbalanced datasets. We report some of the results.



中文翻译:

BCGAN:基于CGAN的过采样模型,使用边界类进行数据平衡

当数据集分解为一个多数类和一个少数类时,会发生类不平衡问题。这个问题在机器学习领域很关键,因为它在训练机器学习模型时会引起偏差。解决此问题的一种流行方法是使用采样技术,通过对多数类别进行欠采样或对少数类别进行过度采样来平衡类别分布。到目前为止,各种过采样技术都遭受过拟合和嘈杂的数据生成问题的困扰。在本文中,我们提出了一种基于边界线类和条件生成对抗网络(CGAN)的过采样方案。更具体地说,我们基于接近多数类的少数类数据定义边界线类。然后,我们使用CGAN为边界线类生成数据以实现数据平衡。为了证明所提出方案的性能,我们对不同的不平衡数据集进行了各种实验。我们报告了一些结果。

更新日期:2021-03-05
down
wechat
bug