当前位置: X-MOL 学术Sci. China Inf. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Important sampling based active learning for imbalance classification
Science China Information Sciences ( IF 8.8 ) Pub Date : 2020-07-07 , DOI: 10.1007/s11432-019-2771-0
Xinyue Wang , Bo Liu , Siyu Cao , Liping Jing , Jian Yu

Imbalance in data distribution hinders the learning performance of classifiers. To solve this problem, a popular type of methods is based on sampling (including oversampling for minority class and undersampling for majority class) so that the imbalanced data becomes relatively balanced data. However, they usually focus on one sampling technique, oversampling or undersampling. Such strategy makes the existing methods suffer from the large imbalance ratio (the majority instances size over the minority instances size). In this paper, an active learning framework is proposed to deal with imbalanced data by alternative performing important sampling (ALIS), which consists of selecting important majority-class instances and generating informative minority-class instances. In ALIS, two important sampling strategies affect each other so that the selected majority-class instances provide much clearer information in the next oversampling process, meanwhile the generated minority-class instances provide much more sufficient information for the next undersampling procedure. Extensive experiments have been conducted on real world datasets with a large range of imbalance ratio to verify ALIS. The experimental results demonstrate the superiority of ALIS in terms of several well-known evaluation metrics by comparing with the state-of-the-art methods.



中文翻译:

基于重要采样的主动学习,用于失衡分类

数据分布不平衡会阻碍分类器的学习性能。为了解决该问题,一种流行的方法是基于采样(包括少数类的过采样和多数类的欠采样),以使不平衡数据成为相对平衡的数据。但是,它们通常集中于一种采样技术,即过采样或欠采样。这种策略使现有方法遭受较大的不平衡率(多数实例大小超过少数实例大小)。在本文中,提出了一种主动学习框架,通过选择执行重要采样(ALIS)来处理不平衡数据,该过程包括选择重要的多数类实例并生成信息性的少数类实例。在ALIS中 两种重要的采样策略相互影响,因此选定的多数类实例在下一个过采样过程中提供了更清晰的信息,同时生成的少数类实例为下一个欠采样过程提供了更多的信息。已经对具有大范围失衡比的真实世界数据集进行了广泛的实验,以验证ALIS。通过与最新方法进行比较,实验结果证明了ALIS在几个知名评估指标方面的优越性。已经对具有大范围失衡比的真实世界数据集进行了广泛的实验,以验证ALIS。通过与最新方法进行比较,实验结果证明了ALIS在几个知名评估指标方面的优越性。已对具有大范围不平衡比的真实世界数据集进行了广泛的实验,以验证ALIS。通过与最新方法进行比较,实验结果证明了ALIS在几个知名评估指标方面的优越性。

更新日期:2020-07-13
down
wechat
bug