当前位置: X-MOL 学术Knowl. Based Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Penalized multiple distribution selection method for imbalanced data classification
Knowledge-Based Systems ( IF 8.8 ) Pub Date : 2020-04-03 , DOI: 10.1016/j.knosys.2020.105833
Ge Shi , Chong Feng , Wenfu Xu , Lejian Liao , Heyan Huang

In reality, the amount of data from different categories varies significantly, which results in learning bias towards prominent classes, hindering the overall classification performance. In this paper, by proving that traditional classification methods that use single softmax distribution are limited for modeling complex and imbalanced data, we propose a general Multiple Distribution Selection (MDS) method for imbalanced data classification. MDS employs a mixture distribution that is composed of a single softmax distribution and a set of degenerate distributions to model imbalanced data. Furthermore, a dynamic distribution selection method, based on L1 regularization, is also proposed to automatically determine the weights of distributions. In addition, the corresponding two-stage optimization algorithm is designed to estimate the parameters of models. Extensive experiments conducted on three widely used benchmark datasets (IMDB, ACE2005, 20NewsGroups) show that our proposed mixture method outperforms previous methods. Moreover, under highly imbalanced setting, our method achieves up to a 4.1 absolute F1 gain over high-performing baselines.



中文翻译:

数据不平衡分类的惩罚性多重分布选择方法

实际上,来自不同类别的数据量差异很大,这导致学习偏向重要类别,从而阻碍了整体分类性能。在本文中,通过证明使用单一softmax分布的传统分类方法在建模复杂和不平衡数据方面受到限制,我们提出了一种用于不平衡数据分类的通用多重分布选择(MDS)方法。MDS使用由单个softmax分布和一组简并分布组成的混合分布来对不平衡数据进行建模。此外,基于大号1个正则化,还建议自动确定分布的权重。另外,设计了相应的两阶段优化算法来估计模型的参数。在三个广泛使用的基准数据集(IMDB,ACE2005、20NewsGroups)上进行的广泛实验表明,我们提出的混合方法优于以前的方法。此外,在高度不平衡的设置下,我们的方法在高性能基准上可获得4.1的绝对F1增益。

更新日期:2020-04-03
down
wechat
bug