当前位置: X-MOL 学术Neural Netw. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Accurate and efficient sequential ensemble learning for highly imbalanced multi-class data.
Neural Networks ( IF 7.8 ) Pub Date : 2020-05-19 , DOI: 10.1016/j.neunet.2020.05.010
Chi-Man Vong 1 , Jie Du 2
Affiliation  

Multi-class classification for highly imbalanced data is a challenging task in which multiple issues must be resolved simultaneously, including (i) accuracy on classifying highly imbalanced multi-class data; (ii) training efficiency for large data; and (iii) sensitivity to high imbalance ratio (IR). In this paper, a novel sequential ensemble learning (SEL) framework is designed to simultaneously resolve these issues. SEL framework provides a significant property over traditional AdaBoost, in which the majority samples can be divided into multiple small and disjoint subsets for training multiple weak learners without compromising accuracy (while AdaBoost cannot). To ensure the class balance and majority-disjoint property of subsets, a learning strategy called balanced and majority-disjoint subsets division (BMSD) is developed. Unfortunately it is difficult to derive a general learner combination method (LCM) for any kind of weak learner. In this work, LCM is specifically designed for extreme learning machine, called LCM-ELM. The proposed SEL framework with BMSD and LCM-ELM has been compared with state-of-the-art methods over 16 benchmark datasets. In the experiments, under highly imbalanced multi-class data (IR up to 14K; data size up to 493K), (i) the proposed works improve the performance in different measures including G-mean, macro-F, micro-F, MAUC; (ii) training time is significantly reduced.



中文翻译:

对于高度不平衡的多类数据,准确而高效的顺序合奏学习。

对高度不平衡的数据进行多类分类是一项艰巨的任务,其中必须同时解决多个问题,其中包括:(i)对高度不平衡的多类数据进行分类的准确性;(ii)大数据的培训效率;(iii)对高失衡率(IR)的敏感性。本文设计了一种新颖的顺序集成学习(SEL)框架来同时解决这些问题。SEL框架提供了优于传统AdaBoost的显着特性,在传统AdaBoost中,大多数样本可分为多个小的子集和不相交的子集,用于训练多个弱学习者而不会影响准确性(而AdaBoost则不能)。为了确保子集的类平衡和多数不相交属性,一种称为建立了平衡和多数不相交子集划分(BMSD)。不幸的是,很难为任何类型的弱学习者导出通用学习者组合方法(LCM)。在这项工作中,LCM专为极限学习机LCM-ELM设计。提议的具有BMSD和LCM-ELM的SEL框架已与16个基准数据集上的最新方法进行了比较。在实验中,在高度不平衡的多类数据(IR高达14K;数据大小最大493K)下,(i)拟议的工作改进了在包括G均值,宏F,微F,MAUC在内的不同度量下的性能; (ii)培训时间大大减少。

更新日期:2020-05-19
down
wechat
bug