Adaptive ensemble of classifiers with regularization for imbalanced data classification,Information Fusion

当前位置： X-MOL 学术 › Inform. Fusion › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Adaptive ensemble of classifiers with regularization for imbalanced data classification
Information Fusion ( IF 14.7 ) Pub Date : 2020-12-13 , DOI: 10.1016/j.inffus.2020.10.017
Chen Wang , Chengyuan Deng , Zhoulu Yu , Dafeng Hui , Xiaofeng Gong , Ruisen Luo

The dynamic ensemble selection of classifiers is an effective approach for processing label-imbalanced data classifications. However, such a technique is prone to overfitting, owing to the lack of regularization methods and the dependence on local geometry of data. In this study, focusing on binary imbalanced data classification, a novel dynamic ensemble method, namely adaptive ensemble of classifiers with regularization (AER), is proposed, to overcome the stated limitations. The method solves the overfitting problem through a new perspective of implicit regularization. Specifically, it leverages the properties of stochastic gradient descent to obtain the solution with the minimum norm, thereby achieving regularization; furthermore, it interpolates the ensemble weights by exploiting the global geometry of data to further prevent overfitting. According to our theoretical proofs, the seemingly complicated AER paradigm, in addition to its regularization capabilities, can actually reduce the asymptotic time and memory complexities of several other algorithms. We evaluate the proposed AER method on seven benchmark imbalanced datasets from the UCI machine learning repository and one artificially generated GMM-based dataset with five variations. The results show that the proposed algorithm outperforms the major existing algorithms based on multiple metrics in most cases, and two hypothesis tests (McNemar’s and Wilcoxon tests) verify the statistical significance further. In addition, the proposed method has other preferred properties such as special advantages in dealing with highly imbalanced data, and it pioneers the researches on regularization for dynamic ensemble methods.

中文翻译：

带有正则化的分类器的自适应集成，用于不平衡数据分类

分类器的动态集成选择是处理标签不平衡数据分类的有效方法。但是，由于缺乏正则化方法并且依赖于数据的局部几何形状，因此这种技术易于过度拟合。在这项研究中，针对二进制不平衡数据分类，提出了一种新颖的动态集成方法，即带有正则化的分类器自适应集成（AER），以克服上述限制。该方法通过隐式正则化的新观点解决了过拟合问题。具体而言，它利用随机梯度下降的特性来获得具有最小范数的解，从而实现正则化。此外，它通过利用数据的整体几何形状来插值整体权重，以进一步防止过拟合。根据我们的理论证明，看似复杂的AER范式除了具有正则化功能外，还可以减少其他几种算法的渐近时间和存储复杂性。我们从UCI机器学习存储库中的七个基准不平衡数据集和一个人工生成的基于GMM的具有五个变体的数据集评估了拟议的AER方法。结果表明，所提出的算法在大多数情况下均优于基于多种指标的现有主要算法，并且两个假设检验（McNemar和Wilcoxon检验）进一步证明了统计意义。此外，该方法还具有其他优选属性，例如在处理高度不平衡的数据方面具有特殊优势，并且开创了动态集合方法正则化的研究。

更新日期：2020-12-16

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11