当前位置: X-MOL 学术Int. J. Artif. Intell. Tools › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Nearest Neighbor Algorithm for Imbalanced Classification
International Journal on Artificial Intelligence Tools ( IF 1.1 ) Pub Date : 2021-05-28 , DOI: 10.1142/s0218213021500135
Rémi Viola 1, 2 , Rémi Emonet 1 , Amaury Habrard 1 , Guillaume Metzler 3 , Sébastien Riou 2 , Marc Sebban 1
Affiliation  

Due to the inability of the accuracy-driven methods to address the challenging problem of learning from imbalanced data, several alternative measures have been proposed in the literature, like the Area Under the ROC Curve (AUC), the Average Precision (AP), the F-measure, the G-Mean, etc. However, these latter measures are neither smooth, convex nor separable, making their direct optimization hard in practice. In this paper, we tackle the challenging problem of imbalanced learning from a nearest-neighbor (NN) classification perspective, where the minority examples typically belong to the class of interest. Based on simple geometrical ideas, we introduce an algorithm that rescales the distance between a query sample and any positive training example. This leads to a modification of the Voronoi regions and thus of the decision boundaries of the NN classifier. We provide a theoretical justification about this scaling scheme which inherently aims at reducing the False Negative rate while controlling the number of False Positives. We further formally establish a link between the proposed method and cost-sensitive learning. An extensive experimental study is conducted on many public imbalanced datasets showing that our method is very effective with respect to popular Nearest-Neighbor algorithms, comparable to state-of-the-art sampling methods and even yields the best performance when combined with them.

中文翻译:

一种不平衡分类的最近邻算法

由于精度驱动方法无法解决从不平衡数据中学习的挑战性问题,文献中提出了几种替代措施,如 ROC 曲线下面积(AUC)、平均精度(AP)、 F-measure、G-Mean 等。然而,后面这些度量既不是平滑的、凸的也不是可分离的,使得它们在实践中很难直接优化。在本文中,我们从最近邻 (NN) 分类的角度解决了学习不平衡的挑战性问题,其中少数示例通常属于感兴趣的类别。基于简单的几何思想,我们引入了一种算法,可以重新调整查询样本和任何正训练样本之间的距离。这导致对 Voronoi 区域的修改,从而对 NN 分类器的决策边界进行修改。我们提供了关于这种缩放方案的理论论证,其本质上旨在降低假阴性率,同时控制假阳性的数量。我们进一步正式建立了所提出的方法和成本敏感学习之间的联系。对许多公共不平衡数据集进行了广泛的实验研究,表明我们的方法对于流行的最近邻算法非常有效,可与最先进的采样方法相媲美,甚至在与它们结合时产生最佳性能。我们进一步正式建立了所提出的方法和成本敏感学习之间的联系。对许多公共不平衡数据集进行了广泛的实验研究,表明我们的方法对于流行的最近邻算法非常有效,可与最先进的采样方法相媲美,甚至在与它们结合时产生最佳性能。我们进一步正式建立了所提出的方法和成本敏感学习之间的联系。对许多公共不平衡数据集进行了广泛的实验研究,表明我们的方法对于流行的最近邻算法非常有效,可与最先进的采样方法相媲美,甚至在与它们结合时产生最佳性能。
更新日期:2021-05-28
down
wechat
bug