当前位置: X-MOL 学术Wirel. Commun. Mob. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Survey of Nearest Neighbor Algorithms for Solving the Class Imbalanced Problem
Wireless Communications and Mobile Computing ( IF 2.146 ) Pub Date : 2021-03-03 , DOI: 10.1155/2021/5520990
Bo Sun 1 , Haiyan Chen 2
Affiliation  

nearest neighbor (NN) is a simple and widely used classifier; it can achieve comparable performance with more complex classifiers including decision tree and artificial neural network. Therefore, NN has been listed as one of the top 10 algorithms in machine learning and data mining. On the other hand, in many classification problems, such as medical diagnosis and intrusion detection, the collected training sets are usually class imbalanced. In class imbalanced data, although positive examples are heavily outnumbered by negative ones, positive examples usually carry more meaningful information and are more important than negative examples. Similar to other classical classifiers, NN is also proposed under the assumption that the training set has approximately balanced class distribution, leading to its unsatisfactory performance on imbalanced data. In addition, under a class imbalanced scenario, the global resampling strategies that are suitable to decision tree and artificial neural network often do not work well for NN, which is a local information-oriented classifier. To solve this problem, researchers have conducted many works for NN over the past decade. This paper presents a comprehensive survey of these works according to their different perspectives and analyzes and compares their characteristics. At last, several future directions are pointed out.

中文翻译:

解决类不平衡问题的最近邻算法研究

最近邻居(NN)是一种简单且广泛使用的分类器;通过决策树和人工神经网络等更复杂的分类器,它可以实现可比的性能。因此,NN被列为机器学习和数据挖掘中的十大算法之一。另一方面,在许多分类问题中,例如医学诊断和入侵检测,收集的训练集通常是类不平衡的。在班级不平衡数据中,尽管积极的例子比消极的例子严重得多,但是积极的例子通常携带更有意义的信息,比消极的例子更重要。与其他经典分类器类似,在训练集具有近似平衡的班级分布的假设下,也提出了NN,从而导致其在不平衡数据上的表现不尽人意。另外,在类不平衡的情况下,适用于决策树和人工神经网络的全局重采样策略通常不适用于NN,这是一种面向局部信息的分类器。为了解决这个问题,在过去的十年中,研究人员为NN做了大量工作。本文根据不同的观点对这些作品进行了全面的概述,并对它们的特点进行了分析和比较。最后指出了未来的发展方向。
更新日期:2021-03-03
down
wechat
bug