当前位置: X-MOL 学术Big Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Effects of Distance Measure Choice on K-Nearest Neighbor Classifier Performance: A Review.
Big Data ( IF 2.6 ) Pub Date : 2019-12-01 , DOI: 10.1089/big.2018.0175
Haneen Arafat Abu Alfeilat 1 , Ahmad B A Hassanat 1 , Omar Lasassmeh 1 , Ahmad S Tarawneh 2 , Mahmoud Bashir Alhasanat 3, 4 , Hamzeh S Eyal Salman 1 , V B Surya Prasath 5, 6, 7, 8
Affiliation  

The K-nearest neighbor (KNN) classifier is one of the simplest and most common classifiers, yet its performance competes with the most complex classifiers in the literature. The core of this classifier depends mainly on measuring the distance or similarity between the tested example and the training examples. This raises a major question about which distance measures to be used for the KNN classifier among a large number of distance and similarity measures? This review attempts to answer the previous question through evaluating the performance (measured by accuracy, precision and recall) of the KNN using a large number of distance measures, tested on a number of real world datasets, with and without adding different levels of noise. The experimental results show that the performance of KNN classifier depends significantly on the distance used, the results showed large gaps between the performances of different distances. We found that a recently proposed non-convex distance performed the best when applied on most datasets comparing to the other tested distances. In addition, the performance of the KNN degraded only about $20\%$ while the noise level reaches $90\%$, this is true for all the distances used. This means that the KNN classifier using any of the top $10$ distances tolerate noise to a certain degree. Moreover, the results show that some distances are less affected by the added noise comparing to other distances.

中文翻译:

距离测度选择对K最近邻分类器性能的影响:综述。

K最近邻(KNN)分类器是最简单,最常见的分类器之一,但其性能可与文献中最复杂的分类器竞争。该分类器的核心主要取决于测量测试示例与训练示例之间的距离或相似度。这就提出了一个主要问题,即在大量距离和相似性度量中,哪些距离度量将用于KNN分类器?本文试图通过使用大量的距离量度来评估KNN的性能(以准确性,精确度和召回率来衡量),并在不添加或不添加不同噪声水平的情况下,对大量真实世界数据集进行测试,以回答上述问题。实验结果表明,KNN分类器的性能很大程度上取决于所使用的距离,结果表明,不同距离的成绩之间差距较大。我们发现,与其他测试距离相比,最近提出的非凸距离在应用于大多数数据集时表现最佳。此外,KNN的性能仅下降了约20%,而噪声级达到了90%,这对于所有使用的距离都是如此。这意味着使用最远的$ 10 $距离的KNN分类器在一定程度上可以容忍噪声。而且,结果表明,与其他距离相比,某些距离受附加噪声的影响较小。KNN的性能只会降低约$ 20 \%$,而噪声级却达到$ 90 \%$,这对于所有使用的距离都是如此。这意味着使用最远的$ 10 $距离的KNN分类器在一定程度上可以容忍噪声。而且,结果表明,与其他距离相比,某些距离受附加噪声的影响较小。KNN的性能只会降低约$ 20 \%$,而噪声级却达到$ 90 \%$,这对于所有使用的距离都是如此。这意味着使用最远的$ 10 $距离的KNN分类器在一定程度上可以容忍噪声。而且,结果表明,与其他距离相比,某些距离受附加噪声的影响较小。
更新日期:2019-12-01
down
wechat
bug