A Generalized Weighted Distance k-Nearest Neighbor for Multi-label Problems,Pattern Recognition

当前位置： X-MOL 学术 › Pattern Recogn. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Generalized Weighted Distance k-Nearest Neighbor for Multi-label Problems
Pattern Recognition ( IF 8 ) Pub Date : 2021-06-01 , DOI: 10.1016/j.patcog.2020.107526
Niloofar Rastin , Mansoor Zolghadri Jahromi , Mohammad Taheri

Abstract In multi-label classification, each instance is associated with a set of pre-specified labels. One common approach is to use Binary Relevance (BR) paradigm to learn each label by a base classifier separately. Use of k-Nearest Neighbor (kNN) as the base classifier (denoted as BRkNN) is a simple, descriptive and powerful approach. In binary relevance a highly imbalanced view of dataset is used. However, kNN is known to perform poorly on imbalanced data. One approach to deal with this is to define the distance function in a parametric form and use the training data to adjust the parameters (i.e. adjusting boundaries between classes) by optimizing a performance measure customized for imbalanced data e.g. F-measure. Prototype Weighting (PW) scheme presented in the literature (Paredes & Vidal, 2006) uses gradient descent to specify the parameters by minimizing the classification error-rate on training data. This paper presents a generalized version of PW. First, instead of minimizing the error-rate proposed in PW, the generalized PW supports also other objective functions that use elements of confusion matrix (including F-measure). Second, PW originally presented for 1NN is extended to the general case of kNN (i.e., k > = 1 ). For problems having highly overlapped classes, it is expected to perform better since a value of k > 1 produces smoother decision boundaries which in turn can improve generalization. In multi-label problems with many labels or problems with highly overlapped classes, the proposed generalized PW is expected to significantly improve the performance as it involves many decision boundaries. The performance of the proposed method has been compared with state-of-the-art methods in multi-label classification containing 6 lazy classifiers based on kNN. Experiments show that the proposed method significantly outperforms other methods.

中文翻译：

多标签问题的广义加权距离 k-最近邻

摘要在多标签分类中，每个实例都与一组预先指定的标签相关联。一种常见的方法是使用二元相关性 (BR) 范式通过基分类器分别学习每个标签。使用 k-最近邻 (kNN) 作为基本分类器（表示为 BRkNN）是一种简单、描述性强且功能强大的方法。在二进制相关性中，使用了高度不平衡的数据集视图。然而，众所周知，kNN 在不平衡数据上表现不佳。解决此问题的一种方法是以参数形式定义距离函数，并使用训练数据通过优化为不平衡数据定制的性能度量（例如 F 度量）来调整参数（即调整类之间的边界）。文献中提出的原型加权 (PW) 方案 (Paredes & Vidal, 2006) 使用梯度下降通过最小化训练数据的分类错误率来指定参数。本文介绍了 PW 的通用版本。首先，不是最小化 PW 中提出的错误率，广义 PW 还支持其他使用混淆矩阵元素的目标函数（包括 F-measure）。其次，最初为 1NN 提出的 PW 扩展到 kNN 的一般情况（即 k > = 1 ）。对于具有高度重叠类的问题，由于 k > 1 的值会产生更平滑的决策边界，从而可以提高泛化能力，因此预计会表现得更好。在具有许多标签的多标签问题或具有高度重叠类的问题中，所提出的广义 PW 有望显着提高性能，因为它涉及许多决策边界。已将所提出方法的性能与基于 kNN 的包含 6 个惰性分类器的多标签分类中的最新方法进行了比较。实验表明，所提出的方法明显优于其他方法。

更新日期：2021-06-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>