当前位置: X-MOL 学术IEEE Trans. Fuzzy Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
On Convergence of the Class Membership Estimator in Fuzzy k-Nearest Neighbor Classifier
IEEE Transactions on Fuzzy Systems ( IF 11.9 ) Pub Date : 2019-06-01 , DOI: 10.1109/tfuzz.2018.2874017
Imon Banerjee , Sankha Subhra Mullick , Swagatam Das

The fuzzy $k$-nearest neighbor classifier (F$k$NN) improves upon the flexibility of the $k$-nearest neighbor classifier by considering each class as a fuzzy set and estimating the membership of an unlabeled data instance for each of the classes. However, the question of validating the quality of the class memberships estimated by F$k$NN for a regular multiclass classification problem still remains mostly unanswered. In this paper, we attempt to address this issue by first proposing a novel direction of evaluating a fuzzy classifier by highlighting the importance of focusing on the class memberships estimated by F$k$NN instead of its misclassification error. This leads us to finding novel theoretical upper bounds, respectively, on the bias and the mean squared error of the class memberships estimated by F$k$NN. Additionally the proposed upper bounds are shown to converge toward zero with increasing availability of the labeled data points, under some elementary assumptions on the class distribution and membership function. The major advantages of this analysis are its simplicity, capability of a direct extension for multiclass problems, parameter independence, and practical implication in explaining the behavior of F$k$NN in diverse situations (such as in presence of class imbalance). Furthermore, we provide a detailed simulation study on artificial and real data sets to empirically support our claims.

中文翻译:

模糊k-最近邻分类器类成员估计器的收敛性

模糊的 $千$-最近邻分类器(F$千$NN) 提高了灵活性 $千$-最近邻分类器通过将每个类视为模糊集并估计每个类的未标记数据实例的成员资格。然而,验证 F 估计的班级成员的质量的问题$千$用于常规多类分类问题的 NN 仍然大多没有答案。在本文中,我们试图通过首先提出一个评估模糊分类器的新方向来解决这个问题,强调关注 F 估计的类成员的重要性$千$NN 而不是它的错误分类错误。这导致我们分别针对 F 估计的类成员的偏差和均方误差找到新的理论上限$千$神经网络。此外,在对类分布和隶属函数的一些基本假设下,建议的上限显示为随着标记数据点的可用性增加而收敛于零。这种分析的主要优点是它的简单性、多类问题的直接扩展能力、参数独立性以及在解释 F 的行为方面的实际意义$千$NN 在不同情况下(例如存在类别不平衡)。此外,我们提供了对人工和真实数据集的详细模拟研究,以实证支持我们的主张。
更新日期:2019-06-01
down
wechat
bug