当前位置: X-MOL 学术J. Classif. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Core Clustering as a Tool for Tackling Noise in Cluster Labels
Journal of Classification ( IF 2 ) Pub Date : 2019-03-30 , DOI: 10.1007/s00357-019-9303-4
Renato Cordeiro de Amorim , Vladimir Makarenkov , Boris Mirkin

Real-world data sets often contain mislabelled entities. This can be particularly problematic if the data set is being used by a supervised classification algorithm at its learning phase. In this case, the accuracy of this classification algorithm, when applied to unlabelled data, is likely to suffer considerably. In this paper, we introduce a clustering-based method capable of reducing the number of mislabelled entities in data sets. Our method can be summarised as follows: (i) cluster the data set; (ii) select the entities that have the most potential to be assigned to correct clusters; (iii) use the entities of the previous step to define the core clusters and map them to the labels using a confusion matrix; (iv) use the core clusters and our cluster membership criterion to correct the labels of the remaining entities. We perform numerous experiments to validate our method empirically using k -nearest neighbour classifiers as a benchmark. We experiment with both synthetic and real-world data sets with different proportions of mislabelled entities. Our experiments demonstrate that the proposed method produces promising results. Thus, it could be used as a preprocessing data correction step of a supervised machine learning algorithm.

中文翻译:

核心聚类作为处理聚类标签中噪声的工具

真实世界的数据集通常包含错误标记的实体。如果数据集在学习阶段被监督分类算法使用,这可能会特别成问题。在这种情况下,当应用于未标记数据时,这种分类算法的准确性可能会受到很大影响。在本文中,我们介绍了一种基于聚类的方法,能够减少数据集中错误标记实体的数量。我们的方法可以总结如下:(i)对数据集进行聚类;(ii) 选择最有可能被分配到正确集群的实体;(iii) 使用上一步的实体来定义核心集群,并使用混淆矩阵将它们映射到标签;(iv) 使用核心集群和我们的集群成员标准来纠正剩余实体的标签。我们进行了大量实验,以使用 k 最近邻分类器作为基准来凭经验验证我们的方法。我们用不同比例的错误标记实体对合成数据集和真实数据集进行试验。我们的实验表明,所提出的方法产生了有希望的结果。因此,它可以用作监督机器学习算法的预处理数据校正步骤。
更新日期:2019-03-30
down
wechat
bug