Identifying noisy labels with a transductive semi-supervised leave-one-out filter,Pattern Recognition Letters

当前位置： X-MOL 学术 › Pattern Recogn. Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Identifying noisy labels with a transductive semi-supervised leave-one-out filter
Pattern Recognition Letters ( IF 3.9 ) Pub Date : 2020-09-23 , DOI: 10.1016/j.patrec.2020.09.024
Bruno Klaus de Aquino Afonso , Lilian Berton

Obtaining data with meaningful labels is often costly and error-prone. In this situation, semi-supervised learning (SSL) approaches are interesting, as they leverage assumptions about the unlabeled data to make up for the limited amount of labels. However, in real-world situations, we cannot assume that the labeling process is infallible, and the accuracy of many SSL classifiers decreases significantly in the presence of label noise. In this work, we introduce the $LGC_LVOf,$ a leave-one-out filtering approach based on the Local and Global Consistency (LGC) algorithm. Our method aims to detect and remove wrong labels, and thus can be used as a preprocessing step to any SSL classifier. Given the propagation matrix, detecting noisy labels takes $O (c l)$ per step, with c the number of classes and l the number of labels. Moreover, one does not need to compute the whole matrix, but only a l × l submatrix corresponding to interactions between labeled instances. As a result, our approach is best suited to datasets with a large amount of unlabeled data but not many labels. Results are provided for a number of datasets, including MNIST and ISOLET. $LGC_LVOf$ appears to be equally or more precise than the adapted gradient-based filter, and thus can be used in practice for active learning, where it may iteratively send labels for re-evaluation. We show that the best-case accuracy of the embedding of $LGC_LVOf$ into LGC yields performance comparable to the best-case of ℓ₁-based classifiers designed to be robust to label noise.

中文翻译：

使用传导性半监督留一出式过滤器识别嘈杂的标签

使用有意义的标签获取数据通常成本高昂且容易出错。在这种情况下，半监督学习（SSL）方法很有趣，因为它们利用有关未标记数据的假设来弥补有限数量的标签。但是，在现实情况下，我们不能认为标记过程是绝对可靠的，并且在存在标记噪声的情况下，许多SSL分类器的准确性会大大降低。在这项工作中，我们介绍了 $LGC_左室，$ 一种基于本地和全局一致性（LGC）算法的留一法过滤方法。我们的方法旨在检测和删除错误的标签，因此可以用作任何SSL分类器的预处理步骤。给定传播矩阵，检测噪声标签会花费 $Ø （ C 升）$ 每一步，使用c个类的数量和l个标签的数量。而且，不需要计算整个矩阵，而只需要计算一个与标记实例之间的交互相对应的l × l子矩阵即可。因此，我们的方法最适合于具有大量未标记数据但标签不多的数据集。提供了许多数据集的结果，包括MNIST和ISOLET。 $LGC_左室$ 似乎比改编的基于梯度的滤波器更精确或更精确，因此可以在实践中用于主动学习，在主动学习中，它可以迭代地发送标签以进行重新评估。我们证明嵌入的最佳情况下的准确性 $LGC_左室$ 成LGC产量性能比得上ℓ的最佳情况_1个设计成鲁棒噪声标记为基础的分类器。

更新日期：2020-10-11

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11