Improving Face Recognition by Clustering Unlabeled Faces in the Wild,arXiv - CS - Computer Vision and Pattern Recognition

当前位置： X-MOL 学术 › arXiv.cs.CV › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Improving Face Recognition by Clustering Unlabeled Faces in the Wild
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2020-07-14 , DOI: arxiv-2007.06995
Aruni RoyChowdhury, Xiang Yu, Kihyuk Sohn, Erik Learned-Miller, Manmohan Chandraker

While deep face recognition has benefited significantly from large-scale labeled data, current research is focused on leveraging unlabeled data to further boost performance, reducing the cost of human annotation. Prior work has mostly been in controlled settings, where the labeled and unlabeled data sets have no overlapping identities by construction. This is not realistic in large-scale face recognition, where one must contend with such overlaps, the frequency of which increases with the volume of data. Ignoring identity overlap leads to significant labeling noise, as data from the same identity is split into multiple clusters. To address this, we propose a novel identity separation method based on extreme value theory. It is formulated as an out-of-distribution detection algorithm, and greatly reduces the problems caused by overlapping-identity label noise. Considering cluster assignments as pseudo-labels, we must also overcome the labeling noise from clustering errors. We propose a modulation of the cosine loss, where the modulation weights correspond to an estimate of clustering uncertainty. Extensive experiments on both controlled and real settings demonstrate our method's consistent improvements over supervised baselines, e.g., 11.6% improvement on IJB-A verification.

中文翻译：

通过对野外未标记的人脸进行聚类来改进人脸识别

虽然深度人脸识别从大规模标记数据中受益匪浅，但目前的研究重点是利用未标记数据进一步提高性能，降低人工标注的成本。以前的工作主要是在受控环境中，其中标记和未标记的数据集在构造上没有重叠的身份。这在大规模人脸识别中是不现实的，必须应对这种重叠，其频率随着数据量的增加而增加。忽略身份重叠会导致显着的标签噪声，因为来自同一身份的数据被分成多个集群。为了解决这个问题，我们提出了一种基于极值理论的新身份分离方法。它被表述为一种分布外检测算法，并且大大减少了重叠身份标签噪声带来的问题。将聚类分配视为伪标签，我们还必须克服来自聚类错误的标签噪声。我们提出了余弦损失的调制，其中调制权重对应于聚类不确定性的估计。在受控和真实设置上的大量实验证明了我们的方法在监督基线上的持续改进，例如，在 IJB-A 验证上改进了 11.6%。

更新日期：2020-07-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文