当前位置: X-MOL 学术Vis. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Self-supervised deep subspace clustering network for faces in videos
The Visual Computer ( IF 3.5 ) Pub Date : 2020-10-07 , DOI: 10.1007/s00371-020-01984-5
Yunhao Qiu , Pengyi Hao

Video face clustering is a challenging task with wide applications. Unlike ordinary image clustering, faces in videos usually exist as a series of tracks, which provide prior knowledge. Specifically, faces from the same track are considered to be the same person while faces from the different tracks appearing in the same frame are considered to be different people. Based on this prior knowledge, we propose the self-supervised deep subspace clustering network (SDSCN). SDSCN adopts autoencoder to nonlinearly map the faces into latent space and adds the fully connected layer between the encoder and decoder to explore the self-expressiveness property. Prior knowledge is automatically incorporated into the loss function to guide the training. We further propose efficient training strategy for our network and clustering. The experiments on the two public datasets (BBT0101 and Notting-Hill) demonstrate the advantages of our method. Specifically, our method achieves about 3–17% improvement in clustering accuracy on BBT0101 and about 6–23% improvement on Notting-Hill compared to the state-of-the-art methods.

中文翻译:

视频中人脸的自监督深度子空间聚类网络

视频人脸聚类是一项具有广泛应用的挑战性任务。与普通图像聚类不同,视频中的人脸通常以一系列轨迹存在,提供先验知识。具体地,来自同一轨道的人脸被认为是同一个人,而出现在同一帧中的来自不同轨道的人脸被认为是不同的人。基于这些先验知识,我们提出了自监督的深度子空间聚类网络(SDSCN)。SDSCN 采用自编码器将人脸非线性映射到潜在空间,并在编码器和解码器之间添加全连接层以探索自我表达特性。先验知识被自动纳入损失函数以指导训练。我们进一步为我们的网络和聚类提出了有效的训练策略。在两个公共数据集(BBT0101 和 Notting-Hill)上的实验证明了我们方法的优势。具体来说,与最先进的方法相比,我们的方法在 BBT0101 上的聚类精度提高了约 3-17%,在 Notting-Hill 上提高了约 6-23%。
更新日期:2020-10-07
down
wechat
bug