Self-supervised deep subspace clustering network for faces in videos

Qiu, Yunhao; Hao, Pengyi

doi:10.1007/s00371-020-01984-5

Self-supervised deep subspace clustering network for faces in videos

Original article
Published: 07 October 2020

Volume 37, pages 2253–2261, (2021)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Yunhao Qiu¹ &
Pengyi Hao²

420 Accesses
7 Citations
1 Altmetric
Explore all metrics

Abstract

Video face clustering is a challenging task with wide applications. Unlike ordinary image clustering, faces in videos usually exist as a series of tracks, which provide prior knowledge. Specifically, faces from the same track are considered to be the same person while faces from the different tracks appearing in the same frame are considered to be different people. Based on this prior knowledge, we propose the self-supervised deep subspace clustering network (SDSCN). SDSCN adopts autoencoder to nonlinearly map the faces into latent space and adds the fully connected layer between the encoder and decoder to explore the self-expressiveness property. Prior knowledge is automatically incorporated into the loss function to guide the training. We further propose efficient training strategy for our network and clustering. The experiments on the two public datasets (BBT0101 and Notting-Hill) demonstrate the advantages of our method. Specifically, our method achieves about 3–17% improvement in clustering accuracy on BBT0101 and about 6–23% improvement on Notting-Hill compared to the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel self-attention deep subspace clustering

Article 18 April 2021

Self-Supervised Convolutional Subspace Clustering Network with the Block Diagonal Regularizer

Article 02 August 2021

Joint Face Representation Adaptation and Clustering in Videos

References

Cinbis, R.G., Verbeek, J.J., Schmid, C.: Unsupervised metric learning for face identification in TV video. In: 2011 IEEE International Conference on Computer Vision (2011)
Elhamifar, E., Vidal, R.: Sparse subspace clustering: algorithm, theory, and applications. In: 2013 IEEE Transactions on Pattern Analysis Machine Intelligence, vol. 35, pp. 2765–2781 (2013)
Liu, G., Lin, Z., Yan, S., Sun, J., Yu, Y., Ma, Y.: Robust recovery of subspace structures by low-rank representation. In: 2013 IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, pp. 171–184 (2013)
Pan, J., Tong, Z., Li, H., Salzmann, M., Reid, I.: Deep subspace clustering networks. In: 2017 Conference and Workshop on Neural Information Processing Systems (2017)
Roth, M., Bäuml, M., Nevatia, R., Stiefelhagen, R.: Robust multi-pose face tracking by multi-stage tracklet association. In: Proceedings of the 21st International Conference on Pattern Recognition, pp. 1012–1016 (2012)
Zhang, Y., Xu, C., Lu, H., Huang, Y.: Character identification in feature-length films using global face-name matching. IEEE Trans. Multimed. 11(7), 1276–1288 (2009)
Article Google Scholar
Lu, C.-Y., Min, H., Zhao, Z.-Q., Zhu, L., Huang, D.-S., Yan, S.: Robust and efficient subspace segmentation via least squares regression. In: 2012 European Conference on Computer Vision, pp. 347–360 (2012)
Patel, V.M., Vidal, R.: Kernel sparse subspace clustering. In: 2014 IEEE International Conference on Image Processing, pp. 2849–2853 (2014)
Patel, V.M., Nguyen, H.V., Vidal, R.: Latent space sparse and low-rank subspace clustering. IEEE J. Sel. Top. Signal Process. 9, 691–701 (2015)
Article Google Scholar
Patel, V.M., Nguyen, H.V., Vidal, R.: Latent space sparse subspace clustering. In: 2013 IEEE International Conference on Computer Vision, pp. 225–232 (2013)
Xiao, S., Tan, M., Xu, D., Dong, Z.Y.: Robust kernel low-rank representation. IEEE Trans. Neural Netw. Learn. Syst. 27, 2268–2281 (2016)
Article MathSciNet Google Scholar
Yin, M., Guo, Y., Gao, J., He, Z., Xie, S.: Kernel sparse subspace clustering on symmetric positive definite manifolds. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (2016)
Peng, X., Feng, J., Xiao, S., Yau, W., Zhou, J.T., Yang, S.: Structured autoencoders for subspace clustering. IEEE Trans. Image Process. 27(10), 5076–5086 (2018)
Article MathSciNet Google Scholar
Cinbis, R.G., Verbeek, J., Schmid, C.: Unsupervised metric learning for face identification in TV video. In: 2011 International Conference on Computer Vision, pp. 1559–1566 (2011)
Du, M., Chellappa, R.: Face association across unconstrained video frames using conditional random fields. In: 2012 European Conference on Computer Vision, pp. 167–180 (2012)
Hu, Y., Mian, A.S., Owens, R.: Sparse approximated nearest points for image set classification. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition, pp. 121–128 (2011)
Tapaswi, M., Law, M.T., Fidler, S.: Video face clustering with unknown number of clusters. In: International Conference on Computer Vision (2019)
Wu, B., Zhang, Y., Hu, B., Ji, Q.: Constrained clustering and its application to face clustering in videos. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3507–3514 (2013)
Xiao, S., Tan, M., Xu, D.: Weighted block-sparse low rank representation for face clustering in videos. In: 2014 European Conference on Computer Vision, pp. 123–138 (2014)
Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Joint face representation adaptation and clustering in videos. In: Proceedings of 2016 European Conference on Computer Vision, vol. 9907, pp. 236–251 (2016)
Zhang, Y., Tang, Z., Wu, B., Ji, Q., Lu, H.: A coupled hidden conditional random field model for simultaneous face clustering and naming in videos. IEEE Trans. Image Process. 25, 5780–5792 (2016)
Article MathSciNet Google Scholar
Sharma, V., Tapaswi, M., Sarfraz, M.S., Stiefelhagen, R.: Self-supervised learning of face representations for video face clustering. In: IEEE International Conference on Automatic Face and Gesture Recognition (2019)
Klein, D., Kamvar, S., Manning, C.: From instancelevel constraints to space-level constraints: making the most of prior knowledge in data clustering. In: Proceedings of the Nineteenth International Conference on Machine Learning, pp. 307–314 (2002)
Chapelle, O., Schölkopf, B., Zien, A.: Probabilistic Semi-Supervised Clustering with Constraints, pp. 73–102. MIT Press, Cambridge (2006)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Ji, P., Salzmann, M., Li, H.: Efficient dense subspace clustering. In: IEEE Winter Conference on Applications of Computer Vision, pp. 461–468 (2014)
Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: 2001 Conference on Neural Information Processing Systems (2001)
Bishop, C.: Pattern Recognition and Machine Learning, pp. 140–155. Springer, Berlin (2006)
MATH Google Scholar
Pini, S., Cornia, M., Bolelli, F.: M-VAD names: a dataset for video captioning with naming. Multimed. Tools Appl. 78, 14007–14027 (2019)
Article Google Scholar

Download references

Acknowledgements

The work was supported by Zhejiang Provincial Natural Science Foundation of China under Grant No. LY18F020034, and Natural Science Foundation of China under Grant No. 61801428.

Author information

Authors and Affiliations

School of Mathematical Science, Zhejiang University, Hangzhou, China
Yunhao Qiu
College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou, China
Pengyi Hao

Authors

Yunhao Qiu
View author publications
You can also search for this author in PubMed Google Scholar
Pengyi Hao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pengyi Hao.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Yunhao Qiu and Pengyi Hao have contributed equally.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qiu, Y., Hao, P. Self-supervised deep subspace clustering network for faces in videos. Vis Comput 37, 2253–2261 (2021). https://doi.org/10.1007/s00371-020-01984-5

Download citation

Accepted: 19 September 2020
Published: 07 October 2020
Issue Date: August 2021
DOI: https://doi.org/10.1007/s00371-020-01984-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Self-supervised deep subspace clustering network for faces in videos

Abstract

Access this article

Similar content being viewed by others

A novel self-attention deep subspace clustering

Self-Supervised Convolutional Subspace Clustering Network with the Block Diagonal Regularizer

Joint Face Representation Adaptation and Clustering in Videos

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Self-supervised deep subspace clustering network for faces in videos

Abstract

Access this article

Similar content being viewed by others

A novel self-attention deep subspace clustering

Self-Supervised Convolutional Subspace Clustering Network with the Block Diagonal Regularizer

Joint Face Representation Adaptation and Clustering in Videos

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation