Joint learning for face alignment and face transfer with depth image

Wang, Xiaoli; Zheng, Yinglin; Zeng, Ming; Cheng, Xuan; Lu, Wei

doi:10.1007/s11042-020-08873-y

Joint learning for face alignment and face transfer with depth image

Published: 15 May 2020

Volume 79, pages 33993–34010, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Xiaoli Wang¹,
Yinglin Zheng¹,
Ming Zeng¹,
Xuan Cheng¹ &
…
Wei Lu²

350 Accesses
1 Citation
Explore all metrics

Abstract

Face alignment and cross-modal face transfer are two important tasks for automatic face analysis in computer vision. Over the years, they have been extensively studied. Recently, deep neural networks have attracted much research attention for both face alignment and face transfer. With the prevalence of the consumer depth sensor, depth-based face alignment and cross-modal (image and depth) are increasingly important. Different from existing RGB- image based tasks, the main challenge of depth-based tasks is the lack of annotated data. To address the challenge, we observe that these two tasks are closely related and their learning processes may benefit each other. This paper develops a joint multi-task learning algorithm for both depth-based face alignment and face transfer using the deep convolutional neural network (CNN). The proposed approach allows the CNN model to simultaneously share visual knowledge and information between two tasks. We use a dataset of 10,000 face depth images for validation. Our experiments show that the proposed approach outperforms state-of-the-art algorithms. The results also show that learning these two related tasks simultaneously improves the performance of each individual task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on Image Data Augmentation for Deep Learning

Article Open access 06 July 2019

Deepfake: An Overview

A review of object detection based on deep learning

Article 12 June 2020

Notes

Hereafter, we use domain-to-domain image translation and image-to-image translation as the same meaning of cross-modal face transfer.

References

Almahairi A, Rajeswar S, Sordoni A, Bachman P, Courville A (2018) Augmented cyclegan: learning many-to-many mappings from unpaired data. arXiv:1802.10151
Ben-David S, Blitzer J, Crammer K, Kulesza A, Pereira F, Vaughan JW (2010) A theory of learning from different domains. Machine Learning 79(1-2):151–175
Article MathSciNet Google Scholar
Ben-David S, Blitzer J, Crammer K, Pereira F (2007) Analysis of representations for domain adaptation. In: Advances in neural information processing systems, pp 137–144
Benaim S, Galanti T, Wolf L (2018) Estimating the success of unsupervised image to image translation. In: European conference on computer vision
Bettadapura V (2012) Face expression recognition and analysis: the state of the art. Computer Science
Borghi G, Venturelli M, Vezzani R, Cucchiara R (2017) Poseidon: face-from-depth for driver pose estimation. In: IEEE conference on computer vision and pattern recognition, pp 5494–5503
Bulat A, Tzimiropoulos G (2017) Binarized convolutional landmark localizers for human pose estimation and face alignment with limited resources. In: IEEE international conference on computer vision, vol 1, p 4
Bulat A, Tzimiropoulos G (2017) How far are we from solving the 2d & 3d face alignment problem? (and a dataset of 230,000 3d facial landmarks). In: International conference on computer vision
Bulat A, Tzimiropoulos G (2018) Super-fan: integrated facial landmark localization and super-resolution of real-world low resolution faces in arbitrary poses with gans. In: IEEE conference on computer vision and pattern recognition, pp 109–117
Chen C, Weng Y, Lin S (2013) Zhou, k.: 3d shape regression for real-time facial animation. Acm Transactions on Graphics 32(4):1
Google Scholar
Choi Y, Choi M, Kim M, Ha JW, Kim S, Choo J (2018) Stargan: unified generative adversarial networks for multi-domain image-to-image translation. In: IEEE conference on computer vision and pattern recognition
Dong H, Neekhara P, Wu C, Guo Y (2017) Unsupervised image-to-image translation with generative adversarial networks. arXiv:1701.02676
Dong X, Yu SI, Weng X, Wei SE, Yang Y, Sheikh Y (2018) Supervision-by-registration: an unsupervised approach to improve the precision of facial landmark detectors. In: IEEE conference on computer vision and pattern recognition
Fabbri M, Borghi G, Lanzi F, Vezzani R, Calderara S, Cucchiara R (2018) Domain translation with conditional gans: from depth to rgb face-to-face. In: International conference on pattern recognition
Feng ZH, Kittler J, Awais M, Huber P, Wu XJ (2018) Wing loss for robust facial landmark localisation with convolutional neural networks. In: IEEE conference on computer vision and pattern recognition
Fourure D, Emonet R, Fromont E, Muselet D, Neverova N, Trĺęmeau A, Wolf C (2017) Multi-task, multi-domain learning: Application to semantic segmentation and pose regression. Neurocomputing 251: 68–80
Article Google Scholar
Gatys LA, Ecker AS, Bethge M (2016) Image style transfer using convolutional neural networks. In: IEEE conference on computer vision and pattern recognition, pp 2414–2423
Gatys LA, Ecker AS, Bethge M, Hertzmann A, Shechtman E (2017) Controlling perceptual factors in neural style transfer. In: IEEE conference on computer vision and pattern recognition, pp 3730–3738
Gkioxari G, Hariharan B, Girshick R, Malik J (2014) R-cnns for pose estimation and action detection. arXiv:1406.5212
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in neural information processing systems, pp 2672–2680
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE Conference on computer vision and pattern recognition, pp 770–778
Hoffman J, Tzeng E, Park T, Zhu JY, Isola P, Saenko K, Efros AA, Darrell T (2017) CyCADA: cycle-consistent adversarial domain adaptation. arXiv:1711.03213
Honari S, Molchanov P, Tyree S, Vincent P, Pal C, Kautz J (2018) Improving landmark localization with semi-supervised learning. In: IEEE conference on computer vision and pattern recognition
Huang X, Liu MY, Belongie S, Kautz J (2018) Multimodal unsupervised image-to-image translation. In: European conference on computer vision
Isola P, Zhu JY, Zhou T, Efros AA (2016) Image-to-image translation with conditional adversarial networks
Jin X, Tan X (2017) Face alignment in-the-wild: a survey. Comput Vis Image Underst 162:1–22
Article Google Scholar
Kai W, An J, Zhao X, Zou J (2018) Accurate landmarking from 3d facial scans by cnn and cascade regression. International Journal of Wavelets, Multiresolution and Information Processing, p 1840 007
Kim T, Cha M, Kim H, Lee JK, Kim J (2017) Learning to discover cross-domain relations with generative adversarial networks. arXiv:1703.05192
Križaj J, Emeršič ž, Dobrišek S, Peer P, Štruc V (2018) Localization of facial landmarks in depth images using gated multiple ridge descent. In: IEEE international work conference on bioinspired intelligence, pp 1–8
Kumar N, Berg AC, Belhumeur PN, Nayar SK (2009) Attribute and simile classifiers for face verification. In: IEEE international conference on computer vision, pp 365–372
Lee H, Tseng HY, Huang JB, Singh M, Yang M (2018) Diverse image-to-image translation via disentangled representations. In: European conference on computer vision
Li M, Huang H, Ma L, Liu W, Zhang T, Jiang Y (2018) Unsupervised image-to-image translation with stacked cycle-consistent adversarial networks. In: European conference on computer vision
Lin J, Xia Y, Qin T, Chen Z, Liu TY (2018) Conditional image-to-image translation. In: IEEE conference on computer vision and pattern recognition
Liu MY, Breuel T, Kautz J (2017) Unsupervised image-to-image translation networks. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems, pp 700–708
Liu Z, Luo P, Wang X, Tang X (2015) Deep learning face attributes in the wild. In: International conference on computer vision
Lv J, Shao X, Xing J, Cheng C, Zhou X (2017) A deep regression architecture with two-stage re-initialization for high performance facial landmark detection. In: IEEE conference on computer vision and pattern recognition
Merget D, Rock M, Rigoll G (2018) Robust facial landmark detection via a fully-convolutional local-global context network. In: IEEE conference on computer vision and pattern recognition
Mirza M, Osindero S (2014) Conditional generative adversarial nets. arXiv:1411.1784
Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: European conference on computer vision, pp 483–499
Pan SJ, Yang Q, et al. (2010) A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22(10):1345–1359
Article Google Scholar
Raina R, Battle A, Lee H, Packer B, Ng AY (2007) Self-taught learning: transfer learning from unlabeled data. In: International conference on machine learning, pp 759–766
Ranjan R, Patel V, Chellappa R (2016) Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. arXiv:1603.01249
Sagonas C, Antonakos E, Tzimiropoulos G, Zafeiriou S, Pantic M (2016) 300 faces in-the-wild challenge: database and results. Image and Vision Computing 47:3–18
Article Google Scholar
Sagonas C, Tzimiropoulos G, Zafeiriou S, Pantic M (2013) 300 faces in-the-wild challenge: the first facial landmark localization challenge. In: IEEE international conference on computer vision workshops, pp 397–403
Sagonas C, Tzimiropoulos G, Zafeiriou S, Pantic M (2013) A semi-automatic methodology for facial landmark annotation. In: IEEE conference on computer vision and pattern recognition workshops, pp 896–903
Taigman Y, Yang M, Ranzato M, Wolf L (2014) Deepface: closing the gap to human-level performance in face verification. In: IEEE conference on computer vision and pattern recognition, pp 1701–1708
Thies J, Zollhofer M, Stamminger M, Theobalt C, Nießner M (2016) Face2face: real-time face capture and reenactment of rgb videos. In: IEEE conference on computer vision and pattern recognition, pp 2387–2395
Vlasic D, Brand M, Pfister H, Popović J (2005) Face transfer with multilinear models. ACM Transactions on Graphics 24(1):426–433
Article Google Scholar
Wang M, Deng W (2018) Deep visual domain adaptation: a survey. Neurocomputing 312:135–153
Article Google Scholar
Wang Y, van de Weijer J, Herranz L (2018) Mix and match networks: encoder-decoder alignment for zero-pair image translation. In: IEEE conference on computer vision and pattern recognition
Weiss K, Khoshgoftaar TM, Wang DD (2016) A survey of transfer learning. Journal of Big Data 3(1):9
Article Google Scholar
Wu W, Qian C, Yang S, Wang Q, Cai Y, Zhou Q (2018) Look at boundary: a boundary-aware face alignment algorithm. In: IEEE conference on computer vision and pattern recognition
Wu Y, Gou C, Ji Q (2017) Simultaneous facial landmark detection, pose and deformation estimation under facial occlusion. In: IEEE conference on computer vision and pattern recognition
Xiong X, De la Torre F (2013) Supervised descent method and its applications to face alignment. In: IEEE conference on computer vision and pattern recognition
Xiong X, Torre FDL (2015) Global supervised descent method. In: IEEE conference on computer vision & pattern recognition, pp 2664–2673
Xu R, Zhou Z, Zhang W, Yu Y (2017) Face transfer with generative adversarial network. arXiv:1710.06090
Yi Z, Hao Z, Gong PTM, Yi Z, Hao Z, Gong PTM, Yi Z, Hao Z, Gong PTM (2017) Dualgan: unsupervised dual learning for image-to-image translation
Zhang Z, Ping L, Chen CL, Tang X (2014) Facial landmark detection by deep multi-task learning. In: European conference on computer vision, pp 94–108
Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z, Du D, Huang C, Torr PH (2015) Conditional random fields as recurrent neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 1529–1537
Zhou J, Chen J, Ye J (2011) Clustered multi-task learning via alternating structure optimization. Advances in Neural Information Processing Systems 2011:702–710
Google Scholar
Zhu JY, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: IEEE international conference on computer vision

Download references

Author information

Authors and Affiliations

School of Informatics, Xiamen University, Xiamen, China
Xiaoli Wang, Yinglin Zheng, Ming Zeng & Xuan Cheng
School of Information, Renmin University, Beijing, China
Wei Lu

Authors

Xiaoli Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yinglin Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Ming Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Xuan Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Wei Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ming Zeng.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by the National Natural Science Foundation of China (No. 61402387, 61702432), the Fundamental Research Funds for Central Universities of China (No. 20720180070, 20720190003), and the International Cooperation Project and the Guiding Project of Fujian Province in China (No. 2018I0016, 2018H0037).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, X., Zheng, Y., Zeng, M. et al. Joint learning for face alignment and face transfer with depth image. Multimed Tools Appl 79, 33993–34010 (2020). https://doi.org/10.1007/s11042-020-08873-y

Download citation

Received: 16 February 2019
Revised: 12 February 2020
Accepted: 19 March 2020
Published: 15 May 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s11042-020-08873-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Joint learning for face alignment and face transfer with depth image

Abstract

Access this article

Similar content being viewed by others

A survey on Image Data Augmentation for Deep Learning

Deepfake: An Overview

A review of object detection based on deep learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Joint learning for face alignment and face transfer with depth image

Abstract

Access this article

Similar content being viewed by others

A survey on Image Data Augmentation for Deep Learning

Deepfake: An Overview

A review of object detection based on deep learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation