Skip to main content
Log in

Joint learning for face alignment and face transfer with depth image

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Face alignment and cross-modal face transfer are two important tasks for automatic face analysis in computer vision. Over the years, they have been extensively studied. Recently, deep neural networks have attracted much research attention for both face alignment and face transfer. With the prevalence of the consumer depth sensor, depth-based face alignment and cross-modal (image and depth) are increasingly important. Different from existing RGB- image based tasks, the main challenge of depth-based tasks is the lack of annotated data. To address the challenge, we observe that these two tasks are closely related and their learning processes may benefit each other. This paper develops a joint multi-task learning algorithm for both depth-based face alignment and face transfer using the deep convolutional neural network (CNN). The proposed approach allows the CNN model to simultaneously share visual knowledge and information between two tasks. We use a dataset of 10,000 face depth images for validation. Our experiments show that the proposed approach outperforms state-of-the-art algorithms. The results also show that learning these two related tasks simultaneously improves the performance of each individual task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. Hereafter, we use domain-to-domain image translation and image-to-image translation as the same meaning of cross-modal face transfer.

References

  1. Almahairi A, Rajeswar S, Sordoni A, Bachman P, Courville A (2018) Augmented cyclegan: learning many-to-many mappings from unpaired data. arXiv:1802.10151

  2. Ben-David S, Blitzer J, Crammer K, Kulesza A, Pereira F, Vaughan JW (2010) A theory of learning from different domains. Machine Learning 79(1-2):151–175

    Article  MathSciNet  Google Scholar 

  3. Ben-David S, Blitzer J, Crammer K, Pereira F (2007) Analysis of representations for domain adaptation. In: Advances in neural information processing systems, pp 137–144

  4. Benaim S, Galanti T, Wolf L (2018) Estimating the success of unsupervised image to image translation. In: European conference on computer vision

  5. Bettadapura V (2012) Face expression recognition and analysis: the state of the art. Computer Science

  6. Borghi G, Venturelli M, Vezzani R, Cucchiara R (2017) Poseidon: face-from-depth for driver pose estimation. In: IEEE conference on computer vision and pattern recognition, pp 5494–5503

  7. Bulat A, Tzimiropoulos G (2017) Binarized convolutional landmark localizers for human pose estimation and face alignment with limited resources. In: IEEE international conference on computer vision, vol 1, p 4

  8. Bulat A, Tzimiropoulos G (2017) How far are we from solving the 2d & 3d face alignment problem? (and a dataset of 230,000 3d facial landmarks). In: International conference on computer vision

  9. Bulat A, Tzimiropoulos G (2018) Super-fan: integrated facial landmark localization and super-resolution of real-world low resolution faces in arbitrary poses with gans. In: IEEE conference on computer vision and pattern recognition, pp 109–117

  10. Chen C, Weng Y, Lin S (2013) Zhou, k.: 3d shape regression for real-time facial animation. Acm Transactions on Graphics 32(4):1

    Google Scholar 

  11. Choi Y, Choi M, Kim M, Ha JW, Kim S, Choo J (2018) Stargan: unified generative adversarial networks for multi-domain image-to-image translation. In: IEEE conference on computer vision and pattern recognition

  12. Dong H, Neekhara P, Wu C, Guo Y (2017) Unsupervised image-to-image translation with generative adversarial networks. arXiv:1701.02676

  13. Dong X, Yu SI, Weng X, Wei SE, Yang Y, Sheikh Y (2018) Supervision-by-registration: an unsupervised approach to improve the precision of facial landmark detectors. In: IEEE conference on computer vision and pattern recognition

  14. Fabbri M, Borghi G, Lanzi F, Vezzani R, Calderara S, Cucchiara R (2018) Domain translation with conditional gans: from depth to rgb face-to-face. In: International conference on pattern recognition

  15. Feng ZH, Kittler J, Awais M, Huber P, Wu XJ (2018) Wing loss for robust facial landmark localisation with convolutional neural networks. In: IEEE conference on computer vision and pattern recognition

  16. Fourure D, Emonet R, Fromont E, Muselet D, Neverova N, Trĺęmeau A, Wolf C (2017) Multi-task, multi-domain learning: Application to semantic segmentation and pose regression. Neurocomputing 251: 68–80

    Article  Google Scholar 

  17. Gatys LA, Ecker AS, Bethge M (2016) Image style transfer using convolutional neural networks. In: IEEE conference on computer vision and pattern recognition, pp 2414–2423

  18. Gatys LA, Ecker AS, Bethge M, Hertzmann A, Shechtman E (2017) Controlling perceptual factors in neural style transfer. In: IEEE conference on computer vision and pattern recognition, pp 3730–3738

  19. Gkioxari G, Hariharan B, Girshick R, Malik J (2014) R-cnns for pose estimation and action detection. arXiv:1406.5212

  20. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in neural information processing systems, pp 2672–2680

  21. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE Conference on computer vision and pattern recognition, pp 770–778

  22. Hoffman J, Tzeng E, Park T, Zhu JY, Isola P, Saenko K, Efros AA, Darrell T (2017) CyCADA: cycle-consistent adversarial domain adaptation. arXiv:1711.03213

  23. Honari S, Molchanov P, Tyree S, Vincent P, Pal C, Kautz J (2018) Improving landmark localization with semi-supervised learning. In: IEEE conference on computer vision and pattern recognition

  24. Huang X, Liu MY, Belongie S, Kautz J (2018) Multimodal unsupervised image-to-image translation. In: European conference on computer vision

  25. Isola P, Zhu JY, Zhou T, Efros AA (2016) Image-to-image translation with conditional adversarial networks

  26. Jin X, Tan X (2017) Face alignment in-the-wild: a survey. Comput Vis Image Underst 162:1–22

    Article  Google Scholar 

  27. Kai W, An J, Zhao X, Zou J (2018) Accurate landmarking from 3d facial scans by cnn and cascade regression. International Journal of Wavelets, Multiresolution and Information Processing, p 1840 007

  28. Kim T, Cha M, Kim H, Lee JK, Kim J (2017) Learning to discover cross-domain relations with generative adversarial networks. arXiv:1703.05192

  29. Križaj J, Emeršič ž, Dobrišek S, Peer P, Štruc V (2018) Localization of facial landmarks in depth images using gated multiple ridge descent. In: IEEE international work conference on bioinspired intelligence, pp 1–8

  30. Kumar N, Berg AC, Belhumeur PN, Nayar SK (2009) Attribute and simile classifiers for face verification. In: IEEE international conference on computer vision, pp 365–372

  31. Lee H, Tseng HY, Huang JB, Singh M, Yang M (2018) Diverse image-to-image translation via disentangled representations. In: European conference on computer vision

  32. Li M, Huang H, Ma L, Liu W, Zhang T, Jiang Y (2018) Unsupervised image-to-image translation with stacked cycle-consistent adversarial networks. In: European conference on computer vision

  33. Lin J, Xia Y, Qin T, Chen Z, Liu TY (2018) Conditional image-to-image translation. In: IEEE conference on computer vision and pattern recognition

  34. Liu MY, Breuel T, Kautz J (2017) Unsupervised image-to-image translation networks. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems, pp 700–708

  35. Liu Z, Luo P, Wang X, Tang X (2015) Deep learning face attributes in the wild. In: International conference on computer vision

  36. Lv J, Shao X, Xing J, Cheng C, Zhou X (2017) A deep regression architecture with two-stage re-initialization for high performance facial landmark detection. In: IEEE conference on computer vision and pattern recognition

  37. Merget D, Rock M, Rigoll G (2018) Robust facial landmark detection via a fully-convolutional local-global context network. In: IEEE conference on computer vision and pattern recognition

  38. Mirza M, Osindero S (2014) Conditional generative adversarial nets. arXiv:1411.1784

  39. Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: European conference on computer vision, pp 483–499

  40. Pan SJ, Yang Q, et al. (2010) A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22(10):1345–1359

    Article  Google Scholar 

  41. Raina R, Battle A, Lee H, Packer B, Ng AY (2007) Self-taught learning: transfer learning from unlabeled data. In: International conference on machine learning, pp 759–766

  42. Ranjan R, Patel V, Chellappa R (2016) Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. arXiv:1603.01249

  43. Sagonas C, Antonakos E, Tzimiropoulos G, Zafeiriou S, Pantic M (2016) 300 faces in-the-wild challenge: database and results. Image and Vision Computing 47:3–18

    Article  Google Scholar 

  44. Sagonas C, Tzimiropoulos G, Zafeiriou S, Pantic M (2013) 300 faces in-the-wild challenge: the first facial landmark localization challenge. In: IEEE international conference on computer vision workshops, pp 397–403

  45. Sagonas C, Tzimiropoulos G, Zafeiriou S, Pantic M (2013) A semi-automatic methodology for facial landmark annotation. In: IEEE conference on computer vision and pattern recognition workshops, pp 896–903

  46. Taigman Y, Yang M, Ranzato M, Wolf L (2014) Deepface: closing the gap to human-level performance in face verification. In: IEEE conference on computer vision and pattern recognition, pp 1701–1708

  47. Thies J, Zollhofer M, Stamminger M, Theobalt C, Nießner M (2016) Face2face: real-time face capture and reenactment of rgb videos. In: IEEE conference on computer vision and pattern recognition, pp 2387–2395

  48. Vlasic D, Brand M, Pfister H, Popović J (2005) Face transfer with multilinear models. ACM Transactions on Graphics 24(1):426–433

    Article  Google Scholar 

  49. Wang M, Deng W (2018) Deep visual domain adaptation: a survey. Neurocomputing 312:135–153

    Article  Google Scholar 

  50. Wang Y, van de Weijer J, Herranz L (2018) Mix and match networks: encoder-decoder alignment for zero-pair image translation. In: IEEE conference on computer vision and pattern recognition

  51. Weiss K, Khoshgoftaar TM, Wang DD (2016) A survey of transfer learning. Journal of Big Data 3(1):9

    Article  Google Scholar 

  52. Wu W, Qian C, Yang S, Wang Q, Cai Y, Zhou Q (2018) Look at boundary: a boundary-aware face alignment algorithm. In: IEEE conference on computer vision and pattern recognition

  53. Wu Y, Gou C, Ji Q (2017) Simultaneous facial landmark detection, pose and deformation estimation under facial occlusion. In: IEEE conference on computer vision and pattern recognition

  54. Xiong X, De la Torre F (2013) Supervised descent method and its applications to face alignment. In: IEEE conference on computer vision and pattern recognition

  55. Xiong X, Torre FDL (2015) Global supervised descent method. In: IEEE conference on computer vision & pattern recognition, pp 2664–2673

  56. Xu R, Zhou Z, Zhang W, Yu Y (2017) Face transfer with generative adversarial network. arXiv:1710.06090

  57. Yi Z, Hao Z, Gong PTM, Yi Z, Hao Z, Gong PTM, Yi Z, Hao Z, Gong PTM (2017) Dualgan: unsupervised dual learning for image-to-image translation

  58. Zhang Z, Ping L, Chen CL, Tang X (2014) Facial landmark detection by deep multi-task learning. In: European conference on computer vision, pp 94–108

  59. Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z, Du D, Huang C, Torr PH (2015) Conditional random fields as recurrent neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 1529–1537

  60. Zhou J, Chen J, Ye J (2011) Clustered multi-task learning via alternating structure optimization. Advances in Neural Information Processing Systems 2011:702–710

    Google Scholar 

  61. Zhu JY, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: IEEE international conference on computer vision

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ming Zeng.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by the National Natural Science Foundation of China (No. 61402387, 61702432), the Fundamental Research Funds for Central Universities of China (No. 20720180070, 20720190003), and the International Cooperation Project and the Guiding Project of Fujian Province in China (No. 2018I0016, 2018H0037).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, X., Zheng, Y., Zeng, M. et al. Joint learning for face alignment and face transfer with depth image. Multimed Tools Appl 79, 33993–34010 (2020). https://doi.org/10.1007/s11042-020-08873-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-08873-y

Keywords

Navigation