Abstract
Face alignment and cross-modal face transfer are two important tasks for automatic face analysis in computer vision. Over the years, they have been extensively studied. Recently, deep neural networks have attracted much research attention for both face alignment and face transfer. With the prevalence of the consumer depth sensor, depth-based face alignment and cross-modal (image and depth) are increasingly important. Different from existing RGB- image based tasks, the main challenge of depth-based tasks is the lack of annotated data. To address the challenge, we observe that these two tasks are closely related and their learning processes may benefit each other. This paper develops a joint multi-task learning algorithm for both depth-based face alignment and face transfer using the deep convolutional neural network (CNN). The proposed approach allows the CNN model to simultaneously share visual knowledge and information between two tasks. We use a dataset of 10,000 face depth images for validation. Our experiments show that the proposed approach outperforms state-of-the-art algorithms. The results also show that learning these two related tasks simultaneously improves the performance of each individual task.
Similar content being viewed by others
Notes
Hereafter, we use domain-to-domain image translation and image-to-image translation as the same meaning of cross-modal face transfer.
References
Almahairi A, Rajeswar S, Sordoni A, Bachman P, Courville A (2018) Augmented cyclegan: learning many-to-many mappings from unpaired data. arXiv:1802.10151
Ben-David S, Blitzer J, Crammer K, Kulesza A, Pereira F, Vaughan JW (2010) A theory of learning from different domains. Machine Learning 79(1-2):151–175
Ben-David S, Blitzer J, Crammer K, Pereira F (2007) Analysis of representations for domain adaptation. In: Advances in neural information processing systems, pp 137–144
Benaim S, Galanti T, Wolf L (2018) Estimating the success of unsupervised image to image translation. In: European conference on computer vision
Bettadapura V (2012) Face expression recognition and analysis: the state of the art. Computer Science
Borghi G, Venturelli M, Vezzani R, Cucchiara R (2017) Poseidon: face-from-depth for driver pose estimation. In: IEEE conference on computer vision and pattern recognition, pp 5494–5503
Bulat A, Tzimiropoulos G (2017) Binarized convolutional landmark localizers for human pose estimation and face alignment with limited resources. In: IEEE international conference on computer vision, vol 1, p 4
Bulat A, Tzimiropoulos G (2017) How far are we from solving the 2d & 3d face alignment problem? (and a dataset of 230,000 3d facial landmarks). In: International conference on computer vision
Bulat A, Tzimiropoulos G (2018) Super-fan: integrated facial landmark localization and super-resolution of real-world low resolution faces in arbitrary poses with gans. In: IEEE conference on computer vision and pattern recognition, pp 109–117
Chen C, Weng Y, Lin S (2013) Zhou, k.: 3d shape regression for real-time facial animation. Acm Transactions on Graphics 32(4):1
Choi Y, Choi M, Kim M, Ha JW, Kim S, Choo J (2018) Stargan: unified generative adversarial networks for multi-domain image-to-image translation. In: IEEE conference on computer vision and pattern recognition
Dong H, Neekhara P, Wu C, Guo Y (2017) Unsupervised image-to-image translation with generative adversarial networks. arXiv:1701.02676
Dong X, Yu SI, Weng X, Wei SE, Yang Y, Sheikh Y (2018) Supervision-by-registration: an unsupervised approach to improve the precision of facial landmark detectors. In: IEEE conference on computer vision and pattern recognition
Fabbri M, Borghi G, Lanzi F, Vezzani R, Calderara S, Cucchiara R (2018) Domain translation with conditional gans: from depth to rgb face-to-face. In: International conference on pattern recognition
Feng ZH, Kittler J, Awais M, Huber P, Wu XJ (2018) Wing loss for robust facial landmark localisation with convolutional neural networks. In: IEEE conference on computer vision and pattern recognition
Fourure D, Emonet R, Fromont E, Muselet D, Neverova N, Trĺęmeau A, Wolf C (2017) Multi-task, multi-domain learning: Application to semantic segmentation and pose regression. Neurocomputing 251: 68–80
Gatys LA, Ecker AS, Bethge M (2016) Image style transfer using convolutional neural networks. In: IEEE conference on computer vision and pattern recognition, pp 2414–2423
Gatys LA, Ecker AS, Bethge M, Hertzmann A, Shechtman E (2017) Controlling perceptual factors in neural style transfer. In: IEEE conference on computer vision and pattern recognition, pp 3730–3738
Gkioxari G, Hariharan B, Girshick R, Malik J (2014) R-cnns for pose estimation and action detection. arXiv:1406.5212
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in neural information processing systems, pp 2672–2680
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE Conference on computer vision and pattern recognition, pp 770–778
Hoffman J, Tzeng E, Park T, Zhu JY, Isola P, Saenko K, Efros AA, Darrell T (2017) CyCADA: cycle-consistent adversarial domain adaptation. arXiv:1711.03213
Honari S, Molchanov P, Tyree S, Vincent P, Pal C, Kautz J (2018) Improving landmark localization with semi-supervised learning. In: IEEE conference on computer vision and pattern recognition
Huang X, Liu MY, Belongie S, Kautz J (2018) Multimodal unsupervised image-to-image translation. In: European conference on computer vision
Isola P, Zhu JY, Zhou T, Efros AA (2016) Image-to-image translation with conditional adversarial networks
Jin X, Tan X (2017) Face alignment in-the-wild: a survey. Comput Vis Image Underst 162:1–22
Kai W, An J, Zhao X, Zou J (2018) Accurate landmarking from 3d facial scans by cnn and cascade regression. International Journal of Wavelets, Multiresolution and Information Processing, p 1840 007
Kim T, Cha M, Kim H, Lee JK, Kim J (2017) Learning to discover cross-domain relations with generative adversarial networks. arXiv:1703.05192
Križaj J, Emeršič ž, Dobrišek S, Peer P, Štruc V (2018) Localization of facial landmarks in depth images using gated multiple ridge descent. In: IEEE international work conference on bioinspired intelligence, pp 1–8
Kumar N, Berg AC, Belhumeur PN, Nayar SK (2009) Attribute and simile classifiers for face verification. In: IEEE international conference on computer vision, pp 365–372
Lee H, Tseng HY, Huang JB, Singh M, Yang M (2018) Diverse image-to-image translation via disentangled representations. In: European conference on computer vision
Li M, Huang H, Ma L, Liu W, Zhang T, Jiang Y (2018) Unsupervised image-to-image translation with stacked cycle-consistent adversarial networks. In: European conference on computer vision
Lin J, Xia Y, Qin T, Chen Z, Liu TY (2018) Conditional image-to-image translation. In: IEEE conference on computer vision and pattern recognition
Liu MY, Breuel T, Kautz J (2017) Unsupervised image-to-image translation networks. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems, pp 700–708
Liu Z, Luo P, Wang X, Tang X (2015) Deep learning face attributes in the wild. In: International conference on computer vision
Lv J, Shao X, Xing J, Cheng C, Zhou X (2017) A deep regression architecture with two-stage re-initialization for high performance facial landmark detection. In: IEEE conference on computer vision and pattern recognition
Merget D, Rock M, Rigoll G (2018) Robust facial landmark detection via a fully-convolutional local-global context network. In: IEEE conference on computer vision and pattern recognition
Mirza M, Osindero S (2014) Conditional generative adversarial nets. arXiv:1411.1784
Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: European conference on computer vision, pp 483–499
Pan SJ, Yang Q, et al. (2010) A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22(10):1345–1359
Raina R, Battle A, Lee H, Packer B, Ng AY (2007) Self-taught learning: transfer learning from unlabeled data. In: International conference on machine learning, pp 759–766
Ranjan R, Patel V, Chellappa R (2016) Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. arXiv:1603.01249
Sagonas C, Antonakos E, Tzimiropoulos G, Zafeiriou S, Pantic M (2016) 300 faces in-the-wild challenge: database and results. Image and Vision Computing 47:3–18
Sagonas C, Tzimiropoulos G, Zafeiriou S, Pantic M (2013) 300 faces in-the-wild challenge: the first facial landmark localization challenge. In: IEEE international conference on computer vision workshops, pp 397–403
Sagonas C, Tzimiropoulos G, Zafeiriou S, Pantic M (2013) A semi-automatic methodology for facial landmark annotation. In: IEEE conference on computer vision and pattern recognition workshops, pp 896–903
Taigman Y, Yang M, Ranzato M, Wolf L (2014) Deepface: closing the gap to human-level performance in face verification. In: IEEE conference on computer vision and pattern recognition, pp 1701–1708
Thies J, Zollhofer M, Stamminger M, Theobalt C, Nießner M (2016) Face2face: real-time face capture and reenactment of rgb videos. In: IEEE conference on computer vision and pattern recognition, pp 2387–2395
Vlasic D, Brand M, Pfister H, Popović J (2005) Face transfer with multilinear models. ACM Transactions on Graphics 24(1):426–433
Wang M, Deng W (2018) Deep visual domain adaptation: a survey. Neurocomputing 312:135–153
Wang Y, van de Weijer J, Herranz L (2018) Mix and match networks: encoder-decoder alignment for zero-pair image translation. In: IEEE conference on computer vision and pattern recognition
Weiss K, Khoshgoftaar TM, Wang DD (2016) A survey of transfer learning. Journal of Big Data 3(1):9
Wu W, Qian C, Yang S, Wang Q, Cai Y, Zhou Q (2018) Look at boundary: a boundary-aware face alignment algorithm. In: IEEE conference on computer vision and pattern recognition
Wu Y, Gou C, Ji Q (2017) Simultaneous facial landmark detection, pose and deformation estimation under facial occlusion. In: IEEE conference on computer vision and pattern recognition
Xiong X, De la Torre F (2013) Supervised descent method and its applications to face alignment. In: IEEE conference on computer vision and pattern recognition
Xiong X, Torre FDL (2015) Global supervised descent method. In: IEEE conference on computer vision & pattern recognition, pp 2664–2673
Xu R, Zhou Z, Zhang W, Yu Y (2017) Face transfer with generative adversarial network. arXiv:1710.06090
Yi Z, Hao Z, Gong PTM, Yi Z, Hao Z, Gong PTM, Yi Z, Hao Z, Gong PTM (2017) Dualgan: unsupervised dual learning for image-to-image translation
Zhang Z, Ping L, Chen CL, Tang X (2014) Facial landmark detection by deep multi-task learning. In: European conference on computer vision, pp 94–108
Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z, Du D, Huang C, Torr PH (2015) Conditional random fields as recurrent neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 1529–1537
Zhou J, Chen J, Ye J (2011) Clustered multi-task learning via alternating structure optimization. Advances in Neural Information Processing Systems 2011:702–710
Zhu JY, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: IEEE international conference on computer vision
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported by the National Natural Science Foundation of China (No. 61402387, 61702432), the Fundamental Research Funds for Central Universities of China (No. 20720180070, 20720190003), and the International Cooperation Project and the Guiding Project of Fujian Province in China (No. 2018I0016, 2018H0037).
Rights and permissions
About this article
Cite this article
Wang, X., Zheng, Y., Zeng, M. et al. Joint learning for face alignment and face transfer with depth image. Multimed Tools Appl 79, 33993–34010 (2020). https://doi.org/10.1007/s11042-020-08873-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-08873-y