Skip to main content
Log in

Deep learning for face image synthesis and semantic manipulations: a review and future perspectives

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Image synthesis using representations learned by deep neural networks has gained wide attention in recent years. Among the different categories of natural images, face images are very important because of their broad range of applications. However, it is very challenging to synthesize face images due to their highly complicated hierarchical structure and the uniqueness of information contained in individual face images. This paper aims at providing a comprehensive review of the recent developments and applications of face synthesis and semantic manipulations using deep learning and discusses future perspectives for improving face perception.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Reprinted from Zhmoginov and Sandler (2016) with permission

Fig. 3

Reproduced using DFI source code (deepfeatinterp 2017) with permission

Fig. 4

Reproduced and adapted using Deep Image Analogy source code (msracver 2018) copyrighted © 2018 MSRA CVer under a MIT license

Fig. 5

Reprinted from Cheung et al. (2014) with permission

Fig. 6
Fig. 7
Fig. 8
Fig. 9

Reprinted from Progressive Growing of GANs source code (tkarras 2018) copyrighted © 2018, NVIDIA CORPORATION under a Creative Commons licence (Creative Commons disclaims all liability for damages resulting from their use to the fullest extent possible)

Fig. 10

Reproduced using StyleGAN source code (NVlabs 2019) copyrighted © 2019, NVIDIA CORPORATION under a Creative Commons licence (Creative Commons disclaims all liability for damages resulting from their use to the fullest extent possible)

Fig. 11
Fig. 12
Fig. 13
Fig. 14

Reprinted from Berthelot et al. (2017) with permission

Fig. 15
Fig. 16

Similar content being viewed by others

References

  • Antipov G, Baccouche M, Dugelay JL (2017) Face aging with conditional generative adversarial networks. In: 2017 IEEE international conference on image processing (ICIP). IEEE, pp 2089–2093

  • Berthelot D, Schumm T, Metz L (2017) Began: boundary equilibrium generative adversarial networks. arXiv:170310717

  • Brock A, Lim T, Ritchie JM, Weston N (2016) Neural photo editing with introspective adversarial networks. arXiv:160907093

  • Bruce V, Young A (1986) Understanding face recognition. Br J Psychol 77(3):305–327

    Article  Google Scholar 

  • Brundage M, Avin S, Clark J, Toner H, Eckersley P, Garfinkel B, Dafoe A, Scharre P, Zeitzoff T, Filar B et al (2018) The malicious use of artificial intelligence: forecasting, prevention, and mitigation. arXiv:180207228

  • Chen X, Duan Y, Houthooft R, Schulman J, Sutskever I, Abbeel P (2016) Infogan: interpretable representation learning by information maximizing generative adversarial nets. In: Advances in neural information processing systems, pp 2172–2180

  • Cheung B, Livezey JA, Bansal AK, Olshausen BA (2014) Discovering hidden factors of variation in deep networks. arXiv:14126583

  • Cole F, Belanger D, Krishnan D, Sarna A, Mosseri I, Freeman WT (2017) Synthesizing normalized faces from facial identity features. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3703–3712

  • Dahl R, Norouzi M, Shlens J (2017) Pixel recursive super resolution. In: Proceedings of the IEEE international conference on computer vision, pp 5439–5448

  • deepfakes (2019) deepfakes/faceswap: deepfakes software for all. https://github.com/deepfakes/faceswap. Accessed 18 Apr 2020

  • deepfeatinterp (2017) paulu/deepfeatinterp: deep feature interpolation (cvpr 2017). https://github.com/paulu/deepfeatinterp

  • Dinh L, Krueger D, Bengio Y (2014) Nice: non-linear independent components estimation. arXiv:14108516

  • Dinh L, Sohl-Dickstein J, Bengio S (2016) Density estimation using real nvp. arXiv:160508803

  • Dumoulin V, Belghazi I, Poole B, Mastropietro O, Lamb A, Arjovsky M, Courville A (2016) Adversarially learned inference. arXiv:160600704

  • Edwards H, Storkey A (2016) Towards a neural statistician. arXiv:160602185

  • Fei-Fei L, Fergus R, Perona P (2006) One-shot learning of object categories. IEEE Trans Pattern Anal Mach Intell 28(4):594–611

    Article  Google Scholar 

  • Frigo O, Sabater N, Delon J, Hellier P (2016) Split and match: example-based adaptive patch sampling for unsupervised style transfer. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 553–561

  • Frigo O, Sabater N, Delon J, Hellier P (2019) Video style transfer by consistent adaptive patch sampling. Vis Comput 35(3):429–443

    Article  Google Scholar 

  • Gatys LA, Ecker AS, Bethge M (2015) A neural algorithm of artistic style. arXiv:150806576

  • Gatys LA, Ecker AS, Bethge M (2016) Image style transfer using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2414–2423

  • Gauthier J (2014) Conditional generative adversarial nets for convolutional face generation. Class Proj Stanf CS231N Convolutional Neural Netw Vis Recognit Winter Semester 2014(5):2

    Google Scholar 

  • glow (2018) Glow: better reversible generative models. https://openai.com/blog/glow/. Accessed 18 Apr 2020

  • Goodfellow (2019) Ian goodfellow on twitter: “4.5 years of gan progress on face generation. https://t.co/kiqkuyulmc, https://t.co/s4absu536b, https://t.co/8di6k6bxvc, https://t.co/uefhewds2m, https://t.co/s6hkqz9glz... https://t.co/bqyv6zgftb”. https://twitter.com/goodfellow_ian/status/1084973596236144640?lang=en

  • Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680

  • Gross R, Matthews I, Cohn J, Kanade T, Baker S (2010) Multi-pie. Image Vis Comput 28(5):807–813

    Article  Google Scholar 

  • Haxby JV, Hoffman EA, Gobbini MI (2000) The distributed human neural system for face perception. Trends Cogn Sci 4(6):223–233

    Article  Google Scholar 

  • Haxby JV, Hoffman EA, Gobbini MI (2002) Human neural systems for face recognition and social communication. Biol Psychiatry 51(1):59–67

    Article  Google Scholar 

  • Hinton (2018) Ovs — what’s wrong with convolutional nets? — video detail. https://techtv.mit.edu/videos/782615f9abc64fbbafb5d0d3c3387392/. Accessed 18 Apr 2020

  • Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507

    Article  MathSciNet  Google Scholar 

  • Huang GB, Ramesh M, Berg T, Learned-Miller E (2007) Labeled faces in the wild: a database for studying face recognition in unconstrained environments. University of Massachusetts, Amherst, Tech Rep 07-49

  • Inceptionism (2015) Google ai blog: Inceptionism: going deeper into neural networks. https://ai.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html

  • Isola P, Zhu JY, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1125–1134

  • Karras T, Aila T, Laine S, Lehtinen J (2017) Progressive growing of gans for improved quality, stability, and variation. arXiv:171010196

  • Karras T, Laine S, Aila T (2019) A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4401–4410

  • Kaur P, Zhang H, Dana KJ (2017) Photo-realistic facial texture transfer. arXiv:170604306

  • Kim T, Cha M, Kim H, Lee JK, Kim J (2017) Learning to discover cross-domain relations with generative adversarial networks. In: Proceedings of the 34th international conference on machine learning-volume 70, JMLR. org, pp 1857–1865

  • Kim H, Carrido P, Tewari A, Xu W, Thies J, Niessner M, Pérez P, Richardt C, Zollhöfer M, Theobalt C (2018) Deep video portraits. ACM Trans Graph (TOG) 37(4):163

    Google Scholar 

  • Kingma DP, Dhariwal P (2018) Glow: generative flow with invertible 1 x 1 convolutions. In: Advances in neural information processing systems, pp 10215–10224

  • Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv:13126114

  • Korshunova I, Shi W, Dambre J, Theis L (2017) Fast face-swap using convolutional neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 3677–3685

  • Lake BM, Salakhutdinov R, Tenenbaum JB (2015) Human-level concept learning through probabilistic program induction. Science 350(6266):1332–1338

    Article  MathSciNet  Google Scholar 

  • Lake BM, Ullman TD, Tenenbaum JB, Gershman SJ (2017) Building machines that learn and think like people. Behav Brain Sci 40:e253

    Article  Google Scholar 

  • Larsen ABL, Sønderby SK, Larochelle H, Winther O (2015) Autoencoding beyond pixels using a learned similarity metric. arXiv:151209300

  • Learned-Miller GBHE (2014) Labeled faces in the wild: updates and new reporting procedures. University of Massachusetts, Amherst, Tech Rep UM-CS-2014-003

  • LeCun (2015) What’s wrong with deep learning?—techtalks.tv. http://techtalks.tv/talks/whats-wrong-with-deep-learning/61639/

  • LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436

    Article  Google Scholar 

  • Li C, Wand M (2016) Combining Markov random fields and convolutional neural networks for image synthesis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2479–2486

  • Li X, Liu S, Kautz J, Yang MH (2019) Learning linear transformations for fast image and video style transfer. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3809–3817

  • Liao J, Yao Y, Yuan L, Hua G, Kang SB (2017) Visual attribute transfer through deep image analogy. arXiv:170501088

  • Liu MY, Tuzel O (2016) Coupled generative adversarial networks. In: Advances in neural information processing systems, pp 469–477

  • Liu Z, Luo P, Wang X, Tang X (2015) Deep learning face attributes in the wild. In: Proceedings of international conference on computer vision (ICCV)

  • Liu S, Ou X, Qian R, Wang W, Cao X (2016) Makeup like a superstar: deep localized makeup transfer network. arXiv:160407102

  • Liu MY, Breuel T, Kautz J (2017) Unsupervised image-to-image translation networks. In: Advances in neural information processing systems, pp 700–708

  • Mirza M, Osindero S (2014) Conditional generative adversarial nets. arXiv:14111784

  • msracver (2018) msracver/deep-image-analogy: the source code of ’visual attribute transfer through deep image analogy’. https://github.com/msracver/Deep-Image-Analogy

  • Natsume R, Yatagawa T, Morishima S (2018) Rsgan: face swapping and editing using face and hair representation in latent spaces. arXiv:180403447

  • Nirkin Y, Keller Y, Hassner T (2019) Fsgan: subject agnostic face swapping and reenactment. In: Proceedings of the IEEE international conference on computer vision, pp 7184–7193

  • NVlabs (2019) Nvlabs/stylegan: stylegan—official tensorflow implementation. https://github.com/NVlabs/stylegan

  • Perarnau G, Van De Weijer J, Raducanu B, Álvarez JM (2016) Invertible conditional gans for image editing. arXiv:161106355

  • Pumarola A, Agudo A, Martinez AM, Sanfeliu A, Moreno-Noguer F (2018) Ganimation: anatomically-aware facial animation from a single image. In: Proceedings of the European conference on computer vision (ECCV), pp 818–833

  • Radford A, Metz L, Chintala S (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv:151106434

  • Research F (2016) Experiments about face and voice perception. http://faceresearch.org. Accessed 18 Apr 2020

  • Rezende DJ, Mohamed S (2015) Variational inference with normalizing flows. arXiv:150505770

  • Rezende DJ, Mohamed S, Wierstra D (2014) Stochastic backpropagation and approximate inference in deep generative models. arXiv:14014082

  • Rossion B (2014) Understanding face perception by means of human electrophysiology. Trends Cogn Sci 18(6):310–318

    Article  Google Scholar 

  • Royer A, Bousmalis K, Gouws S, Bertsch F, Mosseri I, Cole F, Murphy K (2017) Xgan: unsupervised image-to-image translation for many-to-many mappings. arXiv:171105139

  • Ruder M, Dosovitskiy A, Brox T (2016) Artistic style transfer for videos. In: German conference on pattern recognition. Springer, pp 26–36

  • Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252

    Article  MathSciNet  Google Scholar 

  • Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. In: Advances in neural information processing systems, pp 3856–3866

  • Salimans T, Karpathy A, Chen X, Kingma DP (2017) Pixelcnn++: improving the pixelcnn with discretized logistic mixture likelihood and other modifications. arXiv:170105517

  • Sanchez E, Valstar M (2018) Triple consistency loss for pairing distributions in gan-based face synthesis. arXiv:181103492

  • Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 815–823

  • Selim A, Elgharib M, Doyle L (2016) Painting style transfer for head portraits using convolutional neural networks. ACM Trans Graph (ToG) 35(4):129

    Article  Google Scholar 

  • Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:14091556

  • Susskind JM, Anderson AK, Hinton GE (2010) The Toronto face database. Department of Computer Science, University of Toronto, Toronto, ON, Canada, Tech Rep 3

  • Suwajanakorn S, Seitz SM, Kemelmacher-Shlizerman I (2017) Synthesizing obama: learning lip sync from audio. ACM Trans Graph (TOG) 36(4):95

    Article  Google Scholar 

  • ThisPersonDoesNotExist (2019) This person does not exist. https://www.thispersondoesnotexist.com/. Accessed 18 Apr 2020

  • tkarras (2018) tkarras/progressive growing of gans for improved quality, stability, and variation. https://github.com/tkarras/progressive_growing_of_gans

  • Tran L, Yin X, Liu X (2017) Disentangled representation learning gan for pose-invariant face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1415–1424

  • Tran LQ, Yin X, Liu X (2018) Representation learning by rotating your faces. IEEE Trans Pattern Anal Mach Intell 41(12):3007–3021

    Article  Google Scholar 

  • Tsao DY, Livingstone MS (2008) Mechanisms of face perception. Annu Rev Neurosci 31:411–437

    Article  Google Scholar 

  • Upchurch P, Gardner J, Pleiss G, Pless R, Snavely N, Bala K, Weinberger K (2017) Deep feature interpolation for image content changes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7064–7073

  • Van den Oord A, Kalchbrenner N, Espeholt L, Vinyals O, Graves A et al (2016a) Conditional image generation with pixelcnn decoders. In: Advances in neural information processing systems, pp 4790–4798

  • Van den Oord A, Kalchbrenner N, Kavukcuoglu K (2016b) Pixel recurrent neural networks. arXiv:160106759

  • VanRullen R (2017) Perception science in the age of deep neural networks. Front Psychol 8:142

    Article  Google Scholar 

  • Vinyals O, Blundell C, Lillicrap T, Wierstra D et al (2016) Matching networks for one shot learning. In: Advances in neural information processing systems, pp 3630–3638

  • Wang TC, Liu MY, Zhu JY, Tao A, Kautz J, Catanzaro B (2018) High-resolution image synthesis and semantic manipulation with conditional gans. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8798–8807

  • Wolf L, Hassner T, Maoz I (2011) Face recognition in unconstrained videos with matched background similarity. IEEE

  • Wu W, Zhang Y, Li C, Qian C, Change Loy C (2018) Reenactgan: learning to reenact faces via boundary transfer. In: Proceedings of the European conference on computer vision (ECCV), pp 603–619

  • Yan Z, Zhou XS (2017) How intelligent are convolutional neural networks?. arXiv:170906126

  • Yi Z, Zhang H, Tan P, Gong M (2017) Dualgan: unsupervised dual learning for image-to-image translation. In: Proceedings of the IEEE international conference on computer vision, pp 2849–2857

  • Zhmoginov A, Sandler M (2016) Inverting face embeddings with convolutional neural networks. arXiv:160604189

  • Zhu JY, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp 2223–2232

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mahla Abdolahnejad.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abdolahnejad, M., Liu, P.X. Deep learning for face image synthesis and semantic manipulations: a review and future perspectives. Artif Intell Rev 53, 5847–5880 (2020). https://doi.org/10.1007/s10462-020-09835-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-020-09835-4

Keywords

Navigation