2020 Special IssueGenerating photo-realistic training data to improve face recognition accuracy
Introduction
Recent progress in machine learning has made possible the development of face recognition systems that can match face images as good as or better than humans. For this reason, these systems have become a valuable tool in forensics and security. However, state-of-the-art face recognition systems based on convolutional neural networks (CNNs) need to be trained with very large datasets of face images. In this work we aim to reduce the data requirement of face recognition systems by synthesising artificial face images.
Image synthesis is a widely studied topic in computer vision. In particular, face image synthesis has gained a lot of attention because of its diverse practical applications. These include facial image editing (Antipov et al., 2017, Brock et al., 2017, Choi et al., 2018, Lample et al., 2017, Larsen et al., 2016, Perarnau et al., 2016, Shu et al., 2017, Yan et al., 2016, Zhang et al., 2017), face de-identification (Brkic et al., 2017, Meden et al., 2018, Meden et al., 2017, Wu et al., 2019), data augmentation (Banerjee et al., 2017, Kortylewski et al., 2018, Masi et al., 2019, Masi et al., 2016, Mokhayeri et al., 2018, Osadchy et al., 2017, Zhao et al., 2019), face frontalisation (Hassner et al., 2015, Huang et al., 2017, Tran et al., 2019, Yim et al., 2015, Zhu et al., 2015, Zhu et al., 2013, Zhu et al., 2014) and artistic applications (e.g. video games and advertisements).
In this work, we focus on the applicability of face image synthesis for data augmentation. It is widely known that training data is one of the most important factors that affect the accuracy of deep learning models. The datasets used for training need to be large and contain sufficient variation to allow the resulting models to learn features that generalise well to unseen samples. In the case of face recognition, the datasets must contain many different subjects, as well as many different images per subject. The first requirement enables a model to learn inter-class discriminative features that can generalise to subjects not in the training set. The second requirement enables a model to learn features that are robust to intra-class variations. Even though there are several public large-scale datasets (Bansal et al., 2017, Cao et al., 2018, Guo et al., 2016, Nech and Kemelmacher-Shlizerman, 2017, Parkhi et al., 2015, Sun et al., 2014, Yi et al., 2014) that can be used to train CNN-based face recognition models, these datasets are nowhere near the size or quality of commercial datasets. For example, the largest publicly available dataset contains about 10M images of 100K different subjects (Guo et al. 2016), whereas Google’s FaceNet (Schroff et al. 2015) was trained with a private dataset containing between 100M and 200M face images of about 8M different subjects. Another issue is the presence of long-tail distributions in some publicly available datasets, i.e. datasets in which there are many subjects with very few images. Such unbalanced datasets can make the training process difficult and result in models that achieve lower accuracy than those trained with smaller but balanced datasets (Zhao et al., 2019, Zhou et al., 2015). In addition, some publicly available datasets (e.g. Guo et al. 2016) contain many mislabelled samples that can decrease face recognition accuracy if not discarded from the training set. Since collecting large-scale, good quality face datasets is a very expensive and labour-intensive task, we propose a method for generating photo-realistic face images that can be used to effectively increase the depth (number of images per subject) and width (number of subjects) of existing face datasets.
An approach that has recently gained popularity for augmenting face datasets is the use of 3D morphable models (Blanz and Vetter 2003). In this approach, new faces of existing subjects can be synthetised by fitting a 3D morphable model to existing images and modifying a variety of parameters to generate new poses and expressions (Masi et al., 2019, Masi et al., 2016, Mokhayeri et al., 2018, Zhao et al., 2019). It is also possible to generate images with other variations using this approach. For example, Mokhayeri et al. (2018) incorporated a reflectance model to generate images under different lighting conditions; and Kortylewski et al. (2018) randomly sampled 3D face shapes and colours to generate faces of new subjects. The main drawback of methods based on 3D morphable models is that the generated images often look unnatural and lack the level of detail found in real images. Another recent approach based on blending small triangular regions from different training images was proposed in Banerjee et al. (2017). Although this method seemed to produce photo-realistic faces, the authors limited their work to frontal face images. In contrast, our approach makes use of generative adversarial networks (GANs) (Goodfellow et al., 2014, Schmidhuber, 2020), which have recently been shown to produce photo-realistic in-the-wild images often indistinguishable from real images (Karras et al. 2018). Another advantage of using GANs is that they are end-to-end trainable models that do not require any domain-specific processing, as opposed to methods based on 3D modelling or face triangulation.
Many methods based on GANs have been proposed for manipulating attributes of existing face images, including age (Antipov et al., 2017, Yan et al., 2016, Zhang et al., 2017), facial expressions (Choi et al., 2018, Ding et al., 2018, Yan et al., 2016, Zhou and Shi, 2017), and other attributes such as hairstyle, glasses, makeup, facial hair, skin colour or gender (Brock et al., 2017, Choi et al., 2018, He et al., 2017, Lample et al., 2017, Larsen et al., 2016, Lu et al., 2018, Perarnau et al., 2016, Shen and Liu, 2017, Shu et al., 2017). While these methods can be used to increase the depth of a dataset, it remains unclear how to increase the width of a dataset, i.e. how to generate faces of new subjects. Our proposed GAN is able to generate faces from a latent representation that has two Gaussian distributed components and encoding identity-related attributes and non-identity-related attributes respectively. In this way, face images of new subjects can be generated by fixing the identity component and varying the non-identity component . The method most closely related to ours is the semantically decomposed GAN (SD-GAN) proposed in Donahue et al. (2018). SD-GANs are trained with pairs of real images from the same subject and pairs of images generated with the same identity-related attributes but different non-identity-related attributes. A discriminator learns to reject pairs of images when either they do not look photo-realistic or when they do not appear to belong to the same subject. One of the main differences of our method with respect to SD-GANs is that it allows the generation of face images of subjects that exist in the training set. In other words, our method can increase both the width and the depth of a given face dataset. Furthermore, our proposed GAN is arguably simpler to implement than SD-GAN and easier to incorporate into other GAN architectures.
To demonstrate the efficacy of our method, we trained several CNN-based face recognition models with different combinations of real and synthetic data. In most cases, the models trained with a combination of real and synthetic data outperformed the models trained with real data alone.
Our main contributions can be summarised as:
- •
A novel face image synthesis method based on GANs that allows the disentangling of identity-related attributes from non-identity-related attributes.
- •
A data augmentation approach that uses the proposed GAN to increase the depth and width of existing face datasets.
- •
A experimental demonstration that the proposed data augmentation approach can be used to increase the accuracy of a face recognition algorithm trained with real images alone.
The rest of this paper is organised as follows: Section 2 provides the background needed to understand our proposed GAN. Section 3 explains each part of our proposed GAN and the loss functions used for training. Section 4 discusses our experimental results, both in terms of the quality of the synthetic images generated by our proposed GAN and the accuracy achieved by datasets augmented with the synthetic images. Finally, our conclusions are presented in Section 5.
Section snippets
Background
Generative adversarial networks (GANs) generate data by sampling from a probability distribution that is trained to match a true data generating distribution . This is done by mapping a vector of random latent variables to a sample through a generator network , where is a prior distribution that can be easily sampled (e.g. Gaussian or uniform). The generator is trained to fool a discriminator network that tries to determine whether a sample is real or generated
Proposed method
In this section, we first explain our choice of GAN architecture and type of conditional GAN, and then our proposed modifications to disentangle identity-related attributes from non-identity-related attributes. Finally, we explain how we use the proposed GAN for augmenting existing face datasets.
Experiments
In this section, we start by providing a qualitative analysis of the synthetic images generated by our proposed GAN. Next, we explore the feasibility of augmenting face datasets with synthetic images, both in terms of width and depth. The augmented datasets are used to train CNN-based face recognition models (henceforth referred to as discriminative models) to determine whether they achieve a higher accuracy than models trained with real images alone.
We use the Face-Resnet architecture as our
Conclusions
In this paper, we have studied the feasibility of augmenting face datasets with photo-realistic synthetic images. In particular, we have presented a new type of conditional GAN that can generate photo-realistic face images from two latent vectors encoding identity-related attributes and non-identity-related attributes respectively. By fixing the latent vector of identity-related attributes and varying the latent vector of non-identity-related attributes, our proposed GAN can generate images of
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
This research did not receive any specific grant from funding agencies in public, commercial, or not-for-profit sectors.
References (63)
Generative Adversarial Networks are special cases of Artificial Curiosity (1990) and also closely related to Predictability Minimization (1991)
Neural Networks
(2020)- Antipov, G., Baccouche, M., & Dugelay, J.-L. (2017). Face aging with conditional generative adversarial networks. In...
- Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein generative adversarial networks. In Proc. int. conf. mach....
- Banerjee, S., Bernhard, J. S., Scheirer, W. J., Bowyer, K. W., & Flynn, P. J. (2017). SREFI: Synthesis of realistic...
- Bansal, A., Nanduri, A., Castillo, C. D., Ranjan, R., & Chellappa, R. (2017). UMDFaces: An annotated face dataset for...
- et al.
Face recognition based on fitting a 3D morphable model
IEEE Transactions on Pattern Analysis and Machine Intelligence
(2003) - Brkic, K., Sikiric, I., Hrkac, T., & Kalafatic, Z. (2017). I know that person: generative full body and face...
- Brock, A., Donahue, J., & Simonyan, K. (2018). Large scale GAN training for high fidelity natural image synthesis. In...
- Brock, A., Lim, T., Ritchie, J. M., & Weston, N. (2017). Neural photo editing with introspective adversarial networks....
- Cao, Q., Shen, L., Xie, W., Parkhi, O. M., & Zisserman, A. (2018). Vggface2: A dataset for recognising faces across...
Semantically decomposing the latent spaces of generative adversarial networks
NIPS 2016 tutorial: Generative adversarial networks
AttGAN: Facial attribute editing: Only change what you want
Progressive growing of gans for improved quality, stability, and variation
Auto-encoding variational bayes
Training deep face recognition systems with synthetic data
Adversarial autoencoders
Cited by (17)
Face image-sketch synthesis via generative adversarial fusion
2022, Neural NetworksCitation Excerpt :For example, due to the unavailability of suspect face images, face sketches are drawn manually using professional software in terms of the witness description, and are further used for police hunting (Klare, Li, & Jain, 2011; Wang & Tang, 2009; Wang, Tao, Gao, Li, & Li, 2014). Another increasing public concern is the face spoofing attacks to face recognition systems (Peng, Wang, Li, & Gao, 2021; Saez Trigueros, Meng, & Hartnett, 2021). When verifying the security of a face recognition system, it tends to produce inaccurate verification results using the sketches to attack face recognition system.
Consecutive multiscale feature learning-based image classification model
2023, Scientific ReportsApplications of artificial intelligence in forensic sciences: Current potential benefits, limitations and perspectives
2023, International Journal of Legal Medicine