Training bidirectional generative adversarial networks with hints

doi:10.1016/j.patcog.2020.107320

Pattern Recognition

Volume 103, July 2020, 107320

https://doi.org/10.1016/j.patcog.2020.107320 Get rights and content

Highlights

•
The BiGAN has an encoder, in addition to the generator and discriminator of GAN.
•
This encoder coupled with the generator allows defining extra loss terms as hints.
•
We experiment on five image data sets, MNIST, UT-Zap50K, GTSRB, Cifar10, and CelebA.
•
With these different hints, BiGAN generates higher quality and more diverse images.

Abstract

The generative adversarial network (GAN) is composed of a generator and a discriminator where the generator is trained to transform random latent vectors to valid samples from a distribution and the discriminator is trained to separate such “fake” examples from true examples of the distribution, which in turn forces the generator to generate better fakes. The bidirectional GAN (BiGAN) also has an encoder working in the inverse direction of the generator to produce the latent space vector for a given example. This added encoder allows defining auxiliary reconstruction losses as hints for a better generator. On five widely-used data sets, we showed that BiGANs trained with the Wasserstein loss and augmented with hints learn better generators in terms of image generation quality and diversity, as measured numerically by the 1-nearest neighbor test, Fréchet inception distance, and reconstruction error, and qualitatively by visually analyzing the generated samples.

Graphical abstract

Introduction

In generative modeling, we have a data set of {x^t}_t drawn from an unknown probability distribution p(x) and we would like to be able to generate new x that also look like they have been drawn from p(x). For example, x^t may be the face images of a collection of people and we would like to be able to generate new face images; these new synthetic images would be legitimate faces but of people that do not exist.

The typical approach would be to learn some estimator to p(x) (e.g., using a Gaussian distribution) and then to sample from that estimator. The approach we have in this paper defines generative modeling as a mapping task where a generator function takes some low-dimensional z drawn from a given p(z) as input and transforms it into a valid instance x from p(x). All the structure that p(x) has (for example, all the requirements of being a face image) needs to be captured during learning so that the newly generated x also reflect those.

In many real-world applications that involve, for instance, images, speech, or text, our observations x are high-dimensional; at the same time, we know that all these dimensions are not all necessary or independent. An important research area in machine learning is hence dimensionality reduction where we want to map x to a much lower-dimensional z-space without any loss of information, and many methods, e.g., principal components analysis (PCA), have been proposed to learn such a mapping. In a generative model, we posit that the dimensions of z are latent factors that interact to generate the observed x; one example model is factor analysis (FA), which goes in the opposite direction of PCA.

Unsupervised dimensionality reduction can be learned using the neural network architecture called the autoencoder (AE) (Fig. 1). The encoder part compresses x to z (as in PCA) and the decoder part generates x from z (as in FA). The two networks back-to-back are trained to reconstruct the input, that is, to minimize the difference between the output of the decoder and the input to the encoder. In the simplest case, both the encoder and the decoder are one-layer (i.e., linear) networks and in this case, it has been shown that the encoder spans the same subspace as PCA, but with the encoder and the decoder having more layers, the AE realizes nonlinear dimensionality reduction with z corresponding to more interesting abstract features of the input.

Typically, the encoder and the decoder are taken to be inverses of each other in terms of network architecture. For example with image data, the encoder starts with one or more convolution layers that successively downsample followed by one or more dense layers decreasing dimensionality at each layer; the decoder starts from there and increases dimensionality at each layer starting with one or more dense layers and ending with one or more upsampling deconvolution layers to generate the image back again.

The autoencoder is not a generative model; for any x, we can find the corresponding z and then reconstruct x, but we have no way of generating new x outside of the training set. In the variational autoencoder (VAE) [1], we consider z^t as random variables sampled from a known distribution p(z) (e.g., Gaussian), and we add an extra term to the reconstruction error to enforce this. Once training is done, we can sample from this p(z) and use the decoder to generate new x.

In this paper, we extend the generative adversarial network (GAN) [2] that has recently been shown to work better than the VAE as a generative model. The original GAN model is composed of two networks, a generator G and a discriminator D (Fig. 2). Both G and D are deep neural networks with convolutional and dense layers as appropriate. The generator takes a latent vector z as input and generates an observation vector x, where z are low-dimensional and are sampled from an assumed probability distribution p(z) (e.g., multivariate Gaussian with independent features). Once training is done, we can generate new x by sampling new z from p(z) and passing them through G.

The samples generated by G are called fake; they are the adverse examples to the true x^t that we have in our training set. The aim of the discriminator is to tell the true and fake samples apart as well as possible, and that is how it is trained. The aim of the generator on the other hand is to generate fakes so well that the discriminator cannot tell them apart from the true samples. The two networks G and D play an adversarial game and gradually improve their abilities: As G gets to generate better fakes, D gets better at detecting them, which in turn forces G to get even better, and so on.

The following log-likelihood criterion is maximized by D and minimized by G: $L_{GAN} = \sum_{x^{t} \in X} \log D (x^{t}) + \sum_{z^{t} \sim p (z)} \log (1 - D (G (z^{t})))$ Here, x^t are the true samples drawn from the training set $X$ and G(z^t) are the fake samples with z^t sampled from p(z).

Since its inception, the GAN model and its many variants have been successfully used in many applications, in image, video, text and music generation; see [3] for a survey.

Despite their various successful applications, it has been seen that training GANs is difficult and several empirical tips and tricks have been proposed to improve convergence, such as label smoothing, mini-batch discrimination, and feature matching [4].

Our approach in this paper involves adding an auxiliary loss term to that of GAN and optimize the resulting augmented criterion in training. This added term provides a “hint” that directs the learning process towards a better generator. We propose a general framework that defines how such hints can be included in training and show four variants. To be able to define the type of hint we have, we use the bidirectional form of the GAN. The original GAN can generate x for any z but does not have an inverse mapper for generating the corresponding z for a given x. The bidirectional GAN (BiGAN) [5], [6] also includes an encoder component, and this encoder allows us to define various loss functions to train better generators. This new encoder component (x → z), which is also implemented as a deep neural network, works just like the encoder of the AE, and the generator of the GAN (z → x) works just like the decoder of AE, and we use this correspondence in defining the auxiliary loss functions.

In addition to training a better generator, having an encoder can also be useful in different scenarios: Once we have such a mechanism, by investigating how z changes as x is changed, we can assign meaning to the different dimensions of z [7] and this allows knowledge extraction: For example, we can do “vector algebra” where abstract features such as putting on glasses corresponds to adding a vector in the z-space. Because such an encoder works as a dimensionality reducer trained with unlabeled data, it can be used as a preprocessor before a later classifier or regressor in a semi-supervised setting.

In Section 2.1, we discuss the BiGAN model trained using the original log-likelihood criterion as well as with the Wasserstein loss introducing the Wasserstein BiGAN. We introduce the auxiliary reconstruction criteria for training BiGANs in Section 3. Our experimental results on five image data sets are given in Section 4 and we conclude in Section 5.

Section snippets

The bidirectional GAN

The original GAN can generate x for any z but does not have an inverse mapper for generating the corresponding z for any given x. The Bidirectional GAN (BiGAN) [5] and the equivalent Adversarially Learned Inference (ALI) [6] models were proposed independently and contain also an encoder component E mapping true x to z (Fig. 3). Unlike the GAN where the discriminator sees only x as input, in the BiGAN, D sees both x and z, i.e., the observation and its latent representation together. For a true

Motivation

Training a GAN is difficult because of a number of reasons:

1.
Though through the concept of adversarial training it is cast as a supervised problem, training a generator is in fact an unsupervised learning task, and unsupervised learning is known to be more difficult because there is less feedback.
2.
There are two models D and G to train and hence the problem of model selection is multiplied by two. Both are typically implemented by many-layered deep networks and the depth and width of both should be

Setting

We use five well-known real-world image data sets frequently used to test GANs; they are MNIST, UT-Zap50K shoes, German Traffic Sign Recognition Benchmark (GTSRB), Cifar10, and CelebA. MNIST consists of 60,000 handwritten grayscale digit images each of size 28 × 28, which we resize to 32 × 32 for convenience. UT-Zap50K data set contains 50,025 shoe images in RGB; images are of varying sizes and we resize them to 32 × 32. The training set of GTSRB contains 39,209 traffic sign images which we

Conclusions

We applied the Wasserstein loss to BiGAN and also extended it by using additional loss criteria. After our experiments on MNIST, UT-Zap50K, GTSRB, Cifar10 and CelebA data sets, we have reached the following findings:

•
The autoencoder structure of BiGAN allows defining a reconstruction error which can be used to define different loss criteria as hints. We see that suitably defined hints lead to improved quality in generation.
•
We find that the variant that works in the original image space (DS) and

Acknowledgements

This work is partially supported by Boğaziçi University Research Funds with Grant Number 18A01P7. We also thank TETAM for the computing facilities provided.

Uras Mutlu recevied his B.Sc. degree on computer engineering from Istanbul Technical University in 2016 and his M.S. degree on computer engineering from Boǧaziçi University in 2019. He is currently a Ph.D. student with research interests such as generative adversarial networks, computer vision, natural language processing, and deep learning in general.

References (32)

A. Atapour-Abarghouei et al.
Generative adversarial framework for depth filling via Wasserstein metric, cosine transform and domain transfer
Pattern Recognit.
(2019)
W. Xu et al.
Toward learning a unified many-to-many mapping for diverse image translation
Pattern Recognit.
(2019)
D. P. Kingma, M. Welling, Auto-encoding variational Bayes, arXiv:1312.6114...
I. Goodfellow et al.
Generative adversarial nets
Advances in Neural Information Processing Systems 27
(2014)
Y. Hong, U. Hwang, J. Yoo, S. Yoon, How generative adversarial networks and their variants work: an overview,...
M. Arjovsky et al.
Towards principled methods for training generative adversarial networks
International Conference on Learning Representations
(2017)
J. Donahue, P. Krähenbühl, T. Darrell, Adversarial feature learning, arXiv:1605.09782...
V. Dumoulin, I. Belghazi, B. Poole, O. Mastropietro, A. Lamb, M. Arjovsky, A. Courville, Adversarially learned...
A. Radford, L. Metz, S. Chintala, Unsupervised representation learning with deep convolutional generative adversarial...
M. Arjovsky et al.
Wasserstein generative adversarial networks
International Conference on Machine Learning
(2017)

T. Karras, T. Aila, S. Laine, J. Lehtinen, Progressive growing of GANs for improved quality, stability, and variation,...

I. Gulrajani et al.

Improved training of Wasserstein GANs

Advances in Neural Information Processing Systems 30

(2017)

G.-J. Qi, Loss-sensitive generative adversarial networks on Lipschitz densities, arXiv:1701.06264...

C. Szegedy et al.

Rethinking the inception architecture for computer vision

IEEE Conference on Computer Vision and Pattern Recognition

(2016)

A.B.L. Larsen, S.K. Sønderby, H. Larochelle, O. Winther, Autoencoding beyond pixels using a learned similarity metric,...

A. Makhzani, J. Shlens, N. Jaitly, I. Goodfellow, B. Frey, Adversarial autoencoders, arXiv:1511.05644...

Cited by (15)

Abnormal data detection for structural health monitoring: State-of-the-art review
2024, Developments in the Built Environment
Structural health monitoring (SHM) is widely used to monitor and assess the condition and performance of engineering structures such as, buildings, bridges, dams, and tunnels. Owing to sensor defects, data acquisition errors, and environmental interference, abnormal data are often collected and stored in monitoring systems. The abnormal data in this study are essentially different from so-called “abnormal state data,” which result from structural physical damage or performance degradation. Abnormal data are totally related to the external interference rather than changes in the inherent structural features. However, abnormal data can significantly affect the performance assessment of engineering structures. It is imperative to detect and remove abnormal data from measurements to avoid misjudging structural performance in SHM. This paper summarizes abnormal data detection in the SHM field and discusses relevant challenges. Moreover, background knowledge regarding abnormal data detection is introduced. Abnormal data detection methods are then classified into statistical probability methods, predictive models, and computer vision methods. The advantages, disadvantages, and scope of each method are investigated. An example of detecting abnormal monitoring data for a cable-stayed bridge is introduced. In addition, the issues of existing studies are summarized, and future study interests are discussed.
Physics-Guided Generative Adversarial Networks for fault detection of underwater thruster
2023, Ocean Engineering
In this paper, we propose a novel hybrid framework for fault detection in underwater thrusters based on the combination of a physical model and a Generative Adversarial Network (GAN). The proposed framework allows incorporating physical information within the architecture and training process of GAN, and relying only on a small dataset without fault samples to complete the training. Firstly, a Variational Autoencoder (VAE) is used to extract the informative distribution from the voltage signal, and this distribution is introduced into the generator of the GAN. Then, the thruster torque estimated by an Extended State Observer (ESO) is used as the real sequence of values for the discriminator of the GAN. Furthermore, a physical loss is aided for the training of the generator to improve the convergence of the GAN. The proposed hybrid framework is validated on a real dataset with multiple faults. The experimental results show that the proposed framework allows accurately detecting the different degrees of propeller damage faults and biofouling faults in underwater thrusters.
GL-GAN: Adaptive global and local bilevel optimization for generative adversarial network
2022, Pattern Recognition
Citation Excerpt :
In images completion task, Qiang Wang et al. [32] incorporate deep generative adversarial networks with a Laplacian pyramid mechanism to recover the spatial information of missing face regions in a coarse-to-fine manner. Uras Mutlu et al. [33] present an encoder working in the inverse direction of the generator to provide auxiliary reconstruction losses as hints for a better generator. CG-GAN [34] applies generative and evolutionary computation to allow casual users to interactively breed and edit faces.
Although Generative Adversarial Networks (GAN) have shown remarkable performance in image generation, there exist some challenges in instability and convergence speed. During the training, the results of some models display the imbalances of quality within a generated image, in which some defective parts appear compared with other regions. Different from general single global optimization methods, we introduce an adaptive global and local bilevel optimization model (GL-GAN). The model achieves the generation of high-resolution images in a complementary and promoting way, where global optimization is to optimize the whole images and local is only to optimize the low-quality areas. Based on DCGAN, GL-GAN is able to effectively avoid the nature of imbalance by local bilevel optimization, which is accomplished by first locating low-quality areas and then optimizing them. Moreover, through feature map cues from discriminator output, we propose the adaptive local and global optimization method (Ada-OP) for interactive optimization and observe that it boosts the convergence speed. Compared with the current GAN methods, our model has shown impressive performance on CelebA, Oxford Flowers, CelebA-HQ and LSUN datasets.
Unsupervised anomaly detection for underwater gliders using generative adversarial networks
2021, Engineering Applications of Artificial Intelligence
Citation Excerpt :
An earlier study by Li et al. (2018) applied GAN to detect cyber attacks, using multivariate time series with the need of the inference process to map the test data back to latent space. Although these GAN-based anomaly detection systems appear successful in the applied domains, the GAN-based anomaly detection system is still relatively difficult to train for reasons including its unsupervised nature and the generative and adversarial process between multiple deep neural networks (Mutlu and Alpaydin, 2020). Despite GAN-based anomaly detection systems’ success in other domains, they have not been applied to MAS, which are subjected to limited accessibility to system data and require a high level of generality to detect unpredicted anomalies in highly dynamic ocean environments.
An effective anomaly detection system is critical for marine autonomous systems operating in complex and dynamic marine environments to reduce operational costs and achieve concurrent large-scale fleet deployments. However, developing an automated fault detection system remains challenging for several reasons including limited data transmission via satellite services. Currently, most anomaly detection for marine autonomous systems, such as underwater gliders, rely on intensive analysis by pilots. This study proposes an unsupervised anomaly detection system using bidirectional generative adversarial networks guided by assistive hints for marine autonomous systems with time series data collected by multiple sensors. In this study, the anomaly detection system for a fleet of underwater gliders is trained on two healthy deployment datasets and tested on other nine deployment datasets collected by a selection of vehicles operating in a range of locations and environmental conditions. The system is successfully applied to detect anomalies in the nine test deployments, which include several different types of anomalies as well as healthy behaviour. Also, a sensitivity study of the data decimation settings suggests the proposed system is robust for Near Real-Time anomaly detection for underwater gliders.
Diffusion Policies for Out-of-Distribution Generalization in Offline Reinforcement Learning
2024, IEEE Robotics and Automation Letters
Enhancing Trustworthy Deep Learning for Image Classification against Evasion Attacks: A systematic literature review
2023, Research Square

View all citing articles on Scopus

Ethem Alpaydın received his Ph.D. degree from Ecole Polytechnique Fédérale de Lausanne, Switzerland, in 1990, and was a postdoc at the International Computer Science Institute, Berkeley in 1991. He is a Professor in the Department of Computer Engineering, Boǧaziçi University, Istanbul and a Member of the Science Academy, Istanbul. He was a Visiting Researcher at MIT in 1994, IDIAP, in 1998, and TU Delft, in 2014. He was a Fulbright scholar, in 1997. The third edition of his book Introduction to Machine Learning was published by The MIT Press, in 2014.

View full text

Training bidirectional generative adversarial networks with hints

Highlights

Abstract

Graphical abstract

Introduction

Section snippets

The bidirectional GAN

Motivation

Setting

Conclusions

Acknowledgements

Pattern Recognit.

Pattern Recognit.

Generative adversarial nets

Advances in Neural Information Processing Systems 27

Towards principled methods for training generative adversarial networks

International Conference on Learning Representations

Wasserstein generative adversarial networks

International Conference on Machine Learning

Improved training of Wasserstein GANs

Advances in Neural Information Processing Systems 30

Rethinking the inception architecture for computer vision

IEEE Conference on Computer Vision and Pattern Recognition