2020 Special IssueLatent Dirichlet allocation based generative adversarial networks
Introduction
Generating realistic images has been actively pursued in both machine learning and computer vision communities in recent years. Generative adversarial networks (GANs) (Arjovsky et al., 2015, Berthelot et al., 2017, Goodfellow et al., 2014, Hoang et al., 2018, Mao et al., 2017, Miyato et al., 2018, Zhang et al., 2019) provide us a promising way to achieve this goal and its remarkable ability has powered a wide range of applications, ranging from image generation (Gulrajani et al., 2017, Yang et al., 2018), image-to-image translation (Liu et al., 2017, Tran et al., 2017, Yi et al., 2017, Zhu et al., 2017), to text-to-image generation (Reed et al., 2016, Zhang et al., 2017), and visual recognition (Li et al., 2017). Generally, GANs are required to generate output to fit the multi-modal distribution of real images.
However, this goal is hard to achieve if the underlying structure of data is unclear (Pan et al., 2016). This partially explains why mode collapse and dropping problems take place along with GANs training (Arjovsky et al., 2015, Heusel et al., 2017). Typically, employing multiple generators , could generate more diverse and varied images (Arora et al., 2017, Hoang et al., 2018, Tolstikhin et al., 2017), where the generative distribution is an assemble of sub-modal distributions , that is, . However, they still ignore the underlying structure of images, resulting in problems like mode dropping. The absence of sample likelihood and posterior of latent variables in GANs seems to prohibit the forward step to fix this drawback. Moreover, these approaches lack model interpretation: how multiple generators correlate to data structure.
In this paper, we construct a probabilistic GAN framework for multi-modal image generation, which not only integrates with data structure prior, but also has explicit model interpretation (see Fig. 1). We motivate our construction through a Bayesian network. In the following, we list the three main contributions.
First, we introduce the Dirichlet prior and frame a new probabilistic GAN: latent Dirichlet allocation based GAN (LDAGAN). (i) Unlike existing GANs which model the latent space with a simple unimodal distributions (Berthelot et al., 2017, Goodfellow et al., 2014, Gulrajani et al., 2017, Miyato et al., 2018, Peng et al., 2019), we define additional discrete variables in latent space to represent the structure (i.e. mode) of data. Moreover, we impose a hierarchical prior on structure, which increases the possible solution spaces for the multi-modal structure prior (see Fig. 2(b)). (ii) Depending on data structure prior, we build structured GAN to precisely fit complex image data—each generator is only responsible for one image mode. (iii) Through Bayesian networks, we explicitly interpret how multiple generators correlate to data structure.
Second, since GANs do not have an explicit sample likelihood, a natural question then arises: how to estimate model parameters and posterior distributions of latent variables in LDAGAN? To this end, we take an important step: (i) We formulate the likelihood function for LDAGAN model by virtue of the discriminator in GANs. (ii) With the above likelihood, we present a variational inference algorithm in GANs and solve for model parameters by virtue of EM algorithm. (iii) We make stochastic variation inference (Hoffman et al., 2013) to ensure the training of LDAGAN is not time-consuming.
Third, we extend the LDAGAN framework to combine with other single-generator GANs, such as SNGAN (Miyato et al., 2018) and VGAN (Peng et al., 2019). We achieve state-of-the-art performance on both CIFAR-10 (Krizhevsky & Hinton, 2009) and CIFAR-100 (Krizhevsky & Hinton, 2009) datasets. For example, our method has achieved a value of 10.4 for Fréchet Inception Distance on the CIFAR-10 dataset, which is currently the best reported result with ResNet architecture in literature to our knowledge.
The theoretical analysis of this paper may further inspire the GAN communities to study and improve upon how to integrate GANs with other prior knowledge, pursuing more realistic generative performance and explicit interpretability, since this problem thus far has not been well explored.
Section snippets
Related work
This section reviews related work on variants of generative adversarial networks (GANs).
Latent Dirichlet allocation based GAN (LDAGAN)
As previously mentioned, existing GANs applying for multi-modal image generation almost seem to exhibit two drawbacks: (i) ignoring the multi-modal structure prior of images (see Fig. 2(a)), and (ii) lacking model interpretability. This partially explains why mode collapse and dropping problems take place along with GANs training. The inability of obtaining explicit image likelihood and posterior distributions of latent variables in GANs seems to prohibit the forward step to model the data
Learning
This section describes the learning of the discriminator, generators and Dirichlet parameters in LDAGAN.
Extension of LDAGAN
In addition to implementing LDAGAN by DCGAN architecture (Liu et al., 2017), we also investigate the benefits of extending LDAGAN with other state-of-the-art single generator based GANs. These GANs often obtain outstanding performance through changing the loss of GANs or modifying the optimization strategy. Combining with them may further promote the generative performance of LDAGAN and help to investigate thoroughly the superiority of introducing multi-modal structure prior into GANs.
LDAGAN-SN
Experiments
We carried out experiments on both synthetic data and real-world data.
Conclusion and future work
Multi-modal structure prior was introduced to construct a probabilistic GAN framework for multi-modal image generation. By virtue of Bayesian networks, we explicitly modeled the underlying structure of vision data in generation, and well interpreted how multiple generators correlate to data structure. Variational inference and parameter estimation were operationalized by utilizing discriminator to build up a likelihood on model parameters. Moreover, the proposed method was extended to combine
Acknowledgments
This work is partically supported by the National Natural Science Foundation of China (Nos. 61572111 and 61806043), and the China Postdoctoral Science Foundation (Nos. 2016M602674 and 2017M623007).
References (45)
- et al.
Two birds with one stone: Iteratively learn facial attributes with GANs
Neurocomputing
(2020) - et al.
Mixture of grouped regressors and its application to visual mapping
Pattern Recognition
(2016) - Arjovsky, M., Chintala, S., & Bottou, L. (2015). Wasserstein GAN. In ICML (pp....
- Arora, S., Ge, R., Liang, Y., Ma, T., & Zhang, Y. (2017). Generalization and equilibrium in generative adversarial...
- Bai, H., Chen, Z., Lyu, M. R., King, I., & Xu, Z. (2018). Neural relational topic models for scientific article...
- et al.
BEGAN: boundary equilibrium generative adversarial networks
(2017) - et al.
Latent Dirichlet allocation
Journal of Machine Learning Research
(2003) - et al.
Latent Dirichlet allocation
Journal of Machine Learning Research
(2003) - Brock, A., Donahue, J., & Simonyan, K. (2019). Large scale GAN training for high fidelity natural image synthesis. In...
- Chen, T., Zhai, X., Ritter, M., Lucic, M., & Houlsby, N. (2019). Self-supervised GANs via auxiliary rotation loss. In...
Stochastic variational inference
Journal of Machine Learning Research
More is the same; Phase transitions and mean field theories
Journal of Statistical Physics
Adversarial message passing for graphical models
Learning multiple layers of features from tiny imagesTechnical report
Cited by (5)
DivGAN: A diversity enforcing generative adversarial network for mode collapse reduction
2023, Artificial IntelligenceDART: Domain-Adversarial Residual-Transfer networks for unsupervised cross-domain image classification
2020, Neural NetworksCitation Excerpt :Following such kind of principle, several previous works have proposed to learn transferable features with deep neural networks (Long, Cao, Wang, & Jordan, 2015; Long, Zhu, Wang, & Jordan, 2016, 2017; Tzeng, Hoffman, Zhang, Saenko, & Darrell, 2014; Zhuo, Wang, Zhang, & Huang, 2017), by minimizing a distance metric of domain discrepancy, e.g., Maximum Mean Discrepancy (MMD) (Gretton, Borgwardt, Rasch, & Bernhard, 2006). Recently, inspired adversarial training and Generative Adversarial Networks (GANs) (Goodfellow et al., 2014; Liang et al., 2019; Pan, Cheng, Liu, Ren, & Xu, 2020), a surge of emerging studies proposed to apply adversarial learning for unsupervised domain adaptation (Bousmalis, Trigeorgis, Silberman, Krishnan, & Erhan, 2016; Ganin & Lempitsky, 2015; Ganin et al., 2016; Liu, Breuel, & Kautz, 2017; Liu & Tuzel, 2016; Taigman, Polyak, & Wolf, 2017; Tzeng, Hoffman, Saenko, & Darrell, 2017), validating the advantages of adversarial learning over traditional approaches in minimizing domain discrepancy and obtained new state-of-the-art results on benchmark datasets for unsupervised domain adaptation. Among the emerging GAN-inspired approaches, domain adversarial neural networks (DANN) (Ganin et al., 2016) represent an important milestone.
Current status, application, and challenges of the interpretability of generative adversarial network models
2023, Computational IntelligenceA stochastic algorithm for solving the posterior inference problem in topic models
2022, Telkomnika (Telecommunication Computing Electronics and Control)