Elsevier

Neural Networks

Volume 132, December 2020, Pages 461-476
Neural Networks

2020 Special Issue
Latent Dirichlet allocation based generative adversarial networks

https://doi.org/10.1016/j.neunet.2020.08.012Get rights and content

Abstract

Generative adversarial networks have been extensively studied in recent years and powered a wide range of applications, ranging from image generation, image-to-image translation, to text-to-image generation, and visual recognition. These methods typically model the mapping from latent space to image with single or multiple generators. However, they have obvious drawbacks: (i) ignoring the multi-modal structure of images, and (ii) lacking model interpretability. Importantly, the existing methods mostly assume one or more generators can cover all image modes even if we do not know the structure of data. Thus, mode dropping and collapse often take place along with GANs training. Despite the importance of exploring the data structure in generation, it has been almost unexplored. In this work, aiming at generating multi-modal images and interpreting model explicitly, we explore the theory on how to integrate GANs with data structure prior, and propose latent Dirichlet allocation based generative adversarial networks (LDAGAN). This framework is extended to combine with a variety of state-of-the-art single-generator GANs and achieves improved performance. Extensive experiments on synthetic and real datasets demonstrate the efficacy of LDAGAN for multi-modal image generation. An implementation of LDAGAN is available at https://github.com/Sumching/LDAGAN.

Introduction

Generating realistic images has been actively pursued in both machine learning and computer vision communities in recent years. Generative adversarial networks (GANs) (Arjovsky et al., 2015, Berthelot et al., 2017, Goodfellow et al., 2014, Hoang et al., 2018, Mao et al., 2017, Miyato et al., 2018, Zhang et al., 2019) provide us a promising way to achieve this goal and its remarkable ability has powered a wide range of applications, ranging from image generation (Gulrajani et al., 2017, Yang et al., 2018), image-to-image translation (Liu et al., 2017, Tran et al., 2017, Yi et al., 2017, Zhu et al., 2017), to text-to-image generation (Reed et al., 2016, Zhang et al., 2017), and visual recognition (Li et al., 2017). Generally, GANs are required to generate output to fit the multi-modal distribution of real images.

However, this goal is hard to achieve if the underlying structure of data is unclear (Pan et al., 2016). This partially explains why mode collapse and dropping problems take place along with GANs training (Arjovsky et al., 2015, Heusel et al., 2017). Typically, employing multiple generators Gk,k=1,,K, could generate more diverse and varied images (Arora et al., 2017, Hoang et al., 2018, Tolstikhin et al., 2017), where the generative distribution pgx is an assemble of sub-modal distributions pgkx,k=1,,K, that is, pgx=k=1Kπkpgkx. However, they still ignore the underlying structure of images, resulting in problems like mode dropping. The absence of sample likelihood and posterior of latent variables in GANs seems to prohibit the forward step to fix this drawback. Moreover, these approaches lack model interpretation: how multiple generators correlate to data structure.

In this paper, we construct a probabilistic GAN framework for multi-modal image generation, which not only integrates with data structure prior, but also has explicit model interpretation (see Fig. 1). We motivate our construction through a Bayesian network. In the following, we list the three main contributions.

First, we introduce the Dirichlet prior and frame a new probabilistic GAN: latent Dirichlet allocation based GAN (LDAGAN). (i) Unlike existing GANs which model the latent space with a simple unimodal distributions (Berthelot et al., 2017, Goodfellow et al., 2014, Gulrajani et al., 2017, Miyato et al., 2018, Peng et al., 2019), we define additional discrete variables in latent space to represent the structure (i.e. mode) of data. Moreover, we impose a hierarchical prior on structure, which increases the possible solution spaces for the multi-modal structure prior (see Fig. 2(b)). (ii) Depending on data structure prior, we build structured GAN to precisely fit complex image data—each generator is only responsible for one image mode. (iii) Through Bayesian networks, we explicitly interpret how multiple generators correlate to data structure.

Second, since GANs do not have an explicit sample likelihood, a natural question then arises: how to estimate model parameters and posterior distributions of latent variables in LDAGAN? To this end, we take an important step: (i) We formulate the likelihood function for LDAGAN model by virtue of the discriminator in GANs. (ii) With the above likelihood, we present a variational inference algorithm in GANs and solve for model parameters by virtue of EM algorithm. (iii) We make stochastic variation inference (Hoffman et al., 2013) to ensure the training of LDAGAN is not time-consuming.

Third, we extend the LDAGAN framework to combine with other single-generator GANs, such as SNGAN (Miyato et al., 2018) and VGAN (Peng et al., 2019). We achieve state-of-the-art performance on both CIFAR-10 (Krizhevsky & Hinton, 2009) and CIFAR-100 (Krizhevsky & Hinton, 2009) datasets. For example, our method has achieved a value of 10.4 for Fréchet Inception Distance on the CIFAR-10 dataset, which is currently the best reported result with ResNet architecture in literature to our knowledge.

The theoretical analysis of this paper may further inspire the GAN communities to study and improve upon how to integrate GANs with other prior knowledge, pursuing more realistic generative performance and explicit interpretability, since this problem thus far has not been well explored.

Section snippets

Related work

This section reviews related work on variants of generative adversarial networks (GANs).

Latent Dirichlet allocation based GAN (LDAGAN)

As previously mentioned, existing GANs applying for multi-modal image generation almost seem to exhibit two drawbacks: (i) ignoring the multi-modal structure prior of images (see Fig. 2(a)), and (ii) lacking model interpretability. This partially explains why mode collapse and dropping problems take place along with GANs training. The inability of obtaining explicit image likelihood and posterior distributions of latent variables in GANs seems to prohibit the forward step to model the data

Learning

This section describes the learning of the discriminator, generators and Dirichlet parameters α in LDAGAN.

Extension of LDAGAN

In addition to implementing LDAGAN by DCGAN architecture (Liu et al., 2017), we also investigate the benefits of extending LDAGAN with other state-of-the-art single generator based GANs. These GANs often obtain outstanding performance through changing the loss of GANs or modifying the optimization strategy. Combining with them may further promote the generative performance of LDAGAN and help to investigate thoroughly the superiority of introducing multi-modal structure prior into GANs.

LDAGAN-SN

Experiments

We carried out experiments on both synthetic data and real-world data.

Conclusion and future work

Multi-modal structure prior was introduced to construct a probabilistic GAN framework for multi-modal image generation. By virtue of Bayesian networks, we explicitly modeled the underlying structure of vision data in generation, and well interpreted how multiple generators correlate to data structure. Variational inference and parameter estimation were operationalized by utilizing discriminator to build up a likelihood on model parameters. Moreover, the proposed method was extended to combine

Acknowledgments

This work is partically supported by the National Natural Science Foundation of China (Nos. 61572111 and 61806043), and the China Postdoctoral Science Foundation (Nos. 2016M602674 and 2017M623007).

References (45)

  • MaD. et al.

    Two birds with one stone: Iteratively learn facial attributes with GANs

    Neurocomputing

    (2020)
  • PanL. et al.

    Mixture of grouped regressors and its application to visual mapping

    Pattern Recognition

    (2016)
  • Arjovsky, M., Chintala, S., & Bottou, L. (2015). Wasserstein GAN. In ICML (pp....
  • Arora, S., Ge, R., Liang, Y., Ma, T., & Zhang, Y. (2017). Generalization and equilibrium in generative adversarial...
  • Bai, H., Chen, Z., Lyu, M. R., King, I., & Xu, Z. (2018). Neural relational topic models for scientific article...
  • BerthelotD. et al.

    BEGAN: boundary equilibrium generative adversarial networks

    (2017)
  • BleiD.M. et al.

    Latent Dirichlet allocation

    Journal of Machine Learning Research

    (2003)
  • BleiD.M. et al.

    Latent Dirichlet allocation

    Journal of Machine Learning Research

    (2003)
  • Brock, A., Donahue, J., & Simonyan, K. (2019). Large scale GAN training for high fidelity natural image synthesis. In...
  • Chen, T., Zhai, X., Ritter, M., Lucic, M., & Houlsby, N. (2019). Self-supervised GANs via auxiliary rotation loss. In...
  • Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image...
  • Eghbal-zadeh, H., Zellinger, W., & Widmer, G. (2019). Mixture density generative adversarial networks. In CVPR (pp....
  • Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014)....
  • Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., & Courville, A. C. (2017). Improved training of Wasserstein GANs....
  • Heim, E. (2019). Constrained generative adversarial networks for interactive image generation. In CVPR (pp....
  • Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., & Hochreiter, S. (2017). GANs trained by a two time-scale...
  • Hoang, Q., Nguyen, T. D., Le, T., & Phung, D. (2018). MGAN: Training generative adversarial nets with multiple...
  • HoffmanM.D. et al.

    Stochastic variational inference

    Journal of Machine Learning Research

    (2013)
  • KadanoffL.P.

    More is the same; Phase transitions and mean field theories

    Journal of Statistical Physics

    (2003)
  • KaraletsosT.

    Adversarial message passing for graphical models

    (2016)
  • Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2018). Progressive growing of GANs for improved quality, stability,...
  • KrizhevskyA. et al.

    Learning multiple layers of features from tiny imagesTechnical report

    (2009)
  • Cited by (5)

    • DART: Domain-Adversarial Residual-Transfer networks for unsupervised cross-domain image classification

      2020, Neural Networks
      Citation Excerpt :

      Following such kind of principle, several previous works have proposed to learn transferable features with deep neural networks (Long, Cao, Wang, & Jordan, 2015; Long, Zhu, Wang, & Jordan, 2016, 2017; Tzeng, Hoffman, Zhang, Saenko, & Darrell, 2014; Zhuo, Wang, Zhang, & Huang, 2017), by minimizing a distance metric of domain discrepancy, e.g., Maximum Mean Discrepancy (MMD) (Gretton, Borgwardt, Rasch, & Bernhard, 2006). Recently, inspired adversarial training and Generative Adversarial Networks (GANs) (Goodfellow et al., 2014; Liang et al., 2019; Pan, Cheng, Liu, Ren, & Xu, 2020), a surge of emerging studies proposed to apply adversarial learning for unsupervised domain adaptation (Bousmalis, Trigeorgis, Silberman, Krishnan, & Erhan, 2016; Ganin & Lempitsky, 2015; Ganin et al., 2016; Liu, Breuel, & Kautz, 2017; Liu & Tuzel, 2016; Taigman, Polyak, & Wolf, 2017; Tzeng, Hoffman, Saenko, & Darrell, 2017), validating the advantages of adversarial learning over traditional approaches in minimizing domain discrepancy and obtained new state-of-the-art results on benchmark datasets for unsupervised domain adaptation. Among the emerging GAN-inspired approaches, domain adversarial neural networks (DANN) (Ganin et al., 2016) represent an important milestone.

    • A stochastic algorithm for solving the posterior inference problem in topic models

      2022, Telkomnika (Telecommunication Computing Electronics and Control)
    View full text