Paired-D++ GAN for image manipulation with text

Vo, Duc Minh; Sugimoto, Akihiro

doi:10.1007/s00138-022-01298-7

Paired-D++ GAN for image manipulation with text

Original Paper
Published: 08 April 2022

Volume 33, article number 45, (2022)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

Duc Minh Vo¹^nAff2 &
Akihiro Sugimoto³

325 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Image manipulation with text is to semantically modify the appearance of an object in a source image based on the given text describing the novel visual attributes while retaining other irrelevant information in the image, such as the background. This has a wide range of applications, such as intelligent image manipulation, and is helpful to those who are not good at painting. We propose a generative adversarial network having a pair of discriminators with different architectures, namely Paired-D++ GAN, for image manipulation with text where the two discriminators make different judgments: one for foreground synthesis and the other for background synthesis. The generator of Paired-D++ GAN has the encoder–decoder architecture with skip-connections and synthesizes an object’s appearance matching the given text description while preserving other parts of the source image. The two discriminators judge the foreground and background of the synthesized image separately to meet the given input text description and the given source image. The Paired-D++ GAN is trained using the effectively unconditional and conditional adversarial learning process in a simultaneous three-player minimax game. Our comprehensively experimental results on the Caltech-200 bird dataset and the Oxford-102 flower dataset show that Paired-D++ GAN can semantically synthesize images to match an input text description while retaining the background in a source image against the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Paired-D GAN for Semantic Image Synthesis

Text to Image Synthesis Based on Multiple Discrimination

CPGAN: Content-Parsing Generative Adversarial Networks for Text-to-Image Synthesis

Notes

References

Dong, H., Yu, S., Wu, C., Guo, Y.: Semantic image synthesis via adversarial learning. In: ICCV, (2017)
Nam, S., Kim, Y., Kim, S.J.: Manipulating images with natural language. In: NeurIPS, Text-Adaptive Generative Adversarial Networks (2018)
Li, B., Qi, X., Lukasiewicz, T., Philip H.S.T.: Text-guided image manipulation. In: CVPR, Manigan (2020)
Reed, S., Akata, Z., Xinchen Y., Logeswaran L., Bernt S., Honglak L.: Generative adversarial text-to-image synthesis. In: ICML (2016)
Yan, X., Yang, J., Sohn, K., Lee, H.: Attribute2image: Conditional image generation from visual attributes. In: ECCV (2016)
Efros, A.A., Freeman, W.T.; Image quilting for texture synthesis and transfer, In: SIGGRAPH (2001)
Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural networks. In: CVPR (2016)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: NIPS (2014)
Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., Metaxas, D.N: Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In: ICCV (2017)
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR (2017)
Wang, X., Gupta, A.: Generative image modeling using style and structure adversarial networks. In: ECCV (2016)
Ledig, C., Theis, L., Huszar, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A.P., Tejani, A., Totz, J., Wang, Z., Shi, W.: Photo-realistic single image super-resolution using a generative adversarial network. In: CVPR (2017)
Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., Metaxas, D.: Stackgan++: Realistic image synthesis with stacked generative adversarial networks. In: TPAMI (2019)
Xu, T., Zhang, P., Huang, Q., Zhang, H., Gan, Z., Huang, X., He, X.: Attngan: Fine-grained text to image generation with attentional generative adversarial networks. In: CVPR (2018)
Nguyen, T., Le, T., Vu, H., Phung, D.: Dual discriminator generative adversarial nets. In: NIPS (2017)
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD Birds-200-2011 Dataset. Technical Report CNS-TR-2011-001, California Institute of Technology (2011)
Nilsback, M-E., Zisserman, A.: Automated flower classification over a large number of classes. In: Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing (2008)
Yang, J., Kannan, A., Batra, D., Parikh, D.: Lr-gan: Layered recursive generative adversarial networks for image generation. In: ICLR (2017)
Vo, D.M., Sugimoto, A.: Paired-d gan for semantic image synthesis. In: ACCV (2018)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes, In: ICLR (2014)
Rezende, D.J., Mohamed, S., Wierstra, D.: Stochastic backpropagation and approximate inference in deep generative models, In: ICML (2014)
Van Oord, A., Kalchbrenner, N., Kavukcuoglu, K.: Pixel recurrent neural networks. In: ICML (2016)
Van den Oord, A., Kalchbrenner, N., Espeholt, L., Vinyals, O., Graves, A.: Conditional image generation with pixelcnn decoders. In: NIPS (2016)
Jiang, Y., Chang, S., Wang, Z.: Two transformers can make one strong gan. In: NeurIPS, Transgan (2021)
Hudson, D.A., Zitnick, C.L.: Generative adversarial transformers. In: ICML (2021)
Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: Interpretable representation learning by information maximizing generative adversarial nets. In: NIPS (2016)
Taigman, Y., Polyak, A., Wolf, L.: Unsupervised cross-domain image generation. In: ICLR (2017)
Zhu, J.Y., Zhang, R., Pathak, D., Darrell, T., Efros, A.A., Wang, O., Shechtman, E.: Toward multimodal image-to-image translation, In: NIPS (2017)
Perarnau, G., Van De Weijer, J., Raducanu, B., Álvarez, J.M.: Invertible conditional GANs for image editing. In: NIPS Workshop on Adversarial Training (2016)
Li, C., Wand, M.: Precomputed real-time texture synthesis with markovian generative adversarial networks. In: ECCV (2016)
Reed, S., Akata, Z., Lee, H., Schiele, B.: Learning deep representations of fine-grained visual descriptions. In: CVPR (2016)
Reed, S., Akata, Z., Mohan, S., Tenka, S., Schiele, B., Lee, H.: Learning what and where to draw. In: NIPS (2016)
Liu, X., Lin, Z., Zhang, J., Zhao, H., Tran, Q., Wang, X., Hongsheng L.: Open-domain image manipulation with open-vocabulary instructions. In: ECCV, Open-edit (2020)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M.S., Berg, A.C., Li, F.-F.: Imagenet large scale visual recognition challenge. In: IJCV (2015)
Schuster, M., Paliwal, K. K.: Bidirectional recurrent neural networks. In: IEEE Transactions on Signal Processing (1997)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML (2015)
Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: CVPR (2005)
Diederik, P.K., Jimmy B.A: A method for stochastic optimization. In: ICLR (2015)
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X., Chen, X.: Improved techniques for training gans. In: NIPS (2016)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: NIPS (2017)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: CVPR (2016)
Dowson, D.C., Landau, B.V.: The fréchet distance between multivariate normal distributions. J. Multiv. Anal. 12(3), 450–455 (1982)
Article Google Scholar

Download references

Author information

Duc Minh Vo
Present address: The University of Tokyo, Tokyo, Japan

Authors and Affiliations

Department of Informatics, SOKENDAI (The Graduate University for Advanced Studies), Tokyo, Japan
Duc Minh Vo
National Institute of Informatics, Tokyo, Japan
Akihiro Sugimoto

Authors

Duc Minh Vo
View author publications
You can also search for this author in PubMed Google Scholar
Akihiro Sugimoto
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Duc Minh Vo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vo, D.M., Sugimoto, A. Paired-D++ GAN for image manipulation with text. Machine Vision and Applications 33, 45 (2022). https://doi.org/10.1007/s00138-022-01298-7

Download citation

Received: 16 September 2021
Revised: 09 March 2022
Accepted: 17 March 2022
Published: 08 April 2022
DOI: https://doi.org/10.1007/s00138-022-01298-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Paired-D++ GAN for image manipulation with text

Abstract

Access this article

Similar content being viewed by others

Paired-D GAN for Semantic Image Synthesis

Text to Image Synthesis Based on Multiple Discrimination

CPGAN: Content-Parsing Generative Adversarial Networks for Text-to-Image Synthesis

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Paired-D++ GAN for image manipulation with text

Abstract

Access this article

Similar content being viewed by others

Paired-D GAN for Semantic Image Synthesis

Text to Image Synthesis Based on Multiple Discrimination

CPGAN: Content-Parsing Generative Adversarial Networks for Text-to-Image Synthesis

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation