Multi-Sentence Auxiliary Adversarial Networks for Fine-Grained Text-to-Image Synthesis,IEEE Transactions on Image Processing

当前位置： X-MOL 学术 › IEEE Trans. Image Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multi-Sentence Auxiliary Adversarial Networks for Fine-Grained Text-to-Image Synthesis
IEEE Transactions on Image Processing ( IF 10.8 ) Pub Date : 2021-02-02 , DOI: 10.1109/tip.2021.3055062
Yanhua Yang , Lei Wang , De Xie , Cheng Deng , Dacheng Tao

Due to the development of Generative Adversarial Networks (GANs), significant progress has been achieved in text-to-image synthesis task. However, most previous works have only focus on learning the semantic consistency between paired images and sentences, without exploring the semantic correlation between different yet related sentences that describe the same image, which leads to significant visual variation among the synthesized images. Accordingly, in this article, we propose a new method for text-to-image synthesis, dubbed Multi-sentence Auxiliary Generative Adversarial Networks (MA-GAN); this approach not only improves the generation quality but also guarantees the generation similarity of related sentences by exploring the semantic correlation between different sentences describing the same image. More specifically, we propose a Single-sentence Generation and Multi-sentence Discrimination (SGMD) module that explores the semantic correlation between multiple related sentences in order to reduce the variation between their generated images and enhance the reliability of the generated results. Moreover, a Progressive Negative Sample Selection mechanism (PNSS) is designed to mine more suitable negative samples for training, which can effectively promote detailed discrimination ability in the generative model and facilitate the generation of more fine-grained results. Extensive experiments on Oxford-102 and CUB datasets reveal that our MA-GAN significantly outperforms the state-of-the-art methods.

中文翻译：

多句辅助对抗网络，用于细粒度的文本到图像合成

由于生成对抗网络（GAN）的发展，在文本到图像合成任务中已经取得了重大进展。然而，大多数先前的工作仅专注于学习成对的图像和句子之间的语义一致性，而没有探索描述同一图像的不同但相关的句子之间的语义相关性，这导致了合成图像之间的显着视觉差异。因此，在本文中，我们提出了一种用于文本到图像合成的新方法，称为多句子辅助生成对抗网络（MA-GAN）；通过探索描述同一图像的不同句子之间的语义相关性，该方法不仅提高了生成质量，还保证了相关句子的生成相似度。进一步来说，我们提出了单句生成和多句判别（SGMD）模块，该模块探索多个相关句子之间的语义相关性，以减少它们生成的图像之间的差异并提高生成结果的可靠性。此外，设计了渐进式负样本选择机制（PNSS）来挖掘更合适的负样本进行训练，这可以有效地促进生成模型中详细的判别能力，并有助于生成更细粒度的结果。在Oxford-102和CUB数据集上进行的广泛实验表明，我们的MA-GAN明显优于最新方法。

更新日期：2021-02-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11