Paired-D++ GAN for image manipulation with text,Machine Vision and Applications

当前位置： X-MOL 学术 › Mach. Vis. Appl. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Paired-D++ GAN for image manipulation with text
Machine Vision and Applications ( IF 2.4 ) Pub Date : 2022-04-08 , DOI: 10.1007/s00138-022-01298-7
Duc Minh Vo ₁ , Akihiro Sugimoto ₂

Affiliation

Image manipulation with text is to semantically modify the appearance of an object in a source image based on the given text describing the novel visual attributes while retaining other irrelevant information in the image, such as the background. This has a wide range of applications, such as intelligent image manipulation, and is helpful to those who are not good at painting. We propose a generative adversarial network having a pair of discriminators with different architectures, namely Paired-D++ GAN, for image manipulation with text where the two discriminators make different judgments: one for foreground synthesis and the other for background synthesis. The generator of Paired-D++ GAN has the encoder–decoder architecture with skip-connections and synthesizes an object’s appearance matching the given text description while preserving other parts of the source image. The two discriminators judge the foreground and background of the synthesized image separately to meet the given input text description and the given source image. The Paired-D++ GAN is trained using the effectively unconditional and conditional adversarial learning process in a simultaneous three-player minimax game. Our comprehensively experimental results on the Caltech-200 bird dataset and the Oxford-102 flower dataset show that Paired-D++ GAN can semantically synthesize images to match an input text description while retaining the background in a source image against the state-of-the-art methods.

中文翻译：

用于文本图像处理的 Paired-D++ GAN

带有文本的图像处理是基于描述新颖视觉属性的给定文本在语义上修改源图像中对象的外观，同时保留图像中的其他不相关信息，例如背景。这有广泛的应用，例如智能图像处理，对不擅长绘画的人有帮助。我们提出了一个生成对抗网络，它具有一对具有不同架构的鉴别器，即Paired-D++ GAN，用于带有文本的图像处理，其中两个鉴别器做出不同的判断：一个用于前景合成，另一个用于背景合成。Paired-D++ GAN 的生成器具有带跳跃连接的编码器-解码器架构，并合成与给定文本描述匹配的对象外观，同时保留源图像的其他部分。两个判别器分别判断合成图像的前景和背景，以满足给定的输入文本描述和给定的源图像。Paired-D++ GAN 使用有效的无条件和有条件对抗学习过程在同时三人极小极大游戏中进行训练。

更新日期：2022-04-08

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11