Image manipulation with natural language using Two-sided Attentive Conditional Generative Adversarial Network,Neural Networks

当前位置： X-MOL 学术 › Neural Netw. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Image manipulation with natural language using Two-sided Attentive Conditional Generative Adversarial Network
Neural Networks ( IF 7.8 ) Pub Date : 2020-09-12 , DOI: 10.1016/j.neunet.2020.09.002
Dawei Zhu , Aditya Mogadala , Dietrich Klakow

Altering the content of an image with photo editing tools is a tedious task for an inexperienced user. Especially, when modifying the visual attributes of a specific object in an image without affecting other constituents such as background etc. To simplify the process of image manipulation and to provide more control to users, it is better to utilize a simpler interface like natural language. It also enables to semantically modify parts of an image according to the given text. Therefore, in this paper, we address the challenge of manipulating images using natural language descriptions. We propose the Two-sidEd Attentive conditional Generative Adversarial Network (TEA-cGAN) to generate semantically manipulated images. TEA-cGAN’s contribution is seen as two-fold. The first contribution aims to attend locations that need to be modified during generation. It introduces two types of architectures that provide fine-grained attention both in the generator and discriminator of Generative Adversarial Network (GAN). To be specific, the first one i.e., the Single-scale architecture used in the generator focus to modify only the text-relevant regions in an image and leave other regions untouched. While the second one i.e., Multi-scale architecture further extended this idea by taking the different scales of image features into account. The second contribution purpose is to generate higher resolution images (e.g., 256 × 256) as they provide better quality and stability. Quantitative and qualitative experiments conducted on CUB and Oxford-102 datasets confirm that TEA-cGAN different scale architectures outperform existing methods while generating 128 × 128 resolution images including generating higher resolution image i.e., 256 × 256.

中文翻译：

使用双面注意条件生成对抗网络用自然语言进行图像处理

对于没有经验的用户来说，使用照片编辑工具更改图像的内容是一项繁琐的任务。特别是在修改图像中特定对象的视觉属性而不影响其他成分（例如背景等）时。为了简化图像处理过程并向用户提供更多控制，最好使用更简单的界面（例如自然语言）。它还可以根据给定的文本在语义上修改图像的各个部分。因此，在本文中，我们解决了使用自然语言描述处理图像的挑战。我们提出了两面注意力条件条件生成对抗网络（TEA-cGAN）来生成语义操纵的图像。TEA-cGAN的贡献被认为是双重的。第一项贡献旨在参加在生成过程中需要修改的位置。它介绍了两种类型的体系结构，它们在生成对抗网络（GAN）的生成器和鉴别器中均提供了细粒度的关注。具体而言，第一个，即生成器中使用的Single-scale体系结构，重点是仅修改图像中与文本相关的区域，而使其他区域保持不变。而第二种（即多尺度体系结构）通过考虑图像特征的不同尺度进一步扩展了这一思想。第二个贡献目的是生成更高分辨率的图像（例如256×256），因为它们提供了更好的质量和稳定性。

更新日期：2020-09-12

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>