当前位置: X-MOL 学术IEEE Trans. Image Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Unified Generative Adversarial Networks for Controllable Image-to-Image Translation.
IEEE Transactions on Image Processing ( IF 10.6 ) Pub Date : 2020-09-11 , DOI: 10.1109/tip.2020.3021789
Hao Tang , Hong Liu , Nicu Sebe

We propose a unified Generative Adversarial Network (GAN) for controllable image-to-image translation, i.e., transferring an image from a source to a target domain guided by controllable structures. In addition to conditioning on a reference image, we show how the model can generate images conditioned on controllable structures, e.g., class labels, object keypoints, human skeletons, and scene semantic maps. The proposed model consists of a single generator and a discriminator taking a conditional image and the target controllable structure as input. In this way, the conditional image can provide appearance information and the controllable structure can provide the structure information for generating the target result. Moreover, our model learns the image-to-image mapping through three novel losses, i.e., color loss, controllable structure guided cycle-consistency loss, and controllable structure guided self-content preserving loss. Also, we present the Fréchet ResNet Distance (FRD) to evaluate the quality of the generated images. Experiments on two challenging image translation tasks, i.e., hand gesture-to-gesture translation and cross-view image translation, show that our model generates convincing results, and significantly outperforms other state-of-the-art methods on both tasks. Meanwhile, the proposed framework is a unified solution, thus it can be applied to solving other controllable structure guided image translation tasks such as landmark guided facial expression translation and keypoint guided person image generation. To the best of our knowledge, we are the first to make one GAN framework work on all such controllable structure guided image translation tasks. Code is available at https://github.com/Ha0Tang/GestureGAN .

中文翻译:

统一的生成对抗网络,可控制图像到图像的翻译。

我们提出了一个统一的生成对抗网络(GAN),用于可控的图像到图像转换,即,将图像从源传输到可控结构引导的目标域。除了以参考图像为条件之外,我们还展示了该模型如何生成以可控结构(例如,类标签,对象关键点,人体骨骼和场景语义图)为条件的图像。所提出的模型由单个发生器和鉴别器组成,该鉴别器将条件图像和目标可控结构作为输入。以这种方式,条件图像可以提供外观信息,并且可控结构可以提供用于生成目标结果的结构信息。此外,我们的模型通过三种新颖的损失(即颜色损失,可控结构导致循环一致性损失,而可控结构导致自我含量保持损失。此外,我们提出了FréchetResNet距离(FRD)以评估生成图像的质量。对两个具有挑战性的图像翻译任务(即手势到手势翻译和交叉视图图像翻译)进行的实验表明,我们的模型产生了令人信服的结果,并且在这两个任务上均明显优于其他最新方法。同时,所提出的框架是统一的解决方案,因此可以用于解决其他可控的结构引导图像翻译任务,例如地标引导的面部表情翻译和关键点引导的人的图像生成。据我们所知,我们是第一个在所有此类可控结构引导的图像翻译任务上使用GAN框架的公司。代码位于https://github.com/Ha0Tang/GestureGAN
更新日期:2020-09-22
down
wechat
bug