Dual Attention GANs for Semantic Image Synthesis,arXiv - CS - Multimedia

当前位置： X-MOL 学术 › arXiv.cs.MM › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Dual Attention GANs for Semantic Image Synthesis
arXiv - CS - Multimedia Pub Date : 2020-08-29 , DOI: arxiv-2008.13024
Hao Tang, Song Bai, Nicu Sebe

In this paper, we focus on the semantic image synthesis task that aims at transferring semantic label maps to photo-realistic images. Existing methods lack effective semantic constraints to preserve the semantic information and ignore the structural correlations in both spatial and channel dimensions, leading to unsatisfactory blurry and artifact-prone results. To address these limitations, we propose a novel Dual Attention GAN (DAGAN) to synthesize photo-realistic and semantically-consistent images with fine details from the input layouts without imposing extra training overhead or modifying the network architectures of existing methods. We also propose two novel modules, i.e., position-wise Spatial Attention Module (SAM) and scale-wise Channel Attention Module (CAM), to capture semantic structure attention in spatial and channel dimensions, respectively. Specifically, SAM selectively correlates the pixels at each position by a spatial attention map, leading to pixels with the same semantic label being related to each other regardless of their spatial distances. Meanwhile, CAM selectively emphasizes the scale-wise features at each channel by a channel attention map, which integrates associated features among all channel maps regardless of their scales. We finally sum the outputs of SAM and CAM to further improve feature representation. Extensive experiments on four challenging datasets show that DAGAN achieves remarkably better results than state-of-the-art methods, while using fewer model parameters. The source code and trained models are available at https://github.com/Ha0Tang/DAGAN.

中文翻译：

用于语义图像合成的双注意力 GAN

在本文中，我们专注于语义图像合成任务，该任务旨在将语义标签图转换为照片般逼真的图像。现有方法缺乏有效的语义约束来保留语义信息，并忽略空间和通道维度的结构相关性，导致令人不满意的模糊和容易产生伪影的结果。为了解决这些限制，我们提出了一种新颖的双注意力 GAN (DAGAN)，可以从输入布局中合成具有照片般逼真和语义一致的图像以及精细的细节，而无需施加额外的训练开销或修改现有方法的网络架构。我们还提出了两个新的模块，即位置空间注意力模块（SAM）和尺度通道注意力模块（CAM），以捕捉空间和通道维度的语义结构注意力，分别。具体来说，SAM 通过空间注意力图选择性地关联每个位置的像素，导致具有相同语义标签的像素彼此相关，而不管它们的空间距离如何。同时，CAM 通过通道注意力图选择性地强调每个通道的尺度特征，该图整合了所有通道图之间的关联特征，而不管它们的尺度如何。我们最终将 SAM 和 CAM 的输出相加，以进一步改进特征表示。在四个具有挑战性的数据集上进行的大量实验表明，DAGAN 在使用更少的模型参数的同时，取得了比最先进的方法明显更好的结果。源代码和训练模型可在 https://github.com/Ha0Tang/DAGAN 获得。SAM 通过空间注意力图选择性地关联每个位置的像素，导致具有相同语义标签的像素彼此相关，而不管它们的空间距离如何。同时，CAM 通过通道注意力图选择性地强调每个通道的尺度特征，该图整合了所有通道图之间的关联特征，而不管它们的尺度如何。我们最终将 SAM 和 CAM 的输出相加，以进一步改进特征表示。在四个具有挑战性的数据集上进行的大量实验表明，DAGAN 在使用更少的模型参数的同时，取得了比最先进的方法明显更好的结果。源代码和训练模型可在 https://github.com/Ha0Tang/DAGAN 获得。SAM 通过空间注意力图选择性地关联每个位置的像素，导致具有相同语义标签的像素彼此相关，而不管它们的空间距离如何。同时，CAM 通过通道注意力图选择性地强调每个通道的尺度特征，该图集成了所有通道图之间的关联特征，而不管它们的尺度如何。我们最终将 SAM 和 CAM 的输出相加，以进一步改进特征表示。在四个具有挑战性的数据集上进行的大量实验表明，DAGAN 在使用更少的模型参数的同时，取得了比最先进的方法明显更好的结果。源代码和训练模型可在 https://github.com/Ha0Tang/DAGAN 获得。同时，CAM 通过通道注意力图选择性地强调每个通道的尺度特征，该图集成了所有通道图之间的关联特征，而不管它们的尺度如何。我们最终将 SAM 和 CAM 的输出相加，以进一步改进特征表示。在四个具有挑战性的数据集上进行的大量实验表明，DAGAN 在使用更少的模型参数的同时，取得了比最先进的方法明显更好的结果。源代码和训练模型可在 https://github.com/Ha0Tang/DAGAN 获得。同时，CAM 通过通道注意力图选择性地强调每个通道的尺度特征，该图整合了所有通道图之间的关联特征，而不管它们的尺度如何。我们最终将 SAM 和 CAM 的输出相加，以进一步改进特征表示。在四个具有挑战性的数据集上进行的大量实验表明，DAGAN 在使用更少的模型参数的同时，取得了比最先进的方法明显更好的结果。源代码和训练模型可在 https://github.com/Ha0Tang/DAGAN 获得。在四个具有挑战性的数据集上进行的大量实验表明，DAGAN 在使用更少的模型参数的同时，取得了比最先进的方法明显更好的结果。源代码和训练模型可在 https://github.com/Ha0Tang/DAGAN 获得。在四个具有挑战性的数据集上进行的大量实验表明，DAGAN 在使用更少的模型参数的同时，取得了比最先进的方法明显更好的结果。源代码和训练模型可在 https://github.com/Ha0Tang/DAGAN 获得。

更新日期：2020-09-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>