当前位置: X-MOL 学术IEEE Multimed. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Class-Balanced Text to Image Synthesis With Attentive Generative Adversarial Network
IEEE Multimedia ( IF 2.3 ) Pub Date : 2021-01-05 , DOI: 10.1109/mmul.2020.3048939
Min Wang 1 , Congyan Lang 1 , Liqian Liang 1 , Gengyu Lyu 1 , Songhe Feng 1 , Tao Wang 1
Affiliation  

Although the text-to-image synthesis task has shown significant progress, it still remains a challenge in generating high-quality images. In this article, we first propose an attention-driven, cycle-refinement generative adversarial network, AGAN-v1, to bridge the domain gap between visual contents and semantic concepts by constructing spatial configurations of objects. The generation of image contours is the core component, in which an attention mechanism is developed to refine local details of images by focusing on the objects that complement one subregion. Second, an advanced class-balanced generative adversarial network, AGAN-v2, is proposed to address the problem of long-tailed data distribution. Importantly, it is the first method to solve this problem in the text-to-image synthesis task. Our AGAN-v2 introduces a reweighting scheme, which adopts the effective number of samples for each class to rebalance the generative loss. Extensive quantitative and qualitative experiments on CUB and MS-COCO datasets demonstrate that the proposed AGAN-v2 significantly outperforms the state-of-the-art methods.

中文翻译:

具有注意力生成对抗网络的类平衡文本到图像合成

尽管文本到图像的合成任务取得了重大进展,但在生成高质量图像方面仍然是一个挑战。在本文中,我们首先提出了一个注意力驱动的循环细化生成对抗网络 AGAN-v1,通过构建对象的空间配置来弥合视觉内容和语义概念之间的领域差距。图像轮廓的生成是核心组件,其中开发了一种注意力机制,通过关注补充一个子区域的对象来细化图像的局部细节。其次,提出了一种先进的类平衡生成对抗网络 AGAN-v2 来解决长尾数据分布的问题。重要的是,它是文本到图像合成任务中解决这个问题的第一种方法。我们的 AGAN-v2 引入了重新加权方案,它采用每个类的有效样本数来重新平衡生成损失。在 CUB 和 MS-COCO 数据集上进行的大量定量和定性实验表明,所提出的 AGAN-v2 显着优于最先进的方法。
更新日期:2021-01-05
down
wechat
bug