当前位置: X-MOL 学术J. Electron. Imaging › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Attn-Eh ALN: complex text-to-image generation with attention-enhancing adversarial learning networks
Journal of Electronic Imaging ( IF 1.1 ) Pub Date : 2020-12-22 , DOI: 10.1117/1.jei.29.6.063014
Cunyi Lin 1 , Xianwei Rong 1 , Ming Liu 1 , Xiaoyan Yu 1
Affiliation  

Abstract. Text-to-image generation can be widely applied in various fields, such as scene retrieval and computer-aided design. The existing approaches can generate realistic images from simple text descriptions, whereas rendering images from complex text descriptions is still not satisfactory for practical applications. To generate accurate high-resolution images from given complex texts, we proposed an attention-enhancing adversarial learning network (Attn-Eh ALN) based upon conditional generative adversarial networks and the attention mechanism. This model consists of an encoding module and a generative module. In the encoding module, we proposed a local attention driven encoding network that allows words in the text with different weights to enhance the semantic representation of specific object features. The attention mechanism is employed to capture more details while ensuring global information. This enables the details in the generated images to be more fine-grained. In the discriminating stage, we take multiple discriminators to distinguish the realness of the generated images, avoiding the bias caused by a single discriminator. Moreover, a semantic similarity judgment module is introduced to improve the semantic consistency between the text description and visual content. Experimental results on benchmark datasets indicate that Attn-Eh ALN generates favorable results in comparison with other state-of-the-art methods from qualitative and quantitative assessments.

中文翻译:

Attn-Eh ALN:具有注意力增强对抗性学习网络的复杂文本到图像生成

摘要。文本到图像的生成可以广泛应用于各个领域,例如场景检索和计算机辅助设计。现有的方法可以从简单的文本描述中生成逼真的图像,而从复杂的文本描述中渲染图像对于实际应用仍然不能令人满意。为了从给定的复杂文本生成准确的高分辨率图像,我们提出了一种基于条件生成对抗网络和注意力机制的注意力增强对抗学习网络(Attn-Eh ALN)。该模型由编码模块和生成模块组成。在编码模块中,我们提出了一个局部注意力驱动的编码网络,允许文本中具有不同权重的单词来增强特定对象特征的语义表示。注意力机制用于捕获更多细节,同时确保全局信息。这使得生成的图像中的细节更加细粒度。在判别阶段,我们采用多个判别器来区分生成图像的真实性,避免单个判别器造成的偏差。此外,还引入了语义相似度判断模块,以提高文本描述和视觉内容之间的语义一致性。基准数据集的实验结果表明,与来自定性和定量评估的其他最先进方法相比,Attn-Eh ALN 产生了有利的结果。我们采用多个判别器来区分生成图像的真实性,避免单个判别器造成的偏差。此外,还引入了语义相似度判断模块,以提高文本描述和视觉内容之间的语义一致性。基准数据集的实验结果表明,与来自定性和定量评估的其他最先进方法相比,Attn-Eh ALN 产生了有利的结果。我们采用多个判别器来区分生成图像的真实性,避免单个判别器造成的偏差。此外,还引入了语义相似度判断模块,以提高文本描述和视觉内容之间的语义一致性。基准数据集的实验结果表明,与来自定性和定量评估的其他最先进方法相比,Attn-Eh ALN 产生了有利的结果。
更新日期:2020-12-22
down
wechat
bug