当前位置: X-MOL 学术J. Visual Commun. Image Represent. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Text to photo-realistic image synthesis via chained deep recurrent generative adversarial network
Journal of Visual Communication and Image Representation ( IF 2.6 ) Pub Date : 2021-01-01 , DOI: 10.1016/j.jvcir.2020.102955
Min Wang , Congyan Lang , Songhe Feng , Tao Wang , Yi Jin , Yidong Li

Despite the promising progress made in recent years, automatically generating high-resolution realistic images from text descriptions remains a challenging task due to semantic gap between human-written descriptions and diversities of visual appearance. Most existing approaches generate the rough images with the given text descriptions, while the relationship between sentence semantics and visual content is not holistically exploited. In this paper, we propose a novel chained deep recurrent generative adversarial network (CDRGAN) for synthesizing images from text descriptions. Our model uses carefully designed chained deep recurrent generators that simultaneously recovers global image structures and local details. Specially, our method not only considers the logic relationships of image pixels, but also removes computational bottlenecks through parameters sharing. We evaluate our method on three public benchmarks: CUB, Oxford-102 and MS COCO datasets. Experimental results show that our method significantly outperforms the state-of-the-art approaches consistently across different evaluation metrics.



中文翻译:

通过链式深度递归生成对抗网络将文本转换为逼真的图像

尽管近年来取得了令人鼓舞的进步,但由于人工描述和视觉外观多样性之间存在语义鸿沟,因此从文本描述自动生成高分辨率的逼真图像仍然是一项艰巨的任务。现有的大多数方法都使用给定的文本描述来生成粗糙图像,而句子语义与视觉内容之间的关系并未得到全面利用。在本文中,我们提出了一种新颖的链式深度递归生成对抗网络(CDRGAN),用于从文本描述中合成图像。我们的模型使用精心设计的链式深度递归生成器,可同时恢复全局图像结构和局部细节。特别地,我们的方法不仅考虑了图像像素的逻辑关系,而且还可以通过参数共享消除计算瓶颈。我们在三个公共基准上评估了我们的方法:CUB,Oxford-102和MS COCO数据集。实验结果表明,在不同的评估指标上,我们的方法明显优于最新方法。

更新日期:2021-01-07
down
wechat
bug