Adversarial text-to-image synthesis: A review,Neural Networks

当前位置： X-MOL 学术 › Neural Netw. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Adversarial text-to-image synthesis: A review
Neural Networks ( IF 7.8 ) Pub Date : 2021-08-08 , DOI: 10.1016/j.neunet.2021.07.019
Stanislav Frolov ₁ , Tobias Hinz ₂ , Federico Raue ₃ , Jörn Hees ₃ , Andreas Dengel ₁

Affiliation

With the advent of generative adversarial networks, synthesizing images from text descriptions has recently become an active research area. It is a flexible and intuitive way for conditional image generation with significant progress in the last years regarding visual realism, diversity, and semantic alignment. However, the field still faces several challenges that require further research efforts such as enabling the generation of high-resolution images with multiple objects, and developing suitable and reliable evaluation metrics that correlate with human judgement. In this review, we contextualize the state of the art of adversarial text-to-image synthesis models, their development since their inception five years ago, and propose a taxonomy based on the level of supervision. We critically examine current strategies to evaluate text-to-image synthesis models, highlight shortcomings, and identify new areas of research, ranging from the development of better datasets and evaluation metrics to possible improvements in architectural design and model training. This review complements previous surveys on generative adversarial networks with a focus on text-to-image synthesis which we believe will help researchers to further advance the field.

中文翻译：

对抗性文本到图像合成：综述

随着生成对抗网络的出现，从文本描述合成图像最近已成为一个活跃的研究领域。它是一种灵活且直观的条件图像生成方式，在过去几年中在视觉真实感、多样性和语义对齐方面取得了重大进展。然而，该领域仍然面临一些需要进一步研究的挑战，例如能够生成具有多个对象的高分辨率图像，以及开发与人类判断相关的合适且可靠的评估指标。在这篇评论中，我们将对抗性文本到图像合成模型的最新技术、它们自五年前问世以来的发展背景化，并提出了一种基于监督级别的分类法。我们批判性地检查当前评估文本到图像合成模型的策略，突出缺点并确定新的研究领域，从开发更好的数据集和评估指标到架构设计和模型培训的可能改进。这篇评论补充了之前关于生成对抗网络的调查，重点是文本到图像的合成，我们相信这将有助于研究人员进一步推进该领域。

更新日期：2021-09-06

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>