当前位置: X-MOL 学术ACM Trans. Intell. Syst. Technol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
End-to-End Text-to-Image Synthesis with Spatial Constrains
ACM Transactions on Intelligent Systems and Technology ( IF 7.2 ) Pub Date : 2020-05-26 , DOI: 10.1145/3391709
Min Wang 1 , Congyan Lang 1 , Liqian Liang 1 , Songhe Feng 1 , Tao Wang 1 , Yutong Gao 1
Affiliation  

Although the performance of automatically generating high-resolution realistic images from text descriptions has been significantly boosted, many challenging issues in image synthesis have not been fully investigated, due to shapes variations, viewpoint changes, pose changes, and the relations of multiple objects. In this article, we propose a novel end-to-end approach for text-to-image synthesis with spatial constraints by mining object spatial location and shape information. Instead of learning a hierarchical mapping from text to image, our algorithm directly generates multi-object fine-grained images through the guidance of the generated semantic layouts. By fusing text semantic and spatial information into a synthesis module and jointly fine-tuning them with multi-scale semantic layouts generated, the proposed networks show impressive performance in text-to-image synthesis for complex scenes. We evaluate our method both on single-object CUB dataset and multi-object MS-COCO dataset. Comprehensive experimental results demonstrate that our method significantly outperforms the state-of-the-art approaches consistently across different evaluation metrics.

中文翻译:

具有空间约束的端到端文本到图像合成

尽管从文本描述中自动生成高分辨率逼真图像的性能得到了显着提升,但由于形状变化、视点变化、姿势变化以及多个对象的关系,图像合成中的许多具有挑战性的问题尚未得到充分研究。在本文中,我们提出了一种新颖的端到端方法,用于通过挖掘对象空间位置和形状信息来进行具有空间约束的文本到图像合成。我们的算法不是学习从文本到图像的层次映射,而是通过生成的语义布局的指导直接生成多对象细粒度图像。通过将文本语义和空间信息融合到一个合成模块中,并与生成的多尺度语义布局联合微调它们,所提出的网络在复杂场景的文本到图像合成方面表现出令人印象深刻的性能。我们在单对象 CUB 数据集和多对象 MS-COCO 数据集上评估我们的方法。综合实验结果表明,我们的方法在不同的评估指标上始终显着优于最先进的方法。
更新日期:2020-05-26
down
wechat
bug