Object-Centric Image Generation from Layouts,arXiv - CS - Machine Learning

当前位置： X-MOL 学术 › arXiv.cs.LG › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Object-Centric Image Generation from Layouts
arXiv - CS - Machine Learning Pub Date : 2020-03-16 , DOI: arxiv-2003.07449
Tristan Sylvain and Pengchuan Zhang and Yoshua Bengio and R Devon Hjelm and Shikhar Sharma

Despite recent impressive results on single-object and single-domain image generation, the generation of complex scenes with multiple objects remains challenging. In this paper, we start with the idea that a model must be able to understand individual objects and relationships between objects in order to generate complex scenes well. Our layout-to-image-generation method, which we call Object-Centric Generative Adversarial Network (or OC-GAN), relies on a novel Scene-Graph Similarity Module (SGSM). The SGSM learns representations of the spatial relationships between objects in the scene, which lead to our model's improved layout-fidelity. We also propose changes to the conditioning mechanism of the generator that enhance its object instance-awareness. Apart from improving image quality, our contributions mitigate two failure modes in previous approaches: (1) spurious objects being generated without corresponding bounding boxes in the layout, and (2) overlapping bounding boxes in the layout leading to merged objects in images. Extensive quantitative evaluation and ablation studies demonstrate the impact of our contributions, with our model outperforming previous state-of-the-art approaches on both the COCO-Stuff and Visual Genome datasets. Finally, we address an important limitation of evaluation metrics used in previous works by introducing SceneFID -- an object-centric adaptation of the popular Fr{\'e}chet Inception Distance metric, that is better suited for multi-object images.

中文翻译：

从布局生成以对象为中心的图像

尽管最近在单对象和单域图像生成方面取得了令人印象深刻的结果，但具有多个对象的复杂场景的生成仍然具有挑战性。在本文中，我们从模型必须能够理解单个对象和对象之间的关系才能很好地生成复杂场景的想法开始。我们的布局到图像生成方法，我们称之为以对象为中心的生成对抗网络（或 OC-GAN），依赖于一种新颖的场景图相似性模块（SGSM）。SGSM 学习场景中对象之间空间关系的表示，从而提高我们模型的布局保真度。我们还建议对生成器的调节机制进行更改，以增强其对象实例意识。除了提升画质，我们的贡献减轻了先前方法中的两种失败模式：（1）在布局中没有相应边界框的情况下生成了虚假对象，以及（2）布局中的重叠边界框导致图像中的合并对象。广泛的定量评估和消融研究证明了我们的贡献的影响，我们的模型在 COCO-Stuff 和 Visual Genome 数据集上的表现都优于以前的最先进方法。最后，我们通过引入 SceneFID 来解决之前工作中使用的评估指标的一个重要限制——一种以对象为中心的适应流行的 Fr{\'e}chet Inception Distance 指标，它更适合多对象图像。(2) 布局中重叠的边界框导致图像中的合并对象。广泛的定量评估和消融研究证明了我们的贡献的影响，我们的模型在 COCO-Stuff 和 Visual Genome 数据集上的表现都优于以前的最先进方法。最后，我们通过引入 SceneFID 来解决之前工作中使用的评估指标的一个重要限制——一种以对象为中心的适应流行的 Fr{\'e}chet Inception Distance 指标，它更适合多对象图像。(2) 布局中重叠的边界框导致图像中的合并对象。广泛的定量评估和消融研究证明了我们的贡献的影响，我们的模型在 COCO-Stuff 和 Visual Genome 数据集上的表现都优于以前的最先进方法。最后，我们通过引入 SceneFID 来解决之前工作中使用的评估指标的一个重要限制——一种以对象为中心的适应流行的 Fr{\'e}chet Inception Distance 指标，它更适合多对象图像。

更新日期：2020-03-18

点击分享查看原文

点击收藏

阅读更多本刊最新论文