Scene Graph to Image Generation with Contextualized Object Layout Refinement,arXiv - CS - Computer Vision and Pattern Recognition

当前位置： X-MOL 学术 › arXiv.cs.CV › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Scene Graph to Image Generation with Contextualized Object Layout Refinement
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2020-09-23 , DOI: arxiv-2009.10939
Maor Ivgi, Yaniv Benny, Avichai Ben-David, Jonathan Berant, and Lior Wolf

Generating high-quality images from scene graphs, that is, graphs that describe multiple entities in complex relations, is a challenging task that attracted substantial interest recently. Prior work trained such models by using supervised learning, where the goal is to produce the exact target image layout for each scene graph. It relied on predicting object locations and shapes independently and in parallel. However, scene graphs are underspecified, and thus the same scene graph often occurs with many target images in the training data. This leads to generated images with high inter-object overlap, empty areas, blurry objects, and overall compromised quality. In this work, we propose a method that alleviates these issues by generating all object layouts together and reducing the reliance on such supervision. Our model predicts layouts directly from embeddings (without predicting intermediate boxes) by gradually upsampling, refining and contextualizing object layouts. It is trained with a novel adversarial loss, that optimizes the interaction between object pairs. This improves coverage and removes overlaps, while maintaining sensible contours and respecting objects relations. We empirically show on the COCO-STUFF dataset that our proposed approach substantially improves the quality of generated layouts as well as the overall image quality. Our evaluation shows that we improve layout coverage by almost 20 points, and drop object overlap to negligible amounts. This leads to better image generation, relation fulfillment and objects quality.

中文翻译：

场景图到图像生成与上下文对象布局细化

从场景图（即描述复杂关系中的多个实体的图）生成高质量图像是一项具有挑战性的任务，最近引起了极大的兴趣。先前的工作通过使用监督学习来训练此类模型，其目标是为每个场景图生成准确的目标图像布局。它依赖于独立和并行地预测对象位置和形状。然而，场景图未指定，因此相同的场景图经常与训练数据中的许多目标图像一起出现。这导致生成的图像具有高对象间重叠、空白区域、模糊对象和整体质量受损。在这项工作中，我们提出了一种通过一起生成所有对象布局并减少对此类监督的依赖来缓解这些问题的方法。我们的模型通过逐渐上采样、细化和上下文化对象布局直接从嵌入（不预测中间框）预测布局。它使用一种新颖的对抗性损失进行训练，可以优化对象对之间的交互。这可以提高覆盖率并消除重叠，同时保持合理的轮廓并尊重对象关系。我们在 COCO-STUFF 数据集上凭经验表明，我们提出的方法大大提高了生成布局的质量以及整体图像质量。我们的评估表明，我们将布局覆盖率提高了近 20 个百分点，并将对象重叠减少到可以忽略不计的数量。这导致更好的图像生成、关系实现和对象质量。细化和上下文化对象布局。它使用一种新颖的对抗性损失进行训练，可以优化对象对之间的交互。这可以提高覆盖率并消除重叠，同时保持合理的轮廓并尊重对象关系。我们在 COCO-STUFF 数据集上凭经验表明，我们提出的方法大大提高了生成布局的质量以及整体图像质量。我们的评估表明，我们将布局覆盖率提高了近 20 个百分点，并将对象重叠减少到可以忽略不计的数量。这导致更好的图像生成、关系实现和对象质量。细化和上下文化对象布局。它使用一种新颖的对抗性损失进行训练，可以优化对象对之间的交互。这可以提高覆盖率并消除重叠，同时保持合理的轮廓并尊重对象关系。我们在 COCO-STUFF 数据集上凭经验表明，我们提出的方法大大提高了生成布局的质量以及整体图像质量。我们的评估表明，我们将布局覆盖率提高了近 20 个百分点，并将对象重叠减少到可以忽略不计的数量。这导致更好的图像生成、关系实现和对象质量。我们在 COCO-STUFF 数据集上凭经验表明，我们提出的方法大大提高了生成布局的质量以及整体图像质量。我们的评估表明，我们将布局覆盖率提高了近 20 个百分点，并将对象重叠减少到可以忽略不计的数量。这导致更好的图像生成、关系实现和对象质量。我们在 COCO-STUFF 数据集上凭经验表明，我们提出的方法大大提高了生成布局的质量以及整体图像质量。我们的评估表明，我们将布局覆盖率提高了近 20 个百分点，并将对象重叠减少到可以忽略不计的数量。这导致更好的图像生成、关系实现和对象质量。

更新日期：2020-09-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文