Object-level Scene Context Prediction.,IEEE Transactions on Pattern Analysis and Machine Intelligence

当前位置： X-MOL 学术 › IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Object-level Scene Context Prediction.
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 20.8 ) Pub Date : 2021-04-27 , DOI: 10.1109/tpami.2021.3075676
Xiaotian Qiao , Quanlong Zheng , Ying Cao , Rynson W H Lau

Contextual information plays an important role in solving various image and scene understanding tasks. Prior works have focused on the extraction of contextual information from an image and use it to infer the properties of some object(s) in the image or understand the scene behind the image, e.g., context-based object detection, recognition and semantic segmentation. In this paper, we consider an inverse problem, i.e., how to hallucinate the missing contextual information from the properties of standalone objects. We refer to it as object-level scene context prediction. This problem is difficult, as it requires extensive knowledge of the complex and diverse relationships among objects in the scene. We propose a deep neural network, which takes as input the properties (i.e., category, shape, and position) of a few standalone objects to predict an object-level scene layout that compactly encodes the semantics and structure of the scene context where the given objects are. Quantitative experiments and user studies demonstrate that our model can generate more plausible scene contexts than the baselines. Our model also enables the synthesis of realistic scene images from partial scene layouts. Finally, we validate that our model internally learns useful features for scene recognition and fake scene detection.

中文翻译：

对象级场景上下文预测。

上下文信息在解决各种图像和场景理解任务中起着重要作用。先前的工作集中于从图像中提取上下文信息，并将其用于推断图像中某些对象的属性或了解图像背后的场景，例如基于上下文的对象检测，识别和语义分割。在本文中，我们考虑了一个反问题，即如何从独立对象的属性中使缺失的上下文信息产生幻觉。我们将其称为对象级场景上下文预测。这个问题很困难，因为它需要对场景中对象之间复杂而多样的关系有广泛的了解。我们提出了一个深度神经网络，该网络以属性（即类别，形状，和位置）以预测对象级别的场景布局，该布局紧凑地编码给定对象所在的场景上下文的语义和结构。定量实验和用户研究表明，与基线相比，我们的模型可以生成更合理的场景上下文。我们的模型还可以根据部分场景布局合成逼真的场景图像。最后，我们验证我们的模型在内部学习了场景识别和伪造场景检测的有用功能。我们的模型还可以根据部分场景布局合成逼真的场景图像。最后，我们验证我们的模型在内部学习了场景识别和伪造场景检测的有用功能。我们的模型还可以根据部分场景布局合成逼真的场景图像。最后，我们验证我们的模型在内部学习了场景识别和伪造场景检测的有用功能。

更新日期：2021-04-27

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11