Object-level Scene Context Prediction,IEEE Transactions on Pattern Analysis and Machine Intelligence

当前位置： X-MOL 学术 › IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Object-level Scene Context Prediction
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 20.8 ) Pub Date : 2021-04-27 , DOI: 10.1109/tpami.2021.3075676
Xiaotian Qiao , Quanlong Zheng , Ying Cao , Rynson W.H. Lau

Contextual information plays an important role in solving various image and scene understanding tasks. Prior works have focused on the extraction of contextual information from an image and use it to infer the properties of some object(s) in the image or understand the scene behind the image, e.g., context-based object detection, recognition and semantic segmentation. In this paper, we consider an inverse problem, i.e., how to hallucinate the missing contextual information from the properties of standalone objects. We refer to it as object-level scene context prediction. This problem is difficult, as it requires extensive knowledge of the complex and diverse relationships among objects in the scene. We propose a deep neural network, which takes as input the properties (i.e., category, shape, and position) of a few standalone objects to predict an object-level scene layout that compactly encodes the semantics and structure of the scene context where the given objects are. Quantitative experiments and user studies demonstrate that our model can generate more plausible scene contexts than the baselines. Our model also enables the synthesis of realistic scene images from partial scene layouts. Finally, we validate that our model internally learns useful features for scene recognition and fake scene detection.

中文翻译：

对象级场景上下文预测

上下文信息在解决各种图像和场景理解任务中发挥着重要作用。先前的工作主要集中在从图像中提取上下文信息，并使用它来推断图像中某些对象的属性或理解图像背后的场景，例如基于上下文的对象检测、识别和语义分割。在本文中，我们考虑一个逆问题，即如何从独立对象的属性中幻觉缺失的上下文信息。我们将其称为对象级场景上下文预测。这个问题很困难，因为它需要对场景中物体之间复杂多样的关系有广泛的了解。我们提出了一种深度神经网络，它将一些独立对象的属性（即类别、形状和位置）作为输入来预测对象级场景布局，该布局紧凑地编码给定场景上下文的语义和结构。对象是。定量实验和用户研究表明，我们的模型可以生成比基线更合理的场景上下文。我们的模型还可以从部分场景布局合成真实的场景图像。最后，我们验证我们的模型在内部学习了用于场景识别和假场景检测的有用特征。

更新日期：2021-04-27

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11