当前位置: X-MOL 学术Int. J. Comput. Vis. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Visiting the Invisible: Layer-by-Layer Completed Scene Decomposition
International Journal of Computer Vision ( IF 19.5 ) Pub Date : 2021-09-28 , DOI: 10.1007/s11263-021-01517-0
Chuanxia Zheng 1 , Guoxian Song 1 , Tat-Jen Cham 1 , Duy-Son Dao 2 , Jianfei Cai 2
Affiliation  

Existing scene understanding systems mainly focus on recognizing the visible parts of a scene, ignoring the intact appearance of physical objects in the real-world. Concurrently, image completion has aimed to create plausible appearance for the invisible regions, but requires a manual mask as input. In this work, we propose a higher-level scene understanding system to tackle both visible and invisible parts of objects and backgrounds in a given scene. Particularly, we built a system to decompose a scene into individual objects, infer their underlying occlusion relationships, and even automatically learn which parts of the objects are occluded that need to be completed. In order to disentangle the occluded relationships of all objects in a complex scene, we use the fact that the front object without being occluded is easy to be identified, detected, and segmented. Our system interleaves the two tasks of instance segmentation and scene completion through multiple iterations, solving for objects layer-by-layer. We first provide a thorough experiment using a new realistically rendered dataset with ground-truths for all invisible regions. To bridge the domain gap to real imagery where ground-truths are unavailable, we then train another model with the pseudo-ground-truths generated from our trained synthesis model. We demonstrate results on a wide variety of datasets and show significant improvement over the state-of-the-art.



中文翻译:

探访隐形:逐层完成场景分解

现有的场景理解系统主要侧重于识别场景的可见部分,而忽略了现实世界中物理对象的完整外观。同时,图像补全旨在为不可见区域创建合理的外观,但需要手动蒙版作为输入。在这项工作中,我们提出了一个更高级别的场景理解系统来处理给定场景中物体和背景的可见和不可见部分。特别是,我们构建了一个系统来将场景分解为单个对象,推断它们潜在的遮挡关系,甚至自动学习需要完成的对象的哪些部分被遮挡。为了解开复杂场景中所有物体的遮挡关系,我们利用没有被遮挡的前方物体容易被识别、检测、并分段。我们的系统通过多次迭代将实例分割和场景完成这两个任务交织在一起,逐层求解对象。我们首先使用一个新的真实渲染数据集提供了一个彻底的实验,其中包含所有不可见区域的真实情况。为了将领域差距与真实图像不可用的真实图像联系起来,我们然后用我们训练的合成模型生成的伪真实图像训练另一个模型。我们在各种数据集上展示了结果,并显示出对最先进技术的显着改进。我们首先使用一个新的真实渲染数据集提供了一个彻底的实验,其中包含所有不可见区域的真实情况。为了将领域差距与真实图像不可用的真实图像联系起来,我们然后用我们训练的合成模型生成的伪真实图像训练另一个模型。我们在各种数据集上展示了结果,并显示出对最先进技术的显着改进。我们首先使用一个新的真实渲染数据集提供了一个彻底的实验,其中包含所有不可见区域的真实情况。为了将领域差距与真实图像不可用的真实图像联系起来,我们然后用我们训练的合成模型生成的伪真实图像训练另一个模型。我们在各种数据集上展示了结果,并显示出对最先进技术的显着改进。

更新日期:2021-09-28
down
wechat
bug