当前位置: X-MOL 学术Int. J. Comput. Vis. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Configurable 3D Scene Synthesis and 2D Image Rendering with Per-pixel Ground Truth Using Stochastic Grammars
International Journal of Computer Vision ( IF 11.6 ) Pub Date : 2018-06-30 , DOI: 10.1007/s11263-018-1103-5
Chenfanfu Jiang , Siyuan Qi , Yixin Zhu , Siyuan Huang , Jenny Lin , Lap-Fai Yu , Demetri Terzopoulos , Song-Chun Zhu

We propose a systematic learning-based approach to the generation of massive quantities of synthetic 3D scenes and arbitrary numbers of photorealistic 2D images thereof, with associated ground truth information, for the purposes of training, benchmarking, and diagnosing learning-based computer vision and robotics algorithms. In particular, we devise a learning-based pipeline of algorithms capable of automatically generating and rendering a potentially infinite variety of indoor scenes by using a stochastic grammar, represented as an attributed Spatial And-Or Graph, in conjunction with state-of-the-art physics-based rendering. Our pipeline is capable of synthesizing scene layouts with high diversity, and it is configurable inasmuch as it enables the precise customization and control of important attributes of the generated scenes. It renders photorealistic RGB images of the generated scenes while automatically synthesizing detailed, per-pixel ground truth data, including visible surface depth and normal, object identity, and material information (detailed to object parts), as well as environments (e.g., illuminations and camera viewpoints). We demonstrate the value of our synthesized dataset, by improving performance in certain machine-learning-based scene understanding tasks—depth and surface normal prediction, semantic segmentation, reconstruction, etc.—and by providing benchmarks for and diagnostics of trained models by modifying object attributes and scene properties in a controllable manner.

中文翻译:

使用随机语法的可配置 3D 场景合成和 2D 图像渲染,每像素地面实况

我们提出了一种系统的基于学习的方法来生成大量合成 3D 场景及其任意数量的逼真 2D 图像,以及相关的地面实况信息,用于训练、基准测试和诊断基于学习的计算机视觉和机器人技术算法。特别是,我们设计了一种基于学习的算法管道,能够通过使用随机语法(表示为属性空间与或图)结合当前状态自动生成和渲染潜在的无限多种室内场景。基于艺术物理的渲染。我们的管道能够合成具有高度多样性的场景布局,并且它是可配置的,因为它能够精确定制和控制生成场景的重要属性。它渲染生成场景的逼真 RGB 图像,同时自动合成详细的每像素地面实况数据,包括可见表面深度和法线、对象身份和材料信息(详细到对象部分)以及环境(例如,照明和相机视角)。我们通过提高某些基于机器学习的场景理解任务(深度和表面法线预测、语义分割、重建等)的性能,并通过修改对象为训练模型提供基准和诊断,证明了我们合成数据集的价值属性和场景属性以可控的方式。和材料信息(详细到对象部分),以及环境(例如,照明和相机视点)。我们通过提高某些基于机器学习的场景理解任务(深度和表面法线预测、语义分割、重建等)的性能,并通过修改对象为训练模型提供基准和诊断,证明了我们合成数据集的价值属性和场景属性以可控的方式。和材料信息(详细到对象部分),以及环境(例如,照明和相机视点)。我们通过提高某些基于机器学习的场景理解任务(深度和表面法线预测、语义分割、重建等)的性能,并通过修改对象为训练模型提供基准和诊断,证明了我们合成数据集的价值属性和场景属性以可控的方式。
更新日期:2018-06-30
down
wechat
bug