当前位置: X-MOL 学术Int. J. Comput. Vis. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Augmented Reality Meets Computer Vision: Efficient Data Generation for Urban Driving Scenes
International Journal of Computer Vision ( IF 11.6 ) Pub Date : 2018-03-07 , DOI: 10.1007/s11263-018-1070-x
Hassan Abu Alhaija , Siva Karthik Mustikovela , Lars Mescheder , Andreas Geiger , Carsten Rother

The success of deep learning in computer vision is based on the availability of large annotated datasets. To lower the need for hand labeled images, virtually rendered 3D worlds have recently gained popularity. Unfortunately, creating realistic 3D content is challenging on its own and requires significant human effort. In this work, we propose an alternative paradigm which combines real and synthetic data for learning semantic instance segmentation and object detection models. Exploiting the fact that not all aspects of the scene are equally important for this task, we propose to augment real-world imagery with virtual objects of the target category. Capturing real-world images at large scale is easy and cheap, and directly provides real background appearances without the need for creating complex 3D models of the environment. We present an efficient procedure to augment these images with virtual objects. In contrast to modeling complete 3D environments, our data augmentation approach requires only a few user interactions in combination with 3D models of the target object category. Leveraging our approach, we introduce a novel dataset of augmented urban driving scenes with 360 degree images that are used as environment maps to create realistic lighting and reflections on rendered objects. We analyze the significance of realistic object placement by comparing manual placement by humans to automatic methods based on semantic scene analysis. This allows us to create composite images which exhibit both realistic background appearance as well as a large number of complex object arrangements. Through an extensive set of experiments, we conclude the right set of parameters to produce augmented data which can maximally enhance the performance of instance segmentation models. Further, we demonstrate the utility of the proposed approach on training standard deep models for semantic instance segmentation and object detection of cars in outdoor driving scenarios. We test the models trained on our augmented data on the KITTI 2015 dataset, which we have annotated with pixel-accurate ground truth, and on the Cityscapes dataset. Our experiments demonstrate that the models trained on augmented imagery generalize better than those trained on fully synthetic data or models trained on limited amounts of annotated real data.

中文翻译:

增强现实遇上计算机视觉:城市驾驶场景的高效数据生成

深度学习在计算机视觉中的成功基于大型注释数据集的可用性。为了减少对手工标记图像的需求,虚拟渲染的 3D 世界最近越来越受欢迎。不幸的是,创建逼真的 3D 内容本身就具有挑战性,需要大量的人力。在这项工作中,我们提出了一种替代范式,它结合了真实数据和合成数据,用于学习语义实例分割和对象检测模型。利用并非场景的所有方面对这项任务都同样重要的事实,我们建议使用目标类别的虚拟对象来增强现实世界的图像。大规模捕获真实世界的图像既简单又便宜,并且无需创建复杂的环境 3D 模型即可直接提供真实的背景外观。我们提出了一种有效的程序来用虚拟对象增强这些图像。与建模完整的 3D 环境相比,我们的数据增强方法只需要与目标对象类别的 3D 模型相结合的少数用户交互。利用我们的方法,我们引入了一个新的增强城市驾驶场景数据集,其中包含 360 度图像,用作环境地图以在渲染对象上创建逼真的照明和反射。我们通过将人类手动放置与基于语义场景分析的自动方法进行比较来分析现实对象放置的重要性。这使我们能够创建既表现出逼真的背景外观以及大量复杂对象排列的合成图像。通过大量的实验,我们总结出正确的参数集来产生增强数据,这可以最大限度地提高实例分割模型的性能。此外,我们展示了所提出的方法在训练用于户外驾​​驶场景中汽车语义实例分割和对象检测的标准深度模型方面的实用性。我们在 KITTI 2015 数据集和 Cityscapes 数据集上测试了在我们的增强数据上训练的模型,我们用像素精确的地面实况进行了注释。我们的实验表明,在增强图像上训练的模型比在完全合成数据上训练的模型或在有限数量的带注释的真实数据上训练的模型具有更好的概括性。我们展示了所提出的方法在训练标准深度模型方面的实用性,用于户外驾​​驶场景中汽车的语义实例分割和物体检测。我们在 KITTI 2015 数据集和 Cityscapes 数据集上测试了在我们的增强数据上训练的模型,我们用像素精确的地面实况进行了注释。我们的实验表明,在增强图像上训练的模型比在完全合成数据上训练的模型或在有限数量的带注释的真实数据上训练的模型具有更好的泛化能力。我们展示了所提出的方法在训练标准深度模型方面的实用性,用于户外驾​​驶场景中汽车的语义实例分割和物体检测。我们在 KITTI 2015 数据集和 Cityscapes 数据集上测试了在我们的增强数据上训练的模型,我们用像素精确的地面实况进行了注释。我们的实验表明,在增强图像上训练的模型比在完全合成数据上训练的模型或在有限数量的带注释的真实数据上训练的模型具有更好的泛化能力。
更新日期:2018-03-07
down
wechat
bug