PennSyn2Real: Training Object Recognition Models Without Human Labeling,IEEE Robotics and Automation Letters

当前位置： X-MOL 学术 › IEEE Robot. Automation Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

PennSyn2Real: Training Object Recognition Models Without Human Labeling
IEEE Robotics and Automation Letters ( IF 5.2 ) Pub Date : 2021-03-31 , DOI: 10.1109/lra.2021.3070249
Ty Nguyen , Ian D. Miller , Avi Cohen , Dinesh Thakur , Arjun Guru , Shashank Prasad , Camillo J. Taylor , Pratik Chaudhari , Vijay Kumar

Scalable training data generation is a critical problem in deep learning. We propose PennSyn2Real - a photo-realistic synthetic dataset consisting of more than 100 000 4K images of more than 20 types of micro aerial vehicles (MAVs). The dataset can be used to generate arbitrary numbers of training images for high-level computer vision tasks such as MAV detection and classification. Our data generation framework bootstraps chroma-keying, a mature cinematography technique, with a motion tracking system providing artifact-free and curated annotated images. Our system, therefore, allows object orientations and lighting to be controlled. This framework is easy to set up and can be applied to a broad range of objects, reducing the gap between synthetic and real-world data. We show that synthetic data generated using this framework can be directly used to train CNN models for common object recognition tasks such as detection and segmentation. We demonstrate competitive performance in comparison with training using only real images. Furthermore, bootstrapping the generated synthetic data in few-shot learning can significantly improve the overall performance, reducing the number of required training data samples to achieve the desired accuracy.

中文翻译：

PennSyn2Real：无需人工标记的训练对象识别模型

可扩展的训练数据生成是深度学习中的关键问题。我们建议使用PennSyn2Real-一个逼真的合成数据集，其中包含20多种类型的微型飞行器（MAV）的100000多张4K图像。该数据集可用于生成任意数量的训练图像，以用于高级计算机视觉任务，例如MAV检测和分类。我们的数据生成框架引导色度键控（一种成熟的摄影技术），并带有一个运动跟踪系统，可提供无伪影和精选的带注释的图像。因此，我们的系统允许控制对象的方向和照明。该框架易于设置，可应用于各种对象，从而缩小了合成数据与实际数据之间的差距。我们展示了使用此框架生成的合成数据可直接用于训练CNN模型，以进行常见的对象识别任务，例如检测和分割。与仅使用真实图像的训练相比，我们证明了具有竞争力的表现。此外，在几次学习中自举生成的合成数据可以显着提高整体性能，从而减少实现所需精度所需的训练数据样本的数量。

更新日期：2021-04-27

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>