当前位置: X-MOL 学术IEEE Trans. Autom. Sci. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Semiautomatic Labeling for Deep Learning in Robotics
IEEE Transactions on Automation Science and Engineering ( IF 5.9 ) Pub Date : 9-18-2019 , DOI: 10.1109/tase.2019.2938316
Daniele De Gregorio , Alessio Tonioni , Gianluca Palli , Luigi Di Stefano

In this article, we propose an augmented reality semiautomatic labeling (ARS), a semiautomatic method which leverages on moving a 2-D camera by means of a robot, proving precise camera tracking, and an augmented reality pen (ARP) to define initial object bounding box, to create large labeled data sets with minimal human intervention. By removing the burden of generating annotated data from humans, we make the deep learning technique applied to computer vision, which typically requires very large data sets, truly automated and reliable. With the ARS pipeline, we created two novel data sets effortlessly, one on electromechanical components (industrial scenario) and other on fruits (daily-living scenario) and trained two state-of-the-art object detectors robustly, based on convolutional neural networks, such as you only look once (YOLO) and single shot detector (SSD). With respect to conventional manual annotation of 1000 frames that takes us slightly more than 10 h, the proposed approach based on ARS allows to annotate 9 sequences of about 35 000 frames in less than 1 h, with a gain factor of about 450. Moreover, both the precision and recall of object detection is increased by about 15% with respect to manual labeling. All our software is available as a robot operating system (ROS) package in a public repository alongside with the novel annotated data sets. Note to Practitioners—This article was motivated by the lack of a simple and effective solution for the generation of data sets usable to train a data-driven model, such as a modern deep neural network, so as to make them accessible in an industrial environment. Specifically, a deep learning robot guidance vision system would require such a large amount of manually labeled images that it would be too expensive and impractical for a real use case, where system reconfigurability is a fundamental requirement. With our system, on the other hand, especially in the field of industrial robotics, the cost of image labeling can be reduced, for the first time, to nearly zero, thus paving the way for self-reconfiguring systems with very high performance (as demonstrated by our experimental results). One of the limitations of this approach is the need to use a manual method for the detection of objects of interest in the preliminary stages of the pipeline (ARP or graphical interface). A feasible extension, related to the field of collaborative robotics, could be used to exploit the robot itself, manually moved by the user, even for this preliminary stage, so as to eliminate any source of inaccuracy.

中文翻译:


机器人深度学习的半自动标记



在本文中,我们提出了一种增强现实半自动标记(ARS),一种半自动方法,利用机器人移动 2D 相机,证明精确的相机跟踪,以及增强现实笔(ARP)来定义初始对象边界框,以最少的人为干预创建大型标记数据集。通过消除人类生成注释数据的负担,我们将深度学习技术应用于计算机视觉,这通常需要非常大的数据集,真正自动化且可靠。通过 ARS 管道,我们毫不费力地创建了两个新颖的数据集,一个关于机电组件(工业场景),另一个关于水果(日常生活场景),并基于卷积神经网络稳健地训练了两个最先进的物体检测器,比如 You Only Look Once (YOLO) 和 Single Shot detector (SSD)。相对于传统的手动注释 1000 帧需要花费 10 多小时的时间,所提出的基于 ARS 的方法允许在不到 1 小时的时间内注释 9 个约 35 000 帧的序列,增益因子约为 450。此外,物体检测的精度和召回率相对于手动标注提高了约15%。我们所有的软件都可以作为机器人操作系统(ROS)包在公共存储库中与新颖的带注释的数据集一起使用。从业者注意事项——本文的动机是缺乏一种简单有效的解决方案来生成可用于训练数据驱动模型(例如现代深度神经网络)的数据集,从而使它们可以在工业环境中访问。 具体来说,深度学习机器人引导视觉系统需要大量手动标记的图像,对于实际使用案例来说过于昂贵且不切实际,而系统可重构性是系统的基本要求。另一方面,使用我们的系统,特别是在工业机器人领域,图像标记的成本可以首次降低到几乎为零,从而为具有非常高性能的自重新配置系统铺平道路(如我们的实验结果证明了这一点)。这种方法的局限性之一是需要在管道的初始阶段(ARP 或图形界面)使用手动方法来检测感兴趣的对象。与协作机器人领域相关的可行扩展可以用于利用机器人本身,由用户手动移动,即使是在这个初步阶段,以消除任何不准确的来源。
更新日期:2024-08-22
down
wechat
bug