Learning an end-to-end spatial grasp generation and refinement algorithm from simulation,Machine Vision and Applications

当前位置： X-MOL 学术 › Mach. Vis. Appl. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Learning an end-to-end spatial grasp generation and refinement algorithm from simulation
Machine Vision and Applications ( IF 2.4 ) Pub Date : 2020-10-20 , DOI: 10.1007/s00138-020-01127-9
Peiyuan Ni , Wenguang Zhang , Xiaoxiao Zhu , Qixin Cao

Novel object grasping is an important technology for robot manipulation in unstructured environments. For most of current works, a grasp sampling process is required to obtain grasp candidates, combined with a local feature extractor using deep learning. However, this pipeline is time–cost, especially when grasp points are sparse such as at the edge of a bowl. To tackle this problem, our algorithm takes the whole sparse point clouds as the input and requires no sampling or search process. Our work is combined with two steps. The first step is to predict poses, categories and scores (qualities) based on a SPH3D-GCN network. The second step is an iterative grasp pose refinement, which is to refine the best grasp generated in the first step. The whole weight sizes for these two steps are only about 0.81M and 0.52M, which takes about 73 ms for a whole prediction process including an iterative grasp pose refinement using a GeForce 840M GPU. Moreover, to generate training data of multi-object scene, a single-object dataset (79 objects from YCB object set, 23.7k grasps) and a multi-object dataset (20k point clouds with annotations and masks) combined with thin structures grasp planning are generated. Our experiment shows our work gets 76.67% success rate and 94.44% completion rate, which performs better than current state-of-the-art works.

中文翻译：

通过仿真学习端到端空间抓取的生成和完善算法

新颖的对象抓取是在非结构化环境中进行机器人操纵的一项重要技术。对于大多数当前的作品，需要使用抓地力采样过程来获得抓地力候选者，并结合使用深度学习的局部特征提取器。但是，这种流水线是耗时的，尤其是当抓取点稀疏时，例如在碗的边缘时。为了解决这个问题，我们的算法将整个稀疏点云作为输入，并且不需要采样或搜索过程。我们的工作分两个步骤进行。第一步是根据SPH3D-GCN网络预测姿势，类别和分数（质量）。第二步是迭代抓握姿势细化，这是对第一步中生成的最佳抓握的细化。这两个步骤的总重量仅为0.81M和0.52M，整个预测过程大约需要73毫秒，包括使用GeForce 840M GPU进行的迭代抓握姿势优化。此外，要生成多对象场景的训练数据，将单对象数据集（来自YCB对象集中的79个对象，把握23.7k个）和多对象数据集（具有注释和蒙版的20k点云）与薄结构结合起来，进行规划生成。我们的实验表明，我们的工作获得了76.67％的成功率和94.44％的完成率，其表现要优于当前的最新技术。7k抓取），并生成一个多对象数据集（带有注释和蒙版的20k点云）以及薄结构抓取计划。我们的实验表明，我们的工作获得了76.67％的成功率和94.44％的完成率，其表现要优于当前的最新技术。7k抓取），并生成一个多对象数据集（带有注释和蒙版的20k点云）以及薄结构抓取计划。我们的实验表明，我们的工作获得了76.67％的成功率和94.44％的完成率，其表现要优于当前的最新技术。

更新日期：2020-10-20

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11