当前位置: X-MOL 学术IEEE Trans. Cogn. Dev. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Reinforcement Learning Method Using Multifunctional Principal Component Analysis for Human-Like Grasping
IEEE Transactions on Cognitive and Developmental Systems ( IF 5.0 ) Pub Date : 2020-04-17 , DOI: 10.1109/tcds.2020.2988641
Marco Monforte , Fanny Ficuciello

Postural synergies allow a rich set of hand configurations to be represented in lower dimension space compared to the original joint space. In our previous works, we have shown that this can be extended to trajectories thanks to the multivariate functional principal component analysis, obtaining a set of basis functions able to represent grasping movements learned from human demonstration. In this article, we introduce a human cognition-inspired approach for generalizing and improving robot grasping skills in this motion synergies subspace. The use of a reinforcement learning (RL) algorithm allows the robot to explore the surrounding space and improve its capability to reach and grasp objects. The learning method is the policy improvement with path integrals, running in the policy space. Bootstrapped with synergy coefficients obtained from neural networks, the policy reward is based on a force closure grasp quality index computed at the end of the task, measuring how a firm is the grip. We finally show that combining neural networks and RL allows the robot manipulator to have a good initial estimate of the grasping configuration and faster convergence to an optimal grasp with respect to a database approach, the latter a less general solution in presence of new objects.

中文翻译:

基于多功能主成分分析的强化学习方法

与原始关节空间相比,姿势协同作用可以在较低尺寸的空间中表现出丰富的手部构造。在我们之前的工作中,我们已经表明,由于进行了多元函数主成分分析,因此可以将其扩展到轨迹,从而获得了一组基本函数,这些函数可以表示从人类演示中学到的抓取运动。在本文中,我们介绍了一种受人类认知启发的方法,用于在此运动协同作用子空间中概括和改进机器人的抓取技能。强化学习(RL)算法的使用使机器人能够探索周围的空间,并提高其到达和抓住物体的能力。学习方法是在策略空间中运行具有路径积分的策略改进。通过从神经网络获得的协同系数进行引导,策略奖励基于任务结束时计算出的强制关闭抓地力质量指数,从而测量企业的抓地力。我们最终证明,将神经网络和RL结合使用可使机器人操纵器对抓取配置具有良好的初始估计,并且相对于数据库方法而言,可以更快地收敛至最佳抓取,后者是存在新对象时的较不通用的解决方案。
更新日期:2020-04-17
down
wechat
bug