当前位置: X-MOL 学术IEEE Trans. Cybern. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Robot Policy Improvement With Natural Evolution Strategies for Stable Nonlinear Dynamical System
IEEE Transactions on Cybernetics ( IF 11.8 ) Pub Date : 2022-08-05 , DOI: 10.1109/tcyb.2022.3192049
Yingbai Hu 1 , Guang Chen 1 , Zhijun Li 2 , Alois Knoll 1
Affiliation  

Robot learning through kinesthetic teaching is a promising way of cloning human behaviors, but it has its limits in the performance of complex tasks with small amounts of data, due to compounding errors. In order to improve the robustness and adaptability of imitation learning, a hierarchical learning strategy is proposed: low-level learning comprises only behavioral cloning with supervised learning, and high-level learning constitutes policy improvement. First, the Gaussian mixture model (GMM)-based dynamical system is formulated to encode a motion from the demonstration. We then derive the sufficient conditions of the GMM parameters that guarantee the global stability of the dynamical system from any initial state, using the Lyapunov stability theorem. Generally, imitation learning should reason about the motion well into the future for a wide range of tasks; it is significant to improve the adaptability of the learning method by policy improvement. Finally, a method based on exponential natural evolution strategies is proposed to optimize the parameters of the dynamical system associated with the stiffness of variable impedance control, in which the exploration noise is subject to stability conditions of the dynamical system in the exploration space, thus guaranteeing the global stability. Empirical evaluations are conducted on manipulators for different scenarios, including motion planning with obstacle avoidance and stiffness learning.

中文翻译:

稳定非线性动力系统的机器人策略改进与自然进化策略

通过动觉教学进行机器人学习是一种很有前途的克隆人类行为的方法,但由于复合错误,它在执行具有少量数据的复杂任务方面存在局限性。为了提高模仿学习的鲁棒性和适应性,提出了一种分层学习策略:低层学习仅包括行为克隆和监督学习,高层学习构成策略改进。首先,制定基于高斯混合模型 (GMM) 的动力系统来对演示中的运动进行编码。然后,我们使用 Lyapunov 稳定性定理推导出 GMM 参数的充分条件,以保证动力系统从任何初始状态的全局稳定性。一般来说,模仿学习应该为广泛的任务推理出未来的动作;通过策略改进来提高学习方法的适应性具有重要意义。最后,提出了一种基于指数自然演化策略的方法来优化与变阻抗控制刚度相关的动力系统参数,其中探索噪声受制于探索空间中动力系统的稳定性条件,从而保证全球稳定。对不同场景的机械手进行了实证评估,包括具有避障和刚度学习的运动规划。提出了一种基于指数自然演化策略的方法来优化与变阻抗控制刚度相关的动力系统参数,其中探索噪声受制于探索空间动力系统的稳定性条件,从而保证全局稳定。对不同场景的机械手进行了实证评估,包括具有避障和刚度学习的运动规划。提出了一种基于指数自然演化策略的方法来优化与变阻抗控制刚度相关的动力系统参数,其中探索噪声受制于探索空间动力系统的稳定性条件,从而保证全局稳定。对不同场景的机械手进行了实证评估,包括具有避障和刚度学习的运动规划。
更新日期:2022-08-05
down
wechat
bug