当前位置: X-MOL 学术IEEE Trans. Neural Netw. Learn. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Reinforcement Learning Control of Robotic Knee With Human-in-the-Loop by Flexible Policy Iteration.
IEEE Transactions on Neural Networks and Learning Systems ( IF 10.4 ) Pub Date : 2021-05-06 , DOI: 10.1109/tnnls.2021.3071727
Xiang Gao , Jennie Si , Yue Wen , Minhan Li , He Huang

We are motivated by the real challenges presented in a human-robot system to develop new designs that are efficient at data level and with performance guarantees, such as stability and optimality at system level. Existing approximate/adaptive dynamic programming (ADP) results that consider system performance theoretically are not readily providing practically useful learning control algorithms for this problem, and reinforcement learning (RL) algorithms that address the issue of data efficiency usually do not have performance guarantees for the controlled system. This study fills these important voids by introducing innovative features to the policy iteration algorithm. We introduce flexible policy iteration (FPI), which can flexibly and organically integrate experience replay and supplemental values from prior experience into the RL controller. We show system-level performances, including convergence of the approximate value function, (sub)optimality of the solution, and stability of the system. We demonstrate the effectiveness of the FPI via realistic simulations of the human-robot system. It is noted that the problem we face in this study may be difficult to address by design methods based on classical control theory as it is nearly impossible to obtain a customized mathematical model of a human-robot system either online or offline. The results we have obtained also indicate the great potential of RL control to solving realistic and challenging problems with high-dimensional control inputs.

中文翻译:

通过灵活的策略迭代,在环技术中增强对膝关节机器人的学习控制。

我们受到人机交互系统中提出的实际挑战的激励,以开发在数据级别高效且具有性能保证(例如系统级别的稳定性和最优性)的新设计。从理论上考虑系统性能的现有近似/自适应动态规划(ADP)结果不能轻易为该问题提供实用的学习控制算法,而解决数据效率问题的强化学习(RL)算法通常无法为该问题提供性能保证。控制系统。通过将创新功能引入策略迭代算法,本研究填补了这些重要的空白。我们引入了灵活的策略迭代(FPI),该策略可以灵活,有机地将经验重播和先前经验的补充值集成到RL控制器中。我们展示了系统级的性能,包括近似值函数的收敛,解的(亚)最优性和系统的稳定性。我们通过对人类机器人系统的仿真来证明FPI的有效性。值得注意的是,我们在这项研究中面临的问题可能难以通过基于经典控制理论的设计方法解决,因为几乎不可能获得在线或离线的人类机器人系统的定制数学模型。我们获得的结果也表明RL控制在解决高维控制输入的现实和挑战性问题方面的巨大潜力。我们通过对人类机器人系统的仿真来证明FPI的有效性。值得注意的是,我们在这项研究中面临的问题可能难以通过基于经典控制理论的设计方法解决,因为几乎不可能获得在线或离线的人类机器人系统的定制数学模型。我们获得的结果还表明,RL控制在解决高维控制输入的现实和挑战性问题方面具有巨大潜力。我们通过对人类机器人系统的仿真来证明FPI的有效性。值得注意的是,我们在这项研究中面临的问题可能难以通过基于经典控制理论的设计方法解决,因为几乎不可能获得在线或离线的人类机器人系统的定制数学模型。我们获得的结果还表明,RL控制在解决高维控制输入的现实和挑战性问题方面具有巨大潜力。
更新日期:2021-05-06
down
wechat
bug