当前位置: X-MOL 学术IEEE Trans. Neural Netw. Learn. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Reinforcement Learning Control of Robotic Knee With Human-in-the-Loop by Flexible Policy Iteration
IEEE Transactions on Neural Networks and Learning Systems ( IF 10.2 ) Pub Date : 2021-05-06 , DOI: 10.1109/tnnls.2021.3071727
Xiang Gao 1 , Jennie Si 1 , Yue Wen 2 , Minhan Li 2 , He Huang 2
Affiliation  

We are motivated by the real challenges presented in a human–robot system to develop new designs that are efficient at data level and with performance guarantees, such as stability and optimality at system level. Existing approximate/adaptive dynamic programming (ADP) results that consider system performance theoretically are not readily providing practically useful learning control algorithms for this problem, and reinforcement learning (RL) algorithms that address the issue of data efficiency usually do not have performance guarantees for the controlled system. This study fills these important voids by introducing innovative features to the policy iteration algorithm. We introduce flexible policy iteration (FPI), which can flexibly and organically integrate experience replay and supplemental values from prior experience into the RL controller. We show system-level performances, including convergence of the approximate value function, (sub)optimality of the solution, and stability of the system. We demonstrate the effectiveness of the FPI via realistic simulations of the human–robot system. It is noted that the problem we face in this study may be difficult to address by design methods based on classical control theory as it is nearly impossible to obtain a customized mathematical model of a human–robot system either online or offline. The results we have obtained also indicate the great potential of RL control to solving realistic and challenging problems with high-dimensional control inputs.

中文翻译:


通过灵活的策略迭代实现人在环机器人膝关节的强化学习控制



我们受到人类机器人系统中提出的真正挑战的激励,开发出在数据层面高效并具有性能保证的新设计,例如系统层面的稳定性和最优性。现有的从理论上考虑系统性能的近似/自适应动态规划(ADP)结果并不能轻易地为该问题提供实际有用的学习控制算法,而解决数据效率问题的强化学习(RL)算法通常无法为系统提供性能保证。受控系统。这项研究通过向策略迭代算法引入创新功能来填补这些重要的空白。我们引入了灵活的策略迭代(FPI),它可以灵活地将经验回放和先前经验的补充值集成到 RL 控制器中。我们展示了系统级性能,包括近似值函数的收敛性、解决方案的(次)最优性以及系统的稳定性。我们通过对人机系统的真实模拟来证明 FPI 的有效性。值得注意的是,我们在本研究中面临的问题可能很难通过基于经典控制理论的设计方法来解决,因为几乎不可能在线或离线获得人机系统的定制数学模型。我们获得的结果也表明 RL 控制在解决具有高维控制输入的现实且具有挑战性的问题方面具有巨大潜力。
更新日期:2021-05-06
down
wechat
bug