当前位置: X-MOL 学术IEEE ASME Trans. Mechatron. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Prediction of Reward Functions for Deep Reinforcement Learning via Gaussian Process Regression
IEEE/ASME Transactions on Mechatronics ( IF 6.1 ) Pub Date : 2020-05-11 , DOI: 10.1109/tmech.2020.2993564
Jaehyun Lim , Seungchul Ha , Jongeun Choi

Inverse reinforcement learning (IRL) is a technique for automatic reward acquisition, however, it is difficult to apply to high-dimensional problems with unknown dynamics. This article proposes an efficient way to solve the IRL problem based on the sparse Gaussian process (GP) prediction with $l_1$ -regularization only using a highly limited number of expert demonstrations. A GP model is proposed to be trained to predict a reward function using trajectory-reward pair data generated by deep reinforcement learning with different reward functions. The trained GP successfully predicts the reward functions of human experts from their collected demonstration trajectory datasets. To demonstrate our approach, the proposed approach is applied to the obstacle avoidance navigation of the mobile robot. The experimental results clearly show that the robots can clone the experts’ optimality in navigation trajectories avoiding obstacles using only with a very small number of expert demonstration datasets (e.g., $\leq 6$ ). Therefore, the proposed approach shows great potential to be applied to complex real-world applications in an expert data-efficient manner.

中文翻译:

通过高斯过程回归进行深度强化学习的奖励功能预测

逆强化学习(IRL)是一种自动获取奖励的技术,但是,很难将其应用于动力学未知的高维问题。本文提出了一种基于稀疏高斯过程(GP)预测的IRL问题的有效解决方案,$ l_1 $ -仅使用数量非常有限的专家演示进行正规化。提出将GP模型训练为使用由具有不同奖励功能的深度强化学习生成的轨迹-奖励对数据来预测奖励功能。训练有素的GP从收集的演示轨迹数据集中成功预测人类专家的奖励功能。为了演示我们的方法,将所提出的方法应用于移动机器人的避障导航。实验结果清楚地表明,机器人可以仅使用极少量的专家演示数据集(例如,$ \ leq 6 $ )。因此,所提出的方法显示出以专家级数据有效方式应用于复杂的实际应用的巨大潜力。
更新日期:2020-05-11
down
wechat
bug