当前位置: X-MOL 学术IEEE Trans. Neural Netw. Learn. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Inverse Reinforcement Learning for Adversarial Apprentice Games
IEEE Transactions on Neural Networks and Learning Systems ( IF 10.2 ) Pub Date : 2021-10-10 , DOI: 10.1109/tnnls.2021.3114612
Bosen Lian 1 , Wenqian Xue 2 , Frank L. Lewis 1 , Tianyou Chai 2
Affiliation  

This article proposes new inverse reinforcement learning (RL) algorithms to solve our defined Adversarial Apprentice Games for nonlinear learner and expert systems. The games are solved by extracting the unknown cost function of an expert by a learner using demonstrated expert’s behaviors. We first develop a model-based inverse RL algorithm that consists of two learning stages: an optimal control learning and a second learning based on inverse optimal control. This algorithm also clarifies the relationships between inverse RL and inverse optimal control. Then, we propose a new model-free integral inverse RL algorithm to reconstruct the unknown expert cost function. The model-free algorithm only needs online demonstration of the expert and learner’s trajectory data without knowing system dynamics of either the learner or the expert. These two algorithms are further implemented using neural networks (NNs). In Adversarial Apprentice Games, the learner and the expert are allowed to suffer from different adversarial attacks in the learning process. A two-player zero-sum game is formulated for each of these two agents and is solved as a subproblem for the learner in inverse RL. Furthermore, it is shown that the cost functions that the learner learns to mimic the expert’s behavior are stabilizing and not unique. Finally, simulations and comparisons show the effectiveness and the superiority of the proposed algorithms.

中文翻译:


对抗性学徒游戏的逆强化学习



本文提出了新的逆强化学习 (RL) 算法来解决我们为非线性学习器和专家系统定义的对抗性学徒游戏。学习者通过使用已证明的专家行为提取专家的未知成本函数来解决博弈。我们首先开发了一种基于模型的逆 RL 算法,该算法由两个学习阶段组成:最优控制学习和基于逆最优控制的第二个学习。该算法还阐明了逆强化学习和逆最优控制之间的关系。然后,我们提出了一种新的无模型积分逆强化学习算法来重建未知的专家成本函数。无模型算法只需要在线演示专家和学习者的轨迹数据,而不需要了解学习者或专家的系统动态。这两种算法进一步使用神经网络(NN)来实现。在对抗性学徒游戏中,允许学习者和专家在学习过程中遭受不同的对抗性攻击。为这两个代理中的每一个制定了两人零和博弈,并将其作为逆强化学习中学习者的子问题来解决。此外,研究表明,学习者学习模仿专家行为的成本函数是稳定的,而不是唯一的。最后,仿真和比较表明了所提算法的有效性和优越性。
更新日期:2021-10-10
down
wechat
bug