当前位置: X-MOL 学术arXiv.cs.RO › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Falsification-Based Robust Adversarial Reinforcement Learning
arXiv - CS - Robotics Pub Date : 2020-07-01 , DOI: arxiv-2007.00691
Xiao Wang, Saasha Nair, and Matthias Althoff

Reinforcement learning (RL) has achieved tremendous progress in solving various sequential decision-making problems, e.g., control tasks in robotics. However, RL methods often fail to generalize to safety-critical scenarios since policies are overfitted to training environments. Previously, robust adversarial reinforcement learning (RARL) was proposed to train an adversarial network that applies disturbances to a system, which improves robustness in test scenarios. A drawback of neural-network-based adversaries is that integrating system requirements without handcrafting sophisticated reward signals is difficult. Safety falsification methods allow one to find a set of initial conditions as well as an input sequence, such that the system violates a given property formulated in temporal logic. In this paper, we propose falsification-based RARL (FRARL), the first generic framework for integrating temporal-logic falsification in adversarial learning to improve policy robustness. With falsification method, we do not need to construct an extra reward function for the adversary. We evaluate our approach on a braking assistance system and an adaptive cruise control system of autonomous vehicles. Experiments show that policies trained with a falsification-based adversary generalize better and show less violation of the safety specification in test scenarios than the ones trained without an adversary or with an adversarial network.

中文翻译:

基于证伪的鲁棒对抗强化学习

强化学习 (RL) 在解决各种顺序决策问题(例如机器人技术中的控制任务)方面取得了巨大进步。然而,RL 方法通常无法推广到安全关键场景,因为策略过拟合到训练环境。以前,提出了稳健的对抗性强化学习 (RARL) 来训练对抗性网络,该网络将干扰应用于系统,从而提高了测试场景中的稳健性。基于神经网络的对手的一个缺点是很难在不手工制作复杂奖励信号的情况下整合系统要求。安全证伪方法允许人们找到一组初始条件以及输入序列,这样系统就会违反时间逻辑中制定的给定属性。在本文中,我们提出了基于伪造的 RARL (FRARL),这是第一个将时间逻辑伪造集成到对抗性学习中以提高策略稳健性的通用框架。使用伪造方法,我们不需要为对手构建额外的奖励函数。我们评估了我们在自动驾驶汽车的制动辅助系统和自适应巡航控制系统上的方法。实验表明,与没有对手或使用对抗网络训练的策略相比,使用基于伪造的对手训练的策略在测试场景中具有更好的泛化能力,并且对安全规范的违反更少。我们评估了我们在自动驾驶汽车的制动辅助系统和自适应巡航控制系统上的方法。实验表明,与没有对手或使用对抗网络训练的策略相比,使用基于伪造的对手训练的策略在测试场景中具有更好的泛化能力,并且对安全规范的违反更少。我们评估了我们在自动驾驶汽车的制动辅助系统和自适应巡航控制系统上的方法。实验表明,与没有对手或使用对抗网络训练的策略相比,使用基于伪造的对手训练的策略在测试场景中具有更好的泛化能力,并且对安全规范的违反更少。
更新日期:2020-07-20
down
wechat
bug