当前位置: X-MOL 学术Int. J. Adapt. Control Signal Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Assured learning-enabled autonomy: A metacognitive reinforcement learning framework
International Journal of Adaptive Control and Signal Processing ( IF 3.1 ) Pub Date : 2021-09-06 , DOI: 10.1002/acs.3326
Aquib Mustafa 1 , Majid Mazouchi 1 , Subramanya Nageshrao 2 , Hamidreza Modares 1
Affiliation  

Reinforcement learning (RL) agents with pre-specified reward functions cannot provide guaranteed safety across variety of circumstances that an uncertain system might encounter. To guarantee performance while assuring satisfaction of safety constraints across variety of circumstances, an assured autonomous control framework is presented in this article by empowering RL algorithms with metacognitive learning capabilities. More specifically, adapting the reward function parameters of the RL agent is performed in a metacognitive decision-making layer to assure the feasibility of RL agent. That is, to assure that the learned policy by the RL agent satisfies safety constraints specified by signal temporal logic while achieving as much performance as possible. The metacognitive layer monitors any possible future safety violation under the actions of the RL agent and employs a higher-layer Bayesian RL algorithm to proactively adapt the reward function for the lower-layer RL agent. To minimize the higher-layer Bayesian RL intervention, a fitness function is leveraged by the metacognitive layer as a metric to evaluate success of the lower-layer RL agent in satisfaction of safety and liveness specifications, and the higher-layer Bayesian RL intervenes only if there is a risk of lower-layer RL failure. Finally, a simulation example is provided to validate the effectiveness of the proposed approach.

中文翻译:

保证学习的自主性:元认知强化学习框架

具有预先指定奖励功能的强化学习 (RL) 代理无法在不确定系统可能遇到的各种情况下提供有保证的安全性。为了在确保满足各种情况下的安全约束的同时保证性能,本文提出了一个有保证的自主控制框架,通过赋予 RL 算法元认知学习能力。更具体地说,在元认知决策层中执行 RL 代理的奖励函数参数的调整,以确保 RL 代理的可行性。也就是说,确保 RL 代理学习到的策略满足信号时间逻辑指定的安全约束,同时实现尽可能多的性能。元认知层在 RL 代理的行为下监控任何可能的未来安全违规,并采用高层贝叶斯 RL 算法主动调整下层 RL 代理的奖励函数。为了最小化高层贝叶斯 RL 干预,元认知层利用适应度函数作为衡量下层 RL 代理在满足安全性和活性规范方面是否成功的指标,并且高层贝叶斯 RL 仅在以下情况下进行干预存在下层 RL 失败的风险。最后,提供了一个仿真例子来验证所提出方法的有效性。元认知层利用适应度函数作为衡量下层 RL 代理在满足安全性和活性规范方面是否成功的指标,并且仅当存在下层 RL 失败的风险时,上层贝叶斯 RL 才会进行干预. 最后,提供了一个仿真例子来验证所提出方法的有效性。元认知层利用适应度函数作为衡量下层 RL 代理在满足安全性和活性规范方面是否成功的指标,并且仅当存在下层 RL 失败的风险时,上层贝叶斯 RL 才会进行干预. 最后,提供了一个仿真例子来验证所提出方法的有效性。
更新日期:2021-09-06
down
wechat
bug