当前位置: X-MOL 学术Int. J. Control › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Bellman's principle of optimality and deep reinforcement learning for time-varying tasks
International Journal of Control ( IF 2.1 ) Pub Date : 2021-04-16 , DOI: 10.1080/00207179.2021.1913516
Alessandro Giuseppi 1 , Antonio Pietrabissa 1
Affiliation  

ABSTRACT

This paper presents the first framework (up to the authors' knowledge) to address time-varying objectives in finite-horizon Deep Reinforcement Learning (DeepRL), based on a switching control solution developed on the ground of Bellman's principle of optimality. By augmenting the state space of the system with information on its visit time, the DeepRL agent is able to solve problems in which its task dynamically changes within the same episode. To address the scalability problems caused by the state space augmentation, we propose a procedure to partition the episode length to define separate sub-problems that are then solved by specialised DeepRL agents. Contrary to standard solutions, with the proposed approach the DeepRL agents correctly estimate the value function at each time-step and are hence able to solve time-varying tasks. Numerical simulations validate the approach in a classic RL environment.



中文翻译:

时变任务的贝尔曼最优性原理和深度强化学习

摘要

本文介绍了第一个框架(根据作者的知识),以基于贝尔曼最优性原理开发的切换控制解决方案,解决有限视野深度强化学习 (DeepRL) 中的时变目标。通过使用有关其访问时间的信息来扩充系统的状态空间,DeepRL 代理能够解决其任务在同一情节中动态变化的问题。为了解决由状态空间增强引起的可扩展性问题,我们提出了一种划分情节长度的程序,以定义单独的子问题,然后由专门的 DeepRL 代理解决。与标准解决方案相反,通过所提出的方法,DeepRL 代理正确估计了每个时间步的价值函数,因此能够解决时变任务。

更新日期:2021-04-16
down
wechat
bug