当前位置: X-MOL 学术Automatica › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Max-plus approximation for reinforcement learning
Automatica ( IF 4.8 ) Pub Date : 2021-04-15 , DOI: 10.1016/j.automatica.2021.109623
Vinicius Mariano Gonçalves

Max-Plus Algebra has been applied in several contexts, especially in the control of discrete events systems. In this article, we discuss another application closely related to control: the use of Max-Plus algebra concepts in the context of reinforcement learning. Max-Plus Algebra and reinforcement learning are strongly linked due to the latter’s dependence on the Bellman Equation which, in some cases, is a linear Max-Plus equation. This fact motivates the application of Max-Plus algebra to approximate the value function, central to the Bellman Equation and thus also to reinforcement learning. This article proposes conditions so that this approach can be done in a simple way and following the philosophy of reinforcement learning: explore the environment, receive the rewards and use this information to improve the knowledge of the value function. The proposed conditions are related to two matrices and impose on them a relationship that is analogous to the concept of weak inverses in traditional algebra.



中文翻译:

强化学习的最大加近似

Max-Plus代数已在多种情况下应用,特别是在离散事件系统的控制中。在本文中,我们讨论了与控制密切相关的另一个应用程序:在强化学习的上下文中使用Max-Plus代数概念。Max-Plus代数与强化学习紧密相关,因为后者依赖于Bellman方程,在某些情况下,Bellman方程是线性的Max-Plus方程。这一事实促使人们应用Max-Plus代数来近似值函数,该函数是Bellman方程的中心,因此也是强化学习的中心。本文提出了一些条件,以便可以采用简单的方法并遵循强化学习的哲学来进行这种方法:探索环境,获得奖励并使用这些信息来提高价值功能的知识。

更新日期:2021-04-15
down
wechat
bug