Max-plus approximation for reinforcement learning,Automatica

当前位置： X-MOL 学术 › Automatica › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Max-plus approximation for reinforcement learning
Automatica ( IF 4.8 ) Pub Date : 2021-04-15 , DOI: 10.1016/j.automatica.2021.109623
Vinicius Mariano Gonçalves

Max-Plus Algebra has been applied in several contexts, especially in the control of discrete events systems. In this article, we discuss another application closely related to control: the use of Max-Plus algebra concepts in the context of reinforcement learning. Max-Plus Algebra and reinforcement learning are strongly linked due to the latter’s dependence on the Bellman Equation which, in some cases, is a linear Max-Plus equation. This fact motivates the application of Max-Plus algebra to approximate the value function, central to the Bellman Equation and thus also to reinforcement learning. This article proposes conditions so that this approach can be done in a simple way and following the philosophy of reinforcement learning: explore the environment, receive the rewards and use this information to improve the knowledge of the value function. The proposed conditions are related to two matrices and impose on them a relationship that is analogous to the concept of weak inverses in traditional algebra.

中文翻译：

强化学习的最大加近似

Max-Plus代数已在多种情况下应用，特别是在离散事件系统的控制中。在本文中，我们讨论了与控制密切相关的另一个应用程序：在强化学习的上下文中使用Max-Plus代数概念。Max-Plus代数与强化学习紧密相关，因为后者依赖于Bellman方程，在某些情况下，Bellman方程是线性的Max-Plus方程。这一事实促使人们应用Max-Plus代数来近似值函数，该函数是Bellman方程的中心，因此也是强化学习的中心。本文提出了一些条件，以便可以采用简单的方法并遵循强化学习的哲学来进行这种方法：探索环境，获得奖励并使用这些信息来提高价值功能的知识。

更新日期：2021-04-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11