Action Redundancy in Reinforcement Learning,arXiv - CS - Artificial Intelligence

当前位置： X-MOL 学术 › arXiv.cs.AI › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Action Redundancy in Reinforcement Learning
arXiv - CS - Artificial Intelligence Pub Date : 2021-02-22 , DOI: arxiv-2102.11329
Nir Baram, Guy Tennenholtz, Shie Mannor

Maximum Entropy (MaxEnt) reinforcement learning is a powerful learning paradigm which seeks to maximize return under entropy regularization. However, action entropy does not necessarily coincide with state entropy, e.g., when multiple actions produce the same transition. Instead, we propose to maximize the transition entropy, i.e., the entropy of next states. We show that transition entropy can be described by two terms; namely, model-dependent transition entropy and action redundancy. Particularly, we explore the latter in both deterministic and stochastic settings and develop tractable approximation methods in a near model-free setup. We construct algorithms to minimize action redundancy and demonstrate their effectiveness on a synthetic environment with multiple redundant actions as well as contemporary benchmarks in Atari and Mujoco. Our results suggest that action redundancy is a fundamental problem in reinforcement learning.

中文翻译：

强化学习中的动作冗余

最大熵（MaxEnt）强化学习是一种强大的学习范例，旨在在熵正则化下最大化回报。然而，动作熵不一定与状态熵一致，例如，当多个动作产生相同的转变时。取而代之，我们建议最大化过渡熵，即下一状态的熵。我们证明过渡熵可以用两个术语来描述：即依赖模型的过渡熵和动作冗余。特别是，我们在确定性和随机性环境中探索后者，并在近乎无模型的环境中开发出易于处理的近似方法。我们构建算法以最大程度地减少动作冗余，并在具有多个冗余动作的合成环境以及Atari和Mujoco的现代基准测试中证明其有效性。

更新日期：2021-02-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文