Reinforcement Learning with Probabilistically Complete Exploration,arXiv - CS - Robotics

当前位置： X-MOL 学术 › arXiv.cs.RO › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Reinforcement Learning with Probabilistically Complete Exploration
arXiv - CS - Robotics Pub Date : 2020-01-20 , DOI: arxiv-2001.06940
Philippe Morere, Gilad Francis, Tom Blau, Fabio Ramos

Balancing exploration and exploitation remains a key challenge in reinforcement learning (RL). State-of-the-art RL algorithms suffer from high sample complexity, particularly in the sparse reward case, where they can do no better than to explore in all directions until the first positive rewards are found. To mitigate this, we propose Rapidly Randomly-exploring Reinforcement Learning (R3L). We formulate exploration as a search problem and leverage widely-used planning algorithms such as Rapidly-exploring Random Tree (RRT) to find initial solutions. These solutions are used as demonstrations to initialize a policy, then refined by a generic RL algorithm, leading to faster and more stable convergence. We provide theoretical guarantees of R3L exploration finding successful solutions, as well as bounds for its sampling complexity. We experimentally demonstrate the method outperforms classic and intrinsic exploration techniques, requiring only a fraction of exploration samples and achieving better asymptotic performance.

中文翻译：

具有概率完全探索的强化学习

平衡探索和利用仍然是强化学习 (RL) 的关键挑战。最先进的 RL 算法存在样本复杂性高的问题，特别是在稀疏奖励的情况下，在找到第一个正奖励之前，它们只能在各个方向上进行探索。为了缓解这种情况，我们提出了快速随机探索强化学习 (R3L)。我们将探索制定为一个搜索问题，并利用广泛使用的规划算法（例如快速探索随机树 (RRT)）来寻找初始解决方案。这些解决方案被用作初始化策略的演示，然后通过通用 RL 算法进行细化，从而实现更快、更稳定的收敛。我们提供 R3L 探索找到成功解决方案的理论保证，以及其采样复杂性的界限。

更新日期：2020-01-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文