Derivative-free reinforcement learning: a review,Frontiers of Computer Science

当前位置： X-MOL 学术 › Front. Comput. Sci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Derivative-free reinforcement learning: a review
Frontiers of Computer Science ( IF 3.4 ) Pub Date : 2021-09-01 , DOI: 10.1007/s11704-020-0241-4
Hong Qian ₁ , Yang Yu ₁

Affiliation

Reinforcement learning is about learning agent models that make the best sequential decisions in unknown environments. In an unknown environment, the agent needs to explore the environment while exploiting the collected information, which usually forms a sophisticated problem to solve. Derivative-free optimization, meanwhile, is capable of solving sophisticated problems. It commonly uses a sampling-and-updating framework to iteratively improve the solution, where exploration and exploitation are also needed to be well balanced. Therefore, derivative-free optimization deals with a similar core issue as reinforcement learning, and has been introduced in reinforcement learning approaches, under the names of learning classifier systems and neuroevolution/evolutionary reinforcement learning. Although such methods have been developed for decades, recently, derivative-free reinforcement learning exhibits attracting increasing attention. However, recent survey on this topic is still lacking. In this article, we summarize methods of derivative-free reinforcement learning to date, and organize the methods in aspects including parameter updating, model selection, exploration, and parallel/distributed methods. Moreover, we discuss some current limitations and possible future directions, hoping that this article could bring more attentions to this topic and serve as a catalyst for developing novel and efficient approaches.

中文翻译：

无导数强化学习：回顾

强化学习是关于学习在未知环境中做出最佳顺序决策的代理模型。在未知环境中，代理需要在利用收集到的信息的同时探索环境，这通常会形成需要解决的复杂问题。同时，无导数优化能够解决复杂的问题。它通常使用采样和更新框架来迭代改进解决方案，其中探索和开发也需要很好地平衡。因此，无导数优化处理与强化学习类似的核心问题，并以学习分类器系统和神经进化/进化强化学习的名义引入强化学习方法。尽管这些方法已经发展了几十年，但最近，无导数强化学习越来越受到关注。然而，最近关于这个主题的调查仍然缺乏。在本文中，我们总结了迄今为止的无导数强化学习方法，并从参数更新、模型选择、探索和并行/分布式方法等方面对方法进行了组织。此外，我们还讨论了一些当前的局限性和未来可能的方向，希望本文能引起对该主题的更多关注，并成为开发新颖有效方法的催化剂。

更新日期：2021-09-02

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11