当前位置: X-MOL 学术Int. J. Pattern Recognit. Artif. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Reward Space Noise for Exploration in Deep Reinforcement Learning
International Journal of Pattern Recognition and Artificial Intelligence ( IF 0.9 ) Pub Date : 2021-05-21 , DOI: 10.1142/s0218001421520133
Chuxiong Sun 1 , Rui Wang 1 , Qian Li 2 , Xiaohui Hu 1
Affiliation  

A fundamental challenge for reinforcement learning (RL) is how to achieve efficient exploration in initially unknown environments. Most state-of-the-art RL algorithms leverage action space noise to drive exploration. The classical strategies are computationally efficient and straightforward to implement. However, these methods may fail to perform effectively in complex environments. To address this issue, we propose a novel strategy named reward space noise (RSN) for farsighted and consistent exploration in RL. By introducing the stochasticity from reward space, we are able to change agent’s understanding about environment and perturb its behaviors. We find that the simple RSN can achieve consistent exploration and scale to complex domains without intensive computational cost. To demonstrate the effectiveness and scalability of the proposed method, we implement a deep Q-learning agent with reward noise and evaluate its exploratory performance on a set of Atari games which are challenging for the naive ϵ-greedy strategy. The results show that reward noise outperforms action noise in most games and performs comparably in others. Concretely, we found that in the early training, the best exploratory performance of reward noise is obviously better than action noise, which demonstrates that the reward noise can quickly explore the valuable states and aid in finding the optimal policy. Moreover, the average scores and learning efficiency of reward noise are also higher than action noise through the whole training, which indicates that the reward noise can generate more stable and consistent performance.

中文翻译:

深度强化学习探索的奖励空间噪声

强化学习(RL)的一个基本挑战是如何在最初未知的环境中实现有效的探索。大多数最先进的 RL 算法利用动作空间噪声来推动探索。经典策略计算效率高且易于实施。但是,这些方法可能无法在复杂环境中有效执行。为了解决这个问题,我们提出了一种名为奖励空间噪声 (RSN) 的新策略,用于在 RL 中进行有远见和一致的探索。通过引入奖励空间的随机性,我们能够改变代理对环境的理解并扰乱其行为。我们发现简单的 RSN 可以实现一致的探索并扩展到复杂的域,而无需大量的计算成本。为了证明所提出方法的有效性和可扩展性,ε-贪婪的策略。结果表明,奖励噪声在大多数游戏中都优于动作噪声,并且在其他游戏中表现相当。具体而言,我们发现在早期训练中,奖励噪声的最佳探索性能明显优于动作噪声,这表明奖励噪声可以快速探索有价值的状态并有助于找到最优策略。而且,在整个训练过程中,奖励噪声的平均分数和学习效率也高于动作噪声,这表明奖励噪声可以产生更稳定和一致的性能。
更新日期:2021-05-21
down
wechat
bug