当前位置: X-MOL 学术arXiv.cs.AI › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Generalization and Regularization in DQN
arXiv - CS - Artificial Intelligence Pub Date : 2018-09-29 , DOI: arxiv-1810.00123
Jesse Farebrother, Marlos C. Machado, Michael Bowling

Deep reinforcement learning algorithms have shown an impressive ability to learn complex control policies in high-dimensional tasks. However, despite the ever-increasing performance on popular benchmarks, policies learned by deep reinforcement learning algorithms can struggle to generalize when evaluated in remarkably similar environments. In this paper we propose a protocol to evaluate generalization in reinforcement learning through different modes of Atari 2600 games. With that protocol we assess the generalization capabilities of DQN, one of the most traditional deep reinforcement learning algorithms, and we provide evidence suggesting that DQN overspecializes to the training environment. We then comprehensively evaluate the impact of dropout and $\ell_2$ regularization, as well as the impact of reusing learned representations to improve the generalization capabilities of DQN. Despite regularization being largely underutilized in deep reinforcement learning, we show that it can, in fact, help DQN learn more general features. These features can be reused and fine-tuned on similar tasks, considerably improving DQN's sample efficiency.

中文翻译:

DQN 中的泛化和正则化

深度强化学习算法在高维任务中学习复杂控制策略的能力令人印象深刻。然而,尽管在流行的基准测试中性能不断提高,但在非常相似的环境中进行评估时,深度强化学习算法学习的策略可能难以泛化。在本文中,我们提出了一种协议,通过不同模式的 Atari 2600 游戏来评估强化学习中的泛化。使用该协议,我们评估了 DQN(最传统的深度强化学习算法之一)的泛化能力,并提供了证据表明 DQN 过度专注于训练环境。然后我们综合评估dropout和$\ell_2$正则化的影响,以及重用学习到的表示来提高 DQN 的泛化能力的影响。尽管正则化在深度强化学习中基本上没有得到充分利用,但我们表明它实际上可以帮助 DQN 学习更多的一般特征。这些特征可以在类似的任务上重用和微调,大大提高了 DQN 的样本效率。
更新日期:2020-01-22
down
wechat
bug