Attentive multi-view reinforcement learning,International Journal of Machine Learning and Cybernetics

当前位置： X-MOL 学术 › Int. J. Mach. Learn. & Cyber. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Attentive multi-view reinforcement learning
International Journal of Machine Learning and Cybernetics ( IF 3.1 ) Pub Date : 2020-05-04 , DOI: 10.1007/s13042-020-01130-6
Yueyue Hu , Shiliang Sun , Xin Xu , Jing Zhao

The reinforcement learning process usually takes millions of steps from scratch, due to the limited observation experience. More precisely, the representation approximated by a single deep network is usually limited for reinforcement learning agents. In this paper, we propose a novel multi-view deep attention network (MvDAN), which introduces multi-view representation learning into the reinforcement learning framework for the first time. Based on the multi-view scheme of function approximation, the proposed model approximates multiple view-specific policy or value functions in parallel by estimating the middle-level representation and integrates these functions based on attention mechanisms to generate a comprehensive strategy. Furthermore, we develop the multi-view generalized policy improvement to jointly optimize all policies instead of a single one. Compared with the single-view function approximation scheme in reinforcement learning methods, experimental results on eight Atari benchmarks show that MvDAN outperforms the state-of-the-art methods and has faster convergence and training stability.

中文翻译：

专注多视角强化学习

由于有限的观察经验，强化学习过程通常需要从零开始进行数百万个步骤。更精确地讲，由单个深度网络近似的表示通常仅限于强化学习代理。在本文中，我们提出了一种新颖的多视图深度关注网络（MvDAN），该网络首次将多视图表示学习引入了强化学习框架。基于功能逼近的多视图方案，该模型通过估计中间层表示来并行逼近多个特定于视图的策略或价值函数，并基于注意力机制将这些功能集成以生成综合策略。此外，我们开发了多视图广义策略改进，以共同优化所有策略，而不是单个策略。与强化学习方法中的单视图函数逼近方案相比，在8个Atari基准测试中的实验结果表明，MvDAN优于最新方法，并且收敛速度和训练稳定性更高。

更新日期：2020-05-04

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11