MDP Homomorphic Networks: Group Symmetries in Reinforcement Learning,arXiv - CS - Machine Learning

当前位置： X-MOL 学术 › arXiv.cs.LG › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

MDP Homomorphic Networks: Group Symmetries in Reinforcement Learning
arXiv - CS - Machine Learning Pub Date : 2020-06-30 , DOI: arxiv-2006.16908
Elise van der Pol, Daniel E. Worrall, Herke van Hoof, Frans A. Oliehoek, Max Welling

This paper introduces MDP homomorphic networks for deep reinforcement learning. MDP homomorphic networks are neural networks that are equivariant under symmetries in the joint state-action space of an MDP. Current approaches to deep reinforcement learning do not usually exploit knowledge about such structure. By building this prior knowledge into policy and value networks using an equivariance constraint, we can reduce the size of the solution space. We specifically focus on group-structured symmetries (invertible transformations). Additionally, we introduce an easy method for constructing equivariant network layers numerically, so the system designer need not solve the constraints by hand, as is typically done. We construct MDP homomorphic MLPs and CNNs that are equivariant under either a group of reflections or rotations. We show that such networks converge faster than unstructured baselines on CartPole, a grid world and Pong.

中文翻译：

MDP 同态网络：强化学习中的群对称性

本文介绍了用于深度强化学习的 MDP 同态网络。MDP 同态网络是在 MDP 的联合状态-动作空间中对称下等变的神经网络。当前的深度强化学习方法通常不利用有关这种结构的知识。通过使用等方差约束将此先验知识构建到策略和价值网络中，我们可以减少解决方案空间的大小。我们特别关注群结构的对称性（可逆变换）。此外，我们引入了一种以数值方式构建等变网络层的简单方法，因此系统设计人员无需像通常那样手动解决约束。我们构建了在一组反射或旋转下等变的 MDP 同态 MLP 和 CNN。

更新日期：2020-07-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文