Multi-Task Reinforcement Learning as a Hidden-Parameter Block MDP,arXiv - CS - Artificial Intelligence

当前位置： X-MOL 学术 › arXiv.cs.AI › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multi-Task Reinforcement Learning as a Hidden-Parameter Block MDP
arXiv - CS - Artificial Intelligence Pub Date : 2020-07-14 , DOI: arxiv-2007.07206
Amy Zhang, Shagun Sodhani, Khimya Khetarpal, Joelle Pineau

Multi-task reinforcement learning is a rich paradigm where information from previously seen environments can be leveraged for better performance and improved sample-efficiency in new environments. In this work, we leverage ideas of common structure underlying a family of Markov decision processes (MDPs) to improve performance in the few-shot regime. We use assumptions of structure from Hidden-Parameter MDPs and Block MDPs to propose a new framework, HiP-BMDP, and approach for learning a common representation and universal dynamics model. To this end, we provide transfer and generalization bounds based on task and state similarity, along with sample complexity bounds that depend on the aggregate number of samples across tasks, rather than the number of tasks, a significant improvement over prior work. To demonstrate the efficacy of the proposed method, we empirically compare and show improvements against other multi-task and meta-reinforcement learning baselines.

中文翻译：

多任务强化学习作为隐藏参数块 MDP

多任务强化学习是一个丰富的范式，可以利用来自先前环境的信息来提高新环境中的性能和提高样本效率。在这项工作中，我们利用基于马尔可夫决策过程 (MDP) 族的公共结构思想来提高少样本机制中的性能。我们使用来自隐藏参数 MDP 和块 MDP 的结构假设来提出一个新框架 HiP-BMDP，以及学习通用表示和通用动力学模型的方法。为此，我们提供了基于任务和状态相似性的转移和泛化边界，以及依赖于跨任务的样本总数而不是任务数量的样本复杂性边界，与之前的工作相比有了显着的改进。

更新日期：2020-07-29

点击分享查看原文

点击收藏

阅读更多本刊最新论文