当前位置:
X-MOL 学术
›
arXiv.cs.AI
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Model-based Multi-Agent Reinforcement Learning with Cooperative Prioritized Sweeping
arXiv - CS - Artificial Intelligence Pub Date : 2020-01-15 , DOI: arxiv-2001.07527 Eugenio Bargiacchi, Timothy Verstraeten, Diederik M. Roijers, Ann Now\'e
arXiv - CS - Artificial Intelligence Pub Date : 2020-01-15 , DOI: arxiv-2001.07527 Eugenio Bargiacchi, Timothy Verstraeten, Diederik M. Roijers, Ann Now\'e
We present a new model-based reinforcement learning algorithm, Cooperative
Prioritized Sweeping, for efficient learning in multi-agent Markov decision
processes. The algorithm allows for sample-efficient learning on large problems
by exploiting a factorization to approximate the value function. Our approach
only requires knowledge about the structure of the problem in the form of a
dynamic decision network. Using this information, our method learns a model of
the environment and performs temporal difference updates which affect multiple
joint states and actions at once. Batch updates are additionally performed
which efficiently back-propagate knowledge throughout the factored Q-function.
Our method outperforms the state-of-the-art algorithm sparse cooperative
Q-learning algorithm, both on the well-known SysAdmin benchmark and randomized
environments.
中文翻译:
基于模型的多智能体强化学习协同优先扫描
我们提出了一种新的基于模型的强化学习算法,协作优先扫描,用于在多智能体马尔可夫决策过程中进行有效学习。该算法通过利用因式分解来近似值函数,允许对大型问题进行样本高效学习。我们的方法只需要以动态决策网络的形式了解问题的结构。使用这些信息,我们的方法学习环境模型并执行同时影响多个联合状态和动作的时间差异更新。另外执行批量更新,这在整个分解的 Q 函数中有效地反向传播知识。我们的方法优于最先进的算法稀疏协作 Q 学习算法,
更新日期:2020-01-22
中文翻译:
基于模型的多智能体强化学习协同优先扫描
我们提出了一种新的基于模型的强化学习算法,协作优先扫描,用于在多智能体马尔可夫决策过程中进行有效学习。该算法通过利用因式分解来近似值函数,允许对大型问题进行样本高效学习。我们的方法只需要以动态决策网络的形式了解问题的结构。使用这些信息,我们的方法学习环境模型并执行同时影响多个联合状态和动作的时间差异更新。另外执行批量更新,这在整个分解的 Q 函数中有效地反向传播知识。我们的方法优于最先进的算法稀疏协作 Q 学习算法,