DFAC Framework: Factorizing the Value Function via Quantile Mixture for Multi-Agent Distributional Q-Learning,arXiv - CS - Multiagent Systems

当前位置： X-MOL 学术 › arXiv.cs.MA › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

DFAC Framework: Factorizing the Value Function via Quantile Mixture for Multi-Agent Distributional Q-Learning
arXiv - CS - Multiagent Systems Pub Date : 2021-02-16 , DOI: arxiv-2102.07936
Wei-Fang Sun, Cheng-Kuang Lee, Chun-Yi Lee

In fully cooperative multi-agent reinforcement learning (MARL) settings, the environments are highly stochastic due to the partial observability of each agent and the continuously changing policies of the other agents. To address the above issues, we integrate distributional RL and value function factorization methods by proposing a Distributional Value Function Factorization (DFAC) framework to generalize expected value function factorization methods to their DFAC variants. DFAC extends the individual utility functions from deterministic variables to random variables, and models the quantile function of the total return as a quantile mixture. To validate DFAC, we demonstrate DFAC's ability to factorize a simple two-step matrix game with stochastic rewards and perform experiments on all Super Hard tasks of StarCraft Multi-Agent Challenge, showing that DFAC is able to outperform expected value function factorization baselines.

中文翻译：

DFAC框架：通过分位数混合分解值函数以进行多Agent分布式Q学习

在完全协作的多主体增强学习（MARL）设置中，由于每个主体的部分可观察性以及其他主体的不断变化的策略，因此环境是高度随机的。为了解决上述问题，我们通过提出分布值函数分解（DFAC）框架将期望值函数分解方法推广到它们的DFAC变体，将分布RL和值函数分解方法集成在一起。DFAC将各个效用函数从确定性变量扩展到随机变量，并将总收益的分位数函数建模为分位数混合。为了验证DFAC，我们展示了DFAC能够分解具有随机奖励的简单两步矩阵游戏，并能够对《星际争霸》多智能体挑战的所有超级任务执行实验，

更新日期：2021-02-17

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>