A Q-values Sharing Framework for Multiagent Reinforcement Learning under Budget Constraint,arXiv - CS - Multiagent Systems

当前位置： X-MOL 学术 › arXiv.cs.MA › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Q-values Sharing Framework for Multiagent Reinforcement Learning under Budget Constraint
arXiv - CS - Multiagent Systems Pub Date : 2020-11-29 , DOI: arxiv-2011.14281
Changxi Zhu, Ho-fung Leung, Shuyue Hu, Yi Cai

In teacher-student framework, a more experienced agent (teacher) helps accelerate the learning of another agent (student) by suggesting actions to take in certain states. In cooperative multiagent reinforcement learning (MARL), where agents need to cooperate with one another, a student may fail to cooperate well with others even by following the teachers' suggested actions, as the polices of all agents are ever changing before convergence. When the number of times that agents communicate with one another is limited (i.e., there is budget constraint), the advising strategy that uses actions as advices may not be good enough. We propose a partaker-sharer advising framework (PSAF) for cooperative MARL agents learning with budget constraint. In PSAF, each Q-learner can decide when to ask for Q-values and share its Q-values. We perform experiments in three typical multiagent learning problems. Evaluation results show that our approach PSAF outperforms existing advising methods under both unlimited and limited budget, and we give an analysis of the impact of advising actions and sharing Q-values on agents' learning.

中文翻译：

预算约束下用于多主体强化学习的Q值共享框架

在师生框架中，经验丰富的代理人（老师）通过建议在某些状态下采取的行动来帮助加速另一个代理人（学生）的学习。在协作式多主体强化学习（MARL）中，其中的主体需要相互合作，即使遵循教师的建议行动，学生也可能无法与他人进行良好的协作，因为所有主体的策略在融合之前总是在变化。当座席之间相互沟通的次数受到限制（即，存在预算限制）时，使用行动作为建议的建议策略可能不够好。我们为有预算约束的合作式MARL代理商提出了共享者建议框架（PSAF）。在PSAF中，每个Q学习者都可以决定何时要求Q值并共享其Q值。我们在三个典型的多主体学习问题中进行实验。评估结果表明，在无限制和有限的预算下，我们的PSAF方法优于现有的咨询方法，并且我们分析了建议行动和共享Q值对代理学习的影响。

更新日期：2020-12-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>