A Q-values Sharing Framework for Multi-agent Reinforcement Learning under Budget Constraint,ACM Transactions on Autonomous and Adaptive Systems

当前位置： X-MOL 学术 › ACM Trans. Auton. Adapt. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Q-values Sharing Framework for Multi-agent Reinforcement Learning under Budget Constraint
ACM Transactions on Autonomous and Adaptive Systems ( IF 2.2 ) Pub Date : 2021-04-19 , DOI: 10.1145/3447268
Changxi Zhu ₁ , Ho-Fung Leung ₂ , Shuyue Hu ₃ , Yi Cai ₄

Affiliation

In a teacher-student framework, a more experienced agent (teacher) helps accelerate the learning of another agent (student) by suggesting actions to take in certain states. In cooperative multi-agent reinforcement learning (MARL), where agents must cooperate with one another, a student could fail to cooperate effectively with others even by following a teacher’s suggested actions, as the policies of all agents can change before convergence. When the number of times that agents communicate with one another is limited (i.e., there are budget constraints), an advising strategy that uses actions as advice could be less effective. We propose a partaker-sharer advising framework (PSAF) for cooperative MARL agents learning with budget constraints. In PSAF, each Q-learner can decide when to ask for and share its Q-values. We perform experiments in three typical multi-agent learning problems. The evaluation results indicate that the proposed PSAF approach outperforms existing advising methods under both constrained and unconstrained budgets. Moreover, we analyse the influence of advising actions and sharing Q-values on agent learning.

中文翻译：

预算约束下多智能体强化学习的 Q 值共享框架

在师生框架中，更有经验的智能体（教师）通过建议在某些状态下采取的行动来帮助加速另一个智能体（学生）的学习。在协作多智能体强化学习 (MARL) 中，智能体必须相互合作，即使遵循老师建议的行动，学生也可能无法与他人有效合作，因为所有智能体的策略都可能在收敛之前发生变化。当代理相互交流的次数有限（即存在预算限制）时，使用行动作为建议的建议策略可能不太有效。我们提出了一个共享者建议框架（PSAF），用于在预算限制的情况下进行协作 MARL 代理学习。在 PSAF 中，每个 Q-learner 可以决定何时请求和共享其 Q 值。我们在三个典型的多智能体学习问题中进行实验。评估结果表明，所提出的 PSAF 方法在受约束和不受约束的预算下都优于现有的建议方法。此外，我们分析了建议行动和共享 Q 值对智能体学习的影响。

更新日期：2021-04-19

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11