当前位置: X-MOL 学术arXiv.cs.MA › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Reward Design in Cooperative Multi-agent Reinforcement Learning for Packet Routing
arXiv - CS - Multiagent Systems Pub Date : 2020-03-05 , DOI: arxiv-2003.03433
Hangyu Mao, Zhibo Gong, and Zhen Xiao

In cooperative multi-agent reinforcement learning (MARL), how to design a suitable reward signal to accelerate learning and stabilize convergence is a critical problem. The global reward signal assigns the same global reward to all agents without distinguishing their contributions, while the local reward signal provides different local rewards to each agent based solely on individual behavior. Both of the two reward assignment approaches have some shortcomings: the former might encourage lazy agents, while the latter might produce selfish agents. In this paper, we study reward design problem in cooperative MARL based on packet routing environments. Firstly, we show that the above two reward signals are prone to produce suboptimal policies. Then, inspired by some observations and considerations, we design some mixed reward signals, which are off-the-shelf to learn better policies. Finally, we turn the mixed reward signals into the adaptive counterparts, which achieve best results in our experiments. Other reward signals are also discussed in this paper. As reward design is a very fundamental problem in RL and especially in MARL, we hope that MARL researchers can rethink the rewards used in their systems.

中文翻译:

分组路由的协同多智能体强化学习中的奖励设计

在协同多智能体强化学习(MARL)中,如何设计合适的奖励信号来加速学习并稳定收敛是一个关键问题。全局奖励信号为所有智能体分配相同的全局奖励,而不区分它们的贡献,而局部奖励信号仅根据个人行为为每个智能体提供不同的局部奖励。这两种奖励分配方法都有一些缺点:前者可能会鼓励懒惰的代理,而后者可能会产生自私的代理。在本文中,我们研究了基于分组路由环境的协作 MARL 中的奖励设计问题。首先,我们表明上述两个奖励信号容易产生次优策略。然后,受一些观察和考虑的启发,我们设计了一些混合奖励信号,这是现成的,可以学习更好的政策。最后,我们将混合奖励信号转换为自适应对应信号,这在我们的实验中取得了最佳结果。本文还讨论了其他奖励信号。由于奖励设计是 RL 中一个非常基本的问题,尤其是在 MARL 中,我们希望 MARL 研究人员能够重新思考他们系统中使用的奖励。
更新日期:2020-03-10
down
wechat
bug