当前位置: X-MOL 学术Int. J. Fuzzy Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Fuzzy Curiosity-Driven Mechanism for Multi-Agent Reinforcement Learning
International Journal of Fuzzy Systems ( IF 3.6 ) Pub Date : 2021-02-13 , DOI: 10.1007/s40815-020-01035-0
Wenbai Chen , Haobin Shi , Jingchen Li , Kao-Shing Hwang

Many works provide intrinsic rewards to deal with sparse rewards in reinforcement learning. Due to the non-stationarity of multi-agent systems, it is impracticable to apply existing methods to multi-agent reinforcement learning directly. In this paper, a fuzzy curiosity-driven mechanism is proposed for multi-agent reinforcement learning, by which agents can explore more efficiently in a scenario with sparse extrinsic reward. First, we improve the variational auto-encoder to predict the next state through the joint-state and joint-action for agents. Then several fuzzy partitions are built according to the next joint-state, which aims at assigning the prediction error to different agents. With the proposed method, each agent in the multi-agent environment receives its individual intrinsic reward. We elaborate on the proposed method in partially observable environments and fully observable environments separately. Experimental results show that multi-agent learns joint policies more efficiently by the proposed fuzzy curiosity-driven mechanism, and it can also help agents find better policies in the training process.



中文翻译:

多主体强化学习的模糊好奇心驱动机制

许多作品提供内在奖励,以应对强化学习中的稀疏奖励。由于多智能体系统的不稳定性,将现有方法直接应用于多智能体强化学习是不切实际的。本文提出了一种基于模糊好奇心驱动的多智能体强化学习机制,通过这种机制,智能体可以在稀疏外部奖励的情况下更有效地进行探索。首先,我们改进了变分自动编码器,以通过代理的联合状态和联合动作来预测下一个状态。然后根据下一个联合状态建立几个模糊分区,目的是将预测误差分配给不同的主体。使用所提出的方法,多主体环境中的每个主体都会收到其各自的内在奖励。我们分别在部分可观察的环境和完全可观察的环境中详细阐述了所提出的方法。实验结果表明,多智能体通过模糊好奇心驱动机制可以更有效地学习联合策略,也可以帮助智能体在训练过程中找到更好的策略。

更新日期:2021-02-15
down
wechat
bug