当前位置: X-MOL 学术Int. J. Fuzzy Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multi-Agent Reward-Iteration Fuzzy Q-Learning
International Journal of Fuzzy Systems ( IF 3.6 ) Pub Date : 2021-04-13 , DOI: 10.1007/s40815-021-01063-4
Lixiong Leng , Jingchen Li , Jinhui Zhu , Kao-Shing Hwang , Haobin Shi

Fuzzy Q-learning extends Q-learning to continuous state space and has been applied to a wide range of applications such as robot control. But in a multi-agent system, the non-stationary environment makes joint policy challenging to converge. To give agents more suitable rewards in a multi-agent environment, a multi-agent reward-iteration fuzzy Q-learning (RIFQ) is proposed for multi-agent cooperative tasks. The state space is divided into three channels by the proposed state-divider with fuzzy logic. The reward of an agent is reshaped iteratively according to its state, and the update sequence is constructed by calculating the relation among states of different agents. Then, the value functions are updated top-down. By replacing the reward given by the environment with the reshaped reward, agents can avoid the most unreasonable punishments and receive rewards selectively. RIFQ provides a feasible reward relationship for multi-agents, which makes the training of multi-agent more steady. Several simulation experiments show that RIFQ is not limited by the number of agents and has a faster convergence speed than baselines.



中文翻译:

多Agent奖励迭代模糊Q学习

模糊Q学习将Q学习扩展到连续状态空间,并已被广泛应用于机器人控制等应用中。但是在多主体系统中,非平稳环境使联合政策难以融合。为了在多智能体环境中为智能体提供更合适的奖励,提出了一种用于多智能体协作任务的多智能体奖励迭代模糊Q学习(RIFQ)。所提出的带有模糊逻辑的状态划分器将状态空间划分为三个通道。代理的奖励根据其状态进行迭代地重塑,并通过计算不同代理的状态之间的关系来构造更新序列。然后,值函数自上而下更新。通过将环境给予的奖励替换为重塑的奖励,特工可以避免最不合理的惩罚,并有选择地获得奖励。RIFQ为多主体提供了可行的奖励关系,使多主体的训练更加稳定。几个模拟实验表明,RIFQ不受代理数量的限制,并且收敛速度比基线快。

更新日期:2021-04-13
down
wechat
bug