Decentralized Incremental Fuzzy Reinforcement Learning for Multi-Agent Systems,International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems

当前位置： X-MOL 学术 › Int. J. Uncertain. Fuzziness Knowl. Based Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Decentralized Incremental Fuzzy Reinforcement Learning for Multi-Agent Systems
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems ( IF 1.5 ) Pub Date : 2020-01-13 , DOI: 10.1142/s021848852050004x
Sam Hamzeloo ₁ , Mansoor Zolghadri Jahromi ₁

Affiliation

We present a new incremental fuzzy reinforcement learning algorithm to find a sub-optimal policy for infinite-horizon Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs). The algorithm addresses the high computational complexity of solving large Dec-POMDPs by generating a compact fuzzy rule-base for each agent. In our method, each agent uses its own fuzzy rule-base to make the decisions. The fuzzy rules in these rule-bases are incrementally created and tuned according to experiences of the agents. Reinforcement learning is used to tune the behavior of each agent in such a way that maximum global reward is achieved. In addition, we propose a method to construct the initial rule-base for each agent using the solution of the underlying MDP. This drastically improves the performance of the algorithm in comparison with random initialization of the rule-base. We assess the performance of our proposed method using several benchmark problems in comparison with some state-of-the-art methods. Experimental results show that our algorithm achieves better or similar reward when compared with other methods. However, from the runtime point of view, our method is superior to all previous methods. Using a compact fuzzy rule-base not only decreases the amount of memory used but also significantly speeds up the learning phase.

中文翻译：

多智能体系统的分散式增量模糊强化学习

我们提出了一种新的增量模糊强化学习算法，以找到无限范围分散部分可观察马尔可夫决策过程 (Dec-POMDPs) 的次优策略。该算法通过为每个代理生成紧凑的模糊规则库来解决解决大型 Dec-POMDP 的高计算复杂性问题。在我们的方法中，每个代理都使用自己的模糊规则库来做出决策。这些规则库中的模糊规则是根据代理的经验增量创建和调整的。强化学习用于调整每个代理的行为，以实现最大的全局奖励。此外，我们提出了一种使用底层 MDP 的解决方案为每个代理构建初始规则库的方法。与规则库的随机初始化相比，这极大地提高了算法的性能。与一些最先进的方法相比，我们使用几个基准问题来评估我们提出的方法的性能。实验结果表明，与其他方法相比，我们的算法获得了更好或相似的奖励。但是，从运行时的角度来看，我们的方法优于以前的所有方法。使用紧凑的模糊规则库不仅减少了使用的内存量，而且显着加快了学习阶段。但是，从运行时的角度来看，我们的方法优于以前的所有方法。使用紧凑的模糊规则库不仅减少了使用的内存量，而且显着加快了学习阶段。但是，从运行时的角度来看，我们的方法优于以前的所有方法。使用紧凑的模糊规则库不仅减少了使用的内存量，而且显着加快了学习阶段。

更新日期：2020-01-13

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>