当前位置: X-MOL 学术J. Comput. Sci. Tech. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Efficient Multiagent Policy Optimization Based on Weighted Estimators in Stochastic Cooperative Environments
Journal of Computer Science and Technology ( IF 1.9 ) Pub Date : 2020-03-01 , DOI: 10.1007/s11390-020-9967-6
Yan Zheng , Jian-Ye Hao , Zong-Zhang Zhang , Zhao-Peng Meng , Xiao-Tian Hao

Multiagent deep reinforcement learning (MA-DRL) has received increasingly wide attention. Most of the existing MA-DRL algorithms, however, are still inefficient when faced with the non-stationarity due to agents changing behavior consistently in stochastic environments. This paper extends the weighted double estimator to multiagent domains and proposes an MA-DRL framework, named Weighted Double Deep Q-Network (WDDQN). By leveraging the weighted double estimator and the deep neural network, WDDQN can not only reduce the bias effectively but also handle scenarios with raw visual inputs. To achieve efficient cooperation in multiagent domains, we introduce a lenient reward network and scheduled replay strategy. Empirical results show that WDDQN outperforms an existing DRL algorithm (double DQN) and an MA-DRL algorithm (lenient Q-learning) regarding the averaged reward and the convergence speed and is more likely to converge to the Pareto-optimal Nash equilibrium in stochastic cooperative environments.

中文翻译:

随机合作环境下基于加权估计的高效多智能体策略优化

多智能体深度强化学习(MA-DRL)受到越来越广泛的关注。然而,由于代理在随机环境中不断改变行为,大多数现有的 MA-DRL 算法在面对非平稳性时仍然效率低下。本文将加权双估计器扩展到多智能体领域,并提出了一个 MA-DRL 框架,称为加权双深度 Q 网络(WDDQN)。通过利用加权双估计器和深度神经网络,WDDQN 不仅可以有效减少偏差,还可以处理具有原始视觉输入的场景。为了在多智能体领域实现高效合作,我们引入了一个宽松的奖励网络和预定的重放策略。
更新日期:2020-03-01
down
wechat
bug