当前位置: X-MOL 学术Neurocomputing › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A TD3-based Multi-agent Deep Reinforcement Learning Method in Mixed Cooperation-Competition Environment
Neurocomputing ( IF 5.5 ) Pub Date : 2020-10-01 , DOI: 10.1016/j.neucom.2020.05.097
Fengjiao Zhang , Jie Li , Zhi Li

Abstract We explored the problem about function approximation error and complex mission adaptability in multi-agent deep reinforcement learning. This paper proposes a new multi-agent deep reinforcement learning algorithm framework named multi-agent time delayed deep deterministic policy gradient. Our work reduces the overestimation error of neural network approximation and variance of estimation result using dual-centered critic, group target network smoothing and delayed policy updating. According to experiment results, it improves the ability to adapt complex missions eventually. Then, we discuss that there is an inevitable overestimation issue about existing multi-agent algorithms about approximating real action-value equations with neural network. We also explain the approximate error of equations in the multi-agent deep deterministic policy gradient algorithm mathematically and experimentally. Finally, the application of our algorithm in the mixed cooperative competition experimental environment further demonstrates the effectiveness and generalization of our algorithm, especially improving the group’s ability of adapting complex missions and completing more difficult missions.

中文翻译:

混合合作竞争环境下基于TD3的多智能体深度强化学习方法

摘要 我们探讨了多智能体深度强化学习中函数逼近误差和复杂任务适应性的问题。本文提出了一种新的多智能体深度强化学习算法框架,名为多智能体时延深度确定性策略梯度。我们的工作使用双中心批评家、组目标网络平滑和延迟策略更新来减少神经网络逼近的高估误差和估计结果的方差。根据实验结果,最终提高了对复杂任务的适应能力。然后,我们讨论了现有的关于用神经网络逼近真实动作值方程的多智能体算法存在不可避免的高估问题。我们还在数学和实验上解释了多智能体深度确定性策略梯度算法中方程的近似误差。最后,我们的算法在混合合作竞争实验环境中的应用进一步证明了我们算法的有效性和通用性,特别是提高了团队适应复杂任务和完成更高难度任务的能力。
更新日期:2020-10-01
down
wechat
bug