当前位置: X-MOL 学术arXiv.cs.MA › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Differential Advising in Multi-Agent Reinforcement Learning
arXiv - CS - Multiagent Systems Pub Date : 2020-11-07 , DOI: arxiv-2011.03640
Dayong Ye and Tianqing Zhu and Zishuo Cheng and Wanlei Zhou and Philip S. Yu

Agent advising is one of the main approaches to improve agent learning performance by enabling agents to share advice. Existing advising methods have a common limitation that an adviser agent can offer advice to an advisee agent only if the advice is created in the same state as the advisee's concerned state. However, in complex environments, it is a very strong requirement that two states are the same, because a state may consist of multiple dimensions and two states being the same means that all these dimensions in the two states are correspondingly identical. Therefore, this requirement may limit the applicability of existing advising methods to complex environments. In this paper, inspired by the differential privacy scheme, we propose a differential advising method which relaxes this requirement by enabling agents to use advice in a state even if the advice is created in a slightly different state. Compared with existing methods, agents using the proposed method have more opportunity to take advice from others. This paper is the first to adopt the concept of differential privacy on advising to improve agent learning performance instead of addressing security issues. The experimental results demonstrate that the proposed method is more efficient in complex environments than existing methods.

中文翻译:

多智能体强化学习中的差异化建议

代理建议是通过使代理共享建议来提高代理学习性能的主要方法之一。现有的建议方法有一个共同的限制,即只有当建议在与被建议者的相关状态相同的状态下创建时,顾问代理才能向被建议代理提供建议。但是,在复杂的环境中,两个状态相同是非常强烈的要求,因为一个状态可能由多个维度组成,两个状态相同意味着两个状态中的所有这些维度都对应相同。因此,此要求可能会限制现有建议方法对复杂环境的适用性。在本文中,受差分隐私方案的启发,我们提出了一种差异化建议方法,该方法通过使代理能够在某个状态下使用建议来放宽这一要求,即使建议是在稍微不同的状态下创建的。与现有方法相比,使用所提出方法的代理有更多机会听取他人的建议。本文首次采用差异隐私的概念来建议提高代理学习性能而不是解决安全问题。实验结果表明,所提出的方法在复杂环境中比现有方法更有效。本文首次采用差异隐私的概念来建议提高代理学习性能而不是解决安全问题。实验结果表明,所提出的方法在复杂环境中比现有方法更有效。本文首次采用差异隐私的概念来建议提高代理学习性能而不是解决安全问题。实验结果表明,所提出的方法在复杂环境中比现有方法更有效。
更新日期:2020-11-10
down
wechat
bug