当前位置: X-MOL 学术arXiv.cs.GT › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Decentralized Reinforcement Learning: Global Decision-Making via Local Economic Transactions
arXiv - CS - Computer Science and Game Theory Pub Date : 2020-07-05 , DOI: arxiv-2007.02382
Michael Chang, Sidhant Kaushik, S. Matthew Weinberg, Thomas L. Griffiths, Sergey Levine

This paper seeks to establish a framework for directing a society of simple, specialized, self-interested agents to solve what traditionally are posed as monolithic single-agent sequential decision problems. What makes it challenging to use a decentralized approach to collectively optimize a central objective is the difficulty in characterizing the equilibrium strategy profile of non-cooperative games. To overcome this challenge, we design a mechanism for defining the learning environment of each agent for which we know that the optimal solution for the global objective coincides with a Nash equilibrium strategy profile of the agents optimizing their own local objectives. The society functions as an economy of agents that learn the credit assignment process itself by buying and selling to each other the right to operate on the environment state. We derive a class of decentralized reinforcement learning algorithms that are broadly applicable not only to standard reinforcement learning but also for selecting options in semi-MDPs and dynamically composing computation graphs. Lastly, we demonstrate the potential advantages of a society's inherent modular structure for more efficient transfer learning.

中文翻译:

分散强化学习:通过本地经济交易进行全球决策

本文旨在建立一个框架,以指导一个由简单、专业、自利的代理组成的社会,以解决传统上被认为是单一的单代理顺序决策问题。使用分散方法来共同优化中心目标的挑战在于难以表征非合作博弈的均衡策略概况。为了克服这一挑战,我们设计了一种机制来定义每个代理的学习环境,我们知道全局目标的最佳解决方案与代理优化其本地目标的纳什均衡策略配置文件一致。社会充当代理经济,通过相互购买和出售对环境状态的操作权来学习信用分配过程本身。我们推导出一类分散式强化学习算法,这些算法不仅广泛适用于标准强化学习,还适用于选择半 MDP 中的选项和动态组合计算图。最后,我们展示了社会固有的模块化结构对于更有效的迁移学习的潜在优势。
更新日期:2020-08-17
down
wechat
bug