当前位置: X-MOL 学术Int. J. Comput. Sci. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Communication-Efficient Hierarchical Distributed Optimization for Multi-Agent Policy Evaluation
Journal of Computational Science ( IF 3.1 ) Pub Date : 2020-12-26 , DOI: 10.1016/j.jocs.2020.101280
Jineng Ren , Jarvis Haupt , Zehua Guo

Policy evaluation problems in multi-agent reinforcement learning (MARL) have attracted growing interest recently. In this setting, agents collaborate to learn the value of a given policy with private local rewards and jointly observed state-action pairs. However, existing fully decentralized algorithms treat each agent equally, without considering the communication structure of the agents over a given network, and the corresponding effects on communication and computation efficiency. In this paper, we propose a hierarchical distributed algorithm that differentiates the roles of each of the agents during the evaluation process. This method allows us to freely choose various mixing schemes (and corresponding mixing matrices that are not necessarily symmetric or doubly stochastic), in order to reduce the communication and computation cost, while still maintaining convergence at rates as fast as or even faster than the previous distributed algorithms. Theoretically, we show the proposed method, which contains existing distributed methods as a special case, achieves the same order of convergence rate as state-of-the-art methods. Extensive numerical experiments on real datasets verify that the performance of our approach indeed improves - sometimes significantly - over other advanced algorithms in terms of convergence and total communication efficiency.



中文翻译:

多代理策略评估的高效通信高效分层分布式优化

最近,多主体强化学习(MARL)中的策略评估问题引起了越来越多的兴趣。在这种情况下,代理商通过私人本地奖励和共同观察到的国家行为对协作来学习给定策略的价值。但是,现有的完全去中心化算法均等地对待每个代理,而不考虑给定网络上代理的通信结构以及对通信和计算效率的相应影响。在本文中,我们提出了一种分层的分布式算法,该算法可区分评估过程中每个代理的角色。通过这种方法,我们可以自由选择各种混合方案(以及不一定是对称或双重随机的相应混合矩阵),以减少通信和计算成本,同时仍然以与以前的分布式算法一样快甚至更快的速率保持收敛。从理论上讲,我们显示了所提出的方法,其中包含作为特殊情况的现有分布式方法,可以达到与最新方法相同的收敛速度。在真实数据集上进行的大量数值实验证明,在收敛性和总通信效率方面,我们的方法的性能确实比其他高级算法确实有所提高(有时显着)。

更新日期:2020-12-26
down
wechat
bug