当前位置: X-MOL 学术arXiv.cs.DC › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Decentralized Inertial Best-Response with Voluntary and Limited Communication in Random Communication Networks
arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2021-06-13 , DOI: arxiv-2106.07079
Sarper Aydın, Ceyhun Eksin

Multiple autonomous agents interact over a random communication network to maximize their individual utility functions which depend on the actions of other agents. We consider decentralized best-response with inertia type algorithms in which agents form beliefs about the future actions of other players based on local information, and take an action that maximizes their expected utility computed with respect to these beliefs or continue to take their previous action. We show convergence of these types of algorithms to a Nash equilibrium in weakly acyclic games under the condition that the belief update and information exchange protocols successfully learn the actions of other players with positive probability in finite time given a static environment, i.e., when other agents' actions do not change. We design a decentralized fictitious play algorithm with voluntary and limited communication (DFP-VL) protocols that satisfy this condition. In the voluntary communication protocol, each agent decides whom to exchange information with by assessing the novelty of its information and the potential effect of its information on others' assessments of their utility functions. The limited communication protocol entails agents sending only their most frequent action to agents that they decide to communicate with. Numerical experiments on a target assignment game demonstrate that the voluntary and limited communication protocol can more than halve the number of communication attempts while retaining the same convergence rate as DFP in which agents constantly attempt to communicate.

中文翻译:

随机通信网络中具有自愿和有限通信的分散惯性最佳响应

多个自治代理通过随机通信网络进行交互,以最大化其依赖于其他代理行为的个体效用函数。我们考虑使用惯性类型算法的分散式最佳响应,其中代理根据本地信息形成关于其他玩家未来行动的信念,并采取行动最大化他们根据这些信念计算的预期效用或继续采取他们之前的行动。我们展示了这些类型的算法在弱非循环博弈中收敛到纳什均衡,条件是信念更新和信息交换协议在给定静态环境的有限时间内以正概率成功学习其他参与者的行为,即当其他代理'动作不变。我们设计了一种分散式虚拟播放算法,具有满足此条件的自愿和有限通信 (DFP-VL) 协议。在自愿通信协议中,每个代理通过评估其信息的新颖性以及其信息对其他人对其效用函数的评估的潜在影响来决定与谁交换信息。受限通信协议要求代理只向他们决定与之通信的代理发送他们最频繁的动作。目标分配游戏的数值实验表明,自愿和有限的通信协议可以将通信尝试次数减少一半以上,同时保持与 DFP 相同的收敛速度,其中代理不断尝试通信。
更新日期:2021-06-15
down
wechat
bug