Solving Graph-based Public Good Games with Tree Search and Imitation Learning,arXiv - CS - Computer Science and Game Theory

当前位置： X-MOL 学术 › arXiv.cs.GT › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Solving Graph-based Public Good Games with Tree Search and Imitation Learning
arXiv - CS - Computer Science and Game Theory Pub Date : 2021-06-12 , DOI: arxiv-2106.06762
Victor-Alexandru Darvariu, Stephen Hailes, Mirco Musolesi

Public goods games represent insightful settings for studying incentives for individual agents to make contributions that, while costly for each of them, benefit the wider society. In this work, we adopt the perspective of a central planner with a global view of a network of self-interested agents and the goal of maximizing some desired property in the context of a best-shot public goods game. Existing algorithms for this known NP-complete problem find solutions that are sub-optimal and cannot optimize for criteria other than social welfare. In order to efficiently solve public goods games, our proposed method directly exploits the correspondence between equilibria and the Maximal Independent Set (mIS) structural property of graphs. In particular, we define a Markov Decision Process, which incrementally generates an mIS, and adopt a planning method to search for equilibria, outperforming existing methods. Furthermore, we devise an imitation learning technique that uses demonstrations of the search to obtain a graph neural network parametrized policy which quickly generalizes to unseen game instances. Our evaluation results show that this policy is able to reach 99.5% of the performance of the planning method while being approximately three orders of magnitude faster to evaluate on the largest graphs tested. The methods presented in this work can be applied to a large class of public goods games of potentially high societal impact.

中文翻译：

用树搜索和模仿学习解决基于图的公益游戏

公共产品游戏代表了研究激励个体代理做出贡献的有见地的设置，虽然对他们每个人来说都代价高昂，但造福于更广泛的社会。在这项工作中，我们采用了中央规划者的视角，该视角具有自利代理网络的全局视图，以及在最佳公共产品博弈的背景下最大化某些所需财产的目标。这个已知的 NP 完全问题的现有算法找到了次优的解决方案，并且不能针对社会福利以外的标准进行优化。为了有效地解决公共物品博弈，我们提出的方法直接利用了平衡与图的最大独立集（mIS）结构特性之间的对应关系。特别地，我们定义了一个马尔可夫决策过程，它递增地生成一个 mIS，并采用规划方法寻找平衡点，优于现有方法。此外，我们设计了一种模仿学习技术，该技术使用搜索演示来获得图神经网络参数化策略，该策略可快速推广到看不见的游戏实例。我们的评估结果表明，该策略能够达到规划方法性能的 99.5%，同时在测试的最大图上的评估速度大约快三个数量级。这项工作中提出的方法可以应用于具有潜在高社会影响的一大类公共产品游戏。我们设计了一种模仿学习技术，该技术使用搜索演示来获得图神经网络参数化策略，该策略可快速推广到看不见的游戏实例。我们的评估结果表明，该策略能够达到规划方法性能的 99.5%，同时在测试的最大图上的评估速度大约快三个数量级。这项工作中提出的方法可以应用于具有潜在高社会影响的一大类公共产品游戏。我们设计了一种模仿学习技术，该技术使用搜索演示来获得图神经网络参数化策略，该策略可快速推广到看不见的游戏实例。我们的评估结果表明，该策略能够达到规划方法性能的 99.5%，同时在测试的最大图上的评估速度大约快三个数量级。这项工作中提出的方法可以应用于具有潜在高社会影响的一大类公共产品游戏。

更新日期：2021-06-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文