Robust Multi-Agent Multi-Armed Bandits,arXiv - CS - Social and Information Networks

当前位置： X-MOL 学术 › arXiv.cs.SI › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Robust Multi-Agent Multi-Armed Bandits
arXiv - CS - Social and Information Networks Pub Date : 2020-07-07 , DOI: arxiv-2007.03812
Daniel Vial, Sanjay Shakkottai, R. Srikant

There has been recent interest in collaborative multi-agent bandits, where groups of agents share recommendations to decrease per-agent regret. However, these works assume that each agent always recommends their individual best-arm estimates to other agents, which is unrealistic in envisioned applications (machine faults in distributed computing or spam in social recommendation systems). Hence, we generalize the setting to include honest and malicious agents who recommend best-arm estimates and arbitrary arms, respectively. We show that even with a single malicious agent, existing collaboration-based algorithms fail to improve regret guarantees over a single-agent baseline. We propose a scheme where honest agents learn who is malicious and dynamically reduce communication with them, i.e., "blacklist" them. We show that collaboration indeed decreases regret for this algorithm, when the number of malicious agents is small compared to the number of arms, and crucially without assumptions on the malicious agents' behavior. Thus, our algorithm is robust against any malicious recommendation strategy.

中文翻译：

强大的多智能体多臂强盗

最近对协作多代理强盗产生了兴趣，其中代理组共享建议以减少每个代理的后悔。然而，这些工作假设每个代理总是向其他代理推荐他们各自的最佳臂估计，这在预想的应用程序中是不现实的（分布式计算中的机器故障或社交推荐系统中的垃圾邮件）。因此，我们将设置概括为包括分别推荐最佳臂估计和任意臂的诚实和恶意代理。我们表明，即使使用单个恶意代理，现有的基于协作的算法也无法改善单代理基线的后悔保证。我们提出了一种方案，让诚实的代理了解谁是恶意的并动态减少与他们的通信，即“黑名单”他们。我们表明，当恶意代理的数量与武器的数量相比时，协作确实减少了对该算法的遗憾，而且至关重要的是没有对恶意代理的行为进行假设。因此，我们的算法对任何恶意推荐策略都是健壮的。

更新日期：2020-07-09

点击分享查看原文

点击收藏

阅读更多本刊最新论文