当前位置:
X-MOL 学术
›
arXiv.cs.AI
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Fair Algorithms for Multi-Agent Multi-Armed Bandits
arXiv - CS - Artificial Intelligence Pub Date : 2020-07-13 , DOI: arxiv-2007.06699 Safwan Hossain, Evi Micha, Nisarg Shah
arXiv - CS - Artificial Intelligence Pub Date : 2020-07-13 , DOI: arxiv-2007.06699 Safwan Hossain, Evi Micha, Nisarg Shah
We propose a multi-agent variant of the classical multi-armed bandit problem,
in which there are N agents and K arms, and pulling an arm generates a
(possibly different) stochastic reward to each agent. Unlike the classical
multi-armed bandit problem, the goal is not to learn the "best arm", as each
agent may perceive a different arm as best for her. Instead, we seek to learn a
fair distribution over arms. Drawing on a long line of research in economics
and computer science, we use the Nash social welfare as our notion of fairness.
We design multi-agent variants of three classic multi-armed bandit algorithms,
and show that they achieve sublinear regret, now measured in terms of the Nash
social welfare.
中文翻译:
Multi-Agent Multi-Armed Bandits 的公平算法
我们提出了经典多臂老虎机问题的多智能体变体,其中有 N 个智能体和 K 个臂,并且拉动一只臂会为每个智能体生成(可能不同的)随机奖励。与经典的多臂老虎机问题不同,目标不是学习“最好的手臂”,因为每个智能体可能认为不同的手臂最适合她。相反,我们寻求学习武器的公平分配。借鉴经济学和计算机科学的长期研究,我们使用纳什社会福利作为我们的公平概念。我们设计了三种经典多臂老虎机算法的多智能体变体,并表明它们实现了亚线性后悔,现在用纳什社会福利来衡量。
更新日期:2020-07-15
中文翻译:
Multi-Agent Multi-Armed Bandits 的公平算法
我们提出了经典多臂老虎机问题的多智能体变体,其中有 N 个智能体和 K 个臂,并且拉动一只臂会为每个智能体生成(可能不同的)随机奖励。与经典的多臂老虎机问题不同,目标不是学习“最好的手臂”,因为每个智能体可能认为不同的手臂最适合她。相反,我们寻求学习武器的公平分配。借鉴经济学和计算机科学的长期研究,我们使用纳什社会福利作为我们的公平概念。我们设计了三种经典多臂老虎机算法的多智能体变体,并表明它们实现了亚线性后悔,现在用纳什社会福利来衡量。