当前位置: X-MOL 学术arXiv.cs.GT › 论文详情
Game of Thrones: Fully Distributed Learning for Multi-Player Bandits
arXiv - CS - Computer Science and Game Theory Pub Date : 2018-10-26 , DOI: arxiv-1810.11162
Ilai Bistritz; Amir Leshem

We consider an N-player multi-armed bandit game where each player chooses one out of M arms for T turns. Each player has different expected rewards for the arms, and the instantaneous rewards are independent and identically distributed or Markovian. When two or more players choose the same arm, they all receive zero reward. Performance is measured using the expected sum of regrets, compared to optimal assignment of arms to players that maximizes the sum of expected rewards. We assume that each player only knows her actions and the reward she received each turn. Players cannot observe the actions of other players, and no communication between players is possible. We present a distributed algorithm and prove that it achieves an expected sum of regrets of near-O\left(\log T\right). This is the first algorithm to achieve a near order optimal regret in this fully distributed scenario. All other works have assumed that either all players have the same vector of expected rewards or that communication between players is possible.
更新日期:2020-01-14

 

全部期刊列表>>
欢迎访问IOP中国网站
自然职场线上招聘会
GIANT
产业、创新与基础设施
自然科研线上培训服务
材料学研究精选
胸腔和胸部成像专题
屿渡论文,编辑服务
何川
苏昭铭
陈刚
姜涛
李闯创
李刚
北大
隐藏1h前已浏览文章
课题组网站
新版X-MOL期刊搜索和高级搜索功能介绍
ACS材料视界
天合科研
x-mol收录
上海纽约大学
张健
陈芬儿
厦门大学
史大永
吉林大学
卓春祥
张昊
杨中悦
试剂库存
down
wechat
bug