当前位置: X-MOL 学术IEEE Trans. Signal Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multi-Player Multi-Armed Bandits With Collision-Dependent Reward Distributions
IEEE Transactions on Signal Processing ( IF 4.6 ) Pub Date : 2021-07-01 , DOI: 10.1109/tsp.2021.3093261
Chengshuai Shi , Cong Shen

We study a new stochastic multi-player multi-armed bandits (MP-MAB) problem, where the reward distribution changes if a collision occurs on the arm. Existing literature always assumes a zero reward for involved players if collision happens, but for applications such as cognitive radio, the more realistic scenario is that collision reduces the mean reward but not necessarily to zero. We focus on the more practical no-sensing setting where players do not perceive collisions directly, and propose the Error-Correction Collision Communication (EC3) algorithm that models implicit communication as a reliable communication over noisy channel problem, for which random coding error exponent is used to establish the optimal regret that no communication protocol can beat. Finally, optimizing the tradeoff between code length and decoding error rate leads to a regret that approaches the centralized MP-MAB regret, which represents a natural lower bound. Experiments with practical error-correction codes on both synthetic and real-world datasets demonstrate the superiority of EC3. In particular, the results show that the choice of coding schemes has a profound impact on the regret performance.

中文翻译:


具有碰撞相关奖励分配的多人多臂强盗



我们研究了一种新的随机多人多臂老虎机 (MP-MAB) 问题,其中如果手臂发生碰撞,奖励分布就会发生变化。现有文献总是假设如果发生碰撞,参与的玩家的奖励为零,但对于认知无线电等应用来说,更现实的场景是碰撞会降低平均奖励,但不一定为零。我们专注于更实用的无感知设置,其中玩家不直接感知碰撞,并提出了纠错碰撞通信(EC3)算法,该算法将隐式通信建模为噪声信道问题上的可靠通信,其中随机编码误差指数为用于建立任何通信协议都无法超越的最佳遗憾。最后,优化代码长度和解码错误率之间的权衡会导致遗憾接近集中式 MP-MAB 遗憾,这代表了自然下限。在合成数据集和真实数据集上进行的实用纠错码实验证明了 EC3 的优越性。特别是,结果表明编码方案的选择对遗憾性能有着深远的影响。
更新日期:2021-07-01
down
wechat
bug