当前位置: X-MOL 学术arXiv.cs.LG › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Fairness of Exposure in Stochastic Bandits
arXiv - CS - Machine Learning Pub Date : 2021-03-03 , DOI: arxiv-2103.02735
Lequn Wang, Yiwei Bai, Wen Sun, Thorsten Joachims

Contextual bandit algorithms have become widely used for recommendation in online systems (e.g. marketplaces, music streaming, news), where they now wield substantial influence on which items get exposed to the users. This raises questions of fairness to the items -- and to the sellers, artists, and writers that benefit from this exposure. We argue that the conventional bandit formulation can lead to an undesirable and unfair winner-takes-all allocation of exposure. To remedy this problem, we propose a new bandit objective that guarantees merit-based fairness of exposure to the items while optimizing utility to the users. We formulate fairness regret and reward regret in this setting, and present algorithms for both stochastic multi-armed bandits and stochastic linear bandits. We prove that the algorithms achieve sub-linear fairness regret and reward regret. Beyond the theoretical analysis, we also provide empirical evidence that these algorithms can fairly allocate exposure to different arms effectively.

中文翻译:

随机土匪暴露的公平性

上下文强盗算法已被广泛用于在线系统(例如,市场,音乐流,新闻)中的推荐,在该系统中,上下文强盗算法现在对哪些项目暴露给用户具有重大影响。这就提出了对物品的公平性问题,以及从这种接触中受益的卖方,艺术家和作家的公平性问题。我们认为,传统的强盗公式可能会导致不良的,不公平的赢家通吃—所有曝光分配。为了解决这个问题,我们提出了一个新的强盗目标,该目标在保证对用户的效用最优化的同时,保证了基于优点的物品接触公平性。我们在这种情况下制定了公平遗憾和奖励遗憾,并提出了随机多臂匪和随机线性匪的算法。我们证明了该算法实现了亚线性公平后悔和报酬后悔。除了理论分析之外,我们还提供了经验证据,这些算法可以有效地将风险公平地分配给不同部门。
更新日期:2021-03-05
down
wechat
bug