Collaborative Thompson Sampling,Mobile Networks and Applications

当前位置： X-MOL 学术 › Mobile Netw. Appl. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Collaborative Thompson Sampling
Mobile Networks and Applications ( IF 2.3 ) Pub Date : 2020-02-10 , DOI: 10.1007/s11036-019-01453-x
Zhenyu Zhu , Liusheng Huang , Hongli Xu

Thompson sampling is one of the most effective strategies to balance exploration-exploitation trade-off. It has been applied in a variety of domains and achieved remarkable success. Thompson sampling makes decisions in a noisy but stationary environment by accumulating uncertain information over time to improve prediction accuracy. In highly dynamic domains, however, the environment undergoes frequent and unpredictable changes. Making decisions in such an environment should rely on current information. Therefore, standard Thompson sampling may perform poorly in these domains. Here we present collaborative Thompson sampling to apply the exploration-exploitation strategy to highly dynamic settings. The algorithm takes collaborative effects into account by dynamically clustering users into groups, and the feedback of all users in the same group will help to estimate the expected reward in the current context to find the optimal choice. Incorporating collaborative effects into Thompson sampling allows to capture real-time changes of the environment and adjust decision making strategy accordingly. We compare our algorithm with standard Thompson sampling algorithms on two real-world datasets. Our algorithm shows accelerated convergence and improved prediction performance in collaborative environments. We also provide regret analyses of our algorithm in both contextual and non-contextual settings.

中文翻译：

汤普森协作抽样

汤普森采样是平衡勘探与开发权衡的最有效策略之一。它已被广泛应用于各个领域，并取得了巨大的成功。汤普森采样通过在一段时间内累积不确定信息来提高预测准确性，从而在嘈杂但平稳的环境中做出决策。但是，在高度动态的域中，环境会发生频繁且不可预测的变化。在这样的环境中做出决定应该依靠当前的信息。因此，标准的汤普森采样可能在这些领域中表现不佳。在这里，我们介绍了汤普森（Thompson）合作采样，将勘探开发策略应用于高度动态的环境。该算法通过将用户动态分组为群组来考虑协作效果，并且同一组中所有用户的反馈将有助于估计当前情况下的预期奖励，以找到最佳选择。将协同效应整合到Thompson采样中，可以捕获环境的实时变化并相应地调整决策策略。我们在两个真实的数据集上将我们的算法与标准的汤普森采样算法进行了比较。我们的算法在协作环境中显示出加速的收敛和改进的预测性能。我们还将在上下文和非上下文设置中对我们的算法进行遗憾的分析。我们在两个真实的数据集上将我们的算法与标准的汤普森采样算法进行了比较。我们的算法在协作环境中显示出加速的收敛和改进的预测性能。我们还将在上下文和非上下文设置中对我们的算法进行遗憾的分析。我们在两个真实的数据集上将我们的算法与标准的汤普森采样算法进行了比较。我们的算法在协作环境中显示出加速的收敛和改进的预测性能。我们还将在上下文和非上下文设置中对我们的算法进行遗憾的分析。

更新日期：2020-02-10

点击分享查看原文

点击收藏

阅读更多本刊最新论文