Variational learning from implicit bandit feedback,Machine Learning

当前位置： X-MOL 学术 › Mach. Learn. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Variational learning from implicit bandit feedback
Machine Learning ( IF 4.3 ) Pub Date : 2021-07-09 , DOI: 10.1007/s10994-021-06028-0
Quoc-Tuan Truong ₁ , Hady W. Lauw ₁

Affiliation

Recommendations are prevalent in Web applications (e.g., search ranking, item recommendation, advertisement placement). Learning from bandit feedback is challenging due to the sparsity of feedback limited to system-provided actions. In this work, we focus on batch learning from logs of recommender systems involving both bandit and organic feedbacks. We develop a probabilistic framework with a likelihood function for estimating not only explicit positive observations but also implicit negative observations inferred from the data. Moreover, we introduce a latent variable model for organic-bandit feedbacks to robustly capture user preference distributions. Next, we analyze the behavior of the new likelihood under two scenarios, i.e., with and without counterfactual re-weighting. For speedier item ranking, we further investigate the possibility of using Maximum-a-Posteriori (MAP) estimate instead of Monte Carlo (MC)-based approximation for prediction. Experiments on both real datasets as well as data from a simulation environment show substantial performance improvements over comparable baselines.

中文翻译：

从隐式老虎机反馈中进行变分学习

推荐在 Web 应用程序中很普遍（例如，搜索排名、项目推荐、广告投放）。由于反馈的稀疏性仅限于系统提供的操作，因此从强盗反馈中学习具有挑战性。在这项工作中，我们专注于从涉及强盗和有机反馈的推荐系统日志中进行批量学习。我们开发了一个具有似然函数的概率框架，用于估计不仅显式的正面观察，而且还估计从数据中推断出的隐式负面观察。此外，我们为有机强盗反馈引入了一个潜在变量模型，以稳健地捕获用户偏好分布。接下来，我们分析在两种情况下新似然的行为，即有和没有反事实重新加权。为了更快的项目排名，我们进一步研究了使用最大后验 (MAP) 估计而不是基于蒙特卡罗 (MC) 的近似进行预测的可能性。对真实数据集以及来自模拟环境的数据进行的实验表明，与可比较的基线相比，性能有了实质性的提高。

更新日期：2021-07-09

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11