当前位置: X-MOL 学术Mach. Learn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Variational learning from implicit bandit feedback
Machine Learning ( IF 4.3 ) Pub Date : 2021-07-09 , DOI: 10.1007/s10994-021-06028-0
Quoc-Tuan Truong 1 , Hady W. Lauw 1
Affiliation  

Recommendations are prevalent in Web applications (e.g., search ranking, item recommendation, advertisement placement). Learning from bandit feedback is challenging due to the sparsity of feedback limited to system-provided actions. In this work, we focus on batch learning from logs of recommender systems involving both bandit and organic feedbacks. We develop a probabilistic framework with a likelihood function for estimating not only explicit positive observations but also implicit negative observations inferred from the data. Moreover, we introduce a latent variable model for organic-bandit feedbacks to robustly capture user preference distributions. Next, we analyze the behavior of the new likelihood under two scenarios, i.e., with and without counterfactual re-weighting. For speedier item ranking, we further investigate the possibility of using Maximum-a-Posteriori (MAP) estimate instead of Monte Carlo (MC)-based approximation for prediction. Experiments on both real datasets as well as data from a simulation environment show substantial performance improvements over comparable baselines.



中文翻译:

从隐式老虎机反馈中进行变分学习

推荐在 Web 应用程序中很普遍(例如,搜索排名、项目推荐、广告投放)。由于反馈的稀疏性仅限于系统提供的操作,因此从强盗反馈中学习具有挑战性。在这项工作中,我们专注于从涉及强盗和有机反馈的推荐系统日志中进行批量学习。我们开发了一个具有似然函数的概率框架,用于估计不仅显式的正面观察,而且还估计从数据中推断出的隐式负面观察。此外,我们为有机强盗反馈引入了一个潜在变量模型,以稳健地捕获用户偏好分布。接下来,我们分析在两种情况下新似然的行为,即有和没有反事实重新加权。为了更快的项目排名,我们进一步研究了使用最大后验 (MAP) 估计而不是基于蒙特卡罗 (MC) 的近似进行预测的可能性。对真实数据集以及来自模拟环境的数据进行的实验表明,与可比较的基线相比,性能有了实质性的提高。

更新日期:2021-07-09
down
wechat
bug