当前位置: X-MOL 学术arXiv.cs.IR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Model-Based Reinforcement Learning with Adversarial Training for Online Recommendation
arXiv - CS - Information Retrieval Pub Date : 2019-11-10 , DOI: arxiv-1911.03845
Xueying Bai, Jian Guan, Hongning Wang

Reinforcement learning is well suited for optimizing policies of recommender systems. Current solutions mostly focus on model-free approaches, which require frequent interactions with the real environment, and thus are expensive in model learning. Offline evaluation methods, such as importance sampling, can alleviate such limitations, but usually request a large amount of logged data and do not work well when the action space is large. In this work, we propose a model-based reinforcement learning solution which models user-agent interaction for offline policy learning via a generative adversarial network. To reduce bias in the learned model and policy, we use a discriminator to evaluate the quality of generated data and scale the generated rewards. Our theoretical analysis and empirical evaluations demonstrate the effectiveness of our solution in learning policies from the offline and generated data.

中文翻译:

基于模型的强化学习和对抗训练在线推荐

强化学习非常适合优化推荐系统的策略。当前的解决方案主要集中在无模型方法上,这些方法需要与真实环境频繁交互,因此模型学习成本很高。离线评估方法,例如重要性采样,可以缓解这种限制,但通常需要大量记录数据,并且在动作空间较大时效果不佳。在这项工作中,我们提出了一种基于模型的强化学习解决方案,该解决方案通过生成对抗网络为离线策略学习的用户代理交互建模。为了减少学习模型和策略中的偏差,我们使用鉴别器来评估生成数据的质量并缩放生成的奖励。
更新日期:2020-01-22
down
wechat
bug