当前位置: X-MOL 学术arXiv.cs.IR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Reinforcement learning based recommender systems: A survey
arXiv - CS - Information Retrieval Pub Date : 2021-01-15 , DOI: arxiv-2101.06286
M. Mehdi Afsar, Trafford Crump, Behrouz Far

Recommender systems (RSs) are becoming an inseparable part of our everyday lives. They help us find our favorite items to purchase, our friends on social networks, and our favorite movies to watch. Traditionally, the recommendation problem was considered as a simple classification or prediction problem; however, the sequential nature of the recommendation problem has been shown. Accordingly, it can be formulated as a Markov decision process (MDP) and reinforcement learning (RL) methods can be employed to solve it. In fact, recent advances in combining deep learning with traditional RL methods, i.e. deep reinforcement learning (DRL), has made it possible to apply RL to the recommendation problem with massive state and action spaces. In this paper, a survey on reinforcement learning based recommender systems (RLRSs) is presented. We first recognize the fact that algorithms developed for RLRSs can be generally classified into RL- and DRL-based methods. Then, we present these RL- and DRL-based methods in a classified manner based on the specific RL algorithm, e.g., Q-learning, SARSA, and REINFORCE, that is used to optimize the recommendation policy. Furthermore, some tables are presented that contain detailed information about the MDP formulation of these methods, as well as about their evaluation schemes. Finally, we discuss important aspects and challenges that can be addressed in the future.

中文翻译:

基于强化学习的推荐系统:一项调查

推荐系统(RS)成为我们日常生活中不可分割的一部分。他们帮助我们找到了我们最喜欢的商品,社交网络上的朋友以及我们喜欢的电影。传统上,推荐问题被视为简单的分类或预测问题。但是,已显示推荐问题的顺序性质。因此,可以将其公式化为马尔可夫决策过程(MDP),并可以采用强化学习(RL)方法来解决它。实际上,将深度学习与传统RL方法(即深度强化学习(DRL))相结合的最新进展使得将RL应用于具有大量状态和动作空间的推荐问题成为可能。在本文中,对基于强化学习的推荐系统(RLRS)进行了调查。我们首先认识到为RLRS开发的算法通常可以分为基于RL和DRL的方法。然后,我们基于特定的RL算法(例如Q学习,SARSA和REINFORCE)以分类的方式介绍这些基于RL和DRL的方法,这些方法用于优化推荐策略。此外,还提供了一些表格,其中包含有关这些方法的MDP公式及其评估方案的详细信息。最后,我们讨论了将来可以解决的重要方面和挑战。用于优化推荐政策。此外,还提供了一些表格,其中包含有关这些方法的MDP公式及其评估方案的详细信息。最后,我们讨论了将来可以解决的重要方面和挑战。用于优化推荐政策。此外,还提供了一些表格,其中包含有关这些方法的MDP公式及其评估方案的详细信息。最后,我们讨论了将来可以解决的重要方面和挑战。
更新日期:2021-01-19
down
wechat
bug