A Case Study on Sampling Strategies for Evaluating Neural Sequential Item Recommendation Models,arXiv - CS - Information Retrieval

当前位置： X-MOL 学术 › arXiv.cs.IR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Case Study on Sampling Strategies for Evaluating Neural Sequential Item Recommendation Models
arXiv - CS - Information Retrieval Pub Date : 2021-07-27 , DOI: arxiv-2107.13045
Alexander Dallmann, Daniel Zoller, Andreas Hotho

At the present time, sequential item recommendation models are compared by calculating metrics on a small item subset (target set) to speed up computation. The target set contains the relevant item and a set of negative items that are sampled from the full item set. Two well-known strategies to sample negative items are uniform random sampling and sampling by popularity to better approximate the item frequency distribution in the dataset. Most recently published papers on sequential item recommendation rely on sampling by popularity to compare the evaluated models. However, recent work has already shown that an evaluation with uniform random sampling may not be consistent with the full ranking, that is, the model ranking obtained by evaluating a metric using the full item set as target set, which raises the question whether the ranking obtained by sampling by popularity is equal to the full ranking. In this work, we re-evaluate current state-of-the-art sequential recommender models from the point of view, whether these sampling strategies have an impact on the final ranking of the models. We therefore train four recently proposed sequential recommendation models on five widely known datasets. For each dataset and model, we employ three evaluation strategies. First, we compute the full model ranking. Then we evaluate all models on a target set sampled by the two different sampling strategies, uniform random sampling and sampling by popularity with the commonly used target set size of 100, compute the model ranking for each strategy and compare them with each other. Additionally, we vary the size of the sampled target set. Overall, we find that both sampling strategies can produce inconsistent rankings compared with the full ranking of the models. Furthermore, both sampling by popularity and uniform random sampling do not consistently produce the same ranking ...

中文翻译：

评估神经序列项目推荐模型的抽样策略案例研究

目前，通过计算小项目子集（目标集）的度量来比较顺序项目推荐模型以加快计算速度。目标集包含相关项目和从完整项目集中采样的一组负项目。两种众所周知的对负项目进行抽样的策略是均匀随机抽样和按流行度抽样，以更好地近似数据集中的项目频率分布。最近发表的关于顺序项目推荐的论文依赖于按流行度抽样来比较评估模型。然而，最近的工作已经表明，均匀随机抽样的评估可能与完整排名不一致，即使用完整项目集作为目标集评估一个度量获得的模型排名，这就提出了一个问题，通过流行度抽样得到的排名是否等于完整排名。在这项工作中，我们从这些采样策略是否对模型的最终排名产生影响的角度重新评估了当前最先进的顺序推荐模型。因此，我们在五个广为人知的数据集上训练了四个最近提出的顺序推荐模型。对于每个数据集和模型，我们采用三种评估策略。首先，我们计算完整的模型排名。然后，我们在由两种不同采样策略（均匀随机采样和按流行度采样，常用目标集大小为 100）采样的目标集上评估所有模型，计算每个策略的模型排名并将它们相互比较。此外，我们改变了采样目标集的大小。全面的，我们发现，与模型的完整排名相比，两种抽样策略都会产生不一致的排名。此外，按流行度抽样和均匀随机抽样并不能始终如一地产生相同的排名……

更新日期：2021-07-29

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>