Offline evaluation options for recommender systems,Information Retrieval Journal

当前位置： X-MOL 学术 › Inf. Retrieval J. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Offline evaluation options for recommender systems
Information Retrieval Journal ( IF 1.7 ) Pub Date : 2020-03-18 , DOI: 10.1007/s10791-020-09371-3
Rocío Cañamares , Pablo Castells , Alistair Moffat

We undertake a detailed examination of the steps that make up offline experiments for recommender system evaluation, including the manner in which the available ratings are filtered and split into training and test; the selection of a subset of the available users for the evaluation; the choice of strategy to handle the background effects that arise when the system is unable to provide scores for some items or users; the use of either full or condensed output lists for the purposes of scoring; scoring methods themselves, including alternative top-weighted mechanisms for condensed rankings; and the application of statistical testing on a weighted-by-user or weighted-by-volume basis as a mechanism for providing confidence in measured outcomes. We carry out experiments that illustrate the impact that each of these choice points can have on the usefulness of an end-to-end system evaluation, and provide examples of possible pitfalls. In particular, we show that varying the split between training and test data, or changing the evaluation metric, or how target items are selected, or how empty recommendations are dealt with, can give rise to comparisons that are vulnerable to misinterpretation, and may lead to different or even opposite outcomes, depending on the exact combination of settings used.

中文翻译：

推荐系统的离线评估选项

我们对构成推荐系统评估的离线实验的步骤进行了详细的检查，包括过滤可用评分并将其分为培训和测试的方式；选择可用用户的子集进行评估；当系统无法为某些项目或用户提供分数时，选择处理背景效应的策略选择；为了评分而使用完整或精简的输出清单；评分方法本身，包括用于压缩排名的替代性最高加权机制；以及在按用户加权或按体积加权的基础上进行统计测试，以此作为对所测结果提供信心的机制。我们进行的实验说明了每个选择点可能对端到端系统评估的有用性产生的影响，并提供了可能存在的陷阱的示例。尤其是，我们表明，改变训练数据与测试数据之间的差距，或者更改评估指标，或者如何选择目标项目，或者如何处理空的建议，都可能导致比较结果容易被误解，并可能导致取决于所使用设置的确切组合，得出不同甚至相反的结果。

更新日期：2020-03-18

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11