Statistical biases in Information Retrieval metrics for recommender systems,Information Retrieval Journal

当前位置： X-MOL 学术 › Inf. Retrieval J. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Statistical biases in Information Retrieval metrics for recommender systems
Information Retrieval Journal ( IF 2.5 ) Pub Date : 2017-07-27 , DOI: 10.1007/s10791-017-9312-z
Alejandro Bellogín , Pablo Castells , Iván Cantador

There is an increasing consensus in the Recommender Systems community that the dominant error-based evaluation metrics are insufficient, and mostly inadequate, to properly assess the practical effectiveness of recommendations. Seeking to evaluate recommendation rankings—which largely determine the effective accuracy in matching user needs—rather than predicted rating values, Information Retrieval metrics have started to be applied for the evaluation of recommender systems. In this paper we analyse the main issues and potential divergences in the application of Information Retrieval methodologies to recommender system evaluation, and provide a systematic characterisation of experimental design alternatives for this adaptation. We lay out an experimental configuration framework upon which we identify and analyse specific statistical biases arising in the adaptation of Information Retrieval metrics to recommendation tasks, namely sparsity and popularity biases. These biases considerably distort the empirical measurements, hindering the interpretation and comparison of results across experiments. We develop a formal characterisation and analysis of the biases upon which we analyse their causes and main factors, as well as their impact on evaluation metrics under different experimental configurations, illustrating the theoretical findings with empirical evidence. We propose two experimental design approaches that effectively neutralise such biases to a large extent. We report experiments validating our proposed experimental variants, and comparing them to alternative approaches and metrics that have been defined in the literature with similar or related purposes.

中文翻译：

推荐系统的信息检索指标中的统计偏差

推荐系统社区中越来越多的共识是，基于错误的主要评估指标不足以且不足以正确评估建议的实际有效性。为了评估推荐等级（很大程度上决定了满足用户需求的有效准确性），而不是预测的等级值，信息检索指标已开始用于推荐系统的评估。在本文中，我们分析了信息检索方法在推荐系统评估中的主要问题和潜在的分歧，并为适应性实验提供了替代设计的系统表征。我们提出了一个实验性的配置框架，在此框架上，我们可以识别和分析在信息检索指标适应推荐任务时出现的特定统计偏差，即稀疏性和受欢迎度偏差。这些偏差极大地扭曲了经验测量，从而阻碍了整个实验结果的解释和比较。我们对偏差进行了正式的表征和分析，以此来分析其成因和主要因素，以及它们在不同实验配置下对评估指标的影响，以经验证据说明了理论发现。我们提出了两种实验设计方法，可以在很大程度上有效抵消这种偏差。我们报告了验证我们建议的实验变体的实验，

更新日期：2017-07-27

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>