当前位置: X-MOL 学术ACM Trans. Inf. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Popularity Bias in False-positive Metrics for Recommender Systems Evaluation
ACM Transactions on Information Systems ( IF 5.4 ) Pub Date : 2021-05-25 , DOI: 10.1145/3452740
Elisa Mena-Maldonado 1 , Rocío Cañamares 2 , Pablo Castells 2 , Yongli Ren 1 , Mark Sanderson 1
Affiliation  

We investigate the impact of popularity bias in false-positive metrics in the offline evaluation of recommender systems. Unlike their true-positive complements, false-positive metrics reward systems that minimize recommendations disliked by users. Our analysis is, to the best of our knowledge, the first to show that false-positive metrics tend to penalise popular items, the opposite behavior of true-positive metrics—causing a disagreement trend between both types of metrics in the presence of popularity biases. We present a theoretical analysis of the metrics that identifies the reason that the metrics disagree and determines rare situations where the metrics might agree—the key to the situation lies in the relationship between popularity and relevance distributions, in terms of their agreement and steepness —two fundamental concepts we formalize. We then examine three well-known datasets using multiple popular true- and false-positive metrics on 16 recommendation algorithms. Specific datasets are chosen to allow us to estimate both biased and unbiased metric values. The results of the empirical study confirm and illustrate our analytical findings. With the conditions of the disagreement of the two types of metrics established, we then determine under which circumstances true-positive or false-positive metrics should be used by researchers of offline evaluation in recommender systems. 1

中文翻译:

推荐系统评估的假阳性指标中的流行度偏差

我们研究了推荐系统离线评估中假阳性指标中流行度偏差的影响。与他们的真阳性补充不同,假阳性指标奖励系统,最大限度地减少用户不喜欢的推荐。据我们所知,我们的分析是第一个表明假阳性指标倾向于惩罚流行项目,真阳性指标的相反行为 - 在存在流行度偏差的情况下导致两种类型的指标之间出现分歧趋势. 我们对指标进行了理论分析,确定了指标不一致的原因,并确定了指标可能一致的罕见情况——这种情况的关键在于流行度和相关性分布之间的关系,就它们而言协议陡度——我们形式化的两个基本概念。然后,我们使用 16 种推荐算法的多个流行的真假阳性指标检查三个著名的数据集。选择特定的数据集以允许我们估计有偏和无偏的度量值。实证研究的结果证实并说明了我们的分析结果。在确定了两种类型的指标不一致的条件下,我们然后确定在哪些情况下,推荐系统中的离线评估研究人员应该使用真阳性或假阳性指标。1
更新日期:2021-05-25
down
wechat
bug