A Framework for Evaluating Personalized Ranking Systems by Fusing Different Evaluation Measures,Big Data Research

当前位置： X-MOL 学术 › Big Data Res. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Framework for Evaluating Personalized Ranking Systems by Fusing Different Evaluation Measures
Big Data Research ( IF 3.5 ) Pub Date : 2021-02-11 , DOI: 10.1016/j.bdr.2021.100211
Tome Eftimov , Bibek Paudel , Gorjan Popovski , Dragi Kocev

Personalized ranking systems — also known as recommender systems — use different big data methods, including collaborative filtering, graph random-walks, matrix factorization, and latent-factor models. With their wide use in various social-network, e-commerce, and content platforms, online platforms and developers are in need of better ways to choose the systems that are most suitable for their use-cases. At the same time, the research literature on recommender systems describes a multitude of performance measures to evaluate the performance of different algorithms. For the end-user however, the large number of available measures do not provide much help in deciding which algorithm to deploy. Some of the measures are correlated, while others deal with different aspects of recommendation performance like accuracy and diversity. To address this problem, we propose a novel benchmarking framework that mixes different evaluation measures in order to rank the recommender systems on each benchmark dataset, separately. Additionally, our approach discovers sets of correlated measures as well as sets of evaluation measures that are least correlated. We investigate the robustness of the proposed methodology using published results from an experimental study involving multiple big datasets and evaluation measures. Our work provides a general framework that can handle an arbitrary number of evaluation measures and help end-users rank the systems available to them.

中文翻译：

融合不同评估手段的个性化排名系统评估框架

个性化排名系统（也称为推荐系统）使用不同的大数据方法，包括协作过滤，图随机游走，矩阵分解和潜在因子模型。随着它们在各种社交网络，电子商务和内容平台中的广泛使用，在线平台和开发人员需要更好的方法来选择最适合其用例的系统。同时，有关推荐系统的研究文献描述了多种性能指标，用于评估不同算法的性能。但是，对于最终用户而言，大量可用的措施在决定采用哪种算法方面没有提供太多帮助。其中一些度量是相关的，而其他度量则涉及推荐性能的不同方面，例如准确性和多样性。为了解决这个问题，我们提出了一个新颖的基准框架，该框架混合了不同的评估措施，以便分别在每个基准数据集上对推荐系统进行排名。此外，我们的方法还发现了一组相关度量以及相关性最低的评估度量。我们使用涉及多个大数据集和评估措施的实验研究的公开结果来研究所提出方法的鲁棒性。我们的工作提供了一个通用框架，可以处理任意数量的评估措施，并帮助最终用户对他们可用的系统进行排名。我们的方法发现了一组相关的度量值以及最不相关的评估度量值。我们使用涉及多个大数据集和评估措施的实验研究的公开结果来研究所提出方法的鲁棒性。我们的工作提供了一个通用框架，可以处理任意数量的评估措施，并帮助最终用户对他们可用的系统进行排名。我们的方法发现了一组相关的度量值以及最不相关的评估度量值。我们使用涉及多个大数据集和评估措施的实验研究的公开结果来研究所提出方法的鲁棒性。我们的工作提供了一个通用框架，可以处理任意数量的评估措施，并帮助最终用户对他们可用的系统进行排名。

更新日期：2021-02-17

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文