当前位置: X-MOL 学术Int. J. Artif. Intell. Tools › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Clustering Approaches for Top-k Recommender Systems
International Journal on Artificial Intelligence Tools ( IF 1.1 ) Pub Date : 2019-08-30 , DOI: 10.1142/s0218213019500192
Nicolás Torres 1 , Marcelo Mendoza 2
Affiliation  

Clustering-based recommender systems bound the seek of similar users within small user clusters providing fast recommendations in large-scale datasets. Then groups can naturally be distributed into different data partitions scaling up in the number of users the recommender system can handle. Unfortunately, while the number of users and items included in a cluster solution increases, the performance in terms of precision of a clustering-based recommender system decreases. We present a novel approach that introduces a cluster-based distance function used for neighborhood computation. In our approach, clusters generated from the training data provide the basis for neighborhood selection. Then, to expand the search of relevant users, we use a novel measure that can exploit the global cluster structure to infer cluster-outside user’s distances. Empirical studies on five widely known benchmark datasets show that our proposal is very competitive in terms of precision, recall, and NDCG. However, the strongest point of our method relies on scalability, reaching speedups of 20× in a sequential computing evaluation framework and up to 100× in a parallel architecture. These results show that an efficient implementation of our cluster-based CF method can handle very large datasets providing also good results in terms of precision, avoiding the high computational costs involved in the application of more sophisticated techniques.

中文翻译:

Top-k 推荐系统的聚类方法

基于聚类的推荐系统将相似用户的搜索限制在小型用户集群中,从而在大规模数据集中提供快速推荐。然后,组可以自然地分布到不同的数据分区中,从而增加推荐系统可以处理的用户数量。不幸的是,虽然集群解决方案中包含的用户和项目的数量增加了,但基于集群的推荐系统在精度方面的性能却下降了。我们提出了一种新颖的方法,该方法引入了用于邻域计算的基于集群的距离函数。在我们的方法中,从训练数据生成的聚类为邻域选择提供了基础。然后,为了扩展相关用户的搜索,我们使用了一种新的方法,可以利用全局集群结构来推断集群外用户的距离。对五个广为人知的基准数据集的实证研究表明,我们的提议在精度、召回率和 NDCG 方面非常具有竞争力。然而,我们方法的最强点依赖于可扩展性,在顺序计算评估框架中达到 20 倍的加速,在并行架构中达到 100 倍。这些结果表明,我们基于集群的 CF 方法的有效实现可以处理非常大的数据集,在精度方面也提供了良好的结果,避免了应用更复杂技术所涉及的高计算成本。在顺序计算评估框架中达到 20 倍的加速,在并行架构中达到 100 倍。这些结果表明,我们基于集群的 CF 方法的有效实现可以处理非常大的数据集,在精度方面也提供了良好的结果,避免了应用更复杂技术所涉及的高计算成本。在顺序计算评估框架中达到 20 倍的加速,在并行架构中达到 100 倍。这些结果表明,我们基于集群的 CF 方法的有效实现可以处理非常大的数据集,在精度方面也提供了良好的结果,避免了应用更复杂技术所涉及的高计算成本。
更新日期:2019-08-30
down
wechat
bug