Clustering Techniques to Improve Scalability and Accuracy of Recommender Systems,International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems

当前位置： X-MOL 学术 › Int. J. Uncertain. Fuzziness Knowl. Based Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Clustering Techniques to Improve Scalability and Accuracy of Recommender Systems
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems ( IF 1.5 ) Pub Date : 2021-08-02 , DOI: 10.1142/s0218488521500276
Joydeep Das ₁ , Subhashis Majumder ₂ , Kalyani Mali ₃

Affiliation

Recommender systems have emerged as a class of essential tools in the success of modern e-commerce applications. These applications typically handle large datasets and often face challenges like data sparsity and scalability. Clustering techniques help to reduce the computational time needed for recommendation as well as handle the sparsity problem more efficiently. Traditional clustering based recommender systems create partitions (clusters) of the user-item rating matrix and execute the recommendation algorithm in the clusters separately in order to decrease the overall runtime of the system. Each user or item generally belong to at most one cluster. However, it may so happen that some users (boundary users) present in a particular cluster exhibit higher similarity with the preferences of the users residing in the nearby clusters than the ones present in their own cluster. Therefore, we propose a clustering based scalable recommendation algorithm that has a provision for switching a user from its original cluster to another cluster in order to provide more accurate recommendations. For a user belonging to multiple clusters, we aggregate recommendations from those clusters to which the user belongs in order to produce the final set of recommendations to that user. In this work, we propose two types of clustering, one on the basis of rating and the other on the basis of frequency and then compare their performances. Finally, we explore the applicability of cluster ensembles techniques in the proposed method. Our aim is to develop a recommendation framework that can scale well to handle large datasets without much affecting the recommendation quality. The outcomes of our experiments clearly demonstrate the scalability as well as efficacy of our method. It reduces the runtime of the baseline CF algorithm by a minimum of 58% and a maximum of 90% for MovieLens-10M dataset, and a minimum of 42% and a maximum of 84% for MovieLens-20M dataset. The accuracies of recommendations in terms of F1, MAP and NDCG metrics are also better than the existing clustering based recommender systems.

中文翻译：

提高推荐系统的可扩展性和准确性的聚类技术

推荐系统已经成为现代电子商务应用程序成功的一类重要工具。这些应用程序通常处理大型数据集，并且经常面临数据稀疏性和可扩展性等挑战。聚类技术有助于减少推荐所需的计算时间，并更有效地处理稀疏问题。传统的基于聚类的推荐系统创建用户-项目评分矩阵的分区（集群）并在集群中分别执行推荐算法，以减少系统的整体运行时间。每个用户或项目通常最多属于一个集群。然而，某些用户（边界用户）存在于特定集群中的某些用户（边界用户）可能与居住在附近集群中的用户的偏好表现出比他们自己集群中存在的用户更高的相似性。因此，我们提出了一种基于集群的可扩展推荐算法，该算法可以将用户从其原始集群切换到另一个集群，以提供更准确的推荐。对于属于多个集群的用户，我们聚合来自用户所属集群的推荐，以便为该用户生成最终的推荐集。在这项工作中，我们提出了两种类型的聚类，一种基于评级，另一种基于频率，然后比较它们的性能。最后，我们探讨了集群集成技术在所提出的方法中的适用性。我们的目标是开发一个可以很好地扩展以处理大型数据集而不影响推荐质量的推荐框架。我们的实验结果清楚地证明了我们方法的可扩展性和有效性。对于 MovieLens-10M 数据集，它将基线 CF 算法的运行时间减少了至少 58% 和最大 90%，对于 MovieLens-20M 数据集，它减少了最小 42% 和最大 84% 的运行时间。在 F1、MAP 和 NDCG 指标方面的推荐准确性也优于现有的基于聚类的推荐系统。我们的实验结果清楚地证明了我们方法的可扩展性和有效性。对于 MovieLens-10M 数据集，它将基线 CF 算法的运行时间减少了至少 58% 和最大 90%，对于 MovieLens-20M 数据集，它减少了最小 42% 和最大 84% 的运行时间。在 F1、MAP 和 NDCG 指标方面的推荐准确性也优于现有的基于聚类的推荐系统。我们的实验结果清楚地证明了我们方法的可扩展性和有效性。对于 MovieLens-10M 数据集，它将基线 CF 算法的运行时间减少了至少 58% 和最大 90%，对于 MovieLens-20M 数据集，它减少了最小 42% 和最大 84% 的运行时间。在 F1、MAP 和 NDCG 指标方面的推荐准确性也优于现有的基于聚类的推荐系统。

更新日期：2021-08-02

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>