Improvement of recommendation algorithm based on Collaborative Deep Learning and its Parallelization on Spark,Journal of Parallel and Distributed Computing

当前位置： X-MOL 学术 › J. Parallel Distrib. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Improvement of recommendation algorithm based on Collaborative Deep Learning and its Parallelization on Spark
Journal of Parallel and Distributed Computing ( IF 3.8 ) Pub Date : 2020-09-28 , DOI: 10.1016/j.jpdc.2020.09.014
Fan Yang , Huaqiong Wang , Jianjing Fu

Collaborative Deep Learning (CDL) utilizes the strong feature learning capability of neural network and the model fitting robustness to solve the problem that the performance of Recommender System drops dramatically when the data is sparse. However, it makes the model training become difficult to maintain when Recommender System faces a large amount of data, and a variety of unpredictable problems will arise. In order to solve the above problems, collaborative deep learning and its parallelization methods were studied in this study, and an improved model CDL-I (CDL with item private node) aiming at item content optimization based on collaborative deep learning was proposed, which improved SDAE on the basis of CDL, added private network nodes; in case of sharing the network parameters of the model, private bias terms were added for each item. As a result, the network may learn the item content parameters in a more targeted manner, thereby enhancing the detection performance of the model on item content in Recommender System. Furthermore, the algorithm was parallelized by splitting the model, and a parallel training CDL-I method was also proposed, which was transplanted to the Spark distributed cluster. The parameters of each part of the model were trained and optimized in parallel to enhance the scale and scalability of data that the model could process. The experiments on multiple real datasets have verified the effectiveness and efficiency of the proposed parallel CDL-I algorithm.

中文翻译：

基于协作深度学习的推荐算法的改进及其在Spark上的并行化

协作深度学习（CDL）利用神经网络的强大特征学习能力和模型拟合鲁棒性来解决数据稀疏时推荐系统的性能急剧下降的问题。但是，当推荐系统面对大量数据时，这会使模型训练变得难以维护，并且会出现各种无法预料的问题。为了解决上述问题，本研究研究了协同深度学习及其并行化方法，并提出了一种针对基于协同深度学习的项目内容优化的改进模型CDL-I（带有项目专用节点的CDL）。 SDAE在CDL的基础上，增加了专用网络节点；如果共享模型的网络参数，则会为每个项目添加专用偏差项。结果，网络可以以更有针对性的方式学习物品内容参数，从而增强了推荐系统中物品内容模型的检测性能。此外，通过分割模型对算法进行并行化处理，并提出了并行训练CDL-I方法，并将其移植到Spark分布式集群中。并行训练和优化模型各部分的参数，以增强模型可以处理的数据的规模和可伸缩性。在多个真实数据集上的实验已经验证了所提出的并行CDL-1算法的有效性和效率。通过分割模型对算法进行并行化处理，并提出了并行训练CDL-I方法，并移植到Spark分布式集群中。并行训练和优化模型各部分的参数，以增强模型可以处理的数据的规模和可伸缩性。在多个真实数据集上的实验已经验证了所提出的并行CDL-1算法的有效性和效率。通过分割模型对算法进行并行化处理，并提出了并行训练CDL-I方法，并移植到Spark分布式集群中。并行训练和优化模型各部分的参数，以增强模型可以处理的数据的规模和可伸缩性。在多个真实数据集上的实验已经验证了所提出的并行CDL-1算法的有效性和效率。

更新日期：2020-11-03

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>