当前位置: X-MOL 学术IEEE Trans. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
BaPa: A Novel Approach of Improving Load Balance in Parallel Matrix Factorization for Recommender Systems
IEEE Transactions on Computers ( IF 3.6 ) Pub Date : 2020-05-25 , DOI: 10.1109/tc.2020.2997051
Ruixin Guo , Feng Zhang , Lizhe Wang , Wusheng Zhang , Xinya Lei , Rajiv Ranjan , Albert Y. Zomaya

A simplified approach to accelerate matrix factorization of big data is to parallelize it. A commonly used method is to divide the matrix into multiple non-intersecting blocks and concurrently calculate them. This operation causes the Load balance problem, which significantly impacts parallel performance and is a big concern. A general belief is that the load balance across blocks is impossible by balancing rows and columns separately. We challenge the belief by proposing an approach of “Balanced Partitioning (BaPa)”. We demonstrate under what circumstance independently balancing rows and columns can lead to the balanced intersection of rows and columns, why, and how. We formally prove the feasibility of BaPa by observing the variance of rating numbers across blocks, and empirically validate its soundness by applying it to two standard parallel matrix factorization algorithms, DSGD and CCD++. Besides, we establish a mathematical model of “Imbalance Degree” to explain further why BaPa works well. BaPa is applied to synchronous parallel matrix factorization, but as a general load balance solution, it has significant application potential.

中文翻译:

BaPa:一种用于推荐系统的并行矩阵分解中改善负载平衡的新方法

加速大数据矩阵分解的一种简化方法是将其并行化。一种常用的方法是将矩阵分成多个不相交的块并同时计算它们。此操作会导致负载平衡问题,这将严重影响并行性能,并且是一个很大的问题。人们普遍认为,通过分别平衡行和列来实现跨块的负载平衡是不可能的。我们通过提出一种“平衡分区(BaPa)”的方法来挑战这一信念。我们演示了在什么情况下独立地平衡行和列可以导致行和列的平衡交点,原因以及方式。我们通过观察跨块的等级数的方差来正式证明BaPa的可行性,并通过将其应用于两种标准的并行矩阵分解算法DSGD和CCD ++,从经验上验证其稳健性。此外,我们建立了“不平衡度”的数学模型,以进一步说明BaPa为何运作良好。BaPa用于同步并行矩阵分解,但作为一般的负载平衡解决方案,它具有巨大的应用潜力。
更新日期:2020-05-25
down
wechat
bug