Efficient Matrix Factorization on Heterogeneous CPU-GPU Systems,arXiv - CS - Databases

当前位置： X-MOL 学术 › arXiv.cs.DB › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Efficient Matrix Factorization on Heterogeneous CPU-GPU Systems
arXiv - CS - Databases Pub Date : 2020-06-24 , DOI: arxiv-2006.15980
Yuanhang Yu, Dong Wen, Ying Zhang, Xiaoyang Wang, Wenjie Zhang and Xuemin Lin

Matrix Factorization (MF) has been widely applied in machine learning and data mining. A large number of algorithms have been studied to factorize matrices. Among them, stochastic gradient descent (SGD) is a commonly used method. Heterogeneous systems with multi-core CPUs and GPUs have become more and more promising recently due to the prevalence of GPUs in general-purpose data-parallel applications. Due to the large computational cost of MF, we aim to improve the efficiency of SGD-based MF computation by utilizing the massive parallel processing power of heterogeneous multiprocessors. The main challenge in parallel SGD algorithms on heterogeneous CPU-GPU systems lies in the granularity of the matrix division and the strategy to assign tasks. We design a novel strategy to divide the matrix into a set of blocks by considering two aspects. First, we observe that the matrix should be divided nonuniformly, and relatively large blocks should be assigned to GPUs to saturate the computing power of GPUs. In addition to exploiting the characteristics of hardware, the workloads assigned to two types of hardware should be balanced. Aiming at the final division strategy, we design a cost model tailored for our problem to accurately estimate the performance of hardware on different data sizes. A dynamic scheduling policy is also used to further balance workloads in practice. Extensive experiments show that our proposed algorithm achieves high efficiency with a high quality of training quality.

中文翻译：

异构 CPU-GPU 系统上的高效矩阵分解

矩阵分解（MF）已广泛应用于机器学习和数据挖掘。已经研究了大量算法来分解矩阵。其中，随机梯度下降（SGD）是一种常用的方法。由于 GPU 在通用数据并行应用中的流行，具有多核 CPU 和 GPU 的异构系统最近变得越来越有前途。由于 MF 的计算成本很大，我们旨在通过利用异构多处理器的大规模并行处理能力来提高基于 SGD 的 MF 计算的效率。在异构 CPU-GPU 系统上并行 SGD 算法的主要挑战在于矩阵划分的粒度和分配任务的策略。我们设计了一种新颖的策略，通过考虑两个方面将矩阵划分为一组块。第一的，我们观察到矩阵应该是非均匀划分的，应该将相对较大的块分配给 GPU 以饱和 GPU 的计算能力。除了利用硬件的特性外，还应该平衡分配给两种硬件的工作负载。针对最终的划分策略，我们设计了一个为我们的问题量身定制的成本模型，以准确估计硬件在不同数据大小上的性能。动态调度策略还用于在实践中进一步平衡工作负载。大量实验表明，我们提出的算法实现了高效率和高质量的训练质量。除了利用硬件的特性外，还应该平衡分配给两种硬件的工作负载。针对最终的划分策略，我们设计了一个为我们的问题量身定制的成本模型，以准确估计硬件在不同数据大小上的性能。动态调度策略还用于在实践中进一步平衡工作负载。大量实验表明，我们提出的算法实现了高效率和高质量的训练质量。除了利用硬件的特性外，还应该平衡分配给两种硬件的工作负载。针对最终的划分策略，我们设计了一个为我们的问题量身定制的成本模型，以准确估计硬件在不同数据大小上的性能。动态调度策略还用于在实践中进一步平衡工作负载。大量实验表明，我们提出的算法实现了高效率和高质量的训练质量。

更新日期：2020-06-30

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>