当前位置: X-MOL 学术IEEE Trans. Parallel Distrib. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs
IEEE Transactions on Parallel and Distributed Systems ( IF 5.6 ) Pub Date : 2020-04-08 , DOI: 10.1109/tpds.2020.2979702
Kaiwei Li , Jianfei Chen , Wenguang Chen , Jun Zhu

Latent Dirichlet Allocation (LDA) is a popular tool for analyzing discrete count data such as text and images, which are required to model datasets and a large number of topics, e.g., tens of thousands of topics for industry scale applications. Although distributed CPU systems have been used to address this problem, they are slow and resource inefficient. GPU-based systems have emerged as a promising alternative because of their high computational power and memory bandwidth. However, existing GPU-based LDA systems can only learn thousands of topics, because they use dense data structures, and have linear time complexity to the number of topics. In this article, we propose SaberLDA, a GPU-based LDA system that implements a sparsity-aware algorithm to achieve sublinear time complexity to learn a large number of topics. To address the challenges introduced by sparsity, we propose a novel data layout, a warp-based sampling kernel, an efficient sparse matrix counting method, and a fine-grained load balancing strategy. SaberLDA achieves linear speedup on 4 GPUs and is 6-10 times faster than existing GPU systems in thousands of topics. It can learn 40,000 topics from a dataset of billions of tokens in two hours, which was previously only achievable using clusters of tens of CPU servers.

中文翻译:


SaberLDA:GPU 上主题模型的稀疏感知学习



潜在狄利克雷分配(LDA)是一种用于分析离散计数数据(例如文本和图像)的流行工具,需要这些数据来对数据集和大量主题(例如工业规模应用的数万个主题)进行建模。尽管分布式CPU系统已被用来解决这个问题,但它们速度慢且资源效率低。基于 GPU 的系统因其高计算能力和内存带宽而成为一种有前途的替代方案。然而,现有的基于 GPU 的 LDA 系统只能学习数千个主题,因为它们使用密集的数据结构,并且时间复杂度与主题数量呈线性关系。在本文中,我们提出了 SaberLDA,这是一种基于 GPU 的 LDA 系统,它实现了稀疏感知算法,以实现亚线性时间复杂度来学习大量主题。为了解决稀疏性带来的挑战,我们提出了一种新颖的数据布局、基于扭曲的采样内核、高效的稀疏矩阵计数方法和细粒度的负载平衡策略。 SaberLDA 在 4 个 GPU 上实现线性加速,在数千个主题中比现有 GPU 系统快 6-10 倍。它可以在两个小时内从数十亿个代币的数据集中学习 40,000 个主题,这以前只能使用数十个 CPU 服务器的集群来实现。
更新日期:2020-04-08
down
wechat
bug