当前位置: X-MOL 学术arXiv.cs.AR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Understanding Training Efficiency of Deep Learning Recommendation Models at Scale
arXiv - CS - Hardware Architecture Pub Date : 2020-11-11 , DOI: arxiv-2011.05497
Bilge Acun, Matthew Murphy, Xiaodong Wang, Jade Nie, Carole-Jean Wu, Kim Hazelwood

The use of GPUs has proliferated for machine learning workflows and is now considered mainstream for many deep learning models. Meanwhile, when training state-of-the-art personal recommendation models, which consume the highest number of compute cycles at our large-scale datacenters, the use of GPUs came with various challenges due to having both compute-intensive and memory-intensive components. GPU performance and efficiency of these recommendation models are largely affected by model architecture configurations such as dense and sparse features, MLP dimensions. Furthermore, these models often contain large embedding tables that do not fit into limited GPU memory. The goal of this paper is to explain the intricacies of using GPUs for training recommendation models, factors affecting hardware efficiency at scale, and learnings from a new scale-up GPU server design, Zion.

中文翻译:

大规模理解深度学习推荐模型的训练效率

GPU 在机器学习工作流程中的使用激增,现在被认为是许多深度学习模型的主流。同时,在训练最先进的个人推荐模型时,在我们的大型数据中心消耗最多的计算周期时,由于同时具有计算密集型和内存密集型组件,GPU 的使用带来了各种挑战. 这些推荐模型的 GPU 性能和效率在很大程度上受模型架构配置的影响,例如密集和稀疏特征、MLP 维度。此外,这些模型通常包含不适合有限 GPU 内存的大型嵌入表。本文的目的是解释使用 GPU 训练推荐模型的复杂性,大规模影响硬件效率的因素,
更新日期:2020-11-12
down
wechat
bug