PyTorch-Direct: Enabling GPU Centric Data Access for Very Large Graph Neural Network Training with Irregular Accesses,arXiv - CS - Performance

当前位置： X-MOL 学术 › arXiv.cs.PF › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

PyTorch-Direct: Enabling GPU Centric Data Access for Very Large Graph Neural Network Training with Irregular Accesses
arXiv - CS - Performance Pub Date : 2021-01-20 , DOI: arxiv-2101.07956
Seung Won Min, Kun Wu, Sitao Huang, Mert Hidayetoğlu, Jinjun Xiong, Eiman Ebrahimi, Deming Chen, Wen-mei Hwu

With the increasing adoption of graph neural networks (GNNs) in the machine learning community, GPUs have become an essential tool to accelerate GNN training. However, training GNNs on very large graphs that do not fit in GPU memory is still a challenging task. Unlike conventional neural networks, mini-batching input samples in GNNs requires complicated tasks such as traversing neighboring nodes and gathering their feature values. While this process accounts for a significant portion of the training time, we find existing GNN implementations using popular deep neural network (DNN) libraries such as PyTorch are limited to a CPU-centric approach for the entire data preparation step. This "all-in-CPU" approach has negative impact on the overall GNN training performance as it over-utilizes CPU resources and hinders GPU acceleration of GNN training. To overcome such limitations, we introduce PyTorch-Direct, which enables a GPU-centric data accessing paradigm for GNN training. In PyTorch-Direct, GPUs are capable of efficiently accessing complicated data structures in host memory directly without CPU intervention. Our microbenchmark and end-to-end GNN training results show that PyTorch-Direct reduces data transfer time by 47.1% on average and speeds up GNN training by up to 1.6x. Furthermore, by reducing CPU utilization, PyTorch-Direct also saves system power by 12.4% to 17.5% during training. To minimize programmer effort, we introduce a new "unified tensor" type along with necessary changes to the PyTorch memory allocator, dispatch logic, and placement rules. As a result, users need to change at most two lines of their PyTorch GNN training code for each tensor object to take advantage of PyTorch-Direct.

中文翻译：

PyTorch-Direct：通过不规则访问为大型图形神经网络训练启用GPU中心数据访问

随着机器学习社区中图神经网络（GNN）的日益普及，GPU已成为加速GNN训练的重要工具。但是，在不适合GPU内存的非常大的图形上训练GNN仍然是一项艰巨的任务。与常规神经网络不同，GNN中的小批量输入样本需要复杂的任务，例如遍历相邻节点并收集其特征值。虽然此过程占用了培训时间的很大一部分，但我们发现使用流行的深度神经网络（DNN）库（例如PyTorch）的现有GNN实现在整个数据准备步骤中都局限于以CPU为中心的方法。这个“全CPU” 该方法过度使用CPU资源并阻碍GNN训练的GPU加速，因此会对整体GNN训练性能产生负面影响。为了克服这些限制，我们引入了PyTorch-Direct，它支持以GPU为中心的数据访问范例进行GNN训练。在PyTorch-Direct中，GPU能够直接访问主机内存中的复杂数据结构，而无需CPU干预。我们的微基准测试和端到端GNN训练结果表明，PyTorch-Direct平均将数据传输时间减少了47.1％，并将GNN训练速度提高了1.6倍。此外，通过降低CPU利用率，PyTorch-Direct还可以在训练过程中将系统功耗节省12.4％至17.5％。为了最大程度地减少程序员的工作量，我们引入了新的“统一张量”类型，并对PyTorch内存分配器进行了必要的更改，分派逻辑和放置规则。结果，用户需要为每个张量对象最多更改其PyTorch GNN训练代码的两行，才能利用PyTorch-Direct。

更新日期：2021-01-21

点击分享查看原文

点击收藏

阅读更多本刊最新论文