Computational Performance Predictions for Deep Neural Network Training: A Runtime-Based Approach,arXiv - CS - Performance

当前位置： X-MOL 学术 › arXiv.cs.PF › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Computational Performance Predictions for Deep Neural Network Training: A Runtime-Based Approach
arXiv - CS - Performance Pub Date : 2021-01-31 , DOI: arxiv-2102.00527
Geoffrey X. Yu, Yubo Gao, Pavel Golikov, Gennady Pekhimenko

Deep learning researchers and practitioners usually leverage GPUs to help train their deep neural networks (DNNs) faster. However, choosing which GPU to use is challenging both because (i) there are many options, and (ii) users grapple with competing concerns: maximizing compute performance while minimizing costs. In this work, we present a new practical technique to help users make informed and cost-efficient GPU selections: make performance predictions using the help of a GPU that the user already has. Our technique exploits the observation that, because DNN training consists of repetitive compute steps, predicting the execution time of a single iteration is usually enough to characterize the performance of an entire training process. We make predictions by scaling the execution time of each operation in a training iteration from one GPU to another using either (i) wave scaling, a technique based on a GPU's execution model, or (ii) pre-trained multilayer perceptrons. We implement our technique into a Python library called Surfer and find that it makes accurate iteration execution time predictions on ResNet-50, Inception v3, the Transformer, GNMT, and DCGAN across six different GPU architectures. Surfer currently supports PyTorch, is easy to use, and requires only a few lines of code.

中文翻译：

深度神经网络训练的计算性能预测：基于运行时的方法

深度学习研究人员和从业人员通常利用GPU来帮助更快地训练其深度神经网络（DNN）。但是，选择使用哪种GPU面临挑战，这是因为（i）有很多选择，以及（ii）用户要解决竞争问题：最大化计算性能，同时将成本最小化。在这项工作中，我们提出了一种新的实用技术来帮助用户进行明智且具有成本效益的GPU选择：利用用户已经拥有的GPU进行性能预测。我们的技术利用了这样的观察：由于DNN训练包含重复的计算步骤，因此预测一次迭代的执行时间通常足以表征整个训练过程的性能。我们通过使用（i）波形缩放（一种基于GPU执行模型的技术）或（ii）预训练的多层感知器来缩放从一个GPU到另一个GPU的训练迭代中每个操作的执行时间来进行预测。我们将我们的技术实施到名为Surfer的Python库中，发现它可以在ResNet-50，Inception v3，Transformer，GNMT和DCGAN上跨六个不同的GPU架构做出准确的迭代执行时间预测。Surfer当前支持PyTorch，易于使用，并且只需要几行代码。跨六种不同GPU架构的Transformer，GNMT和DCGAN。Surfer当前支持PyTorch，易于使用，并且只需要几行代码。跨六种不同GPU架构的Transformer，GNMT和DCGAN。Surfer当前支持PyTorch，易于使用，并且只需要几行代码。

更新日期：2021-02-02

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>