当前位置: X-MOL 学术arXiv.cs.PF › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Hierarchical Roofline Performance Analysis for Deep Learning Applications
arXiv - CS - Performance Pub Date : 2020-09-11 , DOI: arxiv-2009.05257
Yunsong Wang, Charlene Yang, Steven Farrell, Thorsten Kurth, Samuel Williams

This paper presents a practical methodology for collecting performance data necessary to conduct hierarchical Roofline analysis on NVIDIA GPUs. It discusses the extension of the Empirical Roofline Toolkit for broader support of a range of data precisions and Tensor Core support and introduces a Nsight Compute based method to accurately collect application performance information. This methodology allows for automated machine characterization and application characterization for Roofline analysis across the entire memory hierarchy on NVIDIA GPUs, and it is validated by a complex deep learning application used for climate image segmentation. We use two versions of the code, in TensorFlow and PyTorch respectively, to demonstrate the use and effectiveness of this methodology. We highlight how the application utilizes the compute and memory capabilities on the GPU and how the implementation and performance differ in two deep learning frameworks.

中文翻译:

深度学习应用的分层屋顶线性能分析

本文提出了一种实用的方法,用于收集在 NVIDIA GPU 上进行分层 Roofline 分析所需的性能数据。它讨论了 Empirical Roofline Toolkit 的扩展,以更广泛地支持一系列数据精度和 Tensor Core 支持,并介绍了一种基于 Nsight Compute 的方法来准确收集应用程序性能信息。这种方法允许在 NVIDIA GPU 上的整个内存层次结构中对 Roofline 分析进行自动化机器表征和应用程序表征,并通过用于气候图像分割的复杂深度学习应用程序进行验证。我们分别在 TensorFlow 和 PyTorch 中使用两个版本的代码来演示这种方法的使用和有效性。
更新日期:2020-09-24
down
wechat
bug