Time-Based Roofline for Deep Learning Performance Analysis

Wang, Yunsong; Yang, Charlene; Farrell, Steven; Zhang, Yan; Kurth, Thorsten; Williams, Samuel

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2009.04598 (cs)

[Submitted on 9 Sep 2020 (v1), last revised 22 Sep 2020 (this version, v3)]

Title:Time-Based Roofline for Deep Learning Performance Analysis

Authors:Yunsong Wang, Charlene Yang, Steven Farrell, Yan Zhang, Thorsten Kurth, Samuel Williams

View PDF

Abstract:Deep learning applications are usually very compute-intensive and require a long run time for training and inference. This has been tackled by researchers from both hardware and software sides, and in this paper, we propose a Roofline-based approach to performance analysis to facilitate the optimization of these applications. This approach is an extension of the Roofline model widely used in traditional high-performance computing applications, and it incorporates both compute/bandwidth complexity and run time in its formulae to provide insights into deep learning-specific characteristics. We take two sets of representative kernels, 2D convolution and long short-term memory, to validate and demonstrate the use of this new approach, and investigate how arithmetic intensity, cache locality, auto-tuning, kernel launch overhead, and Tensor Core usage can affect performance. Compared to the common ad-hoc approach, this study helps form a more systematic way to analyze code performance and identify optimization opportunities for deep learning applications.

Comments:	9 pages
Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC); Hardware Architecture (cs.AR); Machine Learning (cs.LG); Performance (cs.PF)
Cite as:	arXiv:2009.04598 [cs.DC]
	(or arXiv:2009.04598v3 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2009.04598

Submission history

From: Charlene Yang [view email]
[v1] Wed, 9 Sep 2020 23:29:04 UTC (644 KB)
[v2] Wed, 16 Sep 2020 07:11:36 UTC (646 KB)
[v3] Tue, 22 Sep 2020 21:51:45 UTC (646 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Time-Based Roofline for Deep Learning Performance Analysis

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Time-Based Roofline for Deep Learning Performance Analysis

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators