IterML: Iterative Machine Learning for Intelligent Parameter Pruning and Tuning in Graphics Processing Units

Cui, Xuewen; Feng, Wu-chun

doi:10.1007/s11265-020-01604-4

IterML: Iterative Machine Learning for Intelligent Parameter Pruning and Tuning in Graphics Processing Units

Published: 06 November 2020

Volume 93, pages 391–403, (2021)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

310 Accesses
1 Citation
Explore all metrics

Abstract

With the rise of graphics processing units (GPUs), the parallel computing community needs better tools to productively extract performance from the GPU. While modern compilers provide flags to activate different optimizations to improve performance, the effectiveness of such automated optimization has been limited at best. As a consequence, extracting the best performance from an algorithm on a GPU requires significant expertise and manual effort to exploit both spatial and temporal sharing of computing resources. In particular, maximizing the performance of an algorithm on a GPU requires extensive hyperparameter (e.g., thread-block size) selection and tuning. Given the myriad of hyperparameter dimensions to optimize across, the search space of optimizations is extremely large, making it infeasible to exhaustively evaluate. This paper proposes an approach that uses statistical analysis with iterative machine learning (IterML) to prune and tune hyperparameters to achieve better performance. During each iteration, we leverage machine-learning models to guide the pruning and tuning for subsequent iterations. We evaluate our IterML approach on the GPU thread-block size across many benchmarks running on an NVIDIA P100 or V100 GPU. Our experimental results show that our automated IterML approach reduces search effort by 40% to 80% when compared to traditional (non-iterative) ML and that the performance of our (unmodified) GPU applications can improve significantly — between 67% and 95% — simply by changing the thread-block size.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Auto-tuning an OpenACC Accelerated Version of Nek5000

Parameter based tuning model for optimizing performance on GPU

Article 01 July 2017

Hyperparameter autotuning of programs with HybridTuner

Article 18 May 2022

Notes

A well-known computational fluid dynamics (CFD) problem for viscous incompressible fluid flow.

References

Altman, N. S. (1992). An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, 46(3), 175–185.
MathSciNet Google Scholar
Breiman, L. (2017). Classification and regression trees. Routledge.
Choi, J. W., Singh, A., & Vuduc, R. W. (2010). Model-driven autotuning of sparse matrix-vector multiply on gpus. In ACM Sigplan notices, (Vol. 45 pp. 115–126): ACM.
Cui, X., Scogland, T. R., de Supinski, B. R., & Feng, W. C. (2017). Directive-based partitioning and pipelining for graphics processing units. In Parallel and distributed processing symposium (IPDPS), 2017 IEEE international (pp. 575–584): IEEE.
Dongarra, J. J., Meuer, H. W., & Strohmaier, E. (1994). Top500 Supercomputer sites.
Drucker, H., Burges, C. J., Kaufman, L., Smola, A. J., & Vapnik, V. (1997). Support vector regression machines. In Advances in neural information processing systems (pp. 155–161).
Hong, S., & Kim, H. (2009). An analytical model for a gpu architecture with memory-level and thread-level parallelism awareness. In ACM SIGARCH Computer architecture news, (Vol. 37 pp. 152–163): ACM.
Hou, K., Feng, W. C., & Che, S. (2017). Auto-tuning strategies for parallelizing sparse matrix-vector (spmv) multiplication on multi-and many-core processors. In Parallel and distributed processing symposium workshops (IPDPSW), 2017 IEEE international (pp. 713–722): IEEE.
Hou, K., Wang, H., & Feng, W. C. (2017). Gpu-unicache: Automatic code generation of spatial blocking for stencils on gpus. In Proceedings of the computing frontiers conference (pp. 107–116): ACM.
Hou, K., Wang, H., Feng, W. C., Vetter, J. S., & Lee, S. (2018). Highly efficient compensation-based parallelism for wavefront loops on gpus. In 2018 IEEE International parallel and distributed processing symposium (IPDPS) (pp. 276–285): IEEE.
Johnson, N. (2013). Epcc openacc benchmark suite.
Joseph, P., Vaswani, K., & Thazhuthaveetil, M. J. (2006). Construction and use of linear regression models for processor performance analysis. In The twelfth international symposium on high-performance computer architecture, 2006 (pp. 99–108): IEEE.
Lee, R., Wang, H., & Zhang, X. (2018). Software-defined software: a perspective of machine learning-based software production. In 2018 IEEE 38th international conference on distributed computing systems (ICDCS) (pp. 1270–1275): IEEE.
Li, W., Jin, G., Cui, X., & See, S. (2015). An evaluation of unified memory technology on nvidia gpus. In 2015 15th IEEE/ACM international symposium on cluster, cloud and grid computing (pp. 1092–1098): IEEE.
Li, Y., Chang, K., Bel, O., Miller, E. L., & Long, D. D. (2017). Capes: unsupervised storage performance tuning using neural network-based deep reinforcement learning. In Proceedings of the international conference for high performance computing, networking, storage and analysis (p. 42): ACM.
Li, Y., Dongarra, J., & Tomov, S. (2009). A note on auto-tuning gemm for gpus. In International conference on computational science (pp. 884–892): Springer.
Liaw, A., Wiener, M., & et al. (2002). Classification and regression by randomforest. R News, 2(3), 18–22.
Google Scholar
Mittal, S., & Vetter, J. S. (2015). A survey of methods for analyzing and improving gpu energy efficiency. ACM Computing Surveys (CSUR), 47(2), 19.
Article Google Scholar
Pal, S. K., & Mitra, S. (1992). Multilayer perceptron, fuzzy sets, classification.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., & et al. (2011). Scikit-learn: machine learning in python. Journal of Machine Learning Research, 12, 2825–2830.
MathSciNet MATH Google Scholar
Pouchet, L.N. (2012). Polybench: the polyhedral benchmark suite. URL: http://www.cs.ucla.edu/pouchet/software/polybench.
Ryoo, S., Rodrigues, C. I., Stone, S. S., Baghsorkhi, S. S., Ueng, S. Z., Stratton, J. A., & Hwu, W. M. W. (2008). Program optimization space pruning for a multithreaded gpu. In Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization (pp. 195–204): ACM.
Tran, N. P., Lee, M., & Choi, J. (2017). Parameter based tuning model for optimizing performance on gpu. Cluster Computing, 20(3), 2133–2142.
Article Google Scholar

Download references

Acknowledgments

This work was supported in part by the Air Force Office of Scientific Research (AFOSR) Computational Mathematics Program via Grant No. AFOSR Grant FA9550-17-1-0205 as well as Virginia Tech’s Advanced Research Computing (ARC) via access to their high-performance computing resources with graphics processing units (GPUs).

Author information

Authors and Affiliations

Virginia Tech, Blacksburg, VA, USA
Xuewen Cui & Wu-chun Feng

Authors

Xuewen Cui
View author publications
You can also search for this author in PubMed Google Scholar
Wu-chun Feng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xuewen Cui.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cui, X., Feng, Wc. IterML: Iterative Machine Learning for Intelligent Parameter Pruning and Tuning in Graphics Processing Units. J Sign Process Syst 93, 391–403 (2021). https://doi.org/10.1007/s11265-020-01604-4

Download citation

Received: 05 November 2019
Revised: 31 July 2020
Accepted: 05 October 2020
Published: 06 November 2020
Issue Date: April 2021
DOI: https://doi.org/10.1007/s11265-020-01604-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

IterML: Iterative Machine Learning for Intelligent Parameter Pruning and Tuning in Graphics Processing Units

Abstract

Access this article

Similar content being viewed by others

Auto-tuning an OpenACC Accelerated Version of Nek5000

Parameter based tuning model for optimizing performance on GPU

Hyperparameter autotuning of programs with HybridTuner

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

IterML: Iterative Machine Learning for Intelligent Parameter Pruning and Tuning in Graphics Processing Units

Abstract

Access this article

Similar content being viewed by others

Auto-tuning an OpenACC Accelerated Version of Nek5000

Parameter based tuning model for optimizing performance on GPU

Hyperparameter autotuning of programs with HybridTuner

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation