DLOOPT: An Optimization Assistant on AutoTVM for Deep Learning Operators

Hsieh, Yu-Sheng; You, Yi-Ping

doi:10.1007/s11265-022-01804-0

DLOOPT: An Optimization Assistant on AutoTVM for Deep Learning Operators

Published: 23 August 2022

Volume 95, pages 585–607, (2023)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

321 Accesses
1 Altmetric
Explore all metrics

Abstract

With the rapid growth of deep learning models and deep learning-based applications, how to accelerate the inference of deep neural networks, especially neural network operators, has become an increasingly important research area. As a bridge between a front-end deep learning framework and a back-end hardware platform, deep learning compilers aim to optimize various deep learning models for a range of hardware platforms with model- and hardware-specific optimizations. Apache TVM (or TVM for short), a well-known open-source deep learning compiler, uses a customized domain-specific language, called Tensor Expression Language, to define hardware-specific optimizations for neural network operators. TVM also allows users to write tensor expressions to design customized optimizations for specific operators. However, TVM does not assist users with supporting information, such as what computations are performed within an operator, and tools for optimizing the operators in a deep learning model. In addition, tensor expressions have an entirely different syntax from imperative languages and are not easy to get started with. Furthermore, although TVM comes with an auto-tuning module, called AutoTVM, which facilitates the tuning of optimization configurations (e.g., tiling size and loop order), AutoTVM takes quite a long time to search the optimum configurations for a set of optimizations. In this paper, we present DLOOPT, an optimization assistant that assists optimization developers in designing effective optimizations for neural network operators and/or obtaining optimum optimization configurations in a timely manner. DLOOPT specifically addresses three key aspects: (1) developers can focus only on designing optimizations by using DLOOPT, which offers sufficient information about the operators of a given model and provides an easier way to write optimizations, (2) the number of optimizations that developers need to design can be minimized by using DLOOPT, which allows optimizations to be reused, and (3) the tuning process can be greatly simplified by using DLOOPT, which implements a set of tuning strategies in AutoTVM. The evaluation results showed that DLOOPT reduced more than 99% of time in terms of developing adequate optimizations for operators in a model. We believe that DLOOPT is friendly to optimization developers and allows them to quickly develop effective optimizations for neural network operators.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 21

Survey and design of paleozoic: a high-performance compiler tool chain for deep learning inference accelerator

Article 06 October 2020

Zihan Liu, Jingwen Leng, … Minyi Guo

NNBlocks: a Blockly framework for AI computing

Article 25 January 2021

Tai-Liang Chen, Yi-Ru Chen, … Jenq-Kuen Lee

BestOf: an online implementation selector for the training and inference of deep neural networks

Article Open access 20 May 2022

Sergio Barrachina, Adrián Castelló, … Andrés E. Tomás

Data Availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

Muller, B. (2022). BERT 101 state of the art NLP model explained. URL https://huggingface.co/blog/bert-101
Alarcon, N. (2018). SONY breaks ResNet-50 training record with NVIDIA V100 tensor core GPUs. URL https://developer.nvidia.com/blog/sony-breaks-resnet-50-training-record-with-nvidia-v100-tensor-core-gpus/
Chen, T., Li, M., Li, Y., Lin, M., Wang, N., Wang, M., Xiao, T., Xu, B., Zhang, C., & Zhang, Z. (2015). MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems. CoRR abs/1512.01274. 1512.01274
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D. G., Steiner, B., Tucker, P., Vasudevan, V., Warden, P., Wicke, M., Yu, Y., & Zheng, X. (2016). TensorFlow: A system for large-scale machine learning. In: Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, USENIX Association, USA, OSDI’16, p 265–283.
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Köpf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., & Chintala, S. (2019). PyTorch: An imperative style, high-performance deep learning library. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems, Curran Associates Inc., Red Hook, NY, USA.
Intel Corporation. (2017). oneAPI deep neural network library (oneDNN). URL https://github.com/oneapi-src/oneDNN
Chetlur, S., Woolley, C., Vandermersch, P., Cohen, J., Tran, J., Catanzaro, B., & Shelhamer, E. (2014). cuDNN: Efficient primitives for deep learning. CoRR abs/1410.0759. 1410.0759
Chen, T., Moreau, T., Jiang, Z., Zheng, L., Yan, E., Cowan, M., Shen, H., Wang, L., Hu, Y., Ceze, L., Guestrin, C., & Krishnamurthy, A. (2018a). TVM: An automated end-to-end optimizing compiler for deep learning. In: Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation, USENIX Association, USA, OSDI’18, p 579–594.
Rotem, N., Fix, J., Abdulrasool, S., Deng, S., Dzhabarov, R., Hegeman, J., Levenstein, R., Maher, B., Satish, N., Olesen, J., Park, J., Rakhov, A., & Smelyanskiy, M. (2018). Glow: Graph lowering compiler techniques for neural networks. CoRR abs/1805.00907. 1805.00907
Vasilache, N., Zinenko, O., Theodoridis, T., Goyal, P., DeVito, Z., Moses, W. S., Verdoolaege, S., Adams, A., & Cohen, A. (2018). Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions. CoRR abs/1802.04730. 1802.04730
Chen, T., Zheng, L., Yan, E., Jiang, Z., Moreau, T., Ceze, L., Guestrin, C., & Krishnamurthy, A. (2018b). Learning to optimize tensor programs. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, Curran Associates Inc., Red Hook, NY, USA, NIPS’18, p 3393–3404.
Zheng, L., Jia, C., Sun, M., Wu, Z., Yu, C. H., Haj-Ali, A., Wang, Y., Yang, J., Zhuo, D., Sen, K., Gonzalez, J. E., & Stoica, I. (2020). Ansor: Generating high-performance tensor programs for deep learning. In: Proceedings of the 14th USENIX Conference on Operating Systems Design and Implementation, USENIX Association, USA.
Ahn, B. H., Pilligundla, P., Yazdanbakhsh, A., & Esmaeilzadeh, H. (2020). Chameleon: Adaptive code optimization for expedited deep neural network compilation. In: Proceedings of the 8th International Conference on Learning Representations.
Hagedorn, B., Lenfers, J., Koehler, T., Gorlatch, S., & Steuwer, M. (2020). A language for describing optimization strategies. CoRR abs/2002.02268. 2002.02268
Ragan-Kelley, J., Barnes, C., Adams, A., Paris, S., Durand, F., & Amarasinghe, S. (2013). Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In: Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, Association for Computing Machinery, New York, NY, USA, PLDI ’13, p 519–530. https://doi.org/10.1145/2491956.2462176
Yu, C. H., Shi, X., Shen, H., Chen, Z., Li, M., & Wang, Y. (2021). Lorien: Efficient deep learning workloads delivery. In: Proceedings of the ACM Symposium on Cloud Computing, Association for Computing Machinery, New York, NY, USA, p 18–32. https://doi.org/10.1145/3472883.3486973
Guo, J., He, H., He, T., Lausen, L., Li, M., Lin, H., Shi, X., Wang, C., Xie, J., Zha, S., Zhang, A., Zhang, H., Zhang, Z., Zhang, Z., Zheng, S., & Zhu, Y. (2020). GluonCV and GluonNLP: Deep learning in computer vision and natural language processing. Journal of Machine Learning Research, 21(23), 1–7.
Google Scholar
GluonCV Model zoo. URL https://cv.gluon.ai/model_zoo/index.html
Roesch, J., Lyubomirsky, S., Weber, L., Pollock, J., Kirisame, M., Chen, T., & Tatlock, Z. (2018). Relay: A new IR for machine learning frameworks. In: Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, Association for Computing Machinery, New York, NY, USA, MAPL 2018, p 58–68. https://doi.org/10.1145/3211346.3211348

Download references

Funding

This study was partially supported by the Ministry of Science and Technology of Taiwan [grant number MOST 110-2221-E-A49-030-MY3].

Author information

Authors and Affiliations

Department of Computer Science, National Yang Ming Chiao Tung University, 1001, University Road, Hsinchu, 300093, Taiwan
Yu-Sheng Hsieh & Yi-Ping You

Authors

Yu-Sheng Hsieh
View author publications
You can also search for this author in PubMed Google Scholar
Yi-Ping You
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Yu-Sheng Hsieh. The first draft of the manuscript was written by Yu-Sheng Hsieh and Yi-Ping You, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Yi-Ping You.

Ethics declarations

Competing Interest

The authors have no competing interests to declare that are relevant to the content of this article.

Ethics Approval

This research involves no human participants and/or animals.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Hsieh, YS., You, YP. DLOOPT: An Optimization Assistant on AutoTVM for Deep Learning Operators. J Sign Process Syst 95, 585–607 (2023). https://doi.org/10.1007/s11265-022-01804-0

Download citation

Received: 18 April 2022
Revised: 20 July 2022
Accepted: 10 August 2022
Published: 23 August 2022
Issue Date: May 2023
DOI: https://doi.org/10.1007/s11265-022-01804-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DLOOPT: An Optimization Assistant on AutoTVM for Deep Learning Operators

Abstract

Access this article

Similar content being viewed by others

Survey and design of paleozoic: a high-performance compiler tool chain for deep learning inference accelerator

NNBlocks: a Blockly framework for AI computing

BestOf: an online implementation selector for the training and inference of deep neural networks

Data Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing Interest

Ethics Approval

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

Survey and design of paleozoic: a high-performance compiler tool chain for deep learning inference accelerator

NNBlocks: a Blockly framework for AI computing

BestOf: an online implementation selector for the training and inference of deep neural networks

Data Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing Interest

Ethics Approval

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation