Skip to main content
Log in

DLOOPT: An Optimization Assistant on AutoTVM for Deep Learning Operators

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

With the rapid growth of deep learning models and deep learning-based applications, how to accelerate the inference of deep neural networks, especially neural network operators, has become an increasingly important research area. As a bridge between a front-end deep learning framework and a back-end hardware platform, deep learning compilers aim to optimize various deep learning models for a range of hardware platforms with model- and hardware-specific optimizations. Apache TVM (or TVM for short), a well-known open-source deep learning compiler, uses a customized domain-specific language, called Tensor Expression Language, to define hardware-specific optimizations for neural network operators. TVM also allows users to write tensor expressions to design customized optimizations for specific operators. However, TVM does not assist users with supporting information, such as what computations are performed within an operator, and tools for optimizing the operators in a deep learning model. In addition, tensor expressions have an entirely different syntax from imperative languages and are not easy to get started with. Furthermore, although TVM comes with an auto-tuning module, called AutoTVM, which facilitates the tuning of optimization configurations (e.g., tiling size and loop order), AutoTVM takes quite a long time to search the optimum configurations for a set of optimizations. In this paper, we present DLOOPT, an optimization assistant that assists optimization developers in designing effective optimizations for neural network operators and/or obtaining optimum optimization configurations in a timely manner. DLOOPT specifically addresses three key aspects: (1) developers can focus only on designing optimizations by using DLOOPT, which offers sufficient information about the operators of a given model and provides an easier way to write optimizations, (2) the number of optimizations that developers need to design can be minimized by using DLOOPT, which allows optimizations to be reused, and (3) the tuning process can be greatly simplified by using DLOOPT, which implements a set of tuning strategies in AutoTVM. The evaluation results showed that DLOOPT reduced more than 99% of time in terms of developing adequate optimizations for operators in a model. We believe that DLOOPT is friendly to optimization developers and allows them to quickly develop effective optimizations for neural network operators.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18
Figure 19
Figure 20
Figure 21
Figure 22
Figure 23
Figure 24
Figure 25
Figure 26
Figure 27
Figure 28
Figure 29
Figure 30

Similar content being viewed by others

Data Availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

  1. Muller, B. (2022). BERT 101 state of the art NLP model explained. URL https://huggingface.co/blog/bert-101

  2. Alarcon, N. (2018). SONY breaks ResNet-50 training record with NVIDIA V100 tensor core GPUs. URL https://developer.nvidia.com/blog/sony-breaks-resnet-50-training-record-with-nvidia-v100-tensor-core-gpus/

  3. Chen, T., Li, M., Li, Y., Lin, M., Wang, N., Wang, M., Xiao, T., Xu, B., Zhang, C., & Zhang, Z. (2015). MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems. CoRR abs/1512.01274. 1512.01274

  4. Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D. G., Steiner, B., Tucker, P., Vasudevan, V., Warden, P., Wicke, M., Yu, Y., & Zheng, X. (2016). TensorFlow: A system for large-scale machine learning. In: Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, USENIX Association, USA, OSDI’16, p 265–283.

  5. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Köpf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., & Chintala, S. (2019). PyTorch: An imperative style, high-performance deep learning library. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems, Curran Associates Inc., Red Hook, NY, USA.

  6. Intel Corporation. (2017). oneAPI deep neural network library (oneDNN). URL https://github.com/oneapi-src/oneDNN

  7. Chetlur, S., Woolley, C., Vandermersch, P., Cohen, J., Tran, J., Catanzaro, B., & Shelhamer, E. (2014). cuDNN: Efficient primitives for deep learning. CoRR abs/1410.0759. 1410.0759

  8. Chen, T., Moreau, T., Jiang, Z., Zheng, L., Yan, E., Cowan, M., Shen, H., Wang, L., Hu, Y., Ceze, L., Guestrin, C., & Krishnamurthy, A. (2018a). TVM: An automated end-to-end optimizing compiler for deep learning. In: Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation, USENIX Association, USA, OSDI’18, p 579–594.

  9. Rotem, N., Fix, J., Abdulrasool, S., Deng, S., Dzhabarov, R., Hegeman, J., Levenstein, R., Maher, B., Satish, N., Olesen, J., Park, J., Rakhov, A., & Smelyanskiy, M. (2018). Glow: Graph lowering compiler techniques for neural networks. CoRR abs/1805.00907. 1805.00907

  10. Vasilache, N., Zinenko, O., Theodoridis, T., Goyal, P., DeVito, Z., Moses, W. S., Verdoolaege, S., Adams, A., & Cohen, A. (2018). Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions. CoRR abs/1802.04730. 1802.04730

  11. Chen, T., Zheng, L., Yan, E., Jiang, Z., Moreau, T., Ceze, L., Guestrin, C., & Krishnamurthy, A. (2018b). Learning to optimize tensor programs. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, Curran Associates Inc., Red Hook, NY, USA, NIPS’18, p 3393–3404.

  12. Zheng, L., Jia, C., Sun, M., Wu, Z., Yu, C. H., Haj-Ali, A., Wang, Y., Yang, J., Zhuo, D., Sen, K., Gonzalez, J. E., & Stoica, I. (2020). Ansor: Generating high-performance tensor programs for deep learning. In: Proceedings of the 14th USENIX Conference on Operating Systems Design and Implementation, USENIX Association, USA.

  13. Ahn, B. H., Pilligundla, P., Yazdanbakhsh, A., & Esmaeilzadeh, H. (2020). Chameleon: Adaptive code optimization for expedited deep neural network compilation. In: Proceedings of the 8th International Conference on Learning Representations.

  14. Hagedorn, B., Lenfers, J., Koehler, T., Gorlatch, S., & Steuwer, M. (2020). A language for describing optimization strategies. CoRR abs/2002.02268. 2002.02268

  15. Ragan-Kelley, J., Barnes, C., Adams, A., Paris, S., Durand, F., & Amarasinghe, S. (2013). Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In: Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, Association for Computing Machinery, New York, NY, USA, PLDI ’13, p 519–530. https://doi.org/10.1145/2491956.2462176

  16. Yu, C. H., Shi, X., Shen, H., Chen, Z., Li, M., & Wang, Y. (2021). Lorien: Efficient deep learning workloads delivery. In: Proceedings of the ACM Symposium on Cloud Computing, Association for Computing Machinery, New York, NY, USA, p 18–32. https://doi.org/10.1145/3472883.3486973

  17. Guo, J., He, H., He, T., Lausen, L., Li, M., Lin, H., Shi, X., Wang, C., Xie, J., Zha, S., Zhang, A., Zhang, H., Zhang, Z., Zhang, Z., Zheng, S., & Zhu, Y. (2020). GluonCV and GluonNLP: Deep learning in computer vision and natural language processing. Journal of Machine Learning Research, 21(23), 1–7.

    Google Scholar 

  18. GluonCV Model zoo. URL https://cv.gluon.ai/model_zoo/index.html

  19. Roesch, J., Lyubomirsky, S., Weber, L., Pollock, J., Kirisame, M., Chen, T., & Tatlock, Z. (2018). Relay: A new IR for machine learning frameworks. In: Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, Association for Computing Machinery, New York, NY, USA, MAPL 2018, p 58–68. https://doi.org/10.1145/3211346.3211348

Download references

Funding

This study was partially supported by the Ministry of Science and Technology of Taiwan [grant number MOST 110-2221-E-A49-030-MY3].

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Yu-Sheng Hsieh. The first draft of the manuscript was written by Yu-Sheng Hsieh and Yi-Ping You, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Yi-Ping You.

Ethics declarations

Competing Interest

The authors have no competing interests to declare that are relevant to the content of this article.

Ethics Approval

This research involves no human participants and/or animals.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hsieh, YS., You, YP. DLOOPT: An Optimization Assistant on AutoTVM for Deep Learning Operators. J Sign Process Syst 95, 585–607 (2023). https://doi.org/10.1007/s11265-022-01804-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-022-01804-0

Keywords

Navigation