DeepRT: predictable deep learning inference for cyber-physical systems

Kang, Woochul; Chung, Jaeyong

doi:10.1007/s11241-018-9314-y

DeepRT: predictable deep learning inference for cyber-physical systems

Published: 18 July 2018

Volume 55, pages 106–135, (2019)
Cite this article

Real-Time Systems Aims and scope Submit manuscript

849 Accesses
11 Citations
Explore all metrics

Abstract

Recently, in mobile and embedded devices, deep learning is changing the way computers see, hear, and understand the world. When deep learning is deployed to such systems, they are supposed to perform inference tasks in a timely and energy-efficient manner. Lots of research has focused on taming deep learning for resource-constrained devices by either compressing deep learning models or devising hardware accelerators. However, these approaches have focused on providing ‘best-effort’ performance for such devices. In this paper, we present the design and implementation of DeepRT, a novel deep learning inference runtime. Unlike previous approaches, DeepRT focuses on supporting predictable temporal and spatial inference performance when deep learning models are used under unpredictable and resource-constrained environments. In particular, DeepRT applies formal control theory to support Quality-of-Service (QoS) management that can dynamically minimize the tardiness of inference tasks at runtime while achieving high energy-efficiency. Further, DeepRT determines a proper level of compression of deep learning models at runtime according to the memory availability and users’ QoS requirements, resulting in proper trade-offs between the memory savings and the losses of inference accuracy. We evaluate DeepRT on a wide range of deep learning models under various conditions. The experimental results show that DeepRT supports the timeliness of inference tasks in a robust and energy-efficient manner.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Energy-Efficient Design of Advanced Machine Learning Hardware

A methodological framework for optimizing the energy consumption of deep neural networks: a case study of a cyber threat detector

Article Open access 15 March 2024

Amit Karamchandani, Alberto Mozo, … Antonio Pastor

Using Approximate DRAM for Enabling Energy-Efficient, High-Performance Deep Neural Network Inference

Notes

The details of the experiment setting is discussed in Sect. 4.1. The system power is measured using an external power meter. Only the result for the CaffeNet model is shown since other models, e.g., LeNet, manifest very similar behavior.
CaffeNet is a variant of AlexNet Krizhevsky et al. (2012).

References

Abe Y, Sasaki H, Kato S, Inoue K, Edahiro M, Peres M (2014) Power and performance characterization and modeling of gpu-accelerated systems. In: 2014 IEEE 28th international parallel and distributed processing symposium, pp 113–122. https://doi.org/10.1109/IPDPS.2014.23
Amert T, Otterness N, Yang M, Anderson JH, Smith FD (2017) Gpu scheduling on the nvidia tx2: hidden details revealed. 2017 IEEE real-time systems symposium (RTSS), pp 104–115
Caffe Model Zoo (2018) https://github.com/bvlc/caffe/wiki/model-zoo
Chen JJ, Kuo CF (2007) Energy-efficient scheduling for real-time systems on dynamic voltage scaling (dvs) platforms. In: 13th IEEE international conference on embedded and real-time computing systems and applications, pp 28–38
Chen W, Wilson J, Tyree S, Weinberger K, Chen Y (2015) Compressing neural networks with the hashing trick. In: International conference on machine learning, pp 2285–2294
Chen YH, Krishna T, Emer JS, Sze V (2017) Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J Solid State Circuits 52(1):127–138
Article Google Scholar
Chung J, Shin T (2016) Simplifying deep neural networks for neuromorphic architectures. In: 2016 53nd ACM/EDAC/IEEE design automation conference (DAC), pp 1–6. https://doi.org/10.1145/2897937.2898092
Deng L, Yu D (2014) Deep learning: methods and applications. Technical Report. https://www.microsoft.com/en-us/research/publication/deep-learning-methods-and-applications/
Denton EL, Zaremba W, Bruna J, LeCun Y, Fergus R (2014) Exploiting linear structure within convolutional networks for efficient evaluation. In: Advances in neural information processing systems, pp 1269–1277
Du Z, Fasthuber R, Chen T, Ienne P, Li L, Luo T, Feng X, Chen Y, Temam O (2015) Shidiannao: shifting vision processing closer to the sensor. In: Proceedings of the 2015 ACM/IEEE 42nd annual international symposium on computer architecture (ISCA). IEEE, New York, pp 92–104
Falcini F, Lami G, Costanza AM (2017) Deep learning in automotive software. IEEE Softw 34(3):56–63
Article Google Scholar
Fu X, Wang X (2011) Utilization-controlled task consolidation for power optimization in multi-core real-time systems. In: Proceedings of the 2011 IEEE 17th international conference on embedded and real-time computing systems and applications (RTCSA), vol 1, pp 73–82
Fu Y, Kottenstette N, Lu C, Koutsoukos XD (2012) Feedback thermal control of real-time systems on multicore processors. In: Proceedings of the tenth ACM international conference on embedded software, EMSOFT ’12. ACM, New York, pp 113–122
Gong Y, Liu L, Yang M, Bourdev L (2014) Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:1412.6115
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge. http://www.deeplearningbook.org
Han S, Liu X, Mao H, Pu J, Pedram A, Horowitz MA, Dally WJ (2016) Eie: efficient inference engine on compressed deep neural network. In: Proceedings of the 43rd international symposium on computer architecture, ISCA ’16, pp 243–254
Han S, Mao H, Dally WJ (2015) Deep compression: compressing deep neural network with pruning, trained quantization and huffman coding. CoRR abs/1510.00149. http://arxiv.org/abs/1510.00149
Han S, Pool J, Tran J, Dally WJ (2015) Learning both weights and connections for efficient neural networks. In: Proceedings of the 28th international conference on neural information processing systems, NIPS’15. MIT Press, Cambridge, pp 1135–1143. http://dl.acm.org/citation.cfm?id=2969239.2969366
Hellerstein JL, Diao Y, Parekh S, Tilbury DM (2004) Feedback control of computing systems. Wiley IEEE press, Hoboken
Book Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Ishihara T, Yasuura H (1998) Voltage scheduling problem for dynamically variable voltage processors. In: Proceedings, 1998 international symposium on low power electronics and design. IEEE, New York, pp 197–202
Jaderberg M, Vedaldi A, Zisserman A (2014) Speeding up convolutional neural networks with low rank expansions. In: Proceedings of the British machine vision conference. BMVA Press
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093
Kang W, Chung J (2017) Energy-efficient response time management for embedded databases. Real Time Syst 53(2):228–253. https://doi.org/10.1007/s11241-016-9264-1
Article Google Scholar
Kang W, Son SH, Stankovic JA (2012) Design, implementation, and evaluation of a qos-aware real-time embedded database. IEEE Trans Comput 61(1):45–59
Article MathSciNet MATH Google Scholar
Kim DHK, Imes C, Hoffmann H (2015) Racing and pacing to idle: theoretical and empirical analysis of energy optimization heuristics. In: 2015 IEEE 3rd international conference on cyber-physical systems, networks, and applications, pp 78–85. https://doi.org/10.1109/CPSNA.2015.23
Kim Y, Park E, Yoo S, Choi T, Yang L, Shin D (2015) Compression of deep convolutional neural networks for fast and low power mobile applications. CoRR abs/1511.06530. http://arxiv.org/abs/1511.06530
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Lane ND, Bhattacharya S, Georgiev P, Forlivesi C, Jiao L, Qendro L, Kawsar F (2016) Deepx: a software accelerator for low-power deep learning inference on mobile devices. In: 2016 15th ACM/IEEE international conference on information processing in sensor networks (IPSN). IEEE, New York, pp 1–12
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
Liu CL, Layland JW (1973) Scheduling algorithms for multiprogramming in a hard-real-time environment. J ACM 20(1):46–61. https://doi.org/10.1145/321738.321743
Article MathSciNet MATH Google Scholar
Ljung L (1999) Systems identification: theory for the user, 2nd edn. Prentice Hall PTR, Upper Saddle River
Google Scholar
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Lu C, Abdelzaher TF, Stankovic JA, Son SH (2001) A feedback control approach for guaranteeing relative delays in web servers. In: RTAS ’01: Proceedings of the seventh real-time technology and applications symposium (RTAS ’01)
Lu C, Stankovic JA, Son SH, Tao G (2002) Feedback control real-time scheduling: framework, modeling, and algorithms. Real Time Syst 23(1–2):85–126
Article MATH Google Scholar
Lu C, Wang X, Gill C (2003) Feedback control real-time scheduling in orb middleware. In: RTAS ’03: Proceedings of the 9th IEEE real-time and embedded technology and applications symposium. IEEE Computer Society, Washington, DC, p 37
Lu Y, Abdelzaher TF, Saxena A (2004) Design, implementation, and evaluation of differentiated caching services. IEEE Trans Parallel Distrib Syst 15(5):440–452
Article Google Scholar
Mei X, Wang Q, Chu X (2017) A survey and measurement study of gpu dvfs on energy conservation. Digital Commun Netw 3(2):89–100
Article Google Scholar
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Article Google Scholar
Nvidia TensorRT (2017) https://developer.nvidia.com/tensorrt
Ovtcharov K, Ruwase O, Kim JY, Fowers J, Strauss K, Chung ES (2015) Toward accelerating deep learning at scale using specialized hardware in the datacenter. In: Hot chips 27 symposium (HCS). IEEE, New York, pp 1–38
Pallipadi V, Starikovskiy A (2006) The ondemand governor. Proc Linux Symp 2:215–230
Google Scholar
Parekh S, Gandhi N, Hellerstein J, Tilbury D, Jayram T, Bigus J (2002) Using control theory to achieve service level objectives in performance management. Real Time Syst 23(1–2):127–141
Article MATH Google Scholar
Park S, Humphrey MA (2011) Predictable high-performance computing using feedback control and admission control. IEEE Trans Parallel Distrib Syst 22(3):396–411. https://doi.org/10.1109/TPDS.2010.100
Article Google Scholar
Paszke A, Chaurasia A, Kim S, Culurciello E (2016) Enet: a deep neural network architecture for real-time semantic segmentation. arXiv preprint arXiv:1606.02147
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis IJCV 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y
Article MathSciNet Google Scholar
Stewart J (2018) Self-driving cars use crazy amounts of power, and it’s becoming a problem. Wired. https://www.wired.com/story/self-driving-cars-power-consumption-nvidia-chip/
Strang G (2016) Introduction to linear algebra, vol 5. Wellesley-Cambridge Press, Wellesley
MATH Google Scholar
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Xue J, Li J, Gong Y (2013) Restructuring of deep neural network acoustic models with singular value decomposition. In: Interspeech, pp 2365–2369
Yao F, Demers A, Shenker S (1995) A scheduling model for reduced cpu energy. In: Proceedings of the 36th annual symposium on foundations of computer science, pp 374–382

Download references

Acknowledgements

This work was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education under Grant NRF-2016R1D1A1B03934266.

Author information

Authors and Affiliations

Department of Embedded Systems Engineering, Incheon National University, Incheon, South Korea
Woochul Kang
Department of Electronic Engineering, Incheon National University, Incheon, South Korea
Jaeyong Chung

Authors

Woochul Kang
View author publications
You can also search for this author in PubMed Google Scholar
Jaeyong Chung
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jaeyong Chung.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kang, W., Chung, J. DeepRT: predictable deep learning inference for cyber-physical systems. Real-Time Syst 55, 106–135 (2019). https://doi.org/10.1007/s11241-018-9314-y

Download citation

Published: 18 July 2018
Issue Date: 15 January 2019
DOI: https://doi.org/10.1007/s11241-018-9314-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DeepRT: predictable deep learning inference for cyber-physical systems

Abstract

Access this article

Similar content being viewed by others

Energy-Efficient Design of Advanced Machine Learning Hardware

A methodological framework for optimizing the energy consumption of deep neural networks: a case study of a cyber threat detector

Using Approximate DRAM for Enabling Energy-Efficient, High-Performance Deep Neural Network Inference

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

DeepRT: predictable deep learning inference for cyber-physical systems

Abstract

Access this article

Similar content being viewed by others

Energy-Efficient Design of Advanced Machine Learning Hardware

A methodological framework for optimizing the energy consumption of deep neural networks: a case study of a cyber threat detector

Using Approximate DRAM for Enabling Energy-Efficient, High-Performance Deep Neural Network Inference

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation