Skip to main content

Advertisement

Log in

DeepRT: predictable deep learning inference for cyber-physical systems

  • Published:
Real-Time Systems Aims and scope Submit manuscript

Abstract

Recently, in mobile and embedded devices, deep learning is changing the way computers see, hear, and understand the world. When deep learning is deployed to such systems, they are supposed to perform inference tasks in a timely and energy-efficient manner. Lots of research has focused on taming deep learning for resource-constrained devices by either compressing deep learning models or devising hardware accelerators. However, these approaches have focused on providing ‘best-effort’ performance for such devices. In this paper, we present the design and implementation of DeepRT, a novel deep learning inference runtime. Unlike previous approaches, DeepRT focuses on supporting predictable temporal and spatial inference performance when deep learning models are used under unpredictable and resource-constrained environments. In particular, DeepRT applies formal control theory to support Quality-of-Service (QoS) management that can dynamically minimize the tardiness of inference tasks at runtime while achieving high energy-efficiency. Further, DeepRT determines a proper level of compression of deep learning models at runtime according to the memory availability and users’ QoS requirements, resulting in proper trade-offs between the memory savings and the losses of inference accuracy. We evaluate DeepRT on a wide range of deep learning models under various conditions. The experimental results show that DeepRT supports the timeliness of inference tasks in a robust and energy-efficient manner.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Notes

  1. The details of the experiment setting is discussed in Sect. 4.1. The system power is measured using an external power meter. Only the result for the CaffeNet model is shown since other models, e.g., LeNet, manifest very similar behavior.

  2. CaffeNet is a variant of AlexNet Krizhevsky et al. (2012).

References

  • Abe Y, Sasaki H, Kato S, Inoue K, Edahiro M, Peres M (2014) Power and performance characterization and modeling of gpu-accelerated systems. In: 2014 IEEE 28th international parallel and distributed processing symposium, pp 113–122. https://doi.org/10.1109/IPDPS.2014.23

  • Amert T, Otterness N, Yang M, Anderson JH, Smith FD (2017) Gpu scheduling on the nvidia tx2: hidden details revealed. 2017 IEEE real-time systems symposium (RTSS), pp 104–115

  • Caffe Model Zoo (2018) https://github.com/bvlc/caffe/wiki/model-zoo

  • Chen JJ, Kuo CF (2007) Energy-efficient scheduling for real-time systems on dynamic voltage scaling (dvs) platforms. In: 13th IEEE international conference on embedded and real-time computing systems and applications, pp 28–38

  • Chen W, Wilson J, Tyree S, Weinberger K, Chen Y (2015) Compressing neural networks with the hashing trick. In: International conference on machine learning, pp 2285–2294

  • Chen YH, Krishna T, Emer JS, Sze V (2017) Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J Solid State Circuits 52(1):127–138

    Article  Google Scholar 

  • Chung J, Shin T (2016) Simplifying deep neural networks for neuromorphic architectures. In: 2016 53nd ACM/EDAC/IEEE design automation conference (DAC), pp 1–6. https://doi.org/10.1145/2897937.2898092

  • Deng L, Yu D (2014) Deep learning: methods and applications. Technical Report. https://www.microsoft.com/en-us/research/publication/deep-learning-methods-and-applications/

  • Denton EL, Zaremba W, Bruna J, LeCun Y, Fergus R (2014) Exploiting linear structure within convolutional networks for efficient evaluation. In: Advances in neural information processing systems, pp 1269–1277

  • Du Z, Fasthuber R, Chen T, Ienne P, Li L, Luo T, Feng X, Chen Y, Temam O (2015) Shidiannao: shifting vision processing closer to the sensor. In: Proceedings of the 2015 ACM/IEEE 42nd annual international symposium on computer architecture (ISCA). IEEE, New York, pp 92–104

  • Falcini F, Lami G, Costanza AM (2017) Deep learning in automotive software. IEEE Softw 34(3):56–63

    Article  Google Scholar 

  • Fu X, Wang X (2011) Utilization-controlled task consolidation for power optimization in multi-core real-time systems. In: Proceedings of the 2011 IEEE 17th international conference on embedded and real-time computing systems and applications (RTCSA), vol 1, pp 73–82

  • Fu Y, Kottenstette N, Lu C, Koutsoukos XD (2012) Feedback thermal control of real-time systems on multicore processors. In: Proceedings of the tenth ACM international conference on embedded software, EMSOFT ’12. ACM, New York, pp 113–122

  • Gong Y, Liu L, Yang M, Bourdev L (2014) Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:1412.6115

  • Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge. http://www.deeplearningbook.org

  • Han S, Liu X, Mao H, Pu J, Pedram A, Horowitz MA, Dally WJ (2016) Eie: efficient inference engine on compressed deep neural network. In: Proceedings of the 43rd international symposium on computer architecture, ISCA ’16, pp 243–254

  • Han S, Mao H, Dally WJ (2015) Deep compression: compressing deep neural network with pruning, trained quantization and huffman coding. CoRR abs/1510.00149. http://arxiv.org/abs/1510.00149

  • Han S, Pool J, Tran J, Dally WJ (2015) Learning both weights and connections for efficient neural networks. In: Proceedings of the 28th international conference on neural information processing systems, NIPS’15. MIT Press, Cambridge, pp 1135–1143. http://dl.acm.org/citation.cfm?id=2969239.2969366

  • Hellerstein JL, Diao Y, Parekh S, Tilbury DM (2004) Feedback control of computing systems. Wiley IEEE press, Hoboken

    Book  Google Scholar 

  • Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  • Ishihara T, Yasuura H (1998) Voltage scheduling problem for dynamically variable voltage processors. In: Proceedings, 1998 international symposium on low power electronics and design. IEEE, New York, pp 197–202

  • Jaderberg M, Vedaldi A, Zisserman A (2014) Speeding up convolutional neural networks with low rank expansions. In: Proceedings of the British machine vision conference. BMVA Press

  • Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093

  • Kang W, Chung J (2017) Energy-efficient response time management for embedded databases. Real Time Syst 53(2):228–253. https://doi.org/10.1007/s11241-016-9264-1

    Article  Google Scholar 

  • Kang W, Son SH, Stankovic JA (2012) Design, implementation, and evaluation of a qos-aware real-time embedded database. IEEE Trans Comput 61(1):45–59

    Article  MathSciNet  MATH  Google Scholar 

  • Kim DHK, Imes C, Hoffmann H (2015) Racing and pacing to idle: theoretical and empirical analysis of energy optimization heuristics. In: 2015 IEEE 3rd international conference on cyber-physical systems, networks, and applications, pp 78–85. https://doi.org/10.1109/CPSNA.2015.23

  • Kim Y, Park E, Yoo S, Choi T, Yang L, Shin D (2015) Compression of deep convolutional neural networks for fast and low power mobile applications. CoRR abs/1511.06530. http://arxiv.org/abs/1511.06530

  • Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  • Lane ND, Bhattacharya S, Georgiev P, Forlivesi C, Jiao L, Qendro L, Kawsar F (2016) Deepx: a software accelerator for low-power deep learning inference on mobile devices. In: 2016 15th ACM/IEEE international conference on information processing in sensor networks (IPSN). IEEE, New York, pp 1–12

  • LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  • Liu CL, Layland JW (1973) Scheduling algorithms for multiprogramming in a hard-real-time environment. J ACM 20(1):46–61. https://doi.org/10.1145/321738.321743

    Article  MathSciNet  MATH  Google Scholar 

  • Ljung L (1999) Systems identification: theory for the user, 2nd edn. Prentice Hall PTR, Upper Saddle River

    Google Scholar 

  • Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440

  • Lu C, Abdelzaher TF, Stankovic JA, Son SH (2001) A feedback control approach for guaranteeing relative delays in web servers. In: RTAS ’01: Proceedings of the seventh real-time technology and applications symposium (RTAS ’01)

  • Lu C, Stankovic JA, Son SH, Tao G (2002) Feedback control real-time scheduling: framework, modeling, and algorithms. Real Time Syst 23(1–2):85–126

    Article  MATH  Google Scholar 

  • Lu C, Wang X, Gill C (2003) Feedback control real-time scheduling in orb middleware. In: RTAS ’03: Proceedings of the 9th IEEE real-time and embedded technology and applications symposium. IEEE Computer Society, Washington, DC, p 37

  • Lu Y, Abdelzaher TF, Saxena A (2004) Design, implementation, and evaluation of differentiated caching services. IEEE Trans Parallel Distrib Syst 15(5):440–452

    Article  Google Scholar 

  • Mei X, Wang Q, Chu X (2017) A survey and measurement study of gpu dvfs on energy conservation. Digital Commun Netw 3(2):89–100

    Article  Google Scholar 

  • Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533

    Article  Google Scholar 

  • Nvidia TensorRT (2017) https://developer.nvidia.com/tensorrt

  • Ovtcharov K, Ruwase O, Kim JY, Fowers J, Strauss K, Chung ES (2015) Toward accelerating deep learning at scale using specialized hardware in the datacenter. In: Hot chips 27 symposium (HCS). IEEE, New York, pp 1–38

  • Pallipadi V, Starikovskiy A (2006) The ondemand governor. Proc Linux Symp 2:215–230

    Google Scholar 

  • Parekh S, Gandhi N, Hellerstein J, Tilbury D, Jayram T, Bigus J (2002) Using control theory to achieve service level objectives in performance management. Real Time Syst 23(1–2):127–141

    Article  MATH  Google Scholar 

  • Park S, Humphrey MA (2011) Predictable high-performance computing using feedback control and admission control. IEEE Trans Parallel Distrib Syst 22(3):396–411. https://doi.org/10.1109/TPDS.2010.100

    Article  Google Scholar 

  • Paszke A, Chaurasia A, Kim S, Culurciello E (2016) Enet: a deep neural network architecture for real-time semantic segmentation. arXiv preprint arXiv:1606.02147

  • Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis IJCV 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y

    Article  MathSciNet  Google Scholar 

  • Stewart J (2018) Self-driving cars use crazy amounts of power, and it’s becoming a problem. Wired. https://www.wired.com/story/self-driving-cars-power-consumption-nvidia-chip/

  • Strang G (2016) Introduction to linear algebra, vol 5. Wellesley-Cambridge Press, Wellesley

    MATH  Google Scholar 

  • Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

  • Xue J, Li J, Gong Y (2013) Restructuring of deep neural network acoustic models with singular value decomposition. In: Interspeech, pp 2365–2369

  • Yao F, Demers A, Shenker S (1995) A scheduling model for reduced cpu energy. In: Proceedings of the 36th annual symposium on foundations of computer science, pp 374–382

Download references

Acknowledgements

This work was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education under Grant NRF-2016R1D1A1B03934266.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jaeyong Chung.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kang, W., Chung, J. DeepRT: predictable deep learning inference for cyber-physical systems. Real-Time Syst 55, 106–135 (2019). https://doi.org/10.1007/s11241-018-9314-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11241-018-9314-y

Keywords

Navigation