Abstract
Recently, in mobile and embedded devices, deep learning is changing the way computers see, hear, and understand the world. When deep learning is deployed to such systems, they are supposed to perform inference tasks in a timely and energy-efficient manner. Lots of research has focused on taming deep learning for resource-constrained devices by either compressing deep learning models or devising hardware accelerators. However, these approaches have focused on providing ‘best-effort’ performance for such devices. In this paper, we present the design and implementation of DeepRT, a novel deep learning inference runtime. Unlike previous approaches, DeepRT focuses on supporting predictable temporal and spatial inference performance when deep learning models are used under unpredictable and resource-constrained environments. In particular, DeepRT applies formal control theory to support Quality-of-Service (QoS) management that can dynamically minimize the tardiness of inference tasks at runtime while achieving high energy-efficiency. Further, DeepRT determines a proper level of compression of deep learning models at runtime according to the memory availability and users’ QoS requirements, resulting in proper trade-offs between the memory savings and the losses of inference accuracy. We evaluate DeepRT on a wide range of deep learning models under various conditions. The experimental results show that DeepRT supports the timeliness of inference tasks in a robust and energy-efficient manner.
Similar content being viewed by others
Notes
References
Abe Y, Sasaki H, Kato S, Inoue K, Edahiro M, Peres M (2014) Power and performance characterization and modeling of gpu-accelerated systems. In: 2014 IEEE 28th international parallel and distributed processing symposium, pp 113–122. https://doi.org/10.1109/IPDPS.2014.23
Amert T, Otterness N, Yang M, Anderson JH, Smith FD (2017) Gpu scheduling on the nvidia tx2: hidden details revealed. 2017 IEEE real-time systems symposium (RTSS), pp 104–115
Caffe Model Zoo (2018) https://github.com/bvlc/caffe/wiki/model-zoo
Chen JJ, Kuo CF (2007) Energy-efficient scheduling for real-time systems on dynamic voltage scaling (dvs) platforms. In: 13th IEEE international conference on embedded and real-time computing systems and applications, pp 28–38
Chen W, Wilson J, Tyree S, Weinberger K, Chen Y (2015) Compressing neural networks with the hashing trick. In: International conference on machine learning, pp 2285–2294
Chen YH, Krishna T, Emer JS, Sze V (2017) Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J Solid State Circuits 52(1):127–138
Chung J, Shin T (2016) Simplifying deep neural networks for neuromorphic architectures. In: 2016 53nd ACM/EDAC/IEEE design automation conference (DAC), pp 1–6. https://doi.org/10.1145/2897937.2898092
Deng L, Yu D (2014) Deep learning: methods and applications. Technical Report. https://www.microsoft.com/en-us/research/publication/deep-learning-methods-and-applications/
Denton EL, Zaremba W, Bruna J, LeCun Y, Fergus R (2014) Exploiting linear structure within convolutional networks for efficient evaluation. In: Advances in neural information processing systems, pp 1269–1277
Du Z, Fasthuber R, Chen T, Ienne P, Li L, Luo T, Feng X, Chen Y, Temam O (2015) Shidiannao: shifting vision processing closer to the sensor. In: Proceedings of the 2015 ACM/IEEE 42nd annual international symposium on computer architecture (ISCA). IEEE, New York, pp 92–104
Falcini F, Lami G, Costanza AM (2017) Deep learning in automotive software. IEEE Softw 34(3):56–63
Fu X, Wang X (2011) Utilization-controlled task consolidation for power optimization in multi-core real-time systems. In: Proceedings of the 2011 IEEE 17th international conference on embedded and real-time computing systems and applications (RTCSA), vol 1, pp 73–82
Fu Y, Kottenstette N, Lu C, Koutsoukos XD (2012) Feedback thermal control of real-time systems on multicore processors. In: Proceedings of the tenth ACM international conference on embedded software, EMSOFT ’12. ACM, New York, pp 113–122
Gong Y, Liu L, Yang M, Bourdev L (2014) Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:1412.6115
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge. http://www.deeplearningbook.org
Han S, Liu X, Mao H, Pu J, Pedram A, Horowitz MA, Dally WJ (2016) Eie: efficient inference engine on compressed deep neural network. In: Proceedings of the 43rd international symposium on computer architecture, ISCA ’16, pp 243–254
Han S, Mao H, Dally WJ (2015) Deep compression: compressing deep neural network with pruning, trained quantization and huffman coding. CoRR abs/1510.00149. http://arxiv.org/abs/1510.00149
Han S, Pool J, Tran J, Dally WJ (2015) Learning both weights and connections for efficient neural networks. In: Proceedings of the 28th international conference on neural information processing systems, NIPS’15. MIT Press, Cambridge, pp 1135–1143. http://dl.acm.org/citation.cfm?id=2969239.2969366
Hellerstein JL, Diao Y, Parekh S, Tilbury DM (2004) Feedback control of computing systems. Wiley IEEE press, Hoboken
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Ishihara T, Yasuura H (1998) Voltage scheduling problem for dynamically variable voltage processors. In: Proceedings, 1998 international symposium on low power electronics and design. IEEE, New York, pp 197–202
Jaderberg M, Vedaldi A, Zisserman A (2014) Speeding up convolutional neural networks with low rank expansions. In: Proceedings of the British machine vision conference. BMVA Press
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093
Kang W, Chung J (2017) Energy-efficient response time management for embedded databases. Real Time Syst 53(2):228–253. https://doi.org/10.1007/s11241-016-9264-1
Kang W, Son SH, Stankovic JA (2012) Design, implementation, and evaluation of a qos-aware real-time embedded database. IEEE Trans Comput 61(1):45–59
Kim DHK, Imes C, Hoffmann H (2015) Racing and pacing to idle: theoretical and empirical analysis of energy optimization heuristics. In: 2015 IEEE 3rd international conference on cyber-physical systems, networks, and applications, pp 78–85. https://doi.org/10.1109/CPSNA.2015.23
Kim Y, Park E, Yoo S, Choi T, Yang L, Shin D (2015) Compression of deep convolutional neural networks for fast and low power mobile applications. CoRR abs/1511.06530. http://arxiv.org/abs/1511.06530
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Lane ND, Bhattacharya S, Georgiev P, Forlivesi C, Jiao L, Qendro L, Kawsar F (2016) Deepx: a software accelerator for low-power deep learning inference on mobile devices. In: 2016 15th ACM/IEEE international conference on information processing in sensor networks (IPSN). IEEE, New York, pp 1–12
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Liu CL, Layland JW (1973) Scheduling algorithms for multiprogramming in a hard-real-time environment. J ACM 20(1):46–61. https://doi.org/10.1145/321738.321743
Ljung L (1999) Systems identification: theory for the user, 2nd edn. Prentice Hall PTR, Upper Saddle River
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Lu C, Abdelzaher TF, Stankovic JA, Son SH (2001) A feedback control approach for guaranteeing relative delays in web servers. In: RTAS ’01: Proceedings of the seventh real-time technology and applications symposium (RTAS ’01)
Lu C, Stankovic JA, Son SH, Tao G (2002) Feedback control real-time scheduling: framework, modeling, and algorithms. Real Time Syst 23(1–2):85–126
Lu C, Wang X, Gill C (2003) Feedback control real-time scheduling in orb middleware. In: RTAS ’03: Proceedings of the 9th IEEE real-time and embedded technology and applications symposium. IEEE Computer Society, Washington, DC, p 37
Lu Y, Abdelzaher TF, Saxena A (2004) Design, implementation, and evaluation of differentiated caching services. IEEE Trans Parallel Distrib Syst 15(5):440–452
Mei X, Wang Q, Chu X (2017) A survey and measurement study of gpu dvfs on energy conservation. Digital Commun Netw 3(2):89–100
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Nvidia TensorRT (2017) https://developer.nvidia.com/tensorrt
Ovtcharov K, Ruwase O, Kim JY, Fowers J, Strauss K, Chung ES (2015) Toward accelerating deep learning at scale using specialized hardware in the datacenter. In: Hot chips 27 symposium (HCS). IEEE, New York, pp 1–38
Pallipadi V, Starikovskiy A (2006) The ondemand governor. Proc Linux Symp 2:215–230
Parekh S, Gandhi N, Hellerstein J, Tilbury D, Jayram T, Bigus J (2002) Using control theory to achieve service level objectives in performance management. Real Time Syst 23(1–2):127–141
Park S, Humphrey MA (2011) Predictable high-performance computing using feedback control and admission control. IEEE Trans Parallel Distrib Syst 22(3):396–411. https://doi.org/10.1109/TPDS.2010.100
Paszke A, Chaurasia A, Kim S, Culurciello E (2016) Enet: a deep neural network architecture for real-time semantic segmentation. arXiv preprint arXiv:1606.02147
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis IJCV 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y
Stewart J (2018) Self-driving cars use crazy amounts of power, and it’s becoming a problem. Wired. https://www.wired.com/story/self-driving-cars-power-consumption-nvidia-chip/
Strang G (2016) Introduction to linear algebra, vol 5. Wellesley-Cambridge Press, Wellesley
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Xue J, Li J, Gong Y (2013) Restructuring of deep neural network acoustic models with singular value decomposition. In: Interspeech, pp 2365–2369
Yao F, Demers A, Shenker S (1995) A scheduling model for reduced cpu energy. In: Proceedings of the 36th annual symposium on foundations of computer science, pp 374–382
Acknowledgements
This work was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education under Grant NRF-2016R1D1A1B03934266.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kang, W., Chung, J. DeepRT: predictable deep learning inference for cyber-physical systems. Real-Time Syst 55, 106–135 (2019). https://doi.org/10.1007/s11241-018-9314-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11241-018-9314-y