skip to main content
research-article
Open Access

SLO-Aware Inference Scheduler for Heterogeneous Processors in Edge Platforms

Published:17 July 2021Publication History
Skip Abstract Section

Abstract

With the proliferation of applications with machine learning (ML), the importance of edge platforms has been growing to process streaming sensor, data locally without resorting to remote servers. Such edge platforms are commonly equipped with heterogeneous computing processors such as GPU, DSP, and other accelerators, but their computational and energy budget are severely constrained compared to the data center servers. However, as an edge platform must perform the processing of multiple machine learning models concurrently for multimodal sensor data, its scheduling problem poses a new challenge to map heterogeneous machine learning computation to heterogeneous computing processors. Furthermore, processing of each input must provide a certain level of bounded response latency, making the scheduling decision critical for the edge platform. This article proposes a set of new heterogeneity-aware ML inference scheduling policies for edge platforms. Based on the regularity of computation in common ML tasks, the scheduler uses the pre-profiled behavior of each ML model and routes requests to the most appropriate processors. It also aims to satisfy the service-level objective (SLO) requirement while reducing the energy consumption for each request. For such SLO supports, the challenge of ML computation on GPUs and DSP is its inflexible preemption capability. To avoid the delay caused by a long task, the proposed scheduler decomposes a large ML task to sub-tasks by its layer in the DNN model.

References

  1. Arnold O. Allen. 2014. Probability, Statistics, and Queueing Theory. Academic Press.Google ScholarGoogle Scholar
  2. Debraj Basu, Giovanni Moretti, Gourab Sen Gupta, and Stephen Marsland. 2013. Wireless sensor network based smart home: Sensor selection, deployment and monitoring. In 2013 IEEE Sensors Applications Symposium Proceedings. IEEE, 49–54.Google ScholarGoogle ScholarCross RefCross Ref
  3. Ravi Bhandari, Akshay Uttama Nambi, Venkata N. Padmanabhan, and Bhaskaran Raman. 2018. DeepLane: Camera-assisted GPS for driving lane detection. In Proceedings of the 5th Conference on Systems for Built Environments. 73–82.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Jiashen Cao, Ramyad Hadidi, Joy Arulraj, and Hyesoon Kim. 2019. Work-in-Progress: Video analytics from edge to server. In 2019 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ ISSS’19). IEEE, 1–2.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Chenyi Chen, Ari Seff, Alain Kornhauser, and Jianxiong Xiao. 2015. Deepdriving: Learning affordance for direct perception in autonomous driving. In Proceedings of the IEEE International Conference on Computer Vision. 2722–2730.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Guobin Chen, Wongun Choi, Xiang Yu, Tony Han, and Manmohan Chandraker. 2017. Learning efficient object detection models with knowledge distillation. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 742–751.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Quan Chen, Hailong Yang, Minyi Guo, Ram Srivatsa Kannan, Jason Mars, and Lingjia Tang. 2017. Prophet: Precise QoS prediction on non-preemptive accelerators to improve utilization in warehouse-scale computers. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems. 17–32.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Quan Chen, Hailong Yang, Jason Mars, and Lingjia Tang. 2016. Baymax: Qos awareness and increased utilization for non-preemptive accelerators in warehouse scale computers. ACM SIGPLAN Notices 51, 4 (2016), 681–696.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Jonghwa Choi, Dongkyoo Shin, and Dongil Shin. 2005. Research and implementation of the context-aware middleware for controlling home appliances. IEEE Transactions on Consumer Electronics 51, 1 (2005), 301–306.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Yujeong Choi, Yunseong Kim, and Minsoo Rhu. 2020. LazyBatching: An SLA-aware batching system for cloud machine learning inference. arXiv preprint arXiv:2010.13103 (2020).Google ScholarGoogle Scholar
  11. Yujeong Choi and Minsoo Rhu. 2020. Prema: A predictive multi-task scheduling algorithm for preemptible neural processing units. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA’20). IEEE, 220–233.Google ScholarGoogle ScholarCross RefCross Ref
  12. Google Cloud. 2019. Edge TPU. https://cloud.google.com/edge-tpu.Google ScholarGoogle Scholar
  13. Intrinsyc Technologies Corporation. 2021. Qualcomm Snapdragon development board. https://www.intrinsyc.com.Google ScholarGoogle Scholar
  14. NVIDIA Corporation. 2019. Jetson AGX Xavier Developer Kit. https://developer.nvidia.com/embedded/jetson-agx-xavier-developer-kit.Google ScholarGoogle Scholar
  15. Daniel Crankshaw, Xin Wang, Guilio Zhou, Michael J. Franklin, Joseph E. Gonzalez, and Ion Stoica. 2017. Clipper: A low-latency online prediction serving system. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI’17). 613–627.Google ScholarGoogle Scholar
  16. Prafulla N. Dawadi, Diane J. Cook, and Maureen Schmitter-Edgecombe. 2013. Automated cognitive health assessment using smart home monitoring of complex tasks. IEEE Transactions on Systems, Man, and Cybernetics: Systems 43, 6 (2013), 1302–1313.Google ScholarGoogle ScholarCross RefCross Ref
  17. Ürün Dogan, Johann Edelbrunner, and Ioannis Iossifidis. 2011. Autonomous driving: A comparison of machine learning techniques by means of the prediction of lane change behavior. In 2011 IEEE International Conference on Robotics and Biomimetics. IEEE, 1837–1843.Google ScholarGoogle ScholarCross RefCross Ref
  18. Samsung Electronics. 2019. Samsung NPU. https://news.samsung.com/global/samsung-electronics-introduces-a-high-speed-low-power-npu-solution-for-ai-deep-learning.Google ScholarGoogle Scholar
  19. Arpan Gujarati, Sameh Elnikety, Yuxiong He, Kathryn S. McKinley, and Björn B. Brandenburg. 2017. Swayam: Distributed autoscaling to meet SLAs of machine learning inference services with resource efficiency. In Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference. 109–120.Google ScholarGoogle Scholar
  20. Arpan Gujarati, Reza Karimi, Safya Alzayat, Wei Hao, Antoine Kaufmann, Ymir Vigfusson, and Jonathan Mace. 2020. Serving DNNs like clockwork: Performance predictability from the bottom up. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI’20). 443–462.Google ScholarGoogle Scholar
  21. Udit Gupta, Carole-Jean Wu, Xiaodong Wang, Maxim Naumov, Brandon Reagen, David Brooks, Bradford Cottel, Kim Hazelwood, Mark Hempstead, Bill Jia, et al. 2020. The architectural implications of Facebook’s DNN-based personalized recommendation. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA’20). IEEE, 488–501.Google ScholarGoogle ScholarCross RefCross Ref
  22. Myeonggyun Han, Jihoon Hyun, Seongbeom Park, Jinsu Park, and Woongki Baek. 2019. Mosaic: Heterogeneity-, communication-, and constraint-aware model slicing and execution for accurate and efficient inference. In 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT’19). IEEE, 165–177.Google ScholarGoogle ScholarCross RefCross Ref
  23. Tianshu Hao, Yunyou Huang, Xu Wen, Wanling Gao, Fan Zhang, Chen Zheng, Lei Wang, Hainan Ye, Kai Hwang, Zujie Ren, et al. 2018. Edge AIBench: Towards comprehensive end-to-end edge computing benchmarking. In International Symposium on Benchmarking, Measuring and Optimization. Springer, 23–30.Google ScholarGoogle Scholar
  24. Kim Hazelwood, Sarah Bird, David Brooks, Soumith Chintala, Utku Diril, Dmytro Dzhulgakov, Mohamed Fawzy, Bill Jia, Yangqing Jia, Aditya Kalro, et al. 2018. Applied machine learning at Facebook: A datacenter infrastructure perspective. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA’18). IEEE, 620–629.Google ScholarGoogle ScholarCross RefCross Ref
  25. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision. 1026–1034.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.Google ScholarGoogle ScholarCross RefCross Ref
  27. Ying He, F. Richard Yu, Nan Zhao, Victor C. M. Leung, and Hongxi Yin. 2017. Software-defined networks with mobile edge computing and caching for smart cities: A big data deep reinforcement learning approach. IEEE Communications Magazine 55, 12 (2017), 31–37.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Seonyeong Heo, Sungjun Cho, Youngsok Kim, and Hanjun Kim. 2020. Real-time object detection system with multi-path neural networks. In 2020 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS’20). IEEE, 174–187.Google ScholarGoogle ScholarCross RefCross Ref
  29. Loc N. Huynh, Youngki Lee, and Rajesh Krishna Balan. 2017. Deepmon: Mobile GPU-based deep learning framework for continuous vision applications. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services. 82–95.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. ADLINK Technology Inc.2019. Heterogeneous Computing for AI at the Edge. https://www.adlinktech.com.Google ScholarGoogle Scholar
  31. Arash Jahangiri and Hesham A. Rakha. 2015. Applying machine learning techniques to transportation mode recognition using mobile phone sensor data. IEEE Transactions on Intelligent Transportation Systems 16, 5 (2015), 2406–2417.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Vikramaditya R. Jakkula and Diane J. Cook. 2011. Detecting anomalous sensor events in smart home data for enhancing the living experience.Artificial Intelligence and Smarter Living 11, 201 (2011), 1.Google ScholarGoogle Scholar
  33. Yiping Kang, Johann Hauswald, Cao Gao, Austin Rovinski, Trevor Mudge, Jason Mars, and Lingjia Tang. 2017. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. ACM SIGARCH Computer Architecture News 45, 1 (2017), 615–629.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Ram Srivatsa Kannan, Lavanya Subramanian, Ashwin Raju, Jeongseob Ahn, Jason Mars, and Lingjia Tang. 2019. Grandslam: Guaranteeing SLAs for jobs in microservices execution frameworks. In Proceedings of the 14th EuroSys Conference 2019. 1–16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Youngsok Kim, Joonsung Kim, Dongju Chae, Daehyun Kim, and Jangwoo Kim. 2019. layer: Low latency on-device inference using cooperative single-layer acceleration and processor-friendly quantization. In Proceedings of the 14th EuroSys Conference 2019. 1–15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Young Geun Kim and Carole-Jean Wu. 2020. AutoScale: Optimizing energy efficiency of end-to-end edge inference under stochastic variance. arXiv preprint arXiv:2005.02544 (2020).Google ScholarGoogle Scholar
  37. Jens Kober, Erhan Oztop, and Jan Peters. 2011. Reinforcement learning to adjust robot movements to new situations. Robotics: Science and Systems, MIT Press Journal 6 (2011), 33–40.Google ScholarGoogle Scholar
  38. Nicholas D. Lane, Sourav Bhattacharya, Petko Georgiev, Claudio Forlivesi, Lei Jiao, Lorena Qendro, and Fahim Kawsar. 2016. Deepx: A software accelerator for low-power deep learning inference on mobile devices. In 2016 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN’16). IEEE, 1–12.Google ScholarGoogle ScholarCross RefCross Ref
  39. Jaeyoung Lee, Aravind Balakrishnan, Ashish Gaurav, Krzysztof Czarnecki, and Sean Sedwards. 2019. Wisemove: A framework for safe deep reinforcement learning for autonomous driving. arXiv preprint arXiv:1902.04118 (2019).Google ScholarGoogle Scholar
  40. Emiliano Miluzzo, Tianyu Wang, and Andrew T. Campbell. 2010. Eyephone: Activating mobile phones with your eyes. In Proceedings of the 2nd ACM SIGCOMM Workshop on Networking, Systems, and Applications on Mobile Handhelds. 15–20.Google ScholarGoogle Scholar
  41. Mehdi Mohammadi and Ala Al-Fuqaha. 2018. Enabling cognitive smart cities using big data and machine learning: Approaches and challenges. IEEE Communications Magazine 56, 2 (2018), 94–101.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Mehdi Mohammadi, Ala Al-Fuqaha, Mohsen Guizani, and Jun-Seok Oh. 2017. Semisupervised deep reinforcement learning in support of IoT and smart city services. IEEE Internet of Things Journal 5, 2 (2017), 624–635.Google ScholarGoogle ScholarCross RefCross Ref
  43. Amir Mosavi and Annamaria R. Varkonyi-Koczy. 2017. Integration of machine learning and optimization for robot learning. In Recent Global Research and Education: Technological Challenges. Springer, 349–355.Google ScholarGoogle Scholar
  44. Mahyar Najibi, Mohammad Rastegari, and Larry S. Davis. 2016. G-CNN: An iterative grid based object detector. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2369–2377.Google ScholarGoogle Scholar
  45. Nick Nordlund, Heesung Kwon, Geeth Ranmal De Mel, and Leandros Tassiulas. 2018. Image classification on the edge for fast multi-camera object tracking. In 2018 IEEE Military Communications Conference (MILCOM’18). IEEE, 1–5.Google ScholarGoogle ScholarCross RefCross Ref
  46. Gennaro Notomista and Michael Botsch. 2017. A machine learning approach for the segmentation of driving maneuvers and its application in autonomous parking. Journal of Artificial Intelligence and Soft Computing Research 7 (2017), 243—255. https://www.sciendo.com/article/10.1515/jaiscr-2017-0017.Google ScholarGoogle ScholarCross RefCross Ref
  47. Jihong Park, Sumudu Samarakoon, Mehdi Bennis, and Mérouane Debbah. 2019. Wireless network intelligence at the edge. Proceedings of the IEEE 107, 11 (2019), 2204–2239.Google ScholarGoogle ScholarCross RefCross Ref
  48. Naser Peiravian and Xingquan Zhu. 2013. Machine learning for Android malware detection using permission and API calls. In 2013 IEEE 25th International Conference on Tools with Artificial Intelligence. IEEE, 300–305.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Ahmad EL Sallab, Mohammed Abdou, Etienne Perot, and Senthil Yogamani. 2017. Deep reinforcement learning framework for autonomous driving. Electronic Imaging 2017, 19 (2017), 70–76.Google ScholarGoogle ScholarCross RefCross Ref
  50. Haichen Shen, Lequn Chen, Yuchen Jin, Liangyu Zhao, Bingyu Kong, Matthai Philipose, Arvind Krishnamurthy, and Ravi Sundaram. 2019. Nexus: A GPU cluster engine for accelerating DNN-based video analysis. In Proceedings of the 27th ACM Symposium on Operating Systems Principles. 322–337.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. David Stavens and Sebastian Thrun. 2012. A self-supervised terrain roughness estimator for off-road autonomous driving. arXiv preprint arXiv:1206.6872 (2012).Google ScholarGoogle Scholar
  52. David Michael Stavens. 2011. Learning to Drive: Perception for Autonomous Cars. Stanford University.Google ScholarGoogle Scholar
  53. Emil Talpes, Debjit Das Sarma, Ganesh Venkataramanan, Peter Bannon, Bill McGee, Benjamin Floering, Ankit Jalote, Christopher Hsiong, Sahil Arora, Atchyuth Gorti, et al. 2020. Compute solution for Tesla’s full self-driving computer. IEEE Micro 40, 2 (2020), 25–35.Google ScholarGoogle ScholarCross RefCross Ref
  54. Bo Tang, Zhen Chen, Gerald Hefferman, Tao Wei, Haibo He, and Qing Yang. 2015. A hierarchical distributed fog computing architecture for big data analysis in smart cities. In Proceedings of the ASE BigData & SocialInformatics 2015. 1–6.Google ScholarGoogle Scholar
  55. Hu Tao, Weihua Li, Xianxiang Qin, and Dan Jia. 2018. Image semantic segmentation based on convolutional neural network and conditional random field. In 2018 10th International Conference on Advanced Computational Intelligence (ICACI’18). IEEE, 568–572.Google ScholarGoogle ScholarCross RefCross Ref
  56. Denis Tomè, Federico Monti, Luca Baroffio, Luca Bondi, Marco Tagliasacchi, and Stefano Tubaro. 2016. Deep convolutional neural networks for pedestrian detection. Signal Processing: Image Communication 47 (2016), 482–489.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Blesson Varghese, Nan Wang, Sakil Barbhuiya, Peter Kilpatrick, and Dimitrios S. Nikolopoulos. 2016. Challenges and opportunities in edge computing. In 2016 IEEE International Conference on Smart Cloud (SmartCloud’16). IEEE, 20–26.Google ScholarGoogle Scholar
  58. Siqi Wang, Gayathri Ananthanarayanan, Yifan Zeng, Neeraj Goel, Anuj Pathania, and Tulika Mitra. 2019. High-throughput CNN inference on embedded ARM Big. LITTLE multicore processors. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39, 10 (2019), 2254–2267.Google ScholarGoogle ScholarCross RefCross Ref
  59. Shouyi Wang, Wanpracha Chaovalitwongse, and Robert Babuska. 2012. Machine learning algorithms in bipedal robot control. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42, 5 (2012), 728–743.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Siqi Wang, Anuj Pathania, and Tulika Mitra. 2020. Neural network inference on mobile SOCs. IEEE Design & Test 37, 5 (2020), 50–57.Google ScholarGoogle ScholarCross RefCross Ref
  61. Shibo Wang, Shusen Yang, and Cong Zhao. 2020. SurveilEdge: Real-time video query based on collaborative cloud-edge deep learning. In IEEE Conference on Computer Communications (IEEE INFOCOM’20). IEEE, 2519–2528.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Tianyu Wang, Giuseppe Cardone, Antonio Corradi, Lorenzo Torresani, and Andrew T. Campbell. 2012. Walksafe: A pedestrian safety app for mobile phone users who walk and talk while crossing roads. In Proceedings of the 12th Workshop on Mobile Computing Systems & Applications. 1–6.Google ScholarGoogle Scholar
  63. Carole-Jean Wu, David Brooks, Kevin Chen, Douglas Chen, Sy Choudhury, Marat Dukhan, Kim Hazelwood, Eldad Isaac, Yangqing Jia, Bill Jia, et al. 2019. Machine learning at Facebook: Understanding inference at the edge. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA’19). IEEE, 331–344.Google ScholarGoogle ScholarCross RefCross Ref
  64. Rayoung Yang and Mark W. Newman. 2013. Learning from a learning thermostat: Lessons for intelligent systems for the home. In Proceedings of the 2013 ACM International Joint Conference on Pervasive and Ubiquitous Computing. 93–102.Google ScholarGoogle Scholar
  65. Shuochao Yao, Yiran Zhao, Aston Zhang, Lu Su, and Tarek Abdelzaher. 2017. Deepiot: Compressing deep neural network structures for sensing systems with a compressor-critic framework. In Proceedings of the 15th ACM Conference on Embedded Network Sensor Systems. 1–14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Chengliang Zhang, Minchen Yu, Wei Wang, and Feng Yan. 2019. Mark: Exploiting cloud services for cost-effective, slo-aware machine learning inference serving. In 2019 USENIX Annual Technical Conference (USENIX ATC’19). 1049–1062.Google ScholarGoogle Scholar
  67. Quanwen Zhu, Long Chen, Qingquan Li, Ming Li, Andreas Nüchter, and Jian Wang. 2012. 3D Lidar point cloud based intersection recognition for autonomous driving. In 2012 IEEE Intelligent Vehicles Symposium. IEEE, 456–461.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. SLO-Aware Inference Scheduler for Heterogeneous Processors in Edge Platforms

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Architecture and Code Optimization
          ACM Transactions on Architecture and Code Optimization  Volume 18, Issue 4
          December 2021
          497 pages
          ISSN:1544-3566
          EISSN:1544-3973
          DOI:10.1145/3476575
          Issue’s Table of Contents

          Copyright © 2021 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 17 July 2021
          • Accepted: 1 April 2021
          • Revised: 1 March 2021
          • Received: 1 June 2020
          Published in taco Volume 18, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format