SLO-Aware Inference Scheduler for Heterogeneous Processors in Edge Platforms

Authors:
Wonik Seo

KAIST, Republic of Korea

KAIST, Republic of Korea
View Profile

,
Sanghoon Cha

Samsung Advanced Institute of Technology, Republic of Korea

Samsung Advanced Institute of Technology, Republic of Korea
View Profile

,
Yeonjae Kim

KAIST, Republic of Korea

KAIST, Republic of Korea
View Profile

,
Jaehyuk Huh

KAIST, Republic of Korea

KAIST, Republic of Korea
View Profile

,
Jongse Park

KAIST, Republic of Korea

KAIST, Republic of Korea
View Profile

ACM Transactions on Architecture and Code Optimization Volume 18 Issue 4Article No.: 43pp 1–26https://doi.org/10.1145/3460352

Published:17 July 2021Publication History

ACM Transactions on Architecture and Code Optimization

Abstract

With the proliferation of applications with machine learning (ML), the importance of edge platforms has been growing to process streaming sensor, data locally without resorting to remote servers. Such edge platforms are commonly equipped with heterogeneous computing processors such as GPU, DSP, and other accelerators, but their computational and energy budget are severely constrained compared to the data center servers. However, as an edge platform must perform the processing of multiple machine learning models concurrently for multimodal sensor data, its scheduling problem poses a new challenge to map heterogeneous machine learning computation to heterogeneous computing processors. Furthermore, processing of each input must provide a certain level of bounded response latency, making the scheduling decision critical for the edge platform. This article proposes a set of new heterogeneity-aware ML inference scheduling policies for edge platforms. Based on the regularity of computation in common ML tasks, the scheduler uses the pre-profiled behavior of each ML model and routes requests to the most appropriate processors. It also aims to satisfy the service-level objective (SLO) requirement while reducing the energy consumption for each request. For such SLO supports, the challenge of ML computation on GPUs and DSP is its inflexible preemption capability. To avoid the delay caused by a long task, the proposed scheduler decomposes a large ML task to sub-tasks by its layer in the DNN model.

References

Arnold O. Allen. 2014. Probability, Statistics, and Queueing Theory. Academic Press.Google Scholar
Debraj Basu, Giovanni Moretti, Gourab Sen Gupta, and Stephen Marsland. 2013. Wireless sensor network based smart home: Sensor selection, deployment and monitoring. In 2013 IEEE Sensors Applications Symposium Proceedings. IEEE, 49–54.Google ScholarCross Ref
Ravi Bhandari, Akshay Uttama Nambi, Venkata N. Padmanabhan, and Bhaskaran Raman. 2018. DeepLane: Camera-assisted GPS for driving lane detection. In Proceedings of the 5th Conference on Systems for Built Environments. 73–82.Google ScholarDigital Library
Jiashen Cao, Ramyad Hadidi, Joy Arulraj, and Hyesoon Kim. 2019. Work-in-Progress: Video analytics from edge to server. In 2019 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ ISSS’19). IEEE, 1–2.Google ScholarDigital Library
Chenyi Chen, Ari Seff, Alain Kornhauser, and Jianxiong Xiao. 2015. Deepdriving: Learning affordance for direct perception in autonomous driving. In Proceedings of the IEEE International Conference on Computer Vision. 2722–2730.Google ScholarDigital Library
Guobin Chen, Wongun Choi, Xiang Yu, Tony Han, and Manmohan Chandraker. 2017. Learning efficient object detection models with knowledge distillation. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 742–751.Google ScholarDigital Library
Quan Chen, Hailong Yang, Minyi Guo, Ram Srivatsa Kannan, Jason Mars, and Lingjia Tang. 2017. Prophet: Precise QoS prediction on non-preemptive accelerators to improve utilization in warehouse-scale computers. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems. 17–32.Google ScholarDigital Library
Quan Chen, Hailong Yang, Jason Mars, and Lingjia Tang. 2016. Baymax: Qos awareness and increased utilization for non-preemptive accelerators in warehouse scale computers. ACM SIGPLAN Notices 51, 4 (2016), 681–696.Google ScholarDigital Library
Jonghwa Choi, Dongkyoo Shin, and Dongil Shin. 2005. Research and implementation of the context-aware middleware for controlling home appliances. IEEE Transactions on Consumer Electronics 51, 1 (2005), 301–306.Google ScholarDigital Library
Yujeong Choi, Yunseong Kim, and Minsoo Rhu. 2020. LazyBatching: An SLA-aware batching system for cloud machine learning inference. arXiv preprint arXiv:2010.13103 (2020).Google Scholar
Yujeong Choi and Minsoo Rhu. 2020. Prema: A predictive multi-task scheduling algorithm for preemptible neural processing units. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA’20). IEEE, 220–233.Google ScholarCross Ref
Google Cloud. 2019. Edge TPU. https://cloud.google.com/edge-tpu.Google Scholar
Intrinsyc Technologies Corporation. 2021. Qualcomm Snapdragon development board. https://www.intrinsyc.com.Google Scholar
NVIDIA Corporation. 2019. Jetson AGX Xavier Developer Kit. https://developer.nvidia.com/embedded/jetson-agx-xavier-developer-kit.Google Scholar
Daniel Crankshaw, Xin Wang, Guilio Zhou, Michael J. Franklin, Joseph E. Gonzalez, and Ion Stoica. 2017. Clipper: A low-latency online prediction serving system. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI’17). 613–627.Google Scholar
Prafulla N. Dawadi, Diane J. Cook, and Maureen Schmitter-Edgecombe. 2013. Automated cognitive health assessment using smart home monitoring of complex tasks. IEEE Transactions on Systems, Man, and Cybernetics: Systems 43, 6 (2013), 1302–1313.Google ScholarCross Ref
Ürün Dogan, Johann Edelbrunner, and Ioannis Iossifidis. 2011. Autonomous driving: A comparison of machine learning techniques by means of the prediction of lane change behavior. In 2011 IEEE International Conference on Robotics and Biomimetics. IEEE, 1837–1843.Google ScholarCross Ref
Samsung Electronics. 2019. Samsung NPU. https://news.samsung.com/global/samsung-electronics-introduces-a-high-speed-low-power-npu-solution-for-ai-deep-learning.Google Scholar
Arpan Gujarati, Sameh Elnikety, Yuxiong He, Kathryn S. McKinley, and Björn B. Brandenburg. 2017. Swayam: Distributed autoscaling to meet SLAs of machine learning inference services with resource efficiency. In Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference. 109–120.Google Scholar
Arpan Gujarati, Reza Karimi, Safya Alzayat, Wei Hao, Antoine Kaufmann, Ymir Vigfusson, and Jonathan Mace. 2020. Serving DNNs like clockwork: Performance predictability from the bottom up. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI’20). 443–462.Google Scholar
Udit Gupta, Carole-Jean Wu, Xiaodong Wang, Maxim Naumov, Brandon Reagen, David Brooks, Bradford Cottel, Kim Hazelwood, Mark Hempstead, Bill Jia, et al. 2020. The architectural implications of Facebook’s DNN-based personalized recommendation. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA’20). IEEE, 488–501.Google ScholarCross Ref
Myeonggyun Han, Jihoon Hyun, Seongbeom Park, Jinsu Park, and Woongki Baek. 2019. Mosaic: Heterogeneity-, communication-, and constraint-aware model slicing and execution for accurate and efficient inference. In 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT’19). IEEE, 165–177.Google ScholarCross Ref
Tianshu Hao, Yunyou Huang, Xu Wen, Wanling Gao, Fan Zhang, Chen Zheng, Lei Wang, Hainan Ye, Kai Hwang, Zujie Ren, et al. 2018. Edge AIBench: Towards comprehensive end-to-end edge computing benchmarking. In International Symposium on Benchmarking, Measuring and Optimization. Springer, 23–30.Google Scholar
Kim Hazelwood, Sarah Bird, David Brooks, Soumith Chintala, Utku Diril, Dmytro Dzhulgakov, Mohamed Fawzy, Bill Jia, Yangqing Jia, Aditya Kalro, et al. 2018. Applied machine learning at Facebook: A datacenter infrastructure perspective. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA’18). IEEE, 620–629.Google ScholarCross Ref
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision. 1026–1034.Google ScholarDigital Library
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.Google ScholarCross Ref
Ying He, F. Richard Yu, Nan Zhao, Victor C. M. Leung, and Hongxi Yin. 2017. Software-defined networks with mobile edge computing and caching for smart cities: A big data deep reinforcement learning approach. IEEE Communications Magazine 55, 12 (2017), 31–37.Google ScholarDigital Library
Seonyeong Heo, Sungjun Cho, Youngsok Kim, and Hanjun Kim. 2020. Real-time object detection system with multi-path neural networks. In 2020 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS’20). IEEE, 174–187.Google ScholarCross Ref
Loc N. Huynh, Youngki Lee, and Rajesh Krishna Balan. 2017. Deepmon: Mobile GPU-based deep learning framework for continuous vision applications. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services. 82–95.Google ScholarDigital Library
ADLINK Technology Inc.2019. Heterogeneous Computing for AI at the Edge. https://www.adlinktech.com.Google Scholar
Arash Jahangiri and Hesham A. Rakha. 2015. Applying machine learning techniques to transportation mode recognition using mobile phone sensor data. IEEE Transactions on Intelligent Transportation Systems 16, 5 (2015), 2406–2417.Google ScholarDigital Library
Vikramaditya R. Jakkula and Diane J. Cook. 2011. Detecting anomalous sensor events in smart home data for enhancing the living experience.Artificial Intelligence and Smarter Living 11, 201 (2011), 1.Google Scholar
Yiping Kang, Johann Hauswald, Cao Gao, Austin Rovinski, Trevor Mudge, Jason Mars, and Lingjia Tang. 2017. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. ACM SIGARCH Computer Architecture News 45, 1 (2017), 615–629.Google ScholarDigital Library
Ram Srivatsa Kannan, Lavanya Subramanian, Ashwin Raju, Jeongseob Ahn, Jason Mars, and Lingjia Tang. 2019. Grandslam: Guaranteeing SLAs for jobs in microservices execution frameworks. In Proceedings of the 14th EuroSys Conference 2019. 1–16.Google ScholarDigital Library
Youngsok Kim, Joonsung Kim, Dongju Chae, Daehyun Kim, and Jangwoo Kim. 2019. layer: Low latency on-device inference using cooperative single-layer acceleration and processor-friendly quantization. In Proceedings of the 14th EuroSys Conference 2019. 1–15.Google ScholarDigital Library
Young Geun Kim and Carole-Jean Wu. 2020. AutoScale: Optimizing energy efficiency of end-to-end edge inference under stochastic variance. arXiv preprint arXiv:2005.02544 (2020).Google Scholar
Jens Kober, Erhan Oztop, and Jan Peters. 2011. Reinforcement learning to adjust robot movements to new situations. Robotics: Science and Systems, MIT Press Journal 6 (2011), 33–40.Google Scholar
Nicholas D. Lane, Sourav Bhattacharya, Petko Georgiev, Claudio Forlivesi, Lei Jiao, Lorena Qendro, and Fahim Kawsar. 2016. Deepx: A software accelerator for low-power deep learning inference on mobile devices. In 2016 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN’16). IEEE, 1–12.Google ScholarCross Ref
Jaeyoung Lee, Aravind Balakrishnan, Ashish Gaurav, Krzysztof Czarnecki, and Sean Sedwards. 2019. Wisemove: A framework for safe deep reinforcement learning for autonomous driving. arXiv preprint arXiv:1902.04118 (2019).Google Scholar
Emiliano Miluzzo, Tianyu Wang, and Andrew T. Campbell. 2010. Eyephone: Activating mobile phones with your eyes. In Proceedings of the 2nd ACM SIGCOMM Workshop on Networking, Systems, and Applications on Mobile Handhelds. 15–20.Google Scholar
Mehdi Mohammadi and Ala Al-Fuqaha. 2018. Enabling cognitive smart cities using big data and machine learning: Approaches and challenges. IEEE Communications Magazine 56, 2 (2018), 94–101.Google ScholarDigital Library
Mehdi Mohammadi, Ala Al-Fuqaha, Mohsen Guizani, and Jun-Seok Oh. 2017. Semisupervised deep reinforcement learning in support of IoT and smart city services. IEEE Internet of Things Journal 5, 2 (2017), 624–635.Google ScholarCross Ref
Amir Mosavi and Annamaria R. Varkonyi-Koczy. 2017. Integration of machine learning and optimization for robot learning. In Recent Global Research and Education: Technological Challenges. Springer, 349–355.Google Scholar
Mahyar Najibi, Mohammad Rastegari, and Larry S. Davis. 2016. G-CNN: An iterative grid based object detector. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2369–2377.Google Scholar
Nick Nordlund, Heesung Kwon, Geeth Ranmal De Mel, and Leandros Tassiulas. 2018. Image classification on the edge for fast multi-camera object tracking. In 2018 IEEE Military Communications Conference (MILCOM’18). IEEE, 1–5.Google ScholarCross Ref
Gennaro Notomista and Michael Botsch. 2017. A machine learning approach for the segmentation of driving maneuvers and its application in autonomous parking. Journal of Artificial Intelligence and Soft Computing Research 7 (2017), 243—255. https://www.sciendo.com/article/10.1515/jaiscr-2017-0017.Google ScholarCross Ref
Jihong Park, Sumudu Samarakoon, Mehdi Bennis, and Mérouane Debbah. 2019. Wireless network intelligence at the edge. Proceedings of the IEEE 107, 11 (2019), 2204–2239.Google ScholarCross Ref
Naser Peiravian and Xingquan Zhu. 2013. Machine learning for Android malware detection using permission and API calls. In 2013 IEEE 25th International Conference on Tools with Artificial Intelligence. IEEE, 300–305.Google ScholarDigital Library
Ahmad EL Sallab, Mohammed Abdou, Etienne Perot, and Senthil Yogamani. 2017. Deep reinforcement learning framework for autonomous driving. Electronic Imaging 2017, 19 (2017), 70–76.Google ScholarCross Ref
Haichen Shen, Lequn Chen, Yuchen Jin, Liangyu Zhao, Bingyu Kong, Matthai Philipose, Arvind Krishnamurthy, and Ravi Sundaram. 2019. Nexus: A GPU cluster engine for accelerating DNN-based video analysis. In Proceedings of the 27th ACM Symposium on Operating Systems Principles. 322–337.Google ScholarDigital Library
David Stavens and Sebastian Thrun. 2012. A self-supervised terrain roughness estimator for off-road autonomous driving. arXiv preprint arXiv:1206.6872 (2012).Google Scholar
David Michael Stavens. 2011. Learning to Drive: Perception for Autonomous Cars. Stanford University.Google Scholar
Emil Talpes, Debjit Das Sarma, Ganesh Venkataramanan, Peter Bannon, Bill McGee, Benjamin Floering, Ankit Jalote, Christopher Hsiong, Sahil Arora, Atchyuth Gorti, et al. 2020. Compute solution for Tesla’s full self-driving computer. IEEE Micro 40, 2 (2020), 25–35.Google ScholarCross Ref
Bo Tang, Zhen Chen, Gerald Hefferman, Tao Wei, Haibo He, and Qing Yang. 2015. A hierarchical distributed fog computing architecture for big data analysis in smart cities. In Proceedings of the ASE BigData & SocialInformatics 2015. 1–6.Google Scholar
Hu Tao, Weihua Li, Xianxiang Qin, and Dan Jia. 2018. Image semantic segmentation based on convolutional neural network and conditional random field. In 2018 10th International Conference on Advanced Computational Intelligence (ICACI’18). IEEE, 568–572.Google ScholarCross Ref
Denis Tomè, Federico Monti, Luca Baroffio, Luca Bondi, Marco Tagliasacchi, and Stefano Tubaro. 2016. Deep convolutional neural networks for pedestrian detection. Signal Processing: Image Communication 47 (2016), 482–489.Google ScholarDigital Library
Blesson Varghese, Nan Wang, Sakil Barbhuiya, Peter Kilpatrick, and Dimitrios S. Nikolopoulos. 2016. Challenges and opportunities in edge computing. In 2016 IEEE International Conference on Smart Cloud (SmartCloud’16). IEEE, 20–26.Google Scholar
Siqi Wang, Gayathri Ananthanarayanan, Yifan Zeng, Neeraj Goel, Anuj Pathania, and Tulika Mitra. 2019. High-throughput CNN inference on embedded ARM Big. LITTLE multicore processors. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39, 10 (2019), 2254–2267.Google ScholarCross Ref
Shouyi Wang, Wanpracha Chaovalitwongse, and Robert Babuska. 2012. Machine learning algorithms in bipedal robot control. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42, 5 (2012), 728–743.Google ScholarDigital Library
Siqi Wang, Anuj Pathania, and Tulika Mitra. 2020. Neural network inference on mobile SOCs. IEEE Design & Test 37, 5 (2020), 50–57.Google ScholarCross Ref
Shibo Wang, Shusen Yang, and Cong Zhao. 2020. SurveilEdge: Real-time video query based on collaborative cloud-edge deep learning. In IEEE Conference on Computer Communications (IEEE INFOCOM’20). IEEE, 2519–2528.Google ScholarDigital Library
Tianyu Wang, Giuseppe Cardone, Antonio Corradi, Lorenzo Torresani, and Andrew T. Campbell. 2012. Walksafe: A pedestrian safety app for mobile phone users who walk and talk while crossing roads. In Proceedings of the 12th Workshop on Mobile Computing Systems & Applications. 1–6.Google Scholar
Carole-Jean Wu, David Brooks, Kevin Chen, Douglas Chen, Sy Choudhury, Marat Dukhan, Kim Hazelwood, Eldad Isaac, Yangqing Jia, Bill Jia, et al. 2019. Machine learning at Facebook: Understanding inference at the edge. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA’19). IEEE, 331–344.Google ScholarCross Ref
Rayoung Yang and Mark W. Newman. 2013. Learning from a learning thermostat: Lessons for intelligent systems for the home. In Proceedings of the 2013 ACM International Joint Conference on Pervasive and Ubiquitous Computing. 93–102.Google Scholar
Shuochao Yao, Yiran Zhao, Aston Zhang, Lu Su, and Tarek Abdelzaher. 2017. Deepiot: Compressing deep neural network structures for sensing systems with a compressor-critic framework. In Proceedings of the 15th ACM Conference on Embedded Network Sensor Systems. 1–14.Google ScholarDigital Library
Chengliang Zhang, Minchen Yu, Wei Wang, and Feng Yan. 2019. Mark: Exploiting cloud services for cost-effective, slo-aware machine learning inference serving. In 2019 USENIX Annual Technical Conference (USENIX ATC’19). 1049–1062.Google Scholar
Quanwen Zhu, Long Chen, Qingquan Li, Ming Li, Andreas Nüchter, and Jian Wang. 2012. 3D Lidar point cloud based intersection recognition for autonomous driving. In 2012 IEEE Intelligent Vehicles Symposium. IEEE, 456–461.Google ScholarCross Ref

Index Terms

SLO-Aware Inference Scheduler for Heterogeneous Processors in Edge Platforms

Recommendations

Heterogeneous edge computing open platforms and tools for internet of things
Abstract
With the continuous development of Internet of Things (IoT) and the overwhelming explosion of Big Data, edge computing serves as an efficient computing mode for time stringent data processing, which can bypass the constraints of ...
Highlights
- In this paper, we introduce the definition of edge computing and advantages of edge computing platform in Internet of Things platforms.
Read More
Workflow simulation and multi-threading aware task scheduling for heterogeneous computing
Abstract
Efficient application scheduling is critical for achieving high performance in heterogeneous computing systems. This problem has proved to be NP-complete even for the homogeneous case, heading research efforts in obtaining low ...
Highlights
- A new task scheduling method (TSRS) reducing the scheduling time of HEFT popular algorithm, when the computation costs are unknown.
Read More
A Parallel Tasks Scheduling Algorithm with Markov Decision Process in Edge Computing
Green, Pervasive, and Cloud Computing
Abstract
In edge computing, in order to obtain low latency and efficient service, users usually offload tasks from their devices to the nearby edge cloud for processing. How to schedule these tasks to the edge cloud efficiently and reliably is of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Architecture and Code Optimization Volume 18, Issue 4
December 2021
497 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/3476575
Editor:
David Kaeli
Northeastern University, USA
Issue’s Table of Contents
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 July 2021
- Accepted: 1 April 2021
- Revised: 1 March 2021
- Received: 1 June 2020
Published in taco Volume 18, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Edge computing
heterogeneous computing
inference
machine learning
task scheduling
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 2,279
  Total Downloads
- Downloads (Last 12 months)853
- Downloads (Last 6 weeks)87
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

SLO-Aware Inference Scheduler for Heterogeneous Processors in Edge Platforms

ACM Transactions on Architecture and Code Optimization

Abstract

References

Cited By

Index Terms

Recommendations

Heterogeneous edge computing open platforms and tools for internet of things

Workflow simulation and multi-threading aware task scheduling for heterogeneous computing

A Parallel Tasks Scheduling Algorithm with Markov Decision Process in Edge Computing