Abstract
With the proliferation of applications with machine learning (ML), the importance of edge platforms has been growing to process streaming sensor, data locally without resorting to remote servers. Such edge platforms are commonly equipped with heterogeneous computing processors such as GPU, DSP, and other accelerators, but their computational and energy budget are severely constrained compared to the data center servers. However, as an edge platform must perform the processing of multiple machine learning models concurrently for multimodal sensor data, its scheduling problem poses a new challenge to map heterogeneous machine learning computation to heterogeneous computing processors. Furthermore, processing of each input must provide a certain level of bounded response latency, making the scheduling decision critical for the edge platform. This article proposes a set of new heterogeneity-aware ML inference scheduling policies for edge platforms. Based on the regularity of computation in common ML tasks, the scheduler uses the pre-profiled behavior of each ML model and routes requests to the most appropriate processors. It also aims to satisfy the service-level objective (SLO) requirement while reducing the energy consumption for each request. For such SLO supports, the challenge of ML computation on GPUs and DSP is its inflexible preemption capability. To avoid the delay caused by a long task, the proposed scheduler decomposes a large ML task to sub-tasks by its layer in the DNN model.
- Arnold O. Allen. 2014. Probability, Statistics, and Queueing Theory. Academic Press.Google Scholar
- Debraj Basu, Giovanni Moretti, Gourab Sen Gupta, and Stephen Marsland. 2013. Wireless sensor network based smart home: Sensor selection, deployment and monitoring. In 2013 IEEE Sensors Applications Symposium Proceedings. IEEE, 49–54.Google ScholarCross Ref
- Ravi Bhandari, Akshay Uttama Nambi, Venkata N. Padmanabhan, and Bhaskaran Raman. 2018. DeepLane: Camera-assisted GPS for driving lane detection. In Proceedings of the 5th Conference on Systems for Built Environments. 73–82.Google ScholarDigital Library
- Jiashen Cao, Ramyad Hadidi, Joy Arulraj, and Hyesoon Kim. 2019. Work-in-Progress: Video analytics from edge to server. In 2019 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ ISSS’19). IEEE, 1–2.Google ScholarDigital Library
- Chenyi Chen, Ari Seff, Alain Kornhauser, and Jianxiong Xiao. 2015. Deepdriving: Learning affordance for direct perception in autonomous driving. In Proceedings of the IEEE International Conference on Computer Vision. 2722–2730.Google ScholarDigital Library
- Guobin Chen, Wongun Choi, Xiang Yu, Tony Han, and Manmohan Chandraker. 2017. Learning efficient object detection models with knowledge distillation. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 742–751.Google ScholarDigital Library
- Quan Chen, Hailong Yang, Minyi Guo, Ram Srivatsa Kannan, Jason Mars, and Lingjia Tang. 2017. Prophet: Precise QoS prediction on non-preemptive accelerators to improve utilization in warehouse-scale computers. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems. 17–32.Google ScholarDigital Library
- Quan Chen, Hailong Yang, Jason Mars, and Lingjia Tang. 2016. Baymax: Qos awareness and increased utilization for non-preemptive accelerators in warehouse scale computers. ACM SIGPLAN Notices 51, 4 (2016), 681–696.Google ScholarDigital Library
- Jonghwa Choi, Dongkyoo Shin, and Dongil Shin. 2005. Research and implementation of the context-aware middleware for controlling home appliances. IEEE Transactions on Consumer Electronics 51, 1 (2005), 301–306.Google ScholarDigital Library
- Yujeong Choi, Yunseong Kim, and Minsoo Rhu. 2020. LazyBatching: An SLA-aware batching system for cloud machine learning inference. arXiv preprint arXiv:2010.13103 (2020).Google Scholar
- Yujeong Choi and Minsoo Rhu. 2020. Prema: A predictive multi-task scheduling algorithm for preemptible neural processing units. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA’20). IEEE, 220–233.Google ScholarCross Ref
- Google Cloud. 2019. Edge TPU. https://cloud.google.com/edge-tpu.Google Scholar
- Intrinsyc Technologies Corporation. 2021. Qualcomm Snapdragon development board. https://www.intrinsyc.com.Google Scholar
- NVIDIA Corporation. 2019. Jetson AGX Xavier Developer Kit. https://developer.nvidia.com/embedded/jetson-agx-xavier-developer-kit.Google Scholar
- Daniel Crankshaw, Xin Wang, Guilio Zhou, Michael J. Franklin, Joseph E. Gonzalez, and Ion Stoica. 2017. Clipper: A low-latency online prediction serving system. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI’17). 613–627.Google Scholar
- Prafulla N. Dawadi, Diane J. Cook, and Maureen Schmitter-Edgecombe. 2013. Automated cognitive health assessment using smart home monitoring of complex tasks. IEEE Transactions on Systems, Man, and Cybernetics: Systems 43, 6 (2013), 1302–1313.Google ScholarCross Ref
- Ürün Dogan, Johann Edelbrunner, and Ioannis Iossifidis. 2011. Autonomous driving: A comparison of machine learning techniques by means of the prediction of lane change behavior. In 2011 IEEE International Conference on Robotics and Biomimetics. IEEE, 1837–1843.Google ScholarCross Ref
- Samsung Electronics. 2019. Samsung NPU. https://news.samsung.com/global/samsung-electronics-introduces-a-high-speed-low-power-npu-solution-for-ai-deep-learning.Google Scholar
- Arpan Gujarati, Sameh Elnikety, Yuxiong He, Kathryn S. McKinley, and Björn B. Brandenburg. 2017. Swayam: Distributed autoscaling to meet SLAs of machine learning inference services with resource efficiency. In Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference. 109–120.Google Scholar
- Arpan Gujarati, Reza Karimi, Safya Alzayat, Wei Hao, Antoine Kaufmann, Ymir Vigfusson, and Jonathan Mace. 2020. Serving DNNs like clockwork: Performance predictability from the bottom up. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI’20). 443–462.Google Scholar
- Udit Gupta, Carole-Jean Wu, Xiaodong Wang, Maxim Naumov, Brandon Reagen, David Brooks, Bradford Cottel, Kim Hazelwood, Mark Hempstead, Bill Jia, et al. 2020. The architectural implications of Facebook’s DNN-based personalized recommendation. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA’20). IEEE, 488–501.Google ScholarCross Ref
- Myeonggyun Han, Jihoon Hyun, Seongbeom Park, Jinsu Park, and Woongki Baek. 2019. Mosaic: Heterogeneity-, communication-, and constraint-aware model slicing and execution for accurate and efficient inference. In 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT’19). IEEE, 165–177.Google ScholarCross Ref
- Tianshu Hao, Yunyou Huang, Xu Wen, Wanling Gao, Fan Zhang, Chen Zheng, Lei Wang, Hainan Ye, Kai Hwang, Zujie Ren, et al. 2018. Edge AIBench: Towards comprehensive end-to-end edge computing benchmarking. In International Symposium on Benchmarking, Measuring and Optimization. Springer, 23–30.Google Scholar
- Kim Hazelwood, Sarah Bird, David Brooks, Soumith Chintala, Utku Diril, Dmytro Dzhulgakov, Mohamed Fawzy, Bill Jia, Yangqing Jia, Aditya Kalro, et al. 2018. Applied machine learning at Facebook: A datacenter infrastructure perspective. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA’18). IEEE, 620–629.Google ScholarCross Ref
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision. 1026–1034.Google ScholarDigital Library
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.Google ScholarCross Ref
- Ying He, F. Richard Yu, Nan Zhao, Victor C. M. Leung, and Hongxi Yin. 2017. Software-defined networks with mobile edge computing and caching for smart cities: A big data deep reinforcement learning approach. IEEE Communications Magazine 55, 12 (2017), 31–37.Google ScholarDigital Library
- Seonyeong Heo, Sungjun Cho, Youngsok Kim, and Hanjun Kim. 2020. Real-time object detection system with multi-path neural networks. In 2020 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS’20). IEEE, 174–187.Google ScholarCross Ref
- Loc N. Huynh, Youngki Lee, and Rajesh Krishna Balan. 2017. Deepmon: Mobile GPU-based deep learning framework for continuous vision applications. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services. 82–95.Google ScholarDigital Library
- ADLINK Technology Inc.2019. Heterogeneous Computing for AI at the Edge. https://www.adlinktech.com.Google Scholar
- Arash Jahangiri and Hesham A. Rakha. 2015. Applying machine learning techniques to transportation mode recognition using mobile phone sensor data. IEEE Transactions on Intelligent Transportation Systems 16, 5 (2015), 2406–2417.Google ScholarDigital Library
- Vikramaditya R. Jakkula and Diane J. Cook. 2011. Detecting anomalous sensor events in smart home data for enhancing the living experience.Artificial Intelligence and Smarter Living 11, 201 (2011), 1.Google Scholar
- Yiping Kang, Johann Hauswald, Cao Gao, Austin Rovinski, Trevor Mudge, Jason Mars, and Lingjia Tang. 2017. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. ACM SIGARCH Computer Architecture News 45, 1 (2017), 615–629.Google ScholarDigital Library
- Ram Srivatsa Kannan, Lavanya Subramanian, Ashwin Raju, Jeongseob Ahn, Jason Mars, and Lingjia Tang. 2019. Grandslam: Guaranteeing SLAs for jobs in microservices execution frameworks. In Proceedings of the 14th EuroSys Conference 2019. 1–16.Google ScholarDigital Library
- Youngsok Kim, Joonsung Kim, Dongju Chae, Daehyun Kim, and Jangwoo Kim. 2019. layer: Low latency on-device inference using cooperative single-layer acceleration and processor-friendly quantization. In Proceedings of the 14th EuroSys Conference 2019. 1–15.Google ScholarDigital Library
- Young Geun Kim and Carole-Jean Wu. 2020. AutoScale: Optimizing energy efficiency of end-to-end edge inference under stochastic variance. arXiv preprint arXiv:2005.02544 (2020).Google Scholar
- Jens Kober, Erhan Oztop, and Jan Peters. 2011. Reinforcement learning to adjust robot movements to new situations. Robotics: Science and Systems, MIT Press Journal 6 (2011), 33–40.Google Scholar
- Nicholas D. Lane, Sourav Bhattacharya, Petko Georgiev, Claudio Forlivesi, Lei Jiao, Lorena Qendro, and Fahim Kawsar. 2016. Deepx: A software accelerator for low-power deep learning inference on mobile devices. In 2016 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN’16). IEEE, 1–12.Google ScholarCross Ref
- Jaeyoung Lee, Aravind Balakrishnan, Ashish Gaurav, Krzysztof Czarnecki, and Sean Sedwards. 2019. Wisemove: A framework for safe deep reinforcement learning for autonomous driving. arXiv preprint arXiv:1902.04118 (2019).Google Scholar
- Emiliano Miluzzo, Tianyu Wang, and Andrew T. Campbell. 2010. Eyephone: Activating mobile phones with your eyes. In Proceedings of the 2nd ACM SIGCOMM Workshop on Networking, Systems, and Applications on Mobile Handhelds. 15–20.Google Scholar
- Mehdi Mohammadi and Ala Al-Fuqaha. 2018. Enabling cognitive smart cities using big data and machine learning: Approaches and challenges. IEEE Communications Magazine 56, 2 (2018), 94–101.Google ScholarDigital Library
- Mehdi Mohammadi, Ala Al-Fuqaha, Mohsen Guizani, and Jun-Seok Oh. 2017. Semisupervised deep reinforcement learning in support of IoT and smart city services. IEEE Internet of Things Journal 5, 2 (2017), 624–635.Google ScholarCross Ref
- Amir Mosavi and Annamaria R. Varkonyi-Koczy. 2017. Integration of machine learning and optimization for robot learning. In Recent Global Research and Education: Technological Challenges. Springer, 349–355.Google Scholar
- Mahyar Najibi, Mohammad Rastegari, and Larry S. Davis. 2016. G-CNN: An iterative grid based object detector. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2369–2377.Google Scholar
- Nick Nordlund, Heesung Kwon, Geeth Ranmal De Mel, and Leandros Tassiulas. 2018. Image classification on the edge for fast multi-camera object tracking. In 2018 IEEE Military Communications Conference (MILCOM’18). IEEE, 1–5.Google ScholarCross Ref
- Gennaro Notomista and Michael Botsch. 2017. A machine learning approach for the segmentation of driving maneuvers and its application in autonomous parking. Journal of Artificial Intelligence and Soft Computing Research 7 (2017), 243—255. https://www.sciendo.com/article/10.1515/jaiscr-2017-0017.Google ScholarCross Ref
- Jihong Park, Sumudu Samarakoon, Mehdi Bennis, and Mérouane Debbah. 2019. Wireless network intelligence at the edge. Proceedings of the IEEE 107, 11 (2019), 2204–2239.Google ScholarCross Ref
- Naser Peiravian and Xingquan Zhu. 2013. Machine learning for Android malware detection using permission and API calls. In 2013 IEEE 25th International Conference on Tools with Artificial Intelligence. IEEE, 300–305.Google ScholarDigital Library
- Ahmad EL Sallab, Mohammed Abdou, Etienne Perot, and Senthil Yogamani. 2017. Deep reinforcement learning framework for autonomous driving. Electronic Imaging 2017, 19 (2017), 70–76.Google ScholarCross Ref
- Haichen Shen, Lequn Chen, Yuchen Jin, Liangyu Zhao, Bingyu Kong, Matthai Philipose, Arvind Krishnamurthy, and Ravi Sundaram. 2019. Nexus: A GPU cluster engine for accelerating DNN-based video analysis. In Proceedings of the 27th ACM Symposium on Operating Systems Principles. 322–337.Google ScholarDigital Library
- David Stavens and Sebastian Thrun. 2012. A self-supervised terrain roughness estimator for off-road autonomous driving. arXiv preprint arXiv:1206.6872 (2012).Google Scholar
- David Michael Stavens. 2011. Learning to Drive: Perception for Autonomous Cars. Stanford University.Google Scholar
- Emil Talpes, Debjit Das Sarma, Ganesh Venkataramanan, Peter Bannon, Bill McGee, Benjamin Floering, Ankit Jalote, Christopher Hsiong, Sahil Arora, Atchyuth Gorti, et al. 2020. Compute solution for Tesla’s full self-driving computer. IEEE Micro 40, 2 (2020), 25–35.Google ScholarCross Ref
- Bo Tang, Zhen Chen, Gerald Hefferman, Tao Wei, Haibo He, and Qing Yang. 2015. A hierarchical distributed fog computing architecture for big data analysis in smart cities. In Proceedings of the ASE BigData & SocialInformatics 2015. 1–6.Google Scholar
- Hu Tao, Weihua Li, Xianxiang Qin, and Dan Jia. 2018. Image semantic segmentation based on convolutional neural network and conditional random field. In 2018 10th International Conference on Advanced Computational Intelligence (ICACI’18). IEEE, 568–572.Google ScholarCross Ref
- Denis Tomè, Federico Monti, Luca Baroffio, Luca Bondi, Marco Tagliasacchi, and Stefano Tubaro. 2016. Deep convolutional neural networks for pedestrian detection. Signal Processing: Image Communication 47 (2016), 482–489.Google ScholarDigital Library
- Blesson Varghese, Nan Wang, Sakil Barbhuiya, Peter Kilpatrick, and Dimitrios S. Nikolopoulos. 2016. Challenges and opportunities in edge computing. In 2016 IEEE International Conference on Smart Cloud (SmartCloud’16). IEEE, 20–26.Google Scholar
- Siqi Wang, Gayathri Ananthanarayanan, Yifan Zeng, Neeraj Goel, Anuj Pathania, and Tulika Mitra. 2019. High-throughput CNN inference on embedded ARM Big. LITTLE multicore processors. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39, 10 (2019), 2254–2267.Google ScholarCross Ref
- Shouyi Wang, Wanpracha Chaovalitwongse, and Robert Babuska. 2012. Machine learning algorithms in bipedal robot control. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42, 5 (2012), 728–743.Google ScholarDigital Library
- Siqi Wang, Anuj Pathania, and Tulika Mitra. 2020. Neural network inference on mobile SOCs. IEEE Design & Test 37, 5 (2020), 50–57.Google ScholarCross Ref
- Shibo Wang, Shusen Yang, and Cong Zhao. 2020. SurveilEdge: Real-time video query based on collaborative cloud-edge deep learning. In IEEE Conference on Computer Communications (IEEE INFOCOM’20). IEEE, 2519–2528.Google ScholarDigital Library
- Tianyu Wang, Giuseppe Cardone, Antonio Corradi, Lorenzo Torresani, and Andrew T. Campbell. 2012. Walksafe: A pedestrian safety app for mobile phone users who walk and talk while crossing roads. In Proceedings of the 12th Workshop on Mobile Computing Systems & Applications. 1–6.Google Scholar
- Carole-Jean Wu, David Brooks, Kevin Chen, Douglas Chen, Sy Choudhury, Marat Dukhan, Kim Hazelwood, Eldad Isaac, Yangqing Jia, Bill Jia, et al. 2019. Machine learning at Facebook: Understanding inference at the edge. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA’19). IEEE, 331–344.Google ScholarCross Ref
- Rayoung Yang and Mark W. Newman. 2013. Learning from a learning thermostat: Lessons for intelligent systems for the home. In Proceedings of the 2013 ACM International Joint Conference on Pervasive and Ubiquitous Computing. 93–102.Google Scholar
- Shuochao Yao, Yiran Zhao, Aston Zhang, Lu Su, and Tarek Abdelzaher. 2017. Deepiot: Compressing deep neural network structures for sensing systems with a compressor-critic framework. In Proceedings of the 15th ACM Conference on Embedded Network Sensor Systems. 1–14.Google ScholarDigital Library
- Chengliang Zhang, Minchen Yu, Wei Wang, and Feng Yan. 2019. Mark: Exploiting cloud services for cost-effective, slo-aware machine learning inference serving. In 2019 USENIX Annual Technical Conference (USENIX ATC’19). 1049–1062.Google Scholar
- Quanwen Zhu, Long Chen, Qingquan Li, Ming Li, Andreas Nüchter, and Jian Wang. 2012. 3D Lidar point cloud based intersection recognition for autonomous driving. In 2012 IEEE Intelligent Vehicles Symposium. IEEE, 456–461.Google ScholarCross Ref
Index Terms
- SLO-Aware Inference Scheduler for Heterogeneous Processors in Edge Platforms
Recommendations
Heterogeneous edge computing open platforms and tools for internet of things
AbstractWith the continuous development of Internet of Things (IoT) and the overwhelming explosion of Big Data, edge computing serves as an efficient computing mode for time stringent data processing, which can bypass the constraints of ...
Highlights- In this paper, we introduce the definition of edge computing and advantages of edge computing platform in Internet of Things platforms.
Workflow simulation and multi-threading aware task scheduling for heterogeneous computing
AbstractEfficient application scheduling is critical for achieving high performance in heterogeneous computing systems. This problem has proved to be NP-complete even for the homogeneous case, heading research efforts in obtaining low ...
Highlights- A new task scheduling method (TSRS) reducing the scheduling time of HEFT popular algorithm, when the computation costs are unknown.
A Parallel Tasks Scheduling Algorithm with Markov Decision Process in Edge Computing
Green, Pervasive, and Cloud ComputingAbstractIn edge computing, in order to obtain low latency and efficient service, users usually offload tasks from their devices to the nearby edge cloud for processing. How to schedule these tasks to the edge cloud efficiently and reliably is of ...
Comments