Task Mapping and Scheduling for OpenVX Applications on Heterogeneous Multi/Many-Core Architectures,IEEE Transactions on Computers

当前位置： X-MOL 学术 › IEEE Trans. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Task Mapping and Scheduling for OpenVX Applications on Heterogeneous Multi/Many-Core Architectures
IEEE Transactions on Computers ( IF 3.7 ) Pub Date : 2021-02-16 , DOI: 10.1109/tc.2021.3059528
Francesco Lumpp , Stefano Aldegheri , Hiren D. Patel , Nicola Bombieri

Computer vision applications have stringent performance constraints that must be satisfied when they are run at the edge on programmable low-power embedded devices. OpenVX has emerged as the de-facto reference standard to develop such applications. OpenVX uses a primitive-based programming model that results in a directed-acyclic graph (DAG) representation of the application, which can then be used for automatic system-level optimizations and synthesis to heterogeneous multi- and many-core platforms. Although OpenVX has been standardized, its state-of-the-art algorithm for task mapping and scheduling does not deliver the performance necessary for such applications to be deployed on heterogeneous multi-/many-core platforms. This article focuses on addressing this challenge with three main contributions: First, we implemented a static task scheduling and mapping approach for OpenVX using the heterogeneous earliest finish time (HEFT) heuristic. We show that HEFT allows us to improve the system performance up to 70 percent on one of the most widespread smart systems for applying computer vision and intelligent video analytics in general at the edge (i.e., NVIDIA VisionWorks on NVIDIA Jetson TX2). Second, we show that HEFT, in the context of a vision application for edge computing where some primitives may have multiple implementations (e.g., for CPU and GPU), can lead to load imbalance amongst heterogeneous computing elements (CEs), thus suffering from degraded performance. Third, we present an algorithm called exclusive earliest finish time (XEFT) that introduces the notion of exclusive overlap between single implementation primitives to improve the load balancing. We show that XEFT can further improve the system performance up to 33 percent over HEFT, and 82 percent over the native OpenVX scheduler. We present the results on a large set of benchmarks, including a real-world localization and mapping application (ORB-SLAM) combined with an NVIDIA inference application based on convolutional neural networks (CNNs) for object detection.

中文翻译：

异构多核/多核架构上 OpenVX 应用程序的任务映射和调度

计算机视觉应用程序具有严格的性能限制，当它们在可编程低功耗嵌入式设备的边缘运行时必须满足这些限制。OpenVX 已成为开发此类应用程序的事实上的参考标准。OpenVX 使用基于原语的编程模型，产生应用程序的有向无环图 (DAG) 表示，然后可用于自动系统级优化和合成到异构多核和众核平台。尽管 OpenVX 已经标准化，但其最先进的任务映射和调度算法并不能提供将此类应用程序部署在异构多核/多核平台上所需的性能。本文主要通过三个主要贡献来解决这一挑战：首先，我们使用异构最早完成时间 (HEFT) 启发式为 OpenVX 实现了静态任务调度和映射方法。我们表明，HEFT 使我们能够在最广泛的智能系统之一（即，NVIDIA Jetson TX2 上的 NVIDIA VisionWorks）上将系统性能提高多达 70%，用于在边缘应用计算机视觉和智能视频分析。其次，我们展示了 HEFT，在边缘计算的视觉应用程序的上下文中，其中一些基元可能有多种实现（例如，对于 CPU 和 GPU），可能导致异构计算元素 (CE) 之间的负载不平衡，从而遭受降级表现。第三，我们提出了一种称为独占最早完成时间 (XEFT) 的算法，该算法引入了单个实现原语之间独占重叠的概念，以改善负载平衡。我们表明，XEFT 可以将系统性能进一步提高，比 HEFT 提高 33%，比原生 OpenVX 调度程序提高 82%。我们在大量基准测试中展示了结果，包括真实世界的定位和映射应用程序 (ORB-SLAM) 与基于卷积神经网络 (CNN) 的 NVIDIA 推理应用程序相结合，用于对象检测。

更新日期：2021-02-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>