Efficiency and productivity for decision making on low-power heterogeneous CPU+GPU SoCs,The Journal of Supercomputing

当前位置： X-MOL 学术 › J. Supercomput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Efficiency and productivity for decision making on low-power heterogeneous CPU+GPU SoCs
The Journal of Supercomputing ( IF 2.5 ) Pub Date : 2020-03-23 , DOI: 10.1007/s11227-020-03257-3
Denisa-Andreea Constantinescu , Angeles Navarro , Francisco Corbera , Juan-Antonio Fernández-Madrigal , Rafael Asenjo

Markov decision processes provide a formal framework for a computer to make decisions autonomously and intelligently when the effects of its actions are not deterministic. This formalism has had tremendous success in many disciplines; however, its implementation on platforms with scarce computing capabilities and power, as it happens in robotics or autonomous driving, is still limited. To solve this computationally complex problem efficiently under these constraints, high-performance accelerator hardware and parallelized software come to the rescue. In particular, in this work, we evaluate off-line-tuned static and dynamic versus adaptive heterogeneous scheduling strategies for executing value iteration—a core procedure in many decision-making methods, such as reinforcement learning and task planning—on a low-power heterogeneous CPU+GPU SoC that only uses 10–15 W. Our experimental results show that by using CPU+GPU heterogeneous strategies, the computation time and energy required are considerably reduced. They can be up to 54% (61%) faster and 57% (65%) more energy-efficient with respect to multicore—TBB—(or GPU-only—OpenCL—) implementation. Additionally, we also explore the impact of increasing the abstraction level of the programming model to ease the programming effort. To that end, we compare the TBB+OpenCL vs. the TBB+oneAPI implementations of our heterogeneous schedulers, observing that oneAPI versions result in up to $$5\times$$ 5 × less programming effort and only incur in 3–8% of overhead if the scheduling strategy is selected carefully.

中文翻译：

低功耗异构 CPU+GPU SoC 决策的效率和生产力

马尔可夫决策过程为计算机提供了一个正式的框架，可以在其行为的影响不确定时自主和智能地做出决策。这种形式主义在许多学科中都取得了巨大的成功。然而，它在计算能力和能力稀缺的平台上的实施，就像机器人或自动驾驶一样，仍然有限。为了在这些限制条件下有效地解决这个计算复杂的问题，高性能加速器硬件和并行化软件应运而生。特别是，在这项工作中，我们评估了用于执行值迭代的离线调整静态和动态与自适应异构调度策略——许多决策方法中的核心程序，例如强化学习和任务规划——在仅使用 10-15 W 的低功耗异构 CPU+GPU SoC 上。我们的实验结果表明，通过使用 CPU+GPU 异构策略，所需的计算时间和能量显着减少。在多核——TBB——（或仅 GPU——OpenCL——）实施方面，它们的速度最多可提高 54% (61%)，能效提高 57% (65%)。此外，我们还探讨了提高编程模型的抽象级别以简化编程工作的影响。为此，我们比较了异构调度器的 TBB+OpenCL 与 TBB+oneAPI 实现，观察到 oneAPI 版本导致高达 $5\times$$ 5 × 更少的编程工作，并且仅导致 3-8%如果仔细选择调度策略，则开销。我们的实验结果表明，通过使用 CPU+GPU 异构策略，所需的计算时间和能量显着减少。在多核——TBB——（或仅 GPU——OpenCL——）实施方面，它们的速度最多可提高 54% (61%)，能效提高 57% (65%)。此外，我们还探讨了提高编程模型的抽象级别以简化编程工作的影响。为此，我们比较了异构调度器的 TBB+OpenCL 与 TBB+oneAPI 实现，观察到 oneAPI 版本导致高达 $5\times$$ 5 × 更少的编程工作，并且仅导致 3-8%如果仔细选择调度策略，则开销。我们的实验结果表明，通过使用 CPU+GPU 异构策略，所需的计算时间和能量显着减少。在多核——TBB——（或仅 GPU——OpenCL——）实施方面，它们的速度最多可提高 54% (61%)，能效提高 57% (65%)。此外，我们还探讨了提高编程模型的抽象级别以简化编程工作的影响。为此，我们比较了异构调度器的 TBB+OpenCL 与 TBB+oneAPI 实现，观察到 oneAPI 版本导致高达 $5\times$$ 5 × 更少的编程工作，并且仅导致 3-8%如果仔细选择调度策略，则开销。在多核——TBB——（或仅 GPU——OpenCL——）实施方面，它们的速度最多可提高 54% (61%)，能效提高 57% (65%)。此外，我们还探讨了提高编程模型的抽象级别以简化编程工作的影响。为此，我们比较了异构调度器的 TBB+OpenCL 与 TBB+oneAPI 实现，观察到 oneAPI 版本导致高达 $5\times$$ 5 × 更少的编程工作，并且仅导致 3-8%如果仔细选择调度策略，则开销。在多核——TBB——（或仅 GPU——OpenCL——）实施方面，它们的速度最多可提高 54% (61%)，能效提高 57% (65%)。此外，我们还探讨了提高编程模型的抽象级别以简化编程工作的影响。为此，我们比较了异构调度器的 TBB+OpenCL 与 TBB+oneAPI 实现，观察到 oneAPI 版本导致高达 $5\times$$ 5 × 更少的编程工作，并且仅导致 3-8%如果仔细选择调度策略，则开销。

更新日期：2020-03-23

点击分享查看原文

点击收藏

阅读更多本刊最新论文