Hardware Acceleration for Postdecision State Reinforcement Learning in IoT Systems,IEEE Internet of Things Journal

当前位置： X-MOL 学术 › IEEE Internet Things J. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Hardware Acceleration for Postdecision State Reinforcement Learning in IoT Systems
IEEE Internet of Things Journal ( IF 10.6 ) Pub Date : 2022-03-30 , DOI: 10.1109/jiot.2022.3163364
Jianchi Sun ₁ , Nikhilesh Sharma ₂ , Jacob Chakareski ₃ , Nicholas Mastronarde ₂ , Yingjie Lao ₁

Affiliation

Reinforcement learning (RL) is increasingly being used to optimize resource-constrained wireless Internet of Things (IoT) devices. However, existing RL algorithms that are lightweight enough to be implemented on these devices, such as

$Q$

-learning, converge too slowly to effectively adapt to the experienced information source and channel dynamics, while deep RL algorithms are too complex to be implemented on these devices. By integrating basic models of the IoT system into the learning process, the so-called postdecision state (PDS)-based RL can achieve faster convergence speeds than these alternative approaches at lower complexity than deep RL; however, its complexity may still hinder the real-time and energy-efficient operations on IoT devices. In this article, we develop efficient hardware accelerators for PDS-based RL. We first develop an arithmetic hardware acceleration architecture and then propose a stochastic computing (SC)-based reconfigurable hardware architecture. By using simple bitwise computations enabled by SC, we eliminate costly multiplications involved in PDS learning, which simultaneously reduces the hardware area and power consumption. We show that the computational efficiency can be further improved by using extremely short stochastic representations without sacrificing learning performance. We demonstrate our proposed approach on a simulated wireless IoT sensor that must transmit delay-sensitive data over a fading channel while minimizing its energy consumption. Our experimental results show that our arithmetic accelerator is

$5.3\times $

faster than

$Q$

-learning and

$2.6\times $

faster than a baseline hardware architecture, while the proposed SC-based architecture further reduces the critical path of the arithmetic accelerator by 87.9%.

中文翻译：

物联网系统中决策后状态强化学习的硬件加速

强化学习 (RL) 越来越多地用于优化资源受限的无线物联网 (IoT) 设备。然而，现有的 RL 算法足够轻量级，可以在这些设备上实现，例如

$Q$

-学习，收敛速度太慢，无法有效适应经验丰富的信息源和渠道动态，而深度 RL 算法过于复杂，无法在这些设备上实现。通过将 IoT 系统的基本模型集成到学习过程中，所谓的基于决策后状态 (PDS) 的 RL 可以在比深度 RL 更低的复杂度下实现比这些替代方法更快的收敛速度；然而，它的复杂性可能仍会阻碍物联网设备上的实时和节能操作。在本文中，我们为基于 PDS 的 RL 开发了高效的硬件加速器。我们首先开发了一种算术硬件加速架构，然后提出了一种基于随机计算（SC）的可重构硬件架构。通过使用 SC 启用的简单按位计算，我们消除了 PDS 学习中涉及的代价高昂的乘法运算，同时减少了硬件面积和功耗。我们表明，在不牺牲学习性能的情况下，可以通过使用极短的随机表示来进一步提高计算效率。我们在模拟无线物联网传感器上展示了我们提出的方法，该传感器必须通过衰落信道传输延迟敏感数据，同时最大限度地减少其能耗。我们的实验结果表明我们的算术加速器是我们在模拟无线物联网传感器上展示了我们提出的方法，该传感器必须通过衰落信道传输延迟敏感数据，同时最大限度地减少其能耗。我们的实验结果表明我们的算术加速器是我们在模拟无线物联网传感器上展示了我们提出的方法，该传感器必须通过衰落信道传输延迟敏感数据，同时最大限度地减少其能耗。我们的实验结果表明我们的算术加速器是

$5.3\乘以$

比...快

$Q$

-学习和

$2.6\乘以$

比基准硬件架构更快，而所提出的基于 SC 的架构进一步将算术加速器的关键路径减少了 87.9%。

更新日期：2022-03-30

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南