当前位置: X-MOL 学术IEEE Trans. Signal Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Accelerated Structure-Aware Reinforcement Learning for Delay-Sensitive Energy Harvesting Wireless Sensors
IEEE Transactions on Signal Processing ( IF 4.6 ) Pub Date : 2020-01-01 , DOI: 10.1109/tsp.2020.2973125
Nikhilesh Sharma , Nicholas Mastronarde , Jacob Chakareski

We consider a time-slotted energy-harvesting wireless sensor transmitting delay-sensitive data over a fading channel. The sensor injects captured data packets into its transmission queue and relies on ambient energy harvested from the environment to transmit them. We aim to find the optimal scheduling policy that decides how many packets to transmit in each time slot to minimize the expected queuing delay. No prior knowledge of the stochastic processes that govern the channel, captured data, and harvested energy dynamics is assumed, thereby necessitating online learning to optimize the scheduling policy. We formulate this problem as a Markov decision process (MDP) with state-space spanning the sensor's buffer, battery, and channel states, and show that its optimal value function is non-decreasing and has increasing differences, in the buffer state, and that it is non-increasing and has increasing differences, in the battery state. We exploit this value function structure knowledge to formulate a novel accelerated reinforcement learning (RL) algorithm based on value function approximation that can solve the scheduling problem online with controlled approximation error, while inducing limited computational and memory complexity. We rigorously capture the trade-off between approximation accuracy and computational/memory complexity savings associated with our approach. Our simulations demonstrate that the proposed algorithm closely approximates the optimal offline solution, which requires complete knowledge of the system state dynamics. Simultaneously, our approach achieves competitive performance relative to a state-of-the-art RL algorithm, at orders of magnitude lower complexity. Moreover, considerable performance gains are demonstrated over the widely popular Q-learning RL technique.

中文翻译:

用于延迟敏感能量收集无线传感器的加速结构感知强化学习

我们考虑一个时隙能量收集无线传感器在衰落信道上传输延迟敏感数据。传感器将捕获的数据包注入其传输队列,并依靠从环境中收集的环境能量来传输它们。我们的目标是找到最优调度策略,决定在每个时隙中传输多少数据包,以最小化预期的排队延迟。假设没有关于控制通道、捕获的数据和收集的能量动态的随机过程的先验知识,因此需要在线学习来优化调度策略。我们将此问题表述为马尔可夫决策过程 (MDP),其状态空间跨越传感器的缓冲区、电池和通道状态,并表明其最优值函数是非递减的,并且具有递增的差异,在缓冲状态下,并且在电池状态下它不增加并且具有增加的差异。我们利用这种价值函数结构知识来制定一种基于价值函数逼近的新型加速强化学习 (RL) 算法,该算法可以在线解决具有受控逼近误差的调度问题,同时引入有限的计算和内存复杂度。我们严格地捕捉了与我们的方法相关的近似精度和计算/内存复杂度节省之间的权衡。我们的模拟表明,所提出的算法非常接近最佳离线解决方案,这需要完整的系统状态动态知识。同时,我们的方法相对于最先进的 RL 算法实现了具有竞争力的性能,在数量级上降低复杂性。此外,与广泛流行的 Q-learning RL 技术相比,性能显着提高。
更新日期:2020-01-01
down
wechat
bug