SREC: Proactive Self-Remedy of Energy-Constrained UAV-Based Networks via Deep Reinforcement Learning,arXiv - CS - Systems and Control

当前位置： X-MOL 学术 › arXiv.cs.SY › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

SREC: Proactive Self-Remedy of Energy-Constrained UAV-Based Networks via Deep Reinforcement Learning
arXiv - CS - Systems and Control Pub Date : 2020-09-17 , DOI: arxiv-2009.08528
Ran Zhang, Miao Wang, and Lin X. Cai

Energy-aware control for multiple unmanned aerial vehicles (UAVs) is one of the major research interests in UAV based networking. Yet few existing works have focused on how the network should react around the timing when the UAV lineup is changed. In this work, we study proactive self-remedy of energy-constrained UAV networks when one or more UAVs are short of energy and about to quit for charging. We target at an energy-aware optimal UAV control policy which proactively relocates the UAVs when any UAV is about to quit the network, rather than passively dispatches the remaining UAVs after the quit. Specifically, a deep reinforcement learning (DRL)-based self remedy approach, named SREC-DRL, is proposed to maximize the accumulated user satisfaction scores for a certain period within which at least one UAV will quit the network. To handle the continuous state and action space in the problem, the state-of-the-art algorithm of the actor-critic DRL, i.e., deep deterministic policy gradient (DDPG), is applied with better convergence stability. Numerical results demonstrate that compared with the passive reaction method, the proposed SREC-DRL approach shows a $12.12\%$ gain in accumulative user satisfaction score during the remedy period.

中文翻译：

SREC：通过深度强化学习对基于能量约束的无人机网络进行主动自我修复

多无人机（UAV）的能量感知控制是基于无人机的网络的主要研究兴趣之一。然而，很少有现有工作关注网络应该如何在无人机阵容改变时做出反应。在这项工作中，我们研究了当一架或多架无人机能量不足并即将停止充电时能量受限无人机网络的主动自我补救。我们的目标是一种能量感知的最优无人机控制策略，当任何无人机即将退出网络时主动重新定位无人机，而不是在退出后被动调度剩余的无人机。具体而言，提出了一种基于深度强化学习 (DRL) 的自我补救方法，名为 SREC-DRL，以在至少一个无人机退出网络的特定时期内最大化累积的用户满意度分数。为了处理问题中的连续状态和动作空间，应用了actor-critic DRL 的最新算法，即深度确定性策略梯度（DDPG），具有更好的收敛稳定性。数值结果表明，与被动反应方法相比，所提出的 SREC-DRL 方法在补救期间累计用户满意度得分增加了 12.12 美元。

更新日期：2020-09-21

点击分享查看原文

点击收藏

阅读更多本刊最新论文