当前位置: X-MOL 学术IEEE Trans. Wirel. Commun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning to Be Proactive: Self-Regulation of UAV Based Networks With UAV and User Dynamics
IEEE Transactions on Wireless Communications ( IF 10.4 ) Pub Date : 2021-02-19 , DOI: 10.1109/twc.2021.3058533
Ran Zhang , Miao Wang , Lin X. Cai , Xuemin Shen

Multi-Unmanned Aerial Vehicle (UAV) control is one of the major research interests in UAV-based networks. Yet few existing works focus on how the network should optimally react when the UAV lineup and user distribution change. In this work, proactive self-regulation (PSR) of UAV-based networks is investigated when one or more UAVs are about to quit or join the network, with considering dynamic user distribution. We target at an optimal UAV trajectory control policy which proactively relocates the UAVs whenever the UAV lineup is about to change, rather than passively dispatches the UAVs after the change. Specifically, a deep reinforcement learning (DRL)-based self-regulation approach is developed to maximize the accumulated user satisfaction (US) score for a certain period within which at least one UAV will quit or join the network. To handle the changed dimension of the state-action space before and after the lineup changes, the state transition is deliberately designed. To accommodate continuous state and action space, an actor-critic based DRL, i.e., deep deterministic policy gradient (DDPG), is applied with better convergence stability. To effectively promote learning exploration around the timing of lineup change, an asynchronous parallel computing (APC) learning structure is proposed. Referred to as PSR-APC, the developed approach is then extended to the case of dynamic user distribution by incorporating time as one of the agent states. Finally, numerical results are presented to demonstrate the convergence and superiority of PSR-APC over a passive reaction method, and its capability in jointly handling the dynamics of both UAV lineup and user distribution.

中文翻译:

学会主动:无人机网络与无人机和用户动态的自我调节

多无人机(UAV)控制是基于无人机的网络的主要研究兴趣之一。然而,现有的工作很少关注当无人机阵容和用户分布发生变化时网络应该如何做出最佳反应。在这项工作中,当一个或多个无人机即将退出或加入网络时,研究了基于无人机的网络的主动自我调节(PSR),并考虑了动态用户分布。我们的目标是最佳的无人机轨迹控制策略,该策略在无人机阵容出现时主动重新定位无人机即将改变,而不是被动调度无人机 更改后。具体而言,开发了一种基于深度强化学习 (DRL) 的自我调节方法,以在一定时期内最大化累积用户满意度 (US) 分数,在此期间至少有一架无人机将退出或加入网络。为了处理阵容变化前后状态-动作空间的变化维度,特意设计了状态转换。为了适应连续的状态和动作空间,应用了具有更好收敛稳定性的基于演员-评论家的 DRL,即深度确定性策略梯度 (DDPG)。为了有效地促进围绕阵容变化时间的学习探索,提出了一种异步并行计算(APC)学习结构。简称PSR-APC,然后通过将时间作为代理状态之一,将开发的方法扩展到动态用户分布的情况。最后,数值结果证明了 PSR-APC 相对于被动反应方法的收敛性和优越性,以及它在联合处理无人机阵容和用户分布动态方面的能力。
更新日期:2021-02-19
down
wechat
bug