当前位置: X-MOL 学术IEEE Trans. Veh. Technol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Three-Dimension Trajectory Design for Multi-UAV Wireless Network With Deep Reinforcement Learning
IEEE Transactions on Vehicular Technology ( IF 6.8 ) Pub Date : 2020-12-29 , DOI: 10.1109/tvt.2020.3047800
Wenqi Zhang , Qiang Wang , Xiao Liu , Yuanwei Liu , Yue Chen

The effective trajectory design of multiple unmanned aerial vehicles (UAVs) is investigated for improving the capacity of the communication system. The aim is for maximizing real-time downlink capacity under the coverage constraint by reaping the mobility benefits of UAVs. The problem of three-dimension (3D) dynamic movement of UAVs under coverage constraint is formulated as a Constrained Markov Decision Process (CMDP) problem, while a constrained Deep Q-Network (cDQN) algorithm is proposed for solving the formulated problem. In the proposed cDQN model, each UAV acts as an agent to explore and learn its 3D deploying policy. The aim of the proposed cDQN model is for obtaining the maximum capacity while attempting to guarantee that all ground terminals (GTs) are covered. In order to satisfy the coverage constraint, a primal-dual method is adopted for training primal variable and dual variable (lagrangian multiplier) in turn. Furthermore, in an effort to reduce the action space of the cDQN algorithm, prior information is utilized for eliminating the invalid actions by the action filter. Experiment results demonstrate that the cDQN algorithm is capable of converging after some training steps. Additionally, the UAVs are capable of adapting the movement of GTs under the coverage constraint according to the 3D deploying policy derived from the proposed cDQN algorithm.

中文翻译:

具有深度强化学习的多无人机无线网络的三维轨迹设计

为了提高通信系统的容量,研究了多种无人机的有效轨迹设计。目的是通过获取无人机的移动性优势,在覆盖范围内最大限度地提高实时下行链路容量。将无人机在覆盖约束下的三维(3D)动态运动问题定义为约束马尔可夫决策过程(CMDP)问题,并提出了约束深层Q网络(cDQN)算法来解决该问题。在提出的cDQN模型中,每个UAV都充当代理来探索和学习其3D部署策略。拟议的cDQN模型的目的是在尝试确保覆盖所有接地端子(GT)的同时获得最大容量。为了满足覆盖范围的约束,采用原始-对偶方法依次训练原始变量和对偶变量(拉格朗日乘数)。此外,为了减小cDQN算法的动作空间,利用先验信息来消除动作过滤器的无效动作。实验结果表明,经过某些训练,cDQN算法具有收敛性。另外,UAV能够根据从建议的cDQN算法得出的3D部署策略,在覆盖范围约束下适应GT的运动。实验结果表明,经过某些训练,cDQN算法具有收敛性。另外,UAV能够根据从建议的cDQN算法得出的3D部署策略,在覆盖范围约束下适应GT的运动。实验结果表明,经过某些训练,cDQN算法具有收敛性。另外,UAV能够根据从建议的cDQN算法得出的3D部署策略,在覆盖范围约束下适应GT的运动。
更新日期:2021-02-16
down
wechat
bug