当前位置: X-MOL 学术arXiv.cs.RO › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
UAV Path Planning for Wireless Data Harvesting: A Deep Reinforcement Learning Approach
arXiv - CS - Robotics Pub Date : 2020-07-01 , DOI: arxiv-2007.00544
Harald Bayerlein, Mirco Theile, Marco Caccamo, David Gesbert

Autonomous deployment of unmanned aerial vehicles (UAVs) supporting next-generation communication networks requires efficient trajectory planning methods. We propose a new end-to-end reinforcement learning (RL) approach to UAV-enabled data collection from Internet of Things (IoT) devices in an urban environment. An autonomous drone is tasked with gathering data from distributed sensor nodes subject to limited flying time and obstacle avoidance. While previous approaches, learning and non-learning based, must perform expensive recomputations or relearn a behavior when important scenario parameters such as the number of sensors, sensor positions, or maximum flying time, change, we train a double deep Q-network (DDQN) with combined experience replay to learn a UAV control policy that generalizes over changing scenario parameters. By exploiting a multi-layer map of the environment fed through convolutional network layers to the agent, we show that our proposed network architecture enables the agent to make movement decisions for a variety of scenario parameters that balance the data collection goal with flight time efficiency and safety constraints. Considerable advantages in learning efficiency from using a map centered on the UAV's position over a non-centered map are also illustrated.

中文翻译:

用于无线数据收集的无人机路径规划:一种深度强化学习方法

支持下一代通信网络的无人驾驶飞行器 (UAV) 的自主部署需要有效的轨迹规划方法。我们提出了一种新的端到端强化学习 (RL) 方法,用于从城市环境中的物联网 (IoT) 设备收集支持无人机的数据。自主无人机的任务是从受限飞行时间和避障的分布式传感器节点收集数据。虽然以前的方法,基于学习和非学习,必须执行昂贵的重新计算或重新学习行为,当重要的场景参数(如传感器数量、传感器位置或最大飞行时间)发生变化时,我们训练了一个双深度 Q 网络(DDQN ) 结合经验回放来学习无人机控制策略,该策略可泛化不断变化的场景参数。通过利用通过卷积网络层馈送到代理的多层环境地图,我们表明我们提出的网络架构使代理能够针对各种场景参数做出运动决策,从而平衡数据收集目标与飞行时间效率和安全约束。还说明了使用以无人机位置为中心的地图相对于非中心地图在学习效率方面的显着优势。
更新日期:2020-10-27
down
wechat
bug