当前位置: X-MOL 学术arXiv.cs.AI › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Efficient UAV Trajectory-Planning using Economic Reinforcement Learning
arXiv - CS - Artificial Intelligence Pub Date : 2021-03-03 , DOI: arxiv-2103.02676
Alvi Ataur Khalil, Alexander J Byrne, Mohammad Ashiqur Rahman, Mohammad Hossein Manshaei

Advances in unmanned aerial vehicle (UAV) design have opened up applications as varied as surveillance, firefighting, cellular networks, and delivery applications. Additionally, due to decreases in cost, systems employing fleets of UAVs have become popular. The uniqueness of UAVs in systems creates a novel set of trajectory or path planning and coordination problems. Environments include many more points of interest (POIs) than UAVs, with obstacles and no-fly zones. We introduce REPlanner, a novel multi-agent reinforcement learning algorithm inspired by economic transactions to distribute tasks between UAVs. This system revolves around an economic theory, in particular an auction mechanism where UAVs trade assigned POIs. We formulate the path planning problem as a multi-agent economic game, where agents can cooperate and compete for resources. We then translate the problem into a Partially Observable Markov decision process (POMDP), which is solved using a reinforcement learning (RL) model deployed on each agent. As the system computes task distributions via UAV cooperation, it is highly resilient to any change in the swarm size. Our proposed network and economic game architecture can effectively coordinate the swarm as an emergent phenomenon while maintaining the swarm's operation. Evaluation results prove that REPlanner efficiently outperforms conventional RL-based trajectory search.

中文翻译:

利用经济强化学习进行有效的无人机航迹计划

无人飞行器(UAV)设计的进步开辟了各种应用,例如监视,消防,蜂窝网络和交付应用。另外,由于成本的降低,采用UAV机队的系统已变得流行。无人机在系统中的独特性产生了一系列新颖的轨迹或路径规划和协调问题。与无人飞行器相比,环境具有更多的兴趣点(POI),且具有障碍物和禁飞区。我们介绍REPlanner,这是一种新颖的多主体强化学习算法,该算法受经济交易的启发而在无人机之间分配任务。该系统围绕一种经济学理论,特别是一种拍卖机制,在这种机制下,无人机交易分配的POI。我们将路径规划问题表述为多代理商经济博弈,代理商可以在该博弈中进行合作并竞争资源。然后,我们将问题转换为部分可观察的马尔可夫决策过程(POMDP),可以使用部署在每个代理上的强化学习(RL)模型来解决。当系统通过无人机协作计算任务分配时,它对群大小的任何变化都具有很高的弹性。我们提出的网络和经济游戏架构可以有效地协调群体作为一种新兴现象,同时保持群体的运作。评估结果证明,REPlanner的性能优于传统的基于RL的轨迹搜索。我们提出的网络和经济游戏架构可以有效地协调群体作为一种新兴现象,同时保持群体的运作。评估结果证明,REPlanner的性能优于传统的基于RL的轨迹搜索。我们提出的网络和经济游戏架构可以在保持群体运作的同时,有效地将群体作为一种新兴现象进行协调。评估结果证明,REPlanner的性能优于传统的基于RL的轨迹搜索。
更新日期:2021-03-05
down
wechat
bug