Deep Reinforcement Learning of Collision-Free Flocking Policies for Multiple Fixed-Wing UAVs Using Local Situation Maps,IEEE Transactions on Industrial Informatics

当前位置： X-MOL 学术 › IEEE Trans. Ind. Inform. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Deep Reinforcement Learning of Collision-Free Flocking Policies for Multiple Fixed-Wing UAVs Using Local Situation Maps
IEEE Transactions on Industrial Informatics ( IF 11.7 ) Pub Date : 2021-07-01 , DOI: 10.1109/tii.2021.3094207
Chao Yan , Chang Wang , Xiaojia Xiang , Zhen Lan , Yuna Jiang

The evolution of artificial intelligence and Internet of Things (IoT) envision a highly integrated artificial IoT (AIoT) network. Flocking and cooperation with multiple unmanned aerial vehicles (UAVs) are expected to play a vital role in industrial AIoT networks. In this article, we formulate the collision-free flocking problem of fixed-wing UAVs as a Markov decision process and solve it in the deep reinforcement learning (DRL) framework. Our method can deal with a variable number of followers by encoding the dynamic environmental state into a fixed-length embedding tensor. Specifically, each follower constructs a fixed-size local situation map that describes the collision risks with other followers nearby. The local situation maps are used by a proposed DRL algorithm to learn the collision-free flocking behavior. To further improve the learning efficiency, we design a reference-point-based action selection strategy and an adaptive mechanism. We compare the proposed MA2D3QN algorithm with several benchmark DRL algorithms through numerical simulation, and we verify its advantages in learning efficiency and performance. Finally, we demonstrate the scalability and adaptability of MA2D3QN in a semiphysical simulation experiment.

中文翻译：

使用局部态势图的多固定翼无人机的无碰撞集群策略的深度强化学习

人工智能和物联网 (IoT) 的发展预示着高度集成的人工物联网 (AIoT) 网络。多无人机的集群和协作预计将在工业 AIoT 网络中发挥至关重要的作用。在本文中，我们将固定翼无人机的无碰撞集群问题表述为马尔可夫决策过程，并在深度强化学习（DRL）框架中解决它。我们的方法可以通过将动态环境状态编码为固定长度的嵌入张量来处理可变数量的追随者。具体来说，每个追随者构建一个固定大小的局部情况图，描述与附近其他追随者的碰撞风险。所提出的 DRL 算法使用局部情况图来学习无碰撞集群行为。为了进一步提高学习效率，我们设计了基于参考点的动作选择策略和自适应机制。我们通过数值模拟将所提出的MA2D3QN算法与几种基准DRL算法进行比较，验证了其在学习效率和性能方面的优势。最后，我们在半物理模拟实验中展示了 MA2D3QN 的可扩展性和适应性。

更新日期：2021-07-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11