A Deep Multi-Agent Reinforcement Learning Approach to Autonomous Separation Assurance,arXiv - CS - Multiagent Systems

当前位置： X-MOL 学术 › arXiv.cs.MA › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Deep Multi-Agent Reinforcement Learning Approach to Autonomous Separation Assurance
arXiv - CS - Multiagent Systems Pub Date : 2020-03-17 , DOI: arxiv-2003.08353
Marc Brittain, Xuxi Yang, Peng Wei

A novel deep multi-agent reinforcement learning framework is proposed to identify and resolve conflicts among a variable number of aircraft in a high-density, stochastic, and dynamic sector. Currently the sector capacity is constrained by human air traffic controller's cognitive limitation. We investigate the feasibility of a new concept (autonomous separation assurance) and a new approach to push the sector capacity above human cognitive limitation. We propose the concept of using distributed vehicle autonomy to ensure separation, instead of a centralized sector air traffic controller. Our proposed framework utilizes Proximal Policy Optimization (PPO) that we modify to incorporate an attention network. This allows the agents to have access to variable aircraft information in the sector in a scalable, efficient approach to achieve high traffic throughput under uncertainty. Agents are trained using a centralized learning, decentralized execution scheme where one neural network is learned and shared by all agents. The proposed framework is validated on three challenging case studies in the BlueSky air traffic control environment. Numerical results show the proposed framework significantly reduces offline training time, increases performance, and results in a more efficient policy.

中文翻译：

一种用于自主分离保证的深度多智能体强化学习方法

提出了一种新颖的深度多智能体强化学习框架，以识别和解决高密度、随机和动态部门中可变数量的飞机之间的冲突。目前，该扇区容量受到空中交通管制员认知限制的限制。我们研究了一个新概念（自主分离保证）和一种将部门容量推到人类认知限制之上的新方法的可行性。我们提出了使用分布式车辆自治来确保分离的概念，而不是使用集中的扇区空中交通管制员。我们提出的框架利用我们修改后的近端策略优化 (PPO) 以包含注意力网络。这允许代理以可扩展的方式访问该扇区中的可变飞机信息，在不确定性下实现高流量吞吐量的有效方法。代理使用集中学习、分散执行方案进行训练，其中一个神经网络由所有代理学习和共享。提议的框架在 BlueSky 空中交通管制环境中的三个具有挑战性的案例研究中得到了验证。数值结果表明，所提出的框架显着减少了离线训练时间，提高了性能，并产生了更有效的策略。

更新日期：2020-08-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文