当前位置: X-MOL 学术Math. Probl. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Improving Maneuver Strategy in Air Combat by Alternate Freeze Games with a Deep Reinforcement Learning Algorithm
Mathematical Problems in Engineering Pub Date : 2020-06-30 , DOI: 10.1155/2020/7180639
Zhuang Wang 1 , Hui Li 1, 2 , Haolin Wu 1 , Zhaoxin Wu 2
Affiliation  

In a one-on-one air combat game, the opponent’s maneuver strategy is usually not deterministic, which leads us to consider a variety of opponent’s strategies when designing our maneuver strategy. In this paper, an alternate freeze game framework based on deep reinforcement learning is proposed to generate the maneuver strategy in an air combat pursuit. The maneuver strategy agents for aircraft guidance of both sides are designed in a flight level with fixed velocity and the one-on-one air combat scenario. Middleware which connects the agents and air combat simulation software is developed to provide a reinforcement learning environment for agent training. A reward shaping approach is used, by which the training speed is increased, and the performance of the generated trajectory is improved. Agents are trained by alternate freeze games with a deep reinforcement algorithm to deal with nonstationarity. A league system is adopted to avoid the red queen effect in the game where both sides implement adaptive strategies. Simulation results show that the proposed approach can be applied to maneuver guidance in air combat, and typical angle fight tactics can be learnt by the deep reinforcement learning agents. For the training of an opponent with the adaptive strategy, the winning rate can reach more than 50%, and the losing rate can be reduced to less than 15%. In a competition with all opponents, the winning rate of the strategic agent selected by the league system is more than 44%, and the probability of not losing is about 75%.

中文翻译:

通过深度强化学习算法通过交替冻结游戏改进空战中的机动策略

在一对一的空战游戏中,对手的机动策略通常是不确定的,这使我们在设计机动策略时考虑了各种对手的策略。在本文中,提出了一种基于深度强化学习的替代性冻结游戏框架,以生成空战追踪中的机动策略。用于双方飞机制导的机动策略代理被设计在固定速度和一对一空战场景的飞行水平上。开发了将特工与空战模拟软件连接起来的中间件,以为特工训练提供强化的学习环境。使用奖励成形方法,通过该方法可以提高训练速度,并改善所生成轨迹的性能。代理通过备用冻结游戏进行训练,并使用深度强化算法来处理非平稳性。采用联赛制度以避免双方都实施自适应策略的游戏中的红皇后效应。仿真结果表明,该方法可应用于空战机动制导,深层强化学习主体可以学习典型的角斗战术。对于采用自适应策略训练对手的情况,获胜率可以达到50%以上,而失败率可以降低到15%以下。在与所有对手的比赛中,联赛系统选择的战略特工的胜率超过44%,而不输的可能性约为75%。
更新日期:2020-06-30
down
wechat
bug