当前位置: X-MOL 学术J. Aerosp. Inf. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Defender-Aware Attacking Guidance Policy for the Target–Attacker–Defender Differential Game
Journal of Aerospace Information Systems ( IF 1.3 ) Pub Date : 2021-03-01 , DOI: 10.2514/1.i010877
Jacob T. English 1 , Jay P. Wilhelm 1
Affiliation  

Deep reinforcement learning was used to train an agent within the framework of a Markov decision process (MDP) to pursue a target, while avoiding a defender, for the target–attacker–defender (TAD) differential game of pursuit and evasion. The aim of this work was to explore the games where the previous attacking guidance methods found in literature failed to capture the target. The reward function of the MDP presented by this work allowed for an attacking agent to learn a policy that expanded the number of cases where the target is captured beyond the former limit of success through the application of the twin delayed deep deterministic policy gradient algorithm. The strategy developed using artificial intelligence expands the target capture guidance approach to enable the attacker to avoid the defender in states where the two agents are in close proximity. Initial target positions within a limited set were considered with fixed values for agent velocities and attacker and defender initial positions to evaluate the attacker’s learned behavior in comparison with the optimal point capture guidance laws for target capture in the TAD game.



中文翻译:

目标-攻击者-防御者差分游戏的防御者感知​​攻击指导策略

深度强化学习被用来在马尔可夫决策过程(MDP)的框架内训练特工,以追求目标,同时避免防御者,进行目标-攻击者-防御者(TAD)的追击和逃避差分游戏。这项工作的目的是探索那些以前文献中发现的攻击指导方法未能捕获目标的游戏。这项工作提出的MDP的奖励功能使攻击者可以学习一种策略,该策略可以通过应用双延迟深度确定性策略梯度算法来扩大捕获目标的案例数量,使其超出成功的前局限。利用人工智能开发的策略扩展了目标捕获指导方法,使攻击者能够避免在两个特工非常接近的州中防御者。在有限集合内的初始目标位置被认为具有固定的特工速度值,并且攻击者和防御者的初始位置与TAD游戏中目标捕获的最佳点捕获指导律相比,评估了攻击者的学习行为。

更新日期:2021-03-02
down
wechat
bug