Anti-Intelligent UAV Jamming Strategy via Deep Q-Networks,IEEE Transactions on Communications

当前位置： X-MOL 学术 › IEEE Trans. Commun. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Anti-Intelligent UAV Jamming Strategy via Deep Q-Networks
IEEE Transactions on Communications ( IF 8.3 ) Pub Date : 2020-01-01 , DOI: 10.1109/tcomm.2019.2947918
Ning Gao , Zhijin Qin , Xiaojun Jing , Qiang Ni , Shi Jin

The downlink communications are vulnerable to intelligent unmanned aerial vehicle (UAV) jamming attack. In this paper, we propose a novel anti-intelligent UAV jamming strategy, in which the ground users can learn the optimal trajectory to elude such jamming. The problem is formulated as a stackelberg dynamic game, where the UAV jammer acts as a leader and the ground users act as followers. First, as the UAV jammer is only aware of the incomplete channel state information (CSI) of the ground users, for the first attempt, we model such leader sub-game as a partially observable Markov decision process (POMDP). Then, we obtain the optimal jamming trajectory via the developed deep recurrent Q-networks (DRQN) in the three-dimension space. Next, for the followers sub-game, we use the Markov decision process (MDP) to model it. Then we obtain the optimal communication trajectory via the developed deep Q-networks (DQN) in the two-dimension space. We prove the existence of the stackelberg equilibrium and derive the closed-form expression for the stackelberg equilibrium in a special case. Moreover, some insightful remarks are obtained and the time complexity of the proposed defense strategy is analyzed. The simulations show that the proposed defense strategy outperforms the benchmark strategies.

中文翻译：

基于深度 Q 网络的反智能无人机干扰策略

下行通信容易受到智能无人机（UAV）干扰攻击。在本文中，我们提出了一种新颖的反智能无人机干扰策略，其中地面用户可以学习最佳轨迹以避开此类干扰。该问题被表述为 stackelberg 动态博弈，其中无人机干扰机充当领导者，地面用户充当跟随者。首先，由于无人机干扰机只知道地面用户的不完整信道状态信息（CSI），因此在第一次尝试中，我们将这种领导者子博弈建模为部分可观察的马尔可夫决策过程（POMDP）。然后，我们通过在三维空间中开发的深度循环 Q 网络（DRQN）获得最佳干扰轨迹。接下来，对于追随者子博弈，我们使用马尔可夫决策过程（MDP）对其进行建模。然后我们通过二维空间中开发的深度 Q 网络（DQN）获得最佳通信轨迹。我们证明了stackelberg均衡的存在，并推导出了一个特殊情况下stackelberg均衡的闭式表达式。此外，获得了一些有见地的评论，并分析了所提出的防御策略的时间复杂度。模拟表明，所提出的防御策略优于基准策略。

更新日期：2020-01-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>