当前位置: X-MOL 学术IEEE Trans. Cognit. Commun. Netw. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Deep Reinforcement Learning for Dynamic Spectrum Sensing and Aggregation in Multi-Channel Wireless Networks
IEEE Transactions on Cognitive Communications and Networking ( IF 8.6 ) Pub Date : 2020-06-01 , DOI: 10.1109/tccn.2020.2982895
Yunzeng Li , Wensheng Zhang , Cheng-Xiang Wang , Jian Sun , Yu Liu

In this paper, the problem of dynamic spectrum sensing and aggregation is investigated in a wireless network containing ${N}$ correlated channels, where these channels are occupied or vacant following an unknown joint 2-state Markov model. At each time slot, a single cognitive user with certain bandwidth requirement either stays idle or selects a segment comprising ${C}$ ( ${C} < {N}$ ) continuous channels to sense. Then, the vacant channels in the selected segment will be aggregated for satisfying the user requirement. The user receives a binary feedback signal indicating whether the transmission is successful or not (i.e., ACK signal) after each transmission, and makes next decision based on the sensing channel states. Here, we aim to find a policy that can maximize the number of successful transmissions without interrupting the primary users (PUs). The problem can be considered as a partially observable Markov decision process (POMDP) due to without full observation of system environment. We implement a Deep Q-Network (DQN) to address the challenge of unknown system dynamics and computational expenses. The performance of DQN, Q-Learning, and the Improvident Policy with known system dynamics is evaluated through simulations. The simulation results show that DQN can achieve near-optimal performance among different system scenarios only based on partial observations and ACK signals.

中文翻译:

多通道无线网络中动态频谱感知和聚合的深度强化学习

本文研究了无线网络中的动态频谱感知和聚合问题。 ${N}$ 相关通道,其中这些通道按照未知的联合二态马尔可夫模型被占用或空闲。在每个时隙,具有一定带宽要求的单个认知用户要么保持空闲,要么选择包含 ${C}$ ( ${C} < {N}$ ) 连续通道感知。然后,将聚合所选段中的空闲信道以满足用户需求。用户在每次传输后接收指示传输是否成功的二进制反馈信号(即ACK信号),并根据感测信道状态做出下一步决定。在这里,我们的目标是找到一种策略,可以在不中断主要用户 (PU) 的情况下最大化成功传输的数量。由于没有完全观察系统环境,该问题可以被认为是部分可观察的马尔可夫决策过程(POMDP)。我们实施了一个深度 Q 网络 (DQN) 来解决未知系统动力学和计算费用的挑战。通过模拟评估 DQN、Q-Learning 和具有已知系统动力学的 Improvident Policy 的性能。
更新日期:2020-06-01
down
wechat
bug