当前位置: X-MOL 学术arXiv.cs.GT › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Cooperative Multi-Agent Reinforcement Learning Based Distributed Dynamic Spectrum Access in Cognitive Radio Networks
arXiv - CS - Computer Science and Game Theory Pub Date : 2021-06-17 , DOI: arxiv-2106.09274
Xiang Tan, Li Zhou, Haijun Wang, Yuli Sun, Haitao Zhao, Boon-Chong Seet, Jibo Wei, Victor C. M. Leung

With the development of the 5G and Internet of Things, amounts of wireless devices need to share the limited spectrum resources. Dynamic spectrum access (DSA) is a promising paradigm to remedy the problem of inefficient spectrum utilization brought upon by the historical command-and-control approach to spectrum allocation. In this paper, we investigate the distributed DSA problem for multi-user in a typical multi-channel cognitive radio network. The problem is formulated as a decentralized partially observable Markov decision process (Dec-POMDP), and we proposed a centralized off-line training and distributed on-line execution framework based on cooperative multi-agent reinforcement learning (MARL). We employ the deep recurrent Q-network (DRQN) to address the partial observability of the state for each cognitive user. The ultimate goal is to learn a cooperative strategy which maximizes the sum throughput of cognitive radio network in distributed fashion without coordination information exchange between cognitive users. Finally, we validate the proposed algorithm in various settings through extensive experiments. From the simulation results, we can observe that the proposed algorithm can converge fast and achieve almost the optimal performance.

中文翻译:

基于协同多代理强化学习的认知无线电网络中的分布式动态频谱接入

随着5G和物联网的发展,大量的无线设备需要共享有限的频谱资源。动态频谱接入 (DSA) 是一种很有前途的范例,可以解决由历史命令和控制频谱分配方法带来的频谱利用效率低下的问题。在本文中,我们研究了典型多信道认知无线电网络中多用户的分布式 DSA 问题。该问题被表述为分散的部分可观察马尔可夫决策过程(Dec-POMDP),我们提出了一种基于协作多智能体强化学习(MARL)的集中式离线训练和分布式在线执行框架。我们采用深度循环 Q 网络 (DRQN) 来解决每个认知用户状态的部分可观察性问题。最终目标是学习一种协作策略,该策略以分布式方式最大化认知无线电网络的总吞吐​​量,而无需认知用户之间的协调信息交换。最后,我们通过大量实验在各种设置中验证了所提出的算法。从仿真结果可以看出,该算法收敛速度快,性能几乎达到最优。
更新日期:2021-06-18
down
wechat
bug