当前位置: X-MOL 学术IEEE J. Sel. Area. Comm. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Deep Reinforcement Learning Framework for Contention-Based Spectrum Sharing
IEEE Journal on Selected Areas in Communications ( IF 16.4 ) Pub Date : 2021-06-07 , DOI: 10.1109/jsac.2021.3087254
Akash Doshi , Srinivas Yerramalli , Lorenzo Ferrari , Taesang Yoo , Jeffrey G. Andrews

The increasing number of wireless devices operating in unlicensed spectrum motivates the development of intelligent adaptive approaches to spectrum access. We consider decentralized contention-based medium access for base stations (BSs) operating on unlicensed shared spectrum, where each BS autonomously decides whether or not to transmit on a given resource. The contention decision attempts to maximize not its own downlink throughput, but rather a network-wide objective. We formulate this problem as a decentralized partially observable Markov decision process with a novel reward structure that provides long term proportional fairness in terms of throughput. We then introduce a two-stage Markov decision process in each time slot that uses information from spectrum sensing and reception quality to make a medium access decision. Finally, we incorporate these features into a distributed reinforcement learning framework for contention-based spectrum access. Our formulation provides decentralized inference, online adaptability and also caters to partial observability of the environment through recurrent Q-learning. Empirically, we find its maximization of the proportional fairness metric to be competitive with a genie-aided adaptive energy detection threshold, while being robust to channel fading and small contention windows.

中文翻译:

基于竞争的频谱共享的深度强化学习框架

在未经许可的频谱中运行的无线设备数量不断增加,推动了频谱接入智能自适应方法的发展。我们考虑在未授权共享频谱上运行的基站 (BS) 的基于分散竞争的媒体接入,其中每个 BS 自主决定是否在给定资源上进行传输。争用决策不是试图最大化其自身的下行链路吞吐量,而是一个网络范围的目标。我们将此问题表述为分散的部分可观察马尔可夫决策过程,其具有新颖的奖励结构,可在吞吐量方面提供长期的比例公平性。然后,我们在每个时隙中引入两阶段马尔可夫决策过程,该过程使用来自频谱感知和接收质量的信息来做出媒体接入决策。最后,我们将这些功能整合到一个分布式强化学习框架中,用于基于竞争的频谱访问。我们的公式提供了分散推理、在线适应性,并且还通过循环 Q 学习来满足环境的部分可观察性。根据经验,我们发现其比例公平度量的最大化与精灵辅助的自适应能量检测阈值具有竞争力,同时对信道衰落和小争用窗口具有鲁棒性。
更新日期:2021-07-16
down
wechat
bug