当前位置: X-MOL 学术arXiv.cs.IT › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Deep reinforcement learning approach to MIMO precoding problem: Optimality and Robustness
arXiv - CS - Information Theory Pub Date : 2020-06-30 , DOI: arxiv-2006.16646
Heunchul Lee, Maksym Girnyk and Jaeseong Jeong

In this paper, we propose a deep reinforcement learning (RL)-based precoding framework that can be used to learn an optimal precoding policy for complex multiple-input multiple-output (MIMO) precoding problems. We model the precoding problem for a single-user MIMO system as an RL problem in which a learning agent sequentially selects the precoders to serve the environment of MIMO system based on contextual information about the environmental conditions, while simultaneously adapting the precoder selection policy based on the reward feedback from the environment to maximize a numerical reward signal. We develop the RL agent with two canonical deep RL (DRL) algorithms, namely deep Q-network (DQN) and deep deterministic policy gradient (DDPG). To demonstrate the optimality of the proposed DRL-based precoding framework, we explicitly consider a simple MIMO environment for which the optimal solution can be obtained analytically and show that DQN- and DDPG-based agents can learn the near-optimal policy to map the environment state of MIMO system to a precoder that maximizes the reward function, respectively, in the codebook-based and non-codebook based MIMO precoding systems. Furthermore, to investigate the robustness of DRL-based precoding framework, we examine the performance of the two DRL algorithms in a complex MIMO environment, for which the optimal solution is not known. The numerical results confirm the effectiveness of the DRL-based precoding framework and show that the proposed DRL-based framework can outperform the conventional approximation algorithm in the complex MIMO environment.

中文翻译:

MIMO 预编码问题的深度强化学习方法:最优性和鲁棒性

在本文中,我们提出了一种基于深度强化学习 (RL) 的预编码框架,可用于为复杂的多输入多输出 (MIMO) 预编码问题学习最佳预编码策略。我们将单用户 MIMO 系统的预编码问题建模为 RL 问题,其中学习代理根据环境条件的上下文信息顺序选择预​​编码器以服务于 MIMO 系统的环境,同时基于来自环境的奖励反馈以最大化数字奖励信号。我们使用两种规范的深度强化学习 (DRL) 算法开发强化学习代理,即深度 Q 网络 (DQN) 和深度确定性策略梯度 (DDPG)。为了证明所提出的基于 DRL 的预编码框架的最优性,我们明确地考虑了一个简单的 MIMO 环境,可以通过分析获得最优解,并表明基于 DQN 和 DDPG 的代理可以学习接近最优的策略,将 MIMO 系统的环境状态映射到最大化奖励函数的预编码器,分别在基于码本和非基于码本的 MIMO 预编码系统中。此外,为了研究基于 DRL 的预编码框架的鲁棒性,我们在复杂的 MIMO 环境中检查了两种 DRL 算法的性能,其中最佳解决方案未知。数值结果证实了基于 DRL 的预编码框架的有效性,并表明所提出的基于 DRL 的框架可以在复杂的 MIMO 环境中优于传统的近似算法。为了研究基于 DRL 的预编码框架的鲁棒性,我们检查了两种 DRL 算法在复杂的 MIMO 环境中的性能,其中最优解未知。数值结果证实了基于 DRL 的预编码框架的有效性,并表明所提出的基于 DRL 的框架可以在复杂的 MIMO 环境中优于传统的近似算法。为了研究基于 DRL 的预编码框架的鲁棒性,我们检查了两种 DRL 算法在复杂的 MIMO 环境中的性能,其中最优解未知。数值结果证实了基于 DRL 的预编码框架的有效性,并表明所提出的基于 DRL 的框架可以在复杂的 MIMO 环境中优于传统的近似算法。
更新日期:2020-07-01
down
wechat
bug