当前位置: X-MOL 学术IEEE Trans. Wirel. Commun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Joint User Scheduling, Phase Shift Control, and Beamforming Optimization in Intelligent Reflecting Surface-Aided Systems
IEEE Transactions on Wireless Communications ( IF 10.4 ) Pub Date : 2022-03-21 , DOI: 10.1109/twc.2022.3159187
Rui Huang 1 , Vincent W. S. Wong 1
Affiliation  

In this paper, we formulate a joint uplink scheduling, phase shift control, and beamforming optimization problem in intelligent reflecting surface (IRS)-aided systems. We consider maximizing the aggregate throughput and achieving the proportional fairness as objectives. We propose a deep reinforcement learning-based user scheduling, phase shift control, beamforming optimization (DUPB) algorithm to solve the joint problem. The proposed DUPB algorithm applies the neural combinatorial optimization (NCO) technique to solve the user scheduling subproblem, in which a stochastic user scheduling policy is learned by deep neural networks with attention mechanism. Curriculum learning with deep deterministic policy gradient (CL-DDPG) is used in the proposed DUPB algorithm to jointly optimize the phase shift control and beamforming vectors. The knowledge on the hidden convexity of the joint problem is exploited to facilitate the policy learning in CL-DDPG. Simulation results show that, with the maximum aggregate throughput as the objective, the proposed DUPB algorithm achieves an aggregate throughput that is higher than the alternating optimization (AO)-based algorithms. Moreover, the throughput fairness among the users is improved when proportional fairness is used as the objective. The proposed DUPB algorithm outperforms the AO-based algorithms in terms of runtime when the number of reflecting elements is large.

中文翻译:

智能反射表面辅助系统中的联合用户调度、相移控制和波束成形优化

在本文中,我们制定了智能反射面 (IRS) 辅助系统中的联合上行链路调度、相移控制和波束成形优化问题。我们考虑最大化总​​吞吐量和实现比例公平作为目标。我们提出了一种基于深度强化学习的用户调度、相移控制、波束成形优化(DUPB)算法来解决联合问题。所提出的 DUPB 算法应用神经组合优化 (NCO) 技术来解决用户调度子问题,其中通过具有注意机制的深度神经网络学习随机用户调度策略。在所提出的 DUPB 算法中使用具有深度确定性策略梯度 (CL-DDPG) 的课程学习来联合优化相移控制和波束形成向量。利用关于联合问题的隐藏凸性的知识来促进 CL-DDPG 中的策略学习。仿真结果表明,以最大聚合吞吐量为目标,所提出的DUPB算法实现了高于基于交替优化(AO)的算法的聚合吞吐量。此外,以比例公平为目标,提高了用户之间的吞吐量公平性。当反射元素的数量很大时,所提出的 DUPB 算法在运行时间方面优于基于 AO 的算法。所提出的 DUPB 算法实现了比基于交替优化 (AO) 的算法更高的总吞吐量。此外,以比例公平为目标,提高了用户之间的吞吐量公平性。当反射元素的数量很大时,所提出的 DUPB 算法在运行时间方面优于基于 AO 的算法。所提出的 DUPB 算法实现了比基于交替优化 (AO) 的算法更高的总吞吐量。此外,以比例公平为目标,提高了用户之间的吞吐量公平性。当反射元素的数量很大时,所提出的 DUPB 算法在运行时间方面优于基于 AO 的算法。
更新日期:2022-03-21
down
wechat
bug