Towards User Scheduling for 6G: A Fairness-Oriented Scheduler Using Multi-Agent Reinforcement Learning,arXiv - CS - Operating Systems

当前位置： X-MOL 学术 › arXiv.cs.OS › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Towards User Scheduling for 6G: A Fairness-Oriented Scheduler Using Multi-Agent Reinforcement Learning
arXiv - CS - Operating Systems Pub Date : 2020-12-30 , DOI: arxiv-2012.15081
Mingqi Yuan, Qi Cao, Man-on Pun, Yi Chen

User scheduling is a classical problem and key technology in wireless communication, which will still plays an important role in the prospective 6G. There are many sophisticated schedulers that are widely deployed in the base stations, such as Proportional Fairness (PF) and Round-Robin Fashion (RRF). It is known that the Opportunistic (OP) scheduling is the optimal scheduler for maximizing the average user data rate (AUDR) considering the full buffer traffic. But the optimal strategy achieving the highest fairness still remains largely unknown both in the full buffer traffic and the bursty traffic. In this work, we investigate the problem of fairness-oriented user scheduling, especially for the RBG allocation. We build a user scheduler using Multi-Agent Reinforcement Learning (MARL), which conducts distributional optimization to maximize the fairness of the communication system. The agents take the cross-layer information (e.g. RSRP, Buffer size) as state and the RBG allocation result as action, then explore the optimal solution following a well-defined reward function designed for maximizing fairness. Furthermore, we take the 5%-tile user data rate (5TUDR) as the key performance indicator (KPI) of fairness, and compare the performance of MARL scheduling with PF scheduling and RRF scheduling by conducting extensive simulations. And the simulation results show that the proposed MARL scheduling outperforms the traditional schedulers.

中文翻译：

面向6G用户计划：使用多智能体强化学习的面向公平性的计划程序

用户调度是无线通信中的经典问题和关键技术，它将在未来的6G中继续发挥重要作用。有许多复杂的调度程序已广泛部署在基站中，例如比例公平（PF）和循环时尚（RRF）。已知机会（OP）调度是考虑到整个缓冲区流量而最大化平均用户数据速率（AUDR）的最佳调度器。但是，无论是在完整缓冲区流量还是突发流量中，实现最高公平性的最佳策略仍然很大程度上未知。在这项工作中，我们研究了面向公平性的用户调度问题，尤其是对于RBG分配而言。我们使用多智能体强化学习（MARL）构建用户调度程序，进行分布优化以最大化通信系统的公平性。代理将跨层信息（例如，RSRP，缓冲区大小）作为状态，并将RBG分配结果作为操作，然后遵循定义良好的奖励函数（旨在最大化公平性）探索最佳解决方案。此外，我们将5％tile用户数据速率（5TUDR）作为公平性的关键性能指标（KPI），并通过进行广泛的仿真比较MARL调度与PF调度和RRF调度的性能。仿真结果表明，提出的MARL调度方案优于传统调度方案。然后遵循为最大化公平性而设计的定义明确的奖励函数，探索最佳解决方案。此外，我们将5％tile用户数据速率（5TUDR）作为公平性的关键性能指标（KPI），并通过进行广泛的仿真比较MARL调度与PF调度和RRF调度的性能。仿真结果表明，提出的MARL调度方案优于传统调度方案。然后遵循为最大化公平性而设计的定义明确的奖励函数，探索最佳解决方案。此外，我们将5％tile用户数据速率（5TUDR）作为公平性的关键性能指标（KPI），并通过进行广泛的仿真比较MARL调度与PF调度和RRF调度的性能。仿真结果表明，提出的MARL调度方案优于传统调度方案。

更新日期：2021-01-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文