当前位置: X-MOL 学术IEEE Trans. Commun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Function Approximation Based Reinforcement Learning for Edge Caching in Massive MIMO Networks
IEEE Transactions on Communications ( IF 8.3 ) Pub Date : 2020-12-28 , DOI: 10.1109/tcomm.2020.3047658
Navneet Garg 1 , Mathini Sellathurai 2 , Vimal Bhatia 3 , Tharmalingam Ratnarajah 1
Affiliation  

Caching popular contents in advance is an important technique to achieve low latency and reduced backhaul congestion in future wireless communication systems. In this article, a multi-cell massive multi-input-multi-output system is considered, where locations of base stations are distributed as a Poisson point process. Assuming probabilistic caching, average success probability (ASP) of the system is derived for a known content popularity (CP) profile, which in practice is time-varying and unknown in advance. Further, modeling CP variations across time as a Markov process, reinforcement $Q$ -learning is employed to learn the optimal content placement strategy to optimize the long-term-discounted ASP and average cache refresh rate. In the $Q$ -learning, the number of $Q$ -updates are large and proportional to the number of states and actions. To reduce the space complexity and update requirements towards scalable $Q$ -learning, two novel (linear and non-linear) function approximations-based $Q$ -learning approaches are proposed, where only a constant (4 and 3 respectively) number of variables need updation, irrespective of the number of states and actions. Convergence of these approximation-based approaches are analyzed. Simulations verify that these approaches converge and successfully learn the similar best content placement, which shows the successful applicability and scalability of the proposed approximated $Q$ -learning schemes.

中文翻译:

大规模MIMO网络中基于函数逼近的边缘缓存强化学习。

预先缓存流行的内容是在未来的无线通信系统中实现低延迟并减少回程拥塞的一项重要技术。在本文中,考虑了一种多小区大规模多输入多输出系统,其中基站的位置作为泊松点过程进行分布。假设概率缓存,则针对已知的内容流行度(CP)配置文件得出系统的平均成功概率(ASP),该配置文件实际上是随时间变化的,并且事先未知。此外,将随着时间的CP变化建模为马尔可夫过程,强化了 $ Q $ -learning用于学习最佳的内容放置策略,以优化长期折价的ASP和平均缓存刷新率。在里面 $ Q $ 学习次数 $ Q $ -更新很大,并且与状态和操作的数量成正比。减少空间复杂性并向可扩展性更新要求 $ Q $ 学习,基于两种新颖的(线性和非线性)函数逼近 $ Q $ 提出了一种学习方法,其中,无论状态和动作的数量如何,仅需要常量(分别为4和3)个变量的更新。分析了这些基于近似的方法的收敛性。仿真验证了这些方法的收敛性并成功地学习了相似的最佳内容放置,这表明了所提出的近似方法的成功适用性和可扩展性 $ Q $ 学习计划。
更新日期:2020-12-28
down
wechat
bug