Collaborative Multi-Agent Multi-Armed Bandit Learning for Small-Cell Caching,IEEE Transactions on Wireless Communications

当前位置： X-MOL 学术 › IEEE Trans. Wirel. Commun. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Collaborative Multi-Agent Multi-Armed Bandit Learning for Small-Cell Caching
IEEE Transactions on Wireless Communications ( IF 10.4 ) Pub Date : 2020-04-01 , DOI: 10.1109/twc.2020.2966599
Xianzhe Xu , Meixia Tao , Cong Shen

This paper investigates learning-based caching in small-cell networks (SCNs) when user preference is unknown. The goal is to optimize the cache placement in each small base station (SBS) for minimizing the system long-term transmission delay. We model this sequential multi-agent decision making problem in a multi-agent multi-armed bandit (MAMAB) perspective. Rather than estimating user preference first and then optimizing the cache strategy, we propose several MAMAB-based algorithms to directly learn the cache strategy online in both stationary and non-stationary environment. In the stationary environment, we first propose two high-complexity agent-based collaborative MAMAB algorithms with performance guarantee. Then we propose a low-complexity distributed MAMAB which ignores the SBS coordination. To achieve a better balance between SBS coordination gain and computational complexity, we develop an edge-based collaborative MAMAB with the coordination graph edge-based reward assignment method. In the non-stationary environment, we modify the MAMAB-based algorithms proposed in the stationary environment by proposing a practical initialization method and designing new perturbed terms to adapt to the dynamic environment. Simulation results are provided to validate the effectiveness of our proposed algorithms. The effects of different parameters on caching performance are also discussed.

中文翻译：

用于小蜂窝缓存的协作多代理多臂强盗学习

本文研究了当用户偏好未知时小蜂窝网络 (SCN) 中基于学习的缓存。目标是优化每个小型基站 (SBS) 中的缓存放置，以最大限度地减少系统长期传输延迟。我们以多代理多臂老虎机 (MAMAB) 的角度对这个顺序多代理决策问题进行建模。我们不是先估计用户偏好然后优化缓存策略，而是提出了几种基于 MAMAB 的算法来直接在线学习静态和非静态环境中的缓存策略。在静止环境中，我们首先提出了两种具有性能保证的高复杂度基于代理的协作 MAMAB 算法。然后我们提出了一种忽略 SBS 协调的低复杂度分布式 MAMAB。为了在 SBS 协调增益和计算复杂度之间取得更好的平衡，我们开发了一种基于边缘的协作 MAMAB，采用基于协调图边缘的奖励分配方法。在非平稳环境中，我们通过提出实用的初始化方法和设计新的扰动项以适应动态环境，修改了在平稳环境中提出的基于 MAMAB 的算法。提供仿真结果以验证我们提出的算法的有效性。还讨论了不同参数对缓存性能的影响。我们通过提出实用的初始化方法和设计新的扰动项以适应动态环境，修改了在静态环境中提出的基于 MAMAB 的算法。提供仿真结果以验证我们提出的算法的有效性。还讨论了不同参数对缓存性能的影响。我们通过提出实用的初始化方法和设计新的扰动项以适应动态环境，修改了在静态环境中提出的基于 MAMAB 的算法。提供仿真结果以验证我们提出的算法的有效性。还讨论了不同参数对缓存性能的影响。

更新日期：2020-04-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>