Decentralized Multi-Agent Multi-Armed Bandit Learning With Calibration for Multi-Cell Caching,IEEE Transactions on Communications

当前位置： X-MOL 学术 › IEEE Trans. Commun. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Decentralized Multi-Agent Multi-Armed Bandit Learning With Calibration for Multi-Cell Caching
IEEE Transactions on Communications ( IF 7.2 ) Pub Date : 2020-12-15 , DOI: 10.1109/tcomm.2020.3045050
Xianzhe Xu ₁ , Meixia Tao ₁

Affiliation

This paper investigates online decentralized cache strategy design in multi-cell networks without the knowledge of user preference. The goal is to minimize the cumulative transmission delay of multi-cell networks within a finite time interval. Each small base station (SBS) aims to decide on its own cache action autonomously based on its past observations and limited information transmitted from other SBSs without a central controller. To coordinate the cache actions of different SBSs in a decentralized manner, we first propose an

$\epsilon $

-calibration learning algorithm for each SBS to predict the cache strategy of other SBSs in real-time, which can progressively improve the accuracy in cache action prediction. Then a decentralized multi-agent multi-armed bandit (MAMAB) algorithm is developed for each SBS to decide its own cache strategy based jointly on its past observations and estimated upcoming cache action of other SBSs. This decentralized MAMAB algorithm with

$\epsilon $

-calibration enables multiple SBSs to converge to a reasonable joint cache action and realize a cooperative cache decision making in a decentralized manner with limited information exchange. Simulation results demonstrate that our proposed decentralized caching algorithm outperforms other decentralized caching algorithms and can rapidly approach towards the centralized caching solutions.

中文翻译：

具有多单元缓存校准功能的分散式多智能体多武装强盗学习

本文研究了多小区网络中的在线分散式缓存策略设计，而无需了解用户的偏好。目标是在有限的时间间隔内最小化多小区网络的累积传输延迟。每个小型基站（SBS）的目标都是根据其过去的观察结果以及从没有中央控制器的其他SBS发送的有限信息，自主决定自己的缓存操作。为了以分散的方式协调不同SBS的缓存操作，我们首先提出

$ \ epsilon $

-针对每个SBS的校准学习算法，可以实时预测其他SBS的缓存策略，可以逐步提高缓存动作预测的准确性。然后，为每个SBS开发了一种分散式多主体多武装匪徒（MAMAB）算法，以根据其过去的观察结果和估计的其他SBS的即将发生的缓存动作共同决定自己的缓存策略。这种分散的MAMAB算法具有

$ \ epsilon $

-校准使多个SBS收敛到合理的联合缓存操作，并以分散的方式实现有限信息交换下的协作式缓存决策。仿真结果表明，我们提出的分散式缓存算法优于其他分散式缓存算法，并且可以快速实现集中式缓存解决方案。

更新日期：2020-12-15

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11