Reinforcement Learning Based Optimal Computing and Caching in Mobile Edge Network,IEEE Journal on Selected Areas in Communications

当前位置： X-MOL 学术 › IEEE J. Sel. Area. Comm. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Reinforcement Learning Based Optimal Computing and Caching in Mobile Edge Network
IEEE Journal on Selected Areas in Communications ( IF 13.8 ) Pub Date : 2020-10-01 , DOI: 10.1109/jsac.2020.3000396
Yichen Qian , Rui Wang , Jun Wu , Bin Tan , Haoqi Ren

Joint pushing and caching are commonly considered an effective way to adapt to tidal effects in networks. However, the problem of how to precisely predict users’ future requests and push or cache the proper content remains to be solved. In this paper, we investigate a joint pushing and caching policy in a general mobile edge computing (MEC) network with multiuser and multicast data. We formulate the joint pushing and caching problem as an infinite-horizon average-cost Markov decision process (MDP). Our aim is not only to maximize bandwidth utilization but also to decrease the total quantity of data transmitted. Then, a joint pushing and caching policy based on hierarchical reinforcement learning (HRL) is proposed, which considers both long-term file popularity and short-term temporal correlations of user requests to fully utilize bandwidth. To address the curse of dimensionality, we apply a divide-and-conquer strategy to decompose the joint base station and user cache optimization problem into two subproblems: the user cache optimization subproblem and the base station cache optimization subproblem. We apply value function approximation Q-learning and a deep Q-network (DQN) to solve these two subproblems. Furthermore, we provide some insights into the design of deep reinforcement learning in network caching. The simulation results show that the proposed policy can learn content popularity very well and predict users’ future demands precisely. Our approach outperforms existing schemes on various parameters including the base station cache size, the number of users and the total number of files in multiple scenarios.

中文翻译：

基于强化学习的移动边缘网络优化计算和缓存

联合推送和缓存通常被认为是适应网络潮汐效应的有效方法。然而，如何精准预测用户未来的请求，推送或缓存合适的内容，还有待解决。在本文中，我们研究了具有多用户和多播数据的通用移动边缘计算 (MEC) 网络中的联合推送和缓存策略。我们将联合推送和缓存问题表述为一个无限范围的平均成本马尔可夫决策过程 (MDP)。我们的目标不仅是最大限度地利用带宽，而且还要减少传输的数据总量。然后，提出了一种基于分层强化学习（HRL）的联合推送和缓存策略，该策略同时考虑了用户请求的长期文件流行度和短期时间相关性，以充分利用带宽。为了解决维度灾难，我们采用分而治之的策略将联合基站和用户缓存优化问题分解为两个子问题：用户缓存优化子问题和基站缓存优化子问题。我们应用价值函数逼近 Q 学习和深度 Q 网络（DQN）来解决这两个子问题。此外，我们对网络缓存中的深度强化学习的设计提供了一些见解。仿真结果表明，所提出的策略可以很好地学习内容流行度并准确预测用户未来的需求。我们的方法在多种参数上优于现有方案，包括基站缓存大小、用户数量和多种场景下的文件总数。我们采用分而治之的策略将联合基站和用户缓存优化问题分解为两个子问题：用户缓存优化子问题和基站缓存优化子问题。我们应用价值函数逼近 Q 学习和深度 Q 网络（DQN）来解决这两个子问题。此外，我们对网络缓存中的深度强化学习的设计提供了一些见解。仿真结果表明，所提出的策略可以很好地学习内容流行度并准确预测用户未来的需求。我们的方法在多种参数上优于现有方案，包括基站缓存大小、用户数量和多种场景下的文件总数。我们采用分而治之的策略将联合基站和用户缓存优化问题分解为两个子问题：用户缓存优化子问题和基站缓存优化子问题。我们应用价值函数逼近 Q 学习和深度 Q 网络（DQN）来解决这两个子问题。此外，我们对网络缓存中的深度强化学习的设计提供了一些见解。仿真结果表明，所提出的策略可以很好地学习内容流行度并准确预测用户未来的需求。我们的方法在多种参数上优于现有方案，包括基站缓存大小、用户数量和多种场景下的文件总数。用户缓存优化子问题和基站缓存优化子问题。我们应用价值函数逼近 Q 学习和深度 Q 网络（DQN）来解决这两个子问题。此外，我们对网络缓存中的深度强化学习的设计提供了一些见解。仿真结果表明，所提出的策略可以很好地学习内容流行度并准确预测用户未来的需求。我们的方法在多种参数上优于现有方案，包括基站缓存大小、用户数量和多种场景下的文件总数。用户缓存优化子问题和基站缓存优化子问题。我们应用价值函数逼近 Q 学习和深度 Q 网络（DQN）来解决这两个子问题。此外，我们对网络缓存中的深度强化学习的设计提供了一些见解。仿真结果表明，所提出的策略可以很好地学习内容流行度并准确预测用户未来的需求。我们的方法在多种参数上优于现有方案，包括基站缓存大小、用户数量和多种场景下的文件总数。我们提供了一些有关网络缓存中深度强化学习设计的见解。仿真结果表明，所提出的策略可以很好地学习内容流行度并准确预测用户未来的需求。我们的方法在多种参数上优于现有方案，包括基站缓存大小、用户数量和多种场景下的文件总数。我们提供了一些有关网络缓存中深度强化学习设计的见解。仿真结果表明，所提出的策略可以很好地学习内容流行度并准确预测用户未来的需求。我们的方法在多种参数上优于现有方案，包括基站缓存大小、用户数量和多种场景下的文件总数。

更新日期：2020-10-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11