Deep reinforcement learning-based incentive mechanism design for short video sharing through D2D communication,Peer-to-Peer Networking and Applications

当前位置： X-MOL 学术 › Peer-to-Peer Netw. Appl. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Deep reinforcement learning-based incentive mechanism design for short video sharing through D2D communication
Peer-to-Peer Networking and Applications ( IF 4.2 ) Pub Date : 2021-05-06 , DOI: 10.1007/s12083-021-01146-x
Zhuo Li , Wentao Dong , Xin Chen

With the development of 5th generation (5G) wireless communication networks and the popularity of short video applications, there has been a rapid increase in short video traffic in cellular networks. Device-to-device (D2D) communication-based short video sharing is considered to be an effective way to offload traffic from cellular networks. Due to the selfish nature of mobile user equipment (MUEs), how to dynamically motivate MUEs to engage in short video sharing while ensuring the Quality of Service, which makes it critical to design an appropriate incentive mechanism. In this paper, we firstly analyze the rationale for dynamically setting rewards and penalties and then define the rewards and penalties setting dynamically for maximizing the utility of the mobile edge computing server (RPSDMU) problem. The problem is proved NP-hard. Furthermore, we formulate the dynamic incentive process as the Markov Decision Process problem. Considering the complexity and dynamics of the problem, we design a Dynamic Incentive Mechanism algorithm of D2D-based Short Video Sharing based on Asynchronous Advantage Actor-Critic (DIM-A3C) to solve the problem. Simulation results show that the proposed dynamic incentive mechanism can increase the utility of mobile edge computing server by an average of 22% and 16% compared with the existing proportional incentive mechanism (PIM) and scoring-based incentive mechanism (SIM). Meanwhile, DIM-A3C achieves a higher degree of satisfaction than PIM and SIM.

中文翻译：

基于深度强化学习的激励机制设计，通过D2D交流实现短视频共享

随着第五代（5G）无线通信网络的发展以及短视频应用程序的普及，蜂窝网络中的短视频流量已迅速增加。基于设备到设备（D2D）通信的短视频共享被认为是从蜂窝网络卸载流量的有效方法。由于移动用户设备（MUE）的自私性质，如何在确保服务质量的同时动态地激励MUE参与短视频共享，因此设计适当的激励机制至关重要。在本文中，我们首先分析动态设置奖励和罚金的原理，然后动态定义奖励和罚金设置，以最大程度地提高移动边缘计算服务器（RPSDMU）问题的效用。问题被证明是NP难的。此外，我们将动态激励过程公式化为马尔可夫决策过程问题。考虑到问题的复杂性和动态性，我们设计了一种基于D2D的短视频共享的动态激励机制算法，该算法基于异步优势主演（DIM-A3C）解决了该问题。仿真结果表明，与现有的比例激励机制（PIM）和基于评分的激励机制（SIM）相比，所提出的动态激励机制可以将移动边缘计算服务器的利用率平均提高22％和16％。同时，与PIM和SIM相比，DIM-A3C的满意度更高。我们设计了一种基于异步优势主演（DIM-A3C）的基于D2D的短视频共享动态激励机制算法，以解决该问题。仿真结果表明，与现有的比例激励机制（PIM）和基于评分的激励机制（SIM）相比，所提出的动态激励机制可以将移动边缘计算服务器的利用率平均提高22％和16％。同时，与PIM和SIM相比，DIM-A3C的满意度更高。我们设计了一种基于异步优势主演（DIM-A3C）的基于D2D的短视频共享动态激励机制算法，以解决该问题。仿真结果表明，与现有的比例激励机制（PIM）和基于评分的激励机制（SIM）相比，所提出的动态激励机制可以将移动边缘计算服务器的利用率平均提高22％和16％。同时，与PIM和SIM相比，DIM-A3C的满意度更高。

更新日期：2021-05-07

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>