当前位置: X-MOL 学术IEEE Trans. Wirel. Commun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Adaptive Video Streaming for Massive MIMO Networks via Approximate MDP and Reinforcement Learning
IEEE Transactions on Wireless Communications ( IF 10.4 ) Pub Date : 2020-09-01 , DOI: 10.1109/twc.2020.2995944
Qiao Lan , Bojie Lv , Rui Wang , Kaibin Huang , Yi Gong

The scheduling of downlink video streaming in a massive multiple-input multiple-output (MIMO) network is considered in this paper, where active users arrive randomly to request video contents of a finite playback duration via their service base stations (BSs). Each video content consisting of a sequence of segments can be transmitted to the requesting users with variable video bitrates. We formulate the joint control of transmitted segment number, frame allocation and segment bitrate in all the super frames (each comprising multiple frames) as an infinite-horizon Markov decision process (MDP). The maximization objective is a discounted measurement of the average Quality of Experience (QoE). Since there is no efficient method for scheduling design with random user arrivals and departures in the existing literature, a novel approximate MDP method is proposed to obtain a low-complexity scheduling policy, where a lower bound on its performance is derived. Specifically, we first introduce a baseline policy and derive its asymptotic value function. One-step policy iteration is then applied to improve this value function, yielding the mentioned low-complexity policy. Finally, we propose a novel and efficient reinforcement learning (RL) algorithm to evaluate the value function when the prior knowledge on user arrival intensity is absent.

中文翻译:

通过近似 MDP 和强化学习实现大规模 MIMO 网络的自适应视频流

本文考虑了大规模多输入多输出 (MIMO) 网络中下行链路视频流的调度,其中活动用户随机到达以通过其服务基站 (BS) 请求有限播放持续时间的视频内容。由一系列片段组成的每个视频内容都可以以可变的视频比特率传输给请求用户。我们将所有超帧(每个超帧包含多个帧)中传输段数、帧分配和段比特率的联合控制制定为无限水平马尔可夫决策过程(MDP)。最大化目标是对平均体验质量 (QoE) 的折扣衡量。由于现有文献中没有有效的用户随机到达和离开的调度设计方法,提出了一种新的近似 MDP 方法来获得低复杂度的调度策略,其中推导出其性能的下限。具体来说,我们首先引入一个基线策略并推导出它的渐近值函数。然后应用一步策略迭代来改进这个价值函数,产生提到的低复杂度策略。最后,我们提出了一种新颖有效的强化学习 (RL) 算法,用于在缺乏用户到达强度的先验知识时评估价值函数。
更新日期:2020-09-01
down
wechat
bug