当前位置: X-MOL 学术IEEE J. Sel. Area. Comm. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
PnP-DRL: A Plug-and-Play Deep Reinforcement Learning Approach for Experience-Driven Networking
IEEE Journal on Selected Areas in Communications ( IF 13.8 ) Pub Date : 2021-06-14 , DOI: 10.1109/jsac.2021.3087270
Zhiyuan Xu , Kun Wu , Weiyi Zhang , Jian Tang , Yanzhi Wang , Guoliang Xue

While Deep Reinforcement Learning has emerged as a de facto approach to many complex experience-driven networking problems, it remains challenging to deploy DRL into real systems. Due to the random exploration or half-trained deep neural networks during the online training process, the DRL agent may make unexpected decisions, which may lead to system performance degradation or even system crash. In this paper, we propose PnP-DRL, an offline-trained, plug and play DRL solution, to leverage the batch reinforcement learning approach to learn the best control policy from pre-collected transition samples without interacting with the system. After being trained without interaction with systems, our Plug and Play DRL agent will start working seamlessly, without additional exploration or possible disruption of the running systems. We implement and evaluate our PnP-DRL solution on a prevalent experience-driven networking problem, Dynamic Adaptive Streaming over HTTP (DASH). Extensive experimental results manifest that 1) The existing batch reinforcement learning method has its limits; 2) Our approach PnP-DRL significantly outperforms classical adaptive bitrate algorithms in average user Quality of Experience (QoE); 3) PnP-DRL, unlike the state-of-the-art online DRL methods, can be off and running without learning gaps, while achieving comparable performances.

中文翻译:


PnP-DRL:一种用于体验驱动网络的即插即用深度强化学习方法



虽然深度强化学习已成为解决许多复杂的经验驱动网络问题的事实上的方法,但将 DRL 部署到实际系统中仍然具有挑战性。由于在线训练过程中的随机探索或半训练的深度神经网络,DRL代理可能会做出意想不到的决定,这可能会导致系统性能下降甚至系统崩溃。在本文中,我们提出了 PnP-DRL,一种离线训练、即插即用的 DRL 解决方案,利用批量强化学习方法从预先收集的转换样本中学习最佳控制策略,而无需与系统交互。在不与系统交互的情况下进行训练后,我们的即插即用 DRL 代理将开始无缝工作,无需额外探索或可能中断正在运行的系统。我们针对普遍存在的体验驱动型网络问题——基于 HTTP 的动态自适应流式传输 (DASH) 实施并评估了我们的 PnP-DRL 解决方案。大量的实验结果表明:1)现有的批量强化学习方法有其局限性; 2)我们的方法 PnP-DRL 在平均用户体验质量(QoE)方面显着优于经典的自适应比特率算法; 3) 与最先进的在线 DRL 方法不同,PnP-DRL 可以在没有学习差距的情况下关闭并运行,同时实现可比的性能。
更新日期:2021-06-14
down
wechat
bug