当前位置: X-MOL 学术IEEE Trans. Cognit. Commun. Netw. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Dynamic Topology Design of NFV-Enabled Services Using Deep Reinforcement Learning
IEEE Transactions on Cognitive Communications and Networking ( IF 8.6 ) Pub Date : 2021-12-31 , DOI: 10.1109/tccn.2021.3139632
Omar Alhussein 1 , Weihua Zhuang 1
Affiliation  

Next-generation networks are endowed with enhanced capabilities thanks to software-defined networking and network function virtualization (NFV). There is a radical shift from device-centric to experience-driven environments of which data is the primary driver behind its running engines. In this paper, we consider joint topology design, traffic routing and NF placement for unicast NFV-enabled services. We develop an end-to-end model-free deep reinforcement learning (RL) framework to dynamically allocate processing and transmission resources, while considering time-varying network traffic patterns. First, we provide a flexible pre-processing technique that represents and reduces the state space and action space of the considered joint problem for the deep RL algorithm. Second, we present a deep deterministic policy gradient (DDPG) algorithm that is enhanced with a model-assisted exploration procedure. Due to the multiple resource types with strongly adverse effects, the existing vanilla DDPG algorithm cannot achieve consistent performance. The model-assisted exploration procedure, which utilizes a perturbed step-wise sub-optimal integer linear program, bootstraps and stabilizes the vanilla DDPG algorithm and finds optimal solutions efficiently.

中文翻译:

使用深度强化学习的 NFV 启用服务的动态拓扑设计

由于软件定义的网络和网络功能虚拟化 (NFV),下一代网络被赋予了增强的功能。从以设备为中心到体验驱动的环境发生了根本性转变,其中数据是其运行引擎背后的主要驱动力。在本文中,我们考虑了支持单播 NFV 服务的联合拓扑设计、流量路由和 NF 放置。我们开发了一个端到端的无模型深度强化学习 (RL) 框架来动态分配处理和传输资源,同时考虑随时间变化的网络流量模式。首先,我们提供了一种灵活的预处理技术,可以表示和减少深度 RL 算法所考虑的联合问题的状态空间和动作空间。第二,我们提出了一种深度确定性策略梯度 (DDPG) 算法,该算法通过模型辅助探索程序进行了增强。由于具有强烈不利影响的多种资源类型,现有的 vanilla DDPG 算法无法达到一致的性能。模型辅助探索过程利用扰动的逐步次优整数线性规划,引导和稳定 vanilla DDPG 算法并有效地找到最优解。
更新日期:2021-12-31
down
wechat
bug