Dynamic Topology Design of NFV-Enabled Services Using Deep Reinforcement Learning,IEEE Transactions on Cognitive Communications and Networking

当前位置： X-MOL 学术 › IEEE Trans. Cognit. Commun. Netw. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Dynamic Topology Design of NFV-Enabled Services Using Deep Reinforcement Learning
IEEE Transactions on Cognitive Communications and Networking ( IF 8.6 ) Pub Date : 2021-12-31 , DOI: 10.1109/tccn.2021.3139632
Omar Alhussein ₁ , Weihua Zhuang ₁

Affiliation

Next-generation networks are endowed with enhanced capabilities thanks to software-defined networking and network function virtualization (NFV). There is a radical shift from device-centric to experience-driven environments of which data is the primary driver behind its running engines. In this paper, we consider joint topology design, traffic routing and NF placement for unicast NFV-enabled services. We develop an end-to-end model-free deep reinforcement learning (RL) framework to dynamically allocate processing and transmission resources, while considering time-varying network traffic patterns. First, we provide a flexible pre-processing technique that represents and reduces the state space and action space of the considered joint problem for the deep RL algorithm. Second, we present a deep deterministic policy gradient (DDPG) algorithm that is enhanced with a model-assisted exploration procedure. Due to the multiple resource types with strongly adverse effects, the existing vanilla DDPG algorithm cannot achieve consistent performance. The model-assisted exploration procedure, which utilizes a perturbed step-wise sub-optimal integer linear program, bootstraps and stabilizes the vanilla DDPG algorithm and finds optimal solutions efficiently.

中文翻译：

使用深度强化学习的 NFV 启用服务的动态拓扑设计

由于软件定义的网络和网络功能虚拟化 (NFV)，下一代网络被赋予了增强的功能。从以设备为中心到体验驱动的环境发生了根本性转变，其中数据是其运行引擎背后的主要驱动力。在本文中，我们考虑了支持单播 NFV 服务的联合拓扑设计、流量路由和 NF 放置。我们开发了一个端到端的无模型深度强化学习 (RL) 框架来动态分配处理和传输资源，同时考虑随时间变化的网络流量模式。首先，我们提供了一种灵活的预处理技术，可以表示和减少深度 RL 算法所考虑的联合问题的状态空间和动作空间。第二，我们提出了一种深度确定性策略梯度 (DDPG) 算法，该算法通过模型辅助探索程序进行了增强。由于具有强烈不利影响的多种资源类型，现有的 vanilla DDPG 算法无法达到一致的性能。模型辅助探索过程利用扰动的逐步次优整数线性规划，引导和稳定 vanilla DDPG 算法并有效地找到最优解。

更新日期：2021-12-31

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>