Reinforcement Learning-based Admission Control in Delay-sensitive Service Systems,arXiv - CS - Performance

当前位置： X-MOL 学术 › arXiv.cs.PF › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Reinforcement Learning-based Admission Control in Delay-sensitive Service Systems
arXiv - CS - Performance Pub Date : 2020-08-21 , DOI: arxiv-2008.09590
Majid Raeis, Ali Tizghadam and Alberto Leon-Garcia

Ensuring quality of service (QoS) guarantees in service systems is a challenging task, particularly when the system is composed of more fine-grained services, such as service function chains. An important QoS metric in service systems is the end-to-end delay, which becomes even more important in delay-sensitive applications, where the jobs must be completed within a time deadline. Admission control is one way of providing end-to-end delay guarantee, where the controller accepts a job only if it has a high probability of meeting the deadline. In this paper, we propose a reinforcement learning-based admission controller that guarantees a probabilistic upper-bound on the end-to-end delay of the service system, while minimizes the probability of unnecessary rejections. Our controller only uses the queue length information of the network and requires no knowledge about the network topology or system parameters. Since long-term performance metrics are of great importance in service systems, we take an average-reward reinforcement learning approach, which is well suited to infinite horizon problems. Our evaluations verify that the proposed RL-based admission controller is capable of providing probabilistic bounds on the end-to-end delay of the network, without using system model information.

中文翻译：

延迟敏感服务系统中基于强化学习的准入控制

确保服务系统中的服务质量 (QoS) 保证是一项具有挑战性的任务，尤其是当系统由更细粒度的服务（例如服务功能链）组成时。服务系统中一个重要的 QoS 指标是端到端延迟，这在延迟敏感的应用程序中变得更加重要，其中作业必须在时间期限内完成。准入控制是提供端到端延迟保证的一种方式，控制器仅在很可能满足截止日期的情况下才接受作业。在本文中，我们提出了一种基于强化学习的准入控制器，它保证了服务系统端到端延迟的概率上限，同时最大限度地减少了不必要的拒绝概率。我们的控制器只使用网络的队列长度信息，不需要了解网络拓扑或系统参数。由于长期性能指标在服务系统中非常重要，因此我们采用平均奖励强化学习方法，该方法非常适合无限视野问题。我们的评估验证了所提出的基于 RL 的准入控制器能够在不使用系统模型信息的情况下提供网络端到端延迟的概率界限。

更新日期：2020-08-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>