当前位置: X-MOL 学术IEEE Trans. Broadcast. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Dynamic Beam Hopping Method Based on Multi-Objective Deep Reinforcement Learning for Next Generation Satellite Broadband Systems
IEEE Transactions on Broadcasting ( IF 3.2 ) Pub Date : 2020-09-01 , DOI: 10.1109/tbc.2019.2960940
Xin Hu , Yuchen Zhang , Xianglai Liao , Zhijun Liu , Weidong Wang , Fadhel M. Ghannouchi

When regarding the inherent uncertainty of differentiated services requirements as well as the non-uniform spatial distribution of capacity requests, it is essential to flexibility adjust resources of the satellite to satisfy the different conditions. How to match the system capacity demand with efficient utilization of beam is a brand-new challenge. The convention beam hopping methods ignores the intrinsic correlation between decisions, do not consider the long-term reward, and only achieve the optimal solution at the current time. Therefore, system complexity increases significantly as the increase of the demand for differentiated services or beam number. This paper investigates the optimal policy for beam hopping in DVB-S2X satellite with multiple purposes of assuring the fairness of each beam services, minimizing the delay of real-time services transmission, and maximizing the throughput of non-instant services transmission. Since wireless channel conditions, differentiated services arrival rates have stochastic properties, and the multi-beam satellite environment’s dynamics are unknown, the model-free multi-objective deep reinforcement learning approach is used to learn the optimal policy through interactions with the situation. To solve the problem with action dimensional disaster, a novel multi-action selection method based on a Double-Loop Learning (DLL) is proposed. Moreover, the multi-dimensional state is reformulated and obtained by the deep neural network. Under realistic conditions achieving evaluation results demonstrate that the proposed method can pursue multiple objectives simultaneously, and it can also allocate resource intelligently adapting to the user requirements and channel conditions.

中文翻译:

基于多目标深度强化学习的下一代卫星宽带系统动态波束跳跃方法

当考虑到差异化服务需求的内在不确定性以及容量请求的空间分布不均匀时,灵活调整卫星资源以满足不同条件至关重要。如何将系统容量需求与光束的高效利用相匹配,是一个全新的挑战。传统的跳束方法忽略了决策之间的内在相关性,不考虑长期奖励,只实现当前时间的最优解。因此,系统复杂度随着对差异化服务或波束数需求的增加而显着增加。本文研究了 DVB-S2X 卫星波束跳跃的最优策略,具有多重目的,以确保每个波束服务的公平性,最小化实时业务传输的延迟,最大化非即时业务传输的吞吐量。由于无线信道条件、差异化服务到达率具有随机特性,并且多波束卫星环境的动态未知,因此使用无模型多目标深度强化学习方法通​​过与情况的交互来学习最优策略。为了解决动作维度灾难的问题,提出了一种基于双环学习(DLL)的多动作选择方法。此外,多维状态是通过深度神经网络重新制定和获得的。在现实条件下获得评价结果表明该方法可以同时追求多个目标,
更新日期:2020-09-01
down
wechat
bug