当前位置: X-MOL 学术IEEE Syst. J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An Actor-Critic Deep Reinforcement Learning Approach for Transmission Scheduling in Cognitive Internet of Things Systems
IEEE Systems Journal ( IF 4.4 ) Pub Date : 2019-03-18 , DOI: 10.1109/jsyst.2019.2891520
Helin Yang , Xianzhong Xie

The cognitive Internet of Things (CIoT) has attracted much interest recently in wireless networks due to its wide applications in smart cities, intelligent transportation systems, and smart metering networks. However, how to smartly schedule the packet transmission in CIoT systems is still a key challenge, that is, how to design a smart agent to realize the intelligent decision making and effective interoperability. In this paper, we model the system state transformation as a Markov decision process, and an actor-critic deep reinforcement learning algorithm based on a fuzzy normalized radial basis function neural network (called AC-FNRBF) is proposed to efficiently solve the intelligent transmission scheduling problem in CIoT systems under high-dimensional variables. The proposed AC-FNRBF algorithm can better approximate both the action function of the actor and the state-action value function of the critic without requiring the system prior knowledge, and a new reward function is established to maximize the system benefit, which jointly takes the transmission packet rate, the system throughput, the power consumption, and the transmission delay into account. Moreover, the AC-FNRBF has the ability to adjust its learning structure and parameters in dynamic environments. Simulation results verify that the proposed algorithm achieves higher transmission packet rate and system throughput with lower power consumption and transmission delay, compared with other existing reinforcement learning algorithms.

中文翻译:

认知物联网系统中用于行为调度的Actor-Critic深度强化学习方法

由于认知物联网(CIoT)在智能城市,智能交通系统和智能电表网络中的广泛应用,最近引起了无线网络的极大兴趣。然而,如何在CIoT系统中智能地调度数据包传输仍然是一个关键的挑战,即如何设计一个智能代理以实现智能决策和有效的互操作性。本文将系统状态转换建模为马尔可夫决策过程,并提出了一种基于模糊归一化径向基函数神经网络(AC-FNRBF)的actor-critic深度强化学习算法,以有效地解决智能传输调度问题。高维变量下CIoT系统中的问题。提出的AC-FNRBF算法不需要系统先验知识就能更好地逼近演员的行为函数和评论家的状态行为价值函数,并建立了新的奖励函数以最大化系统的收益,共同承担了收益。传输数据包速率,系统吞吐量,功耗和传输延迟都考虑在内。此外,AC-FNRBF能够在动态环境中调整其学习结构和参数。仿真结果表明,与现有的其他强化学习算法相比,该算法可以实现更高的传输包速率和系统吞吐量,并且功耗和传输延迟更低。并建立了新的奖励功能以最大化系统效益,该功能共同考虑了传输包速率,系统吞吐量,功耗和传输延迟。此外,AC-FNRBF能够在动态环境中调整其学习结构和参数。仿真结果表明,与现有的其他强化学习算法相比,该算法可以实现更高的传输包速率和系统吞吐量,并且功耗和传输延迟更低。并建立了新的奖励功能以最大化系统效益,该功能共同考虑了传输包速率,系统吞吐量,功耗和传输延迟。此外,AC-FNRBF能够在动态环境中调整其学习结构和参数。仿真结果表明,与现有的其他强化学习算法相比,该算法可以实现更高的传输包速率和系统吞吐量,并且功耗和传输延迟更低。AC-FNRBF能够在动态环境中调整其学习结构和参数。仿真结果证明,与现有的其他强化学习算法相比,该算法可以实现较高的传输包速率和系统吞吐量,且功耗和传输延迟较低。AC-FNRBF能够在动态环境中调整其学习结构和参数。仿真结果表明,与现有的其他强化学习算法相比,该算法可以实现更高的传输包速率和系统吞吐量,并且功耗和传输延迟更低。
更新日期:2020-04-22
down
wechat
bug