Model-Free Control in Wireless Cyber鈥揚hysical System With Communication Latency: A DRL Method With Improved Experience Replay,IEEE Transactions on Cybernetics

当前位置： X-MOL 学术 › IEEE Trans. Cybern. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Model-Free Control in Wireless Cyber鈥揚hysical System With Communication Latency: A DRL Method With Improved Experience Replay
IEEE Transactions on Cybernetics ( IF 9.4 ) Pub Date : 2023-05-22 , DOI: 10.1109/tcyb.2023.3275150
Yifei Qiu ₁ , Shaohua Wu ₂ , Jian Jiao ₂ , Ning Zhang ₃ , Qinyu Zhang ₂

Affiliation

This article explores the model-free remote control problem in a wireless networked cyber–physical system (CPS) composed of spatially distributed sensors, controllers, and actuators. The sensors sample the states of the controlled system to generate control instructions at the remote controller, while the actuators maintain the system’s stability by executing control commands. To realize the control under a model-free system, the deep deterministic policy gradient (DDPG) algorithm is adopted in the controller to enable model-free control. Unlike the traditional DDPG algorithm, which only takes the system state as input, this article incorporates historical action information as input to extract more information and achieve precise control in the case of communication latency. Additionally, in the experience replay mechanism of the DDPG algorithm, we incorporate the reward into the prioritized experience replay (PER) approach. According to the simulation results, the proposed sampling policy improves the convergence rate by determining the sampling probability of transitions based on the joint consideration of temporal difference (TD) error and reward.

中文翻译：

具有通信延迟的无线网络物理系统中的无模型控制：一种改进经验回放的 DRL 方法

本文探讨了由空间分布的传感器、控制器和执行器组成的无线网络信息物理系统（CPS）中的无模型远程控制问题。传感器对受控系统的状态进行采样，以在遥控器上生成控制指令，而执行器则通过执行控制命令来维持系统的稳定性。为了实现无模型系统下的控制，控制器中采用了深度确定性策略梯度（DDPG）算法来实现无模型控制。与传统DDPG算法仅以系统状态作为输入不同，本文结合历史动作信息作为输入，以提取更多信息，在通信延迟的情况下实现精确控制。此外，在DDPG算法的经验重放机制中，我们将奖励纳入优先经验重放（PER）方法中。根据仿真结果，所提出的采样策略通过联合考虑时间差（TD）误差和奖励来确定转移的采样概率，从而提高了收敛速度。

更新日期：2023-05-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11