A Reinforcement Learning-Based Pantograph Control Strategy for Improving Current Collection Quality in High-Speed Railways,IEEE Transactions on Neural Networks and Learning Systems

当前位置： X-MOL 学术 › IEEE Trans. Neural Netw. Learn. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Reinforcement Learning-Based Pantograph Control Strategy for Improving Current Collection Quality in High-Speed Railways
IEEE Transactions on Neural Networks and Learning Systems ( IF 10.2 ) Pub Date : 11-14-2022 , DOI: 10.1109/tnnls.2022.3219814
Hui Wang ₁ , Zhiwei Han ₁ , Wenqiang Liu ₂ , Yanbo Wu ₁

Affiliation

In high-speed railways, the pantograph-catenary system (PCS) is a critical subsystem of the train power supply system. In particular, when the double-PCS (DPCS) is in operation, the passing of the leading pantograph (LP) causes the contact force of the trailing pantograph (TP) to fluctuate violently, affecting the power collection quality of the electric multiple units (EMUs). The actively controlled pantograph is the most promising technique for reducing the pantograph-catenary contact force (PCCF) fluctuation and improving the current collection quality. Based on the Nash equilibrium framework, this study proposes a multiagent reinforcement learning (MARL) algorithm for active pantograph control called cooperative proximity policy optimization (Coo-PPO). In the algorithm implementation, the heterogeneous agents play a unique role in a cooperative environment guided by the global value function. Then, a novel reward propagation channel is proposed to reveal implicit associations between agents. Furthermore, a curriculum learning approach is adopted to strike a balance between reward maximization and rational movement patterns. An existing MARL algorithm and a traditional control strategy are compared in the same scenario to validate the proposed control strategy’s performance. The experimental results show that the Coo-PPO algorithm obtains more rewards, significantly suppresses the fluctuation in PCCF (up to 41.55%), and dramatically decreases the TP’s offline rate (up to 10.77%). This study adopts MARL technology for the first time to address the coordinated control of double pantographs in DPCS.

中文翻译：

基于强化学习的提高高速铁路受电质量的受电弓控制策略

在高速铁路中，受电弓接触网系统（PCS）是列车供电系统的关键子系统。特别是双PCS（DPCS）运行时，前受电弓（LP）的通过导致后受电弓（TP）接触力剧烈波动，影响电动车组的受电质量（动车组）。主动控制受电弓是减少受电弓接触力（PCCF）波动、提高集流质量最有前景的技术。基于纳什均衡框架，本研究提出了一种用于主动受电弓控制的多智能体强化学习（MARL）算法，称为协作邻近策略优化（Coo-PPO）。在算法实现中，异构智能体在全局价值函数引导的合作环境中发挥着独特的作用。然后，提出了一种新颖的奖励传播渠道来揭示代理之间的隐式关联。此外，采用课程学习方法在奖励最大化和理性运动模式之间取得平衡。在相同场景下对现有的 MARL 算法和传统控制策略进行比较，以验证所提出的控制策略的性能。实验结果表明，Coo-PPO算法获得了更多的奖励，显着抑制了PCCF的波动（高达41.55%），并大幅降低了TP的下线率（高达10.77%）。本研究首次采用MARL技术解决DPCS中双受电弓的协调控制问题。

更新日期：2024-08-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11