当前位置: X-MOL 学术J. Ind. Inf. Integr. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Physics-informed continuous-time reinforcement learning with data-driven approach for robotic arm manipulation
Journal of Industrial Information Integration ( IF 11.6 ) Pub Date : 2025-11-12 , DOI: 10.1016/j.jii.2025.101008
Jin-Qiang Wang ,  Lirong Song ,  Jun Shen ,  Binbin Yong ,  Xiaoteng Han ,  Yuanbo Jiang ,  Mona Raoufi ,  Qingguo Zhou

Deep reinforcement learning (DRL) plays a crucial role in complex sequential decision-making tasks. However, existing data-driven DRL methods primarily rely on an empirical risk minimization (ERM) strategy to fit optimal value function models. This approach often neglects the environment’s dynamical system properties, which in turn leads to an inadequate consideration of the structural risk minimization (SRM) strategy. To address this limitation, this paper proposes a physics-informed continuous-time reinforcement learning (PICRL) to validate model effectiveness from both ERM and SRM perspectives. Specifically, we begin by theoretically analyzing the mechanism of SRM in reinforcement learning models. Then, physics information is integrated into both discrete and continuous reinforcement learning algorithms for comparative experiments. Finally, we systematically examine the effects of various physics-informed and boundary constraints on these two learning frameworks. Experimental results on the PandaGym demonstrate that the proposed method achieves comparable or superior performance in both discrete and continuous-time reinforcement learning frameworks. This provides strong evidence for its significant advantages in learning control policies for dynamical systems with small time intervals.

中文翻译:

基于物理的连续时间强化学习,采用数据驱动的方法进行机械臂作

深度强化学习(DRL)在复杂的顺序决策任务中起着至关重要的作用。然而,现有的数据驱动 DRL 方法主要依赖于经验风险最小化(ERM)策略来拟合最优价值函数模型。这种方法往往忽视了环境的动态系统属性,这反过来又导致对结构风险最小化(SRM)策略的考虑不足。为了解决这一限制,本文提出了一种物理知情的连续时间强化学习(PICRL),以从 ERM 和 SRM 的角度验证模型的有效性。具体来说,我们首先从理论上分析了强化学习模型中 SRM 的机制。然后,将物理信息集成到离散和连续强化学习算法中进行比较实验。最后,我们系统地研究了各种物理知情约束和边界约束对这两个学习框架的影响。在 PandaGym 上的实验结果表明,所提方法在离散和连续时间强化学习框架中都取得了相当或更好的性能。这为其在学习小时间间隔动力系统的控制策略方面的显着优势提供了有力的证据。
更新日期:2025-11-12
down
wechat
bug