当前位置: X-MOL 学术Int. J. Adapt. Control Signal Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Adaptive dynamic programming for model‐free tracking of trajectories with time‐varying parameters
International Journal of Adaptive Control and Signal Processing ( IF 3.9 ) Pub Date : 2020-03-03 , DOI: 10.1002/acs.3106
Florian Köpf 1 , Simon Ramsteiner 1 , Luca Puccetti 1, 2 , Michael Flad 1 , Sören Hohmann 1
Affiliation  

In order to autonomously learn to control unknown systems optimally w.r.t. an objective function, Adaptive Dynamic Programming (ADP) is well-suited to adapt controllers based on experience from interaction with the system. In recent years, many researchers focused on the tracking case, where the aim is to follow a desired trajectory. So far, ADP tracking controllers assume that the reference trajectory follows time-invariant exo-system dynamics-an assumption that does not hold for many applications. In order to overcome this limitation, we propose a new Q-function which explicitly incorporates a parametrized approximation of the reference trajectory. This allows to learn to track a general class of trajectories by means of ADP. Once our Q-function has been learned, the associated controller copes with time-varying reference trajectories without need of further training and independent of exo-system dynamics. After proposing our general model-free off-policy tracking method, we provide analysis of the important special case of linear quadratic tracking. We conclude our paper with an example which demonstrates that our new method successfully learns the optimal tracking controller and outperforms existing approaches in terms of tracking error and cost.

中文翻译:

具有时变参数的轨迹无模型跟踪的自适应动态规划

为了通过目标函数自主学习以最佳方式控制未知系统,自适应动态规划 (ADP) 非常适合根据与系统交互的经验来调整控制器。近年来,许多研究人员专注于跟踪案例,其目的是遵循所需的轨迹。到目前为止,ADP 跟踪控制器假设参考轨迹遵循时不变的外系统动力学——这一假设不适用于许多应用。为了克服这个限制,我们提出了一个新的 Q 函数,它明确地结合了参考轨迹的参数化近似。这允许学习通过 ADP 跟踪一般类别的轨迹。一旦我们学习了 Q 函数,相关的控制器处理随时间变化的参考轨迹,无需进一步训练并且独立于外系统动力学。在提出我们的通用无模型离策略跟踪方法后,我们对线性二次跟踪的重要特殊情况进行了分析。我们用一个例子结束我们的论文,该例子表明我们的新方法成功地学习了最佳跟踪控制器,并且在跟踪误差和成本方面优于现有方法。
更新日期:2020-03-03
down
wechat
bug