当前位置: X-MOL 学术Math. Control Signals Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Convergence results for an averaged LQR problem with applications to reinforcement learning
Mathematics of Control, Signals, and Systems ( IF 1.8 ) Pub Date : 2021-07-08 , DOI: 10.1007/s00498-021-00294-y
Andrea Pesare 1 , Maurizio Falcone 1 , Michele Palladino 2
Affiliation  

In this paper, we will deal with a linear quadratic optimal control problem with unknown dynamics. As a modeling assumption, we will suppose that the knowledge that an agent has on the current system is represented by a probability distribution \(\pi \) on the space of matrices. Furthermore, we will assume that such a probability measure is opportunely updated to take into account the increased experience that the agent obtains while exploring the environment, approximating with increasing accuracy the underlying dynamics. Under these assumptions, we will show that the optimal control obtained by solving the “average” linear quadratic optimal control problem with respect to a certain \(\pi \) converges to the optimal control driven related to the linear quadratic optimal control problem governed by the actual, underlying dynamics. This approach is closely related to model-based reinforcement learning algorithms where prior and posterior probability distributions describing the knowledge on the uncertain system are recursively updated. In the last section, we will show a numerical test that confirms the theoretical results.



中文翻译:

应用于强化学习的平均 LQR 问题的收敛结果

在本文中,我们将处理动态未知的线性二次最优控制问题。作为建模假设,我们将假设代理对当前系统的知识由矩阵空间上的概率分布\(\pi \)表示。此外,我们将假设这种概率度量被适时更新,以考虑到代理在探索环境时获得的经验增加,并以越来越准确的方式逼近潜在动态。在这些假设下,我们将证明通过求解关于某个\(\pi \)的“平均”线性二次最优控制问题获得的最优控制收敛到与由实际基础动力学控制的线性二次最优控制问题相关的最优控制。这种方法与基于模型的强化学习算法密切相关,其中描述不确定系统知识的先验和后验概率分布是递归更新的。在最后一节中,我们将展示一个数值测试来证实理论结果。

更新日期:2021-07-08
down
wechat
bug