Primal-dual Learning for the Model-free Risk-constrained Linear Quadratic Regulator,arXiv - CS - Systems and Control

当前位置： X-MOL 学术 › arXiv.cs.SY › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Primal-dual Learning for the Model-free Risk-constrained Linear Quadratic Regulator
arXiv - CS - Systems and Control Pub Date : 2020-11-22 , DOI: arxiv-2011.10931
Feiran Zhao, Keyou You

Risk-aware control, though with promise to tackle unexpected events, requires a known exact dynamical model. In this work, we propose a model-free framework to learn a risk-aware controller with a focus on the linear system. We formulate it as a discrete-time infinite-horizon LQR problem with a state predictive variance constraint. To solve it, we parameterize the policy with a feedback gain pair and leverage primal-dual methods to optimize it by solely using data. We first study the optimization landscape of the Lagrangian function and establish the strong duality in spite of its non-convex nature. Alongside, we find that the Lagrangian function enjoys an important local gradient dominance property, which is then exploited to develop a convergent random search algorithm to learn the dual function. Furthermore, we propose a primal-dual algorithm with global convergence to learn the optimal policy-multiplier pair. Finally, we validate our results via simulations.

中文翻译：

无模型风险受限线性二次调节器的原始对偶学习

意识到风险的控制尽管有望解决意外事件，但仍需要已知的精确动力学模型。在这项工作中，我们提出了一个无模型的框架来学习以线性系统为重点的风险感知控制器。我们将其表述为具有状态预测方差约束的离散时间无限水平LQR问题。为了解决这个问题，我们使用反馈增益对来参数化该策略，并利用原始对偶方法通过仅使用数据来对其进行优化。我们首先研究拉格朗日函数的优化态势，并建立强对偶性，尽管它具有非凸性。同时，我们发现拉格朗日函数具有重要的局部梯度优势性，然后利用它来开发收敛的随机搜索算法以学习对偶函数。此外，我们提出了一种具有全局收敛性的原始对偶算法，以学习最优策略乘数对。最后，我们通过仿真验证我们的结果。

更新日期：2020-11-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文