Policy Optimization for Markovian Jump Linear Quadratic Control: Gradient-Based Methods and Global Convergence,arXiv - CS - Systems and Control

当前位置： X-MOL 学术 › arXiv.cs.SY › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Policy Optimization for Markovian Jump Linear Quadratic Control: Gradient-Based Methods and Global Convergence
arXiv - CS - Systems and Control Pub Date : 2020-11-24 , DOI: arxiv-2011.11852
Joao Paulo Jansch-Porto, Bin Hu, Geir Dullerud

Recently, policy optimization for control purposes has received renewed attention due to the increasing interest in reinforcement learning. In this paper, we investigate the global convergence of gradient-based policy optimization methods for quadratic optimal control of discrete-time Markovian jump linear systems (MJLS). First, we study the optimization landscape of direct policy optimization for MJLS, with static state feedback controllers and quadratic performance costs. Despite the non-convexity of the resultant problem, we are still able to identify several useful properties such as coercivity, gradient dominance, and almost smoothness. Based on these properties, we show global convergence of three types of policy optimization methods: the gradient descent method; the Gauss-Newton method; and the natural policy gradient method. We prove that all three methods converge to the optimal state feedback controller for MJLS at a linear rate if initialized at a controller which is mean-square stabilizing. Some numerical examples are presented to support the theory. This work brings new insights for understanding the performance of policy gradient methods on the Markovian jump linear quadratic control problem.

中文翻译：

马尔可夫跳跃线性二次控制的策略优化：基于梯度的方法和全局收敛

最近，由于对强化学习的兴趣日益浓厚，用于控制目的的策略优化受到了越来越多的关注。本文研究离散马尔可夫跳跃线性系统（MJLS）二次最优控制的基于梯度的策略优化方法的全局收敛性。首先，我们研究具有静态反馈控制器和二次性能成本的MJLS直接策略优化的优化环境。尽管产生的问题不具有凸性，但我们仍然能够确定一些有用的属性，例如矫顽力，梯度优势和几乎平滑的特性。基于这些特性，我们展示了三种类型的策略优化方法的全局收敛性：梯度下降方法；梯度下降方法；梯度下降方法；以及下降策略。高斯-牛顿法；和自然政策梯度法。我们证明，如果在均方稳定化的控制器上初始化，所有这三种方法都以线性速率收敛到MJLS的最佳状态反馈控制器。提出了一些数值示例来支持该理论。这项工作为理解策略梯度方法对马尔可夫跳跃线性二次控制问题的性能带来了新的见解。

更新日期：2020-11-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文