A Convex Optimization Approach to Dynamic Programming in Continuous State and Action Spaces,Journal of Optimization Theory and Applications

当前位置： X-MOL 学术 › J. Optim. Theory Appl. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Convex Optimization Approach to Dynamic Programming in Continuous State and Action Spaces
Journal of Optimization Theory and Applications ( IF 1.9 ) Pub Date : 2020-09-14 , DOI: 10.1007/s10957-020-01747-1
Insoon Yang

In this paper, a convex optimization-based method is proposed for numerically solving dynamic programs in continuous state and action spaces. The key idea is to approximate the output of the Bellman operator at a particular state by the optimal value of a convex program. The approximate Bellman operator has a computational advantage because it involves a convex optimization problem in the case of control-affine systems and convex costs. Using this feature, we propose a simple dynamic programming algorithm to evaluate the approximate value function at pre-specified grid points by solving convex optimization problems in each iteration. We show that the proposed method approximates the optimal value function with a uniform convergence property in the case of convex optimal value functions. We also propose an interpolation-free design method for a control policy, of which performance converges uniformly to the optimum as the grid resolution becomes finer. When a nonlinear control-affine system is considered, the convex optimization approach provides an approximate policy with a provable suboptimality bound. For general cases, the proposed convex formulation of dynamic programming operators can be modified as a nonconvex bi-level program, in which the inner problem is a linear program, without losing uniform convergence properties.

中文翻译：

连续状态和动作空间中动态规划的凸优化方法

在本文中，提出了一种基于凸优化的方法来数值求解连续状态和动作空间中的动态程序。关键思想是通过凸程序的最优值来近似特定状态下贝尔曼算子的输出。近似 Bellman 算子具有计算优势，因为它在控制仿射系统和凸成本的情况下涉及凸优化问题。利用这个特性，我们提出了一种简单的动态规划算法，通过在每次迭代中求解凸优化问题来评估预先指定网格点处的近似值函数。我们表明，在凸最优值函数的情况下，所提出的方法以一致的收敛性逼近最优值函数。我们还提出了一种控制策略的无插值设计方法，其性能随着网格分辨率变细而均匀地收敛到最佳值。当考虑非线性控制仿射系统时，凸优化方法提供具有可证明的次优边界的近似策略。对于一般情况，所提出的动态规划算子的凸公式可以修改为非凸双层规划，其中内部问题是一个线性规划，而不会失去统一收敛特性。

更新日期：2020-09-14

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>