Policy Decomposition: Approximate Optimal Control with Suboptimality Estimates,arXiv - CS - Robotics

当前位置： X-MOL 学术 › arXiv.cs.RO › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Policy Decomposition: Approximate Optimal Control with Suboptimality Estimates
arXiv - CS - Robotics Pub Date : 2021-03-03 , DOI: arxiv-2103.02716
Ashwin Khadke, Hartmut Geyer

Numerically computing global policies to optimal control problems for complex dynamical systems is mostly intractable. In consequence, a number of approximation methods have been developed. However, none of the current methods can quantify by how much the resulting control underperforms the elusive globally optimal solution. Here we propose policy decomposition, an approximation method with explicit suboptimality estimates. Our method decomposes the optimal control problem into lower-dimensional subproblems, whose optimal solutions are recombined to build a control policy for the entire system. Many such combinations exist, and we introduce the value error and its LQR and DDP estimates to predict the suboptimality of possible combinations and prioritize the ones that minimize it. Using a cart-pole, a 3-link balancing biped and N-link planar manipulators as example systems, we find that the estimates correctly identify the best combinations, yielding control policies in a fraction of the time it takes to compute the optimal control without a notable sacrifice in closed-loop performance. While more research will be needed to find ways of dealing with the combinatorics of policy decomposition, the results suggest this method could be an effective alternative for approximating optimal control in intractable systems.

中文翻译：

策略分解：具有次优估计的近似最优控制

数值计算全局策略以优化复杂动力系统的最佳控制问题在大多数情况下都是棘手的。结果，已经开发了许多近似方法。但是，当前的方法都无法量化所得到的控制效果不及可捉摸的全局最优解的程度。在这里，我们提出策略分解，一种具有显式次优估计的近似方法。我们的方法将最优控制问题分解为低维子问题，将其最优解重新组合以构建整个系统的控制策略。存在许多这样的组合，我们引入了值误差及其LQR和DDP估计值，以预测可能组合的次优性并确定将其最小化的组合的优先级。用撑竿子以3链接平衡Biped和N链接平面操纵器为例，我们发现估算值正确地确定了最佳组合，从而在不花费显着代价的情况下，在计算出最优控制的时间的一小部分就产生了控制策略。循环性能。尽管将需要进行更多的研究来找到应对政策分解的组合方法，但结果表明，该方法可能是逼近棘手系统中最优控制的有效替代方法。

更新日期：2021-03-05

点击分享查看原文

点击收藏

阅读更多本刊最新论文