Random-Sampling Monte-Carlo Tree Search Methods for Cost Approximation in Long-Horizon Optimal Control,arXiv - CS - Systems and Control

当前位置： X-MOL 学术 › arXiv.cs.SY › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Random-Sampling Monte-Carlo Tree Search Methods for Cost Approximation in Long-Horizon Optimal Control
arXiv - CS - Systems and Control Pub Date : 2020-09-15 , DOI: arxiv-2009.07354
Shankarachary Ragi and Hans D. Mittelmann

In this paper, we develop Monte-Carlo based heuristic approaches to approximate the objective function in long horizon optimal control problems. In these approaches, to approximate the expectation operator in the objective function, we evolve the system state over multiple trajectories into the future while sampling the noise disturbances at each time-step, and find the average (or weighted average) of the costs along all the trajectories. We call these methods random sampling - multipath hypothesis propagation or RS-MHP. These methods (or variants) exist in the literature; however, the literature lacks results on how well these approximation strategies converge. This paper fills this knowledge gap to a certain extent. We derive convergence results for the cost approximation error from the RS-MHP methods and discuss their convergence (in probability) as the sample size increases. We consider two case studies to demonstrate the effectiveness of our methods - a) linear quadratic control problem; b) UAV path optimization problem.

中文翻译：

长视野最优控制中成本逼近的随机抽样蒙特卡罗树搜索方法

在本文中，我们开发了基于蒙特卡罗的启发式方法来逼近长期最优控制问题中的目标函数。在这些方法中，为了逼近目标函数中的期望算子，我们将多个轨迹上的系统状态演化到未来，同时在每个时间步对噪声干扰进行采样，并找到所有成本的平均值（或加权平均值）轨迹。我们称这些方法为随机采样——多径假设传播或 RS-MHP。这些方法（或变体）存在于文献中；然而，文献缺乏关于这些近似策略收敛程度的结果。本文在一定程度上填补了这一知识空白。我们从 RS-MHP 方法推导出成本近似误差的收敛结果，并讨论它们随着样本量增加而收敛（在概率上）。我们考虑了两个案例研究来证明我们方法的有效性 - a) 线性二次控制问题；b) 无人机路径优化问题。

更新日期：2020-09-17

点击分享查看原文

点击收藏

阅读更多本刊最新论文