Leveraging experience in lazy search,Autonomous Robots

当前位置： X-MOL 学术 › Auton. Robot. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Leveraging experience in lazy search
Autonomous Robots ( IF 3.7 ) Pub Date : 2021-11-03 , DOI: 10.1007/s10514-021-10018-5
Mohak Bhardwaj ₁ , Byron Boots ₁ , Siddhartha Srinivasa ₁ , Sanjiban Choudhury ₂

Affiliation

Lazy graph search algorithms are efficient at solving motion planning problems where edge evaluation is the computational bottleneck. These algorithms work by lazily computing the shortest potentially feasible path, evaluating edges along that path, and repeating until a feasible path is found. The order in which edges are selected is critical to minimizing the total number of edge evaluations: a good edge selector chooses edges that are not only likely to be invalid, but also eliminates future paths from consideration. We wish to learn such a selector by leveraging prior experience. We formulate this problem as a Markov Decision Process (MDP) on the state of the search problem. While solving this large MDP is generally intractable, we show that we can compute oracular selectors that can solve the MDP during training. With access to such oracles, we use imitation learning to find effective policies. If new search problems are sufficiently similar to problems solved during training, the learned policy will choose a good edge evaluation ordering and solve the motion planning problem quickly. We evaluate our algorithms on a wide range of 2D and 7D problems and show that the learned selector outperforms baseline commonly used heuristics. We further provide a novel theoretical analysis of lazy search in a Bayesian framework as well as regret guarantees on our imitation learning based approach to motion planning.

中文翻译：

利用惰性搜索的经验

惰性图搜索算法在解决边缘评估是计算瓶颈的运动规划问题方面非常有效。这些算法的工作原理是懒惰地计算最短的潜在可行路径，评估沿该路径的边，然后重复直到找到可行路径。选择边的顺序对于最小化边评估的总数至关重要：一个好的边选择器不仅会选择可能无效的边，而且还会从考虑中排除未来的路径。我们希望通过利用先前的经验来学习这样的选择器。我们将这个问题表述为搜索问题状态的马尔可夫决策过程（MDP）。虽然解决这个大的 MDP 通常是棘手的，但我们表明我们可以计算可以在训练期间解决 MDP 的预言选择器。通过访问这些神谕，我们使用模仿学习来寻找有效的政策。如果新的搜索问题与训练期间解决的问题足够相似，则学习到的策略将选择一个好的边缘评估排序并快速解决运动规划问题。我们在广泛的 2D 和 7D 问题上评估我们的算法，并表明学习到的选择器优于基线常用的启发式算法。我们进一步提供了贝叶斯框架中惰性搜索的新颖理论分析，以及我们基于模仿学习的运动规划方法的遗憾保证。我们在广泛的 2D 和 7D 问题上评估我们的算法，并表明学习到的选择器优于基线常用的启发式算法。我们进一步提供了贝叶斯框架中惰性搜索的新颖理论分析，以及我们基于模仿学习的运动规划方法的遗憾保证。我们在广泛的 2D 和 7D 问题上评估我们的算法，并表明学习到的选择器优于基线常用的启发式算法。我们进一步提供了贝叶斯框架中惰性搜索的新颖理论分析，以及我们基于模仿学习的运动规划方法的遗憾保证。

更新日期：2021-11-03

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11