Skip to main content
Log in

Leveraging experience in lazy search

  • Published:
Autonomous Robots Aims and scope Submit manuscript

Abstract

Lazy graph search algorithms are efficient at solving motion planning problems where edge evaluation is the computational bottleneck. These algorithms work by lazily computing the shortest potentially feasible path, evaluating edges along that path, and repeating until a feasible path is found. The order in which edges are selected is critical to minimizing the total number of edge evaluations: a good edge selector chooses edges that are not only likely to be invalid, but also eliminates future paths from consideration. We wish to learn such a selector by leveraging prior experience. We formulate this problem as a Markov Decision Process (MDP) on the state of the search problem. While solving this large MDP is generally intractable, we show that we can compute oracular selectors that can solve the MDP during training. With access to such oracles, we use imitation learning to find effective policies. If new search problems are sufficiently similar to problems solved during training, the learned policy will choose a good edge evaluation ordering and solve the motion planning problem quickly. We evaluate our algorithms on a wide range of 2D and 7D problems and show that the learned selector outperforms baseline commonly used heuristics. We further provide a novel theoretical analysis of lazy search in a Bayesian framework as well as regret guarantees on our imitation learning based approach to motion planning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. The framework can be extended to handle non-uniform evaluation cost as well

  2. We can handle a varying graph by adding it to the state space.

  3. As opposed to the problem of optimal decision tree (ODT), where the true hypothesis must be identified.

  4. If we are only interested in minimizing the number of tests, then \(c(t)=1\) for all \(t\)

References

  • Bhardwaj, M., Choudhury, S., & Scherer, S. (2017). Learning heuristic search via imitation. In CoRL.

  • Bhardwaj, M., Choudhury, S., Boots, B., & Srinivasa, S. (2019). Leveraging experience in lazy search. arXiv preprint arXiv:1907.07238.

  • Bialkowski, J., Otte, M., & Frazzoli, E. (2013). Free-configuration biased sampling for motion planning. In IROS.

  • Bohlin, R., & Kavraki, L. E. (2000). Path planning using lazy prm. In ICRA.

  • Burns, B., & Brock, O. (2005). Sampling-based motion planning using predictive models. In ICRA.

  • Chakaravarthy, V. T., Pandit, V., Roy, S., Awasthi, P., & Mohania, M. (2007). Decision trees for entity identification: Approximation algorithms and hardness results. In Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 53–62.

  • Cheng, C.-A., Yan, X., Wagener, N., & Boots, B. (2018). Fast policy learning through imitation and reinforcement. arXiv preprint arXiv:1805.10413.

  • Choudhury, S., Bhardwaj, M., Arora, S., Kapoor, A., Ranade, G., Scherer, S., & Dey, D. (2017a). Data-driven planning via imitation learning. In IJRR.

  • Choudhury, S., Dellin, C. M., & Srinivasa, S. S. (2016). Pareto-optimal search over configuration space beliefs for anytime motion planning. In IROS.

  • Choudhury, S., Javdani, S., Srinivasa, S., & Scherer, S. (2017b). Near-optimal edge evaluation in explicit generalized binomial graphs. In NIPS.

  • Choudhury, S., Srinivasa, S. S., & Scherer, S. (2018). Bayesian active edge evaluation on expensive graphs. In IJCAI.

  • Cohen, B., Phillips, M., Likhachev, M. (2015). Planning single-arm manipulations with n-arm robots. In Eigth annual symposium on combinatorial search.

  • Dellin, C. M., & Srinivasa, S. S. (2016). A unifying formalism for shortest path problems with expensive edge evaluations via lazy best-first search over paths with edge selectors. In ICAPS.

  • Gammell, J. D., Srinivasa, Siddhartha S., & Barfoot, T. D. (2015). Batch Informed Trees: Sampling-based optimal planning via heuristically guided search of random geometric graphs. In ICRA.

  • Golovin, D., & Krause, A. (2011). Adaptive submodularity: Theory and applications in active learning and stochastic optimization. Journal of Artificial Intelligence Research.

  • Golovin, D., Krause, A., & Ray, D. (2010). Near-optimal bayesian active learning with noisy observations. In NIPS.

  • Gordon, G. J. (1995). Stable function approximation in dynamic programming. In Machine learning proceedings 1995 (pp. 261–268). Elsevier.

  • Haghtalab, N., Mackenzie, S., Procaccia, A., Salzman, O., & Srinivasa, S. (2018). The provable virtue of laziness in motion planning. pp. 106–113. https://aaai.org/ocs/index.php/ICAPS/ICAPS18/paper/view/17726.

  • Hauser, K. (2015). Lazy collision checking in asymptotically-optimal motion planning. In ICRA.

  • Hsu, D., Latombe, J.-C., & Motwani, R. (1997). Path planning in expansive configuration spaces. In ICRA.

  • Huh, J, & Lee, D. D. (2016). Learning high-dimensional mixture models for fast collision detection in rapidly-exploring random trees. In ICRA.

  • Javdani, S., Chen, Y., Karbasi, A., Krause, A., Bagnell, D., & Srinivasa, S. (2014). Near optimal Bayesian active learning for decision making. In AISTATS.

  • Javdani, S., Srinivasa, S. S., & Bagnell, J. A. (2015). Shared autonomy via hindsight optimization. In RSS.

  • Kahn, G., Zhang, T., Levine, S., & Abbeel, P. (2017). Plato: Policy learning using adaptive trajectory optimization. In ICRA.

  • Koval, M., Pollard, N., & Srinivasa, S. (2014). Pre- and post-contact policy decomposition for planar contact manipulation under uncertainty. In RSS.

  • Lacevic, B., Osmankovic, D., & Ademovic, A. (2016). Burs of free c-space: A novel structure for path planning. In ICRA. IEEE.

  • Littman, M. L., Cassandra, A. R., & Kaelbling, L. P. (1995). Learning policies for partially observable environments: Scaling up. In ICML.

  • Mandalika, A., Salzman, O., & Srinivasa, S. (2018). Lazy receding horizon A* for efficient path planning in graphs with expensive-to-evaluate edges, pp. 476–484.

  • Narayanan, V., & Likhachev, M. (2017). Heuristic search on graphs with existence priors for expensive-to-evaluate edges. In ICAPS.

  • Nielsen, C. L., & Kavraki, L. E. (2000). A 2 level fuzzy prm for manipulation planning. In IROS.

  • Ross, S. & Bagnell, J. A. (2014). Reinforcement and imitation learning via interactive no-regret learning. arXiv.

  • Ross, S., Gordon, G., & Bagnell, D. (2011). A reduction of imitation learning and structured prediction to no-regret online learning. In AISTATS, volume 1, p. 6.

  • Schwartz, J. T., & Sharir, M. (1983). On the “piano movers” problem i. The case of a two-dimensional rigid polygonal body moving amidst polygonal barriers. Communications on Pure and Applied Mathematics,36(3), 345–398.

  • Sun, W., Bagnell, J. A., & Boots, B. (2018). Truncated horizon policy search: Combining reinforcement learning & imitation learning. arXiv preprint arXiv:1805.11240.

  • Sun, W., Venkatraman, A., Gordon, G.J., Boots, B., & Bagnell, J.A. (2017). Deeply aggrevated: Differentiable imitation learning for sequential prediction. In International conference on machine learning, pp. 3309–3318.

  • Tamar, A., Thomas, G., Zhang, T., Levine, S., & Abbeel, P. (2016). Learning from the hindsight plan–episodic mpc improvement. arXiv preprint arXiv:1609.09001.

  • Watkins, C. J. C. H., & Dayan, P. (1992). Q-learning. Machine Learning, 8(3–4), 279–292.

    MATH  Google Scholar 

  • Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., & Philip, S. Y. (2020). A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems.

  • Yoon, S. W., Fern, A., & Robert, R. (2007). Ff-replan: A baseline for probabilistic planning. ICAPS, 7, 352–359.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohak Bhardwaj.

Ethics declarations

Conflict of interest

The current manuscript was developed while Mohak Bhardwaj, Byron Boots and Siddhartha Srinivasa were affiliated with University of Washington, and Sanjiban Choudhury with Aurora Innovation, Inc. The initial RSS 2019 submission, which the current work builds upon, was done in affiliation with Georgia Institute of Technology (Mohak Bhardwaj and Byron Boots) and University of Washington (Sanjiban Choudhury and Siddhartha Srinivasa). Siddhartha Srinivasa and Byron Boots also hold positions at Amazon Robotics and NVIDIA respectively.

Additional information

This is one of the several papers published in Autonomous Robotscomprising the Special Issue on Robotics: Science and Systems 2019.

This work was (partially) funded by the National Institute of Health R01 (#R01EB019335), National Science Foundation CPS (#1544797), National Science Foundation NRI (#1637748), National Science Foundation CAREER (#1750483), the Office of Naval Research, the RCTA, Amazon, and Honda Research Institute USA.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bhardwaj, M., Choudhury, S., Boots, B. et al. Leveraging experience in lazy search. Auton Robot 45, 979–996 (2021). https://doi.org/10.1007/s10514-021-10018-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10514-021-10018-5

Keywords

Navigation