Leveraging experience in lazy search

Bhardwaj, Mohak; Choudhury, Sanjiban; Boots, Byron; Srinivasa, Siddhartha

doi:10.1007/s10514-021-10018-5

Leveraging experience in lazy search

Published: 03 November 2021

Volume 45, pages 979–996, (2021)
Cite this article

Autonomous Robots Aims and scope Submit manuscript

Mohak Bhardwaj ORCID: orcid.org/0000-0002-4775-8113¹,
Sanjiban Choudhury²,
Byron Boots¹ &
…
Siddhartha Srinivasa¹

238 Accesses
Explore all metrics

Abstract

Lazy graph search algorithms are efficient at solving motion planning problems where edge evaluation is the computational bottleneck. These algorithms work by lazily computing the shortest potentially feasible path, evaluating edges along that path, and repeating until a feasible path is found. The order in which edges are selected is critical to minimizing the total number of edge evaluations: a good edge selector chooses edges that are not only likely to be invalid, but also eliminates future paths from consideration. We wish to learn such a selector by leveraging prior experience. We formulate this problem as a Markov Decision Process (MDP) on the state of the search problem. While solving this large MDP is generally intractable, we show that we can compute oracular selectors that can solve the MDP during training. With access to such oracles, we use imitation learning to find effective policies. If new search problems are sufficiently similar to problems solved during training, the learned policy will choose a good edge evaluation ordering and solve the motion planning problem quickly. We evaluate our algorithms on a wide range of 2D and 7D problems and show that the learned selector outperforms baseline commonly used heuristics. We further provide a novel theoretical analysis of lazy search in a Bayesian framework as well as regret guarantees on our imitation learning based approach to motion planning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Monte Carlo Tree Search: a review of recent modifications and applications

Article Open access 19 July 2022

Maciej Świechowski, Konrad Godlewski, … Jacek Mańdziuk

A practical guide to multi-objective reinforcement learning and planning

Article Open access 13 April 2022

Conor F. Hayes, Roxana Rădulescu, … Diederik M. Roijers

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Notes

The framework can be extended to handle non-uniform evaluation cost as well
We can handle a varying graph by adding it to the state space.
As opposed to the problem of optimal decision tree (ODT), where the true hypothesis must be identified.
If we are only interested in minimizing the number of tests, then \(c(t)=1\) for all \(t\)

References

Bhardwaj, M., Choudhury, S., & Scherer, S. (2017). Learning heuristic search via imitation. In CoRL.
Bhardwaj, M., Choudhury, S., Boots, B., & Srinivasa, S. (2019). Leveraging experience in lazy search. arXiv preprint arXiv:1907.07238.
Bialkowski, J., Otte, M., & Frazzoli, E. (2013). Free-configuration biased sampling for motion planning. In IROS.
Bohlin, R., & Kavraki, L. E. (2000). Path planning using lazy prm. In ICRA.
Burns, B., & Brock, O. (2005). Sampling-based motion planning using predictive models. In ICRA.
Chakaravarthy, V. T., Pandit, V., Roy, S., Awasthi, P., & Mohania, M. (2007). Decision trees for entity identification: Approximation algorithms and hardness results. In Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 53–62.
Cheng, C.-A., Yan, X., Wagener, N., & Boots, B. (2018). Fast policy learning through imitation and reinforcement. arXiv preprint arXiv:1805.10413.
Choudhury, S., Bhardwaj, M., Arora, S., Kapoor, A., Ranade, G., Scherer, S., & Dey, D. (2017a). Data-driven planning via imitation learning. In IJRR.
Choudhury, S., Dellin, C. M., & Srinivasa, S. S. (2016). Pareto-optimal search over configuration space beliefs for anytime motion planning. In IROS.
Choudhury, S., Javdani, S., Srinivasa, S., & Scherer, S. (2017b). Near-optimal edge evaluation in explicit generalized binomial graphs. In NIPS.
Choudhury, S., Srinivasa, S. S., & Scherer, S. (2018). Bayesian active edge evaluation on expensive graphs. In IJCAI.
Cohen, B., Phillips, M., Likhachev, M. (2015). Planning single-arm manipulations with n-arm robots. In Eigth annual symposium on combinatorial search.
Dellin, C. M., & Srinivasa, S. S. (2016). A unifying formalism for shortest path problems with expensive edge evaluations via lazy best-first search over paths with edge selectors. In ICAPS.
Gammell, J. D., Srinivasa, Siddhartha S., & Barfoot, T. D. (2015). Batch Informed Trees: Sampling-based optimal planning via heuristically guided search of random geometric graphs. In ICRA.
Golovin, D., & Krause, A. (2011). Adaptive submodularity: Theory and applications in active learning and stochastic optimization. Journal of Artificial Intelligence Research.
Golovin, D., Krause, A., & Ray, D. (2010). Near-optimal bayesian active learning with noisy observations. In NIPS.
Gordon, G. J. (1995). Stable function approximation in dynamic programming. In Machine learning proceedings 1995 (pp. 261–268). Elsevier.
Haghtalab, N., Mackenzie, S., Procaccia, A., Salzman, O., & Srinivasa, S. (2018). The provable virtue of laziness in motion planning. pp. 106–113. https://aaai.org/ocs/index.php/ICAPS/ICAPS18/paper/view/17726.
Hauser, K. (2015). Lazy collision checking in asymptotically-optimal motion planning. In ICRA.
Hsu, D., Latombe, J.-C., & Motwani, R. (1997). Path planning in expansive configuration spaces. In ICRA.
Huh, J, & Lee, D. D. (2016). Learning high-dimensional mixture models for fast collision detection in rapidly-exploring random trees. In ICRA.
Javdani, S., Chen, Y., Karbasi, A., Krause, A., Bagnell, D., & Srinivasa, S. (2014). Near optimal Bayesian active learning for decision making. In AISTATS.
Javdani, S., Srinivasa, S. S., & Bagnell, J. A. (2015). Shared autonomy via hindsight optimization. In RSS.
Kahn, G., Zhang, T., Levine, S., & Abbeel, P. (2017). Plato: Policy learning using adaptive trajectory optimization. In ICRA.
Koval, M., Pollard, N., & Srinivasa, S. (2014). Pre- and post-contact policy decomposition for planar contact manipulation under uncertainty. In RSS.
Lacevic, B., Osmankovic, D., & Ademovic, A. (2016). Burs of free c-space: A novel structure for path planning. In ICRA. IEEE.
Littman, M. L., Cassandra, A. R., & Kaelbling, L. P. (1995). Learning policies for partially observable environments: Scaling up. In ICML.
Mandalika, A., Salzman, O., & Srinivasa, S. (2018). Lazy receding horizon A* for efficient path planning in graphs with expensive-to-evaluate edges, pp. 476–484.
Narayanan, V., & Likhachev, M. (2017). Heuristic search on graphs with existence priors for expensive-to-evaluate edges. In ICAPS.
Nielsen, C. L., & Kavraki, L. E. (2000). A 2 level fuzzy prm for manipulation planning. In IROS.
Ross, S. & Bagnell, J. A. (2014). Reinforcement and imitation learning via interactive no-regret learning. arXiv.
Ross, S., Gordon, G., & Bagnell, D. (2011). A reduction of imitation learning and structured prediction to no-regret online learning. In AISTATS, volume 1, p. 6.
Schwartz, J. T., & Sharir, M. (1983). On the “piano movers” problem i. The case of a two-dimensional rigid polygonal body moving amidst polygonal barriers. Communications on Pure and Applied Mathematics,36(3), 345–398.
Sun, W., Bagnell, J. A., & Boots, B. (2018). Truncated horizon policy search: Combining reinforcement learning & imitation learning. arXiv preprint arXiv:1805.11240.
Sun, W., Venkatraman, A., Gordon, G.J., Boots, B., & Bagnell, J.A. (2017). Deeply aggrevated: Differentiable imitation learning for sequential prediction. In International conference on machine learning, pp. 3309–3318.
Tamar, A., Thomas, G., Zhang, T., Levine, S., & Abbeel, P. (2016). Learning from the hindsight plan–episodic mpc improvement. arXiv preprint arXiv:1609.09001.
Watkins, C. J. C. H., & Dayan, P. (1992). Q-learning. Machine Learning, 8(3–4), 279–292.
MATH Google Scholar
Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., & Philip, S. Y. (2020). A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems.
Yoon, S. W., Fern, A., & Robert, R. (2007). Ff-replan: A baseline for probabilistic planning. ICAPS, 7, 352–359.
Google Scholar

Download references

Author information

Authors and Affiliations

Bill and Melinda Gates Center, University of Washington, 3800 E Stevens Way NE, Seattle, WA, 98195, USA
Mohak Bhardwaj, Byron Boots & Siddhartha Srinivasa
Aurora Innovation, Inc, 113 47th St, Pittsburgh, PA, 15201, USA
Sanjiban Choudhury

Authors

Mohak Bhardwaj
View author publications
You can also search for this author in PubMed Google Scholar
Sanjiban Choudhury
View author publications
You can also search for this author in PubMed Google Scholar
Byron Boots
View author publications
You can also search for this author in PubMed Google Scholar
Siddhartha Srinivasa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohak Bhardwaj.

Ethics declarations

Conflict of interest

The current manuscript was developed while Mohak Bhardwaj, Byron Boots and Siddhartha Srinivasa were affiliated with University of Washington, and Sanjiban Choudhury with Aurora Innovation, Inc. The initial RSS 2019 submission, which the current work builds upon, was done in affiliation with Georgia Institute of Technology (Mohak Bhardwaj and Byron Boots) and University of Washington (Sanjiban Choudhury and Siddhartha Srinivasa). Siddhartha Srinivasa and Byron Boots also hold positions at Amazon Robotics and NVIDIA respectively.

Additional information

This is one of the several papers published in Autonomous Robotscomprising the Special Issue on Robotics: Science and Systems 2019.

This work was (partially) funded by the National Institute of Health R01 (#R01EB019335), National Science Foundation CPS (#1544797), National Science Foundation NRI (#1637748), National Science Foundation CAREER (#1750483), the Office of Naval Research, the RCTA, Amazon, and Honda Research Institute USA.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bhardwaj, M., Choudhury, S., Boots, B. et al. Leveraging experience in lazy search. Auton Robot 45, 979–996 (2021). https://doi.org/10.1007/s10514-021-10018-5

Download citation

Received: 31 January 2021
Accepted: 30 August 2021
Published: 03 November 2021
Issue Date: October 2021
DOI: https://doi.org/10.1007/s10514-021-10018-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Leveraging experience in lazy search

Abstract

Access this article

Similar content being viewed by others

Monte Carlo Tree Search: a review of recent modifications and applications

A practical guide to multi-objective reinforcement learning and planning

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Leveraging experience in lazy search

Abstract

Access this article

Similar content being viewed by others

Monte Carlo Tree Search: a review of recent modifications and applications

A practical guide to multi-objective reinforcement learning and planning

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation