Learning temporal logic formulas from suboptimal demonstrations: theory and experiments

Chou, Glen; Ozay, Necmiye; Berenson, Dmitry

doi:10.1007/s10514-021-10004-x

Learning temporal logic formulas from suboptimal demonstrations: theory and experiments

Published: 30 July 2021

Volume 46, pages 149–174, (2022)
Cite this article

Autonomous Robots Aims and scope Submit manuscript

1105 Accesses
5 Citations
57 Altmetric
8 Mentions
Explore all metrics

Abstract

We present a method for learning multi-stage tasks from demonstrations by learning the logical structure and atomic propositions of a consistent linear temporal logic (LTL) formula. The learner is given successful but potentially suboptimal demonstrations, where the demonstrator is optimizing a cost function while satisfying the LTL formula, and the cost function is uncertain to the learner. Our algorithm uses the Karush-Kuhn-Tucker (KKT) optimality conditions of the demonstrations together with a counterexample-guided falsification strategy to learn the atomic proposition parameters and logical structure of the LTL formula, respectively. We provide theoretical guarantees on the conservativeness of the recovered atomic proposition sets, as well as completeness in the search for finding an LTL formula consistent with the demonstrations. We evaluate our method on high-dimensional nonlinear systems by learning LTL formulas explaining multi-stage tasks on a simulated 7-DOF arm and a quadrotor, and show that it outperforms competing methods for learning LTL formulas from positive examples. Finally, we demonstrate that our approach can learn a real-world multi-stage tabletop manipulation task on a physical 7-DOF Kuka iiwa arm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Fig. 18

A practical guide to multi-objective reinforcement learning and planning

Article Open access 13 April 2022

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

Article 22 April 2021

A review of motion planning algorithms for intelligent robots

Article Open access 25 November 2021

Notes

This problem can also be represented and solved with satisfiability modulo theories (SMT) solvers.
Provided that the remaining higher-cost demonstrations are feasible, they can still be used in the learning process; we can enforce that these demonstrations should still be feasible for any candidate LTL formula.

References

Abbeel, P., & Ng, A.Y. (2004). Apprenticeship learning via inverse reinforcement learning. In International conference on machine learning (ICML).
Annpureddy, Y., Liu, C., Fainekos, G.E., & Sankaranarayanan, S. (2011). S-taliro: A tool for temporal logic falsification for hybrid systems. In 17th international conference on tools and algorithms for the construction and analysis of systems, TACAS, pp. 254–257.
Araki, B., Vodrahalli, K., Leech, T., Vasile, C.I., Donahue, M., & Rus, D. (2019). Learning to plan with logical automata. In Robotics: Science and systems XV.
Argall, B., Chernova, S., Veloso, M., & Browning, B. (2009). A survey of robot learning from demonstration. Robotics and Autonomous Systems, 57, 469–483.
Article Google Scholar
Baier, C., & Katoen, J.-P. (2008). Principles of model checking. Cambridge: MIT Press.
MATH Google Scholar
Bakhirkin, A., Ferrère, T., & Maler, O. (2018). Efficient parametric identification for STL. In Proceedings of the 21st international conference on hybrid systems: Computation and control, pp. 177–186.
Bertsimas, D., & Tsitsiklis, J. (1997). Introduction to linear optimization (1st ed.). Belmont: Athena Scientific. ISBN 1886529191.
Google Scholar
Biere, A., Heljanko, K., Junttila, T.A., Latvala, T., & Schuppan, V. (2006). Linear encodings of bounded LTL model checking. Logical Methods in Computer Science, 2(5).
Bombara, G., Vasile, C.I., Penedo, F., Yasuoka, H., & Belta, C. (2016). A decision tree approach to data classification using signal temporal logic. In Proceedings of the 19th international conference on hybrid systems: Computation and control, HSCC 2016, pp. 1–10.
Boyd, S., & Vandenberghe, L. (2004). Convex optimization. New York: Cambridge University Press. ISBN 0521833787.
Book Google Scholar
Bufo, S., Bartocci, E., Sanguinetti, G., Borelli, M., Lucangelo, U., & Bortolussi, L. (2014). Temporal logic based monitoring of assisted ventilation in intensive care patients. In Leveraging Applications of Formal Methods, Verification and Validation. Specialized Techniques and Applications—6th International Symposium, ISoLA 2014, pp. 391–403.
Calinon, S., & Billard, A. (2008). A probabilistic programming by demonstration framework handling constraints in joint space and task space. In International Conference on Intelligent Robots and Systems (IROS).
Çalli, B., Singh, A., Bruce, J., Walsman, A., Konolige, K., Srinivasa, S. S., et al. (2017). Yale-cmu-berkeley dataset for robotic manipulation research. International Journal of Robotics Research, 36(3), 261–268.
Article Google Scholar
Camacho, A., & McIlraith, S.A. (2019). Learning interpretable models expressed in linear temporal logic. In Proceedings of the twenty-ninth international conference on automated planning and scheduling, ICAPS 2018, pp. 621–630.
Chou, G., Berenson, D., & Ozay, N. (2018). Learning constraints from demonstrations. Workshop on the Algorithmic Foundations of Robotics (WAFR), arXiv:1812.07084.
Chou, G., Ozay, N., & Berenson, D. (2019). Learning parametric constraints in high dimensions from demonstrations. In 3rd Conference on Robot Learning (CoRL), arXiv:1910.03477.
Chou, G., Ozay, N., & Berenson, D. (2020a). Explaining multi-stage tasks by learning temporal logic formulas from suboptimal demonstrations. In Proceedings of robotics: Science and systems, Corvalis, Oregon, USA.
Chou, G., Ozay, N., & Berenson, D. (2020b). Uncertainty-aware constraint learning for adaptive safe motion planning from demonstrations. In 4th Conference on Robot Learning (CoRL). arXiv:2011.04141.
Chou, G., Ozay, N., & Berenson, D.. (2020c). Learning constraints from locally-optimal demonstrations under cost function uncertainty. In Robotics and Automation Letters (RA-L), arXiv:2001.09336.
De Haan, L., & Ferreira, A. (2007). Extreme value theory: An introduction. Berlin: Springer.
MATH Google Scholar
Demri, S., & Schnoebelen, P. (2002). The complexity of propositional linear temporal logics in simple cases. Information and Computation, 174(1), 84–103.
Article MathSciNet Google Scholar
Deng, X., Mousavian, A., Xiang, Y., Xia, F., Bretl, T., & Fox, D. (2019). Poserbpf: A rao-blackwellized particle filter for 6d object pose estimation. In Robotics: Science and Systems XV.
Englert, P., Vien, N. A., & Toussaint, M. (2017). Inverse kkt: Learning cost functions of manipulation tasks from demonstrations. International Journal of Robotics Research (IJRR), 36(13–14), 1474–1488.
Article Google Scholar
Fu, J., Papusha, I., & Topcu, U. (2017). Sampling-based approximate optimal control under temporal logic constraints. In Proceedings of the 20th international conference on hybrid systems: Computation and control, HSCC 2017, pp. 227–235.
Jha, S.K., Clarke, E.M., Langmead, C.J., Legay, A., Platzer, A., & Zuliani, P. (2009). A bayesian approach to model checking biological systems. In 7th International conference on computational methods in systems biology, CMSB 2009, pp. 218–234.
Jha, S. (2017). susmitjha/telex. https://github.com/susmitjha/TeLEX.
Jha, S., Tiwari, A., Seshia, S. A., Sahai, T., & Shankar, N. (2019). Telex: learning signal temporal logic from positive examples using tightness metric. Formal Methods in System Design, 54(3), 364–387.
Article Google Scholar
Johnson, M., Aghasadeghi, N., & Bretl, T. (2013). Inverse optimal control for deterministic continuous-time nonlinear systems. In IEEE Conference on Decision and Control (CDC).
Keshavarz, A., Wang, Y., & Boyd, S.P. (2011). Imputing a convex objective function. In IEEE International Symposium on Intelligent Control (ISIC), pp. 613–619. IEEE
Knuth, C., Chou, G., Ozay, N., & Berenson, D. (2021). Planning with learned dynamics: Probabilistic guarantees on safety and reachability via lipschitz constants. IEEE Robotics and Automation Letters (RA-L).
Kong, Z., Jones, A., Ayala, A.M., Gol, E.A., & Belta, C. (2014). Temporal logic inference for classification and prediction from data. In 17th International conference on hybrid systems: Computation and control (part of CPS Week), HSCC’14, pp. 273–282.
Kong, Z., Jones, A., & Belta, C. (2017). Temporal logics for learning and detection of anomalous behavior. IEEE Transactions on Automatic Control, 62(3), 1210–1222.
Article MathSciNet Google Scholar
Kress-Gazit, H., Fainekos, G. E., & Pappas, G. J. (2009). Temporal-logic-based reactive mission and motion planning. IEEE Transactions on Robotics, 25(6), 1370–1381.
Article Google Scholar
Krishnan, S., Garg, A., Liaw, R., Thananjeyan, B., Miller, L., Pokorny, F. T., et al. (2019). SWIRL: A sequential windowed inverse reinforcement learning algorithm for robot tasks with delayed rewards. International Journal of Robotics Research (IJRR), 38(2–3), 126–145.
Article Google Scholar
Leung, K., Aréchiga, N., & Pavone, M. (2019). Backpropagation for parametric STL. In 2019 IEEE Intelligent Vehicles Symposium, IV, pp. 185–192.
Li, L., & Fu, J. (2017). Sampling-based approximate optimal temporal logic planning. In 2017 IEEE International Conference on Robotics and Automation, ICRA, pp. 1328–1335.
Neider, D., & Gavran, I. (2018). Learning linear temporal properties. In 2018 Formal Methods in Computer Aided Design, FMCAD 2018, pp. 1–10.
Ng, A.Y., & Russell, S.J. (2000). Algorithms for inverse reinforcement learning. In International Conference on Machine Learning (ICML), pp. 663–670, San Francisco, CA, USA.
Pais, A. L., Umezawa, K., Nakamura, Y., Billard, A. (2013). Learning robot skills through motion segmentation and constraints extraction. ACM/IEEE International Conference on Human-Robot Interaction (HRI).
Papusha, I., Wen, M., & Topcu, U. (2018). Inverse optimal control with regular language specifications. In 2018 Annual American Control Conference. ACC, 2018, 770–777.
Ranchod, P., Rosman, B., & Konidaris, G.D. (2015). Nonparametric bayesian reward segmentation for skill discovery using inverse reinforcement learning. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2015, pp. 471–477.
Ratliff, N.D., Andrew Bagnell, J., & Zinkevich, M. (2006). Maximum margin planning. In Proceedings of the twenty-third international conference on machine learning (ICML 2006), pp. 729–736.
Rusu, R.B., & Cousins, S. (2011). 3d is here: Point cloud library (PCL). In IEEE international conference on robotics and automation, ICRA 2011. IEEE.
Sabatino, F. (2015). Quadrotor control: modeling, nonlinearcontrol design, and simulation.
Sadigh, D., Dragan, A.D., Sastry, S., & Seshia, S.A. (2017). Active preference-based learning of reward functions. In Robotics: Science and Systems XIII.
Schulman, J., Duan, Y., Ho, J., Lee, A. X., Awwal, I., Bradlow, H., et al. (2014). Motion planning with sequential convex optimization and convex collision checking. International Journal of Robotics Research, 33(9), 1251–1270.
Article Google Scholar
Shah, A., Kamath, P., Shah, J.A., & Li, S. (2018). Bayesian inference of temporal task specifications from demonstrations. In Advances in Neural Information Processing Systems (NeurIPS) 2018, pp. 3808–3817.
Vaidyanathan, P., Ivison, R., Bombara, G., DeLateur, N.A., Weiss, R., Densmore, D., & Belta, C. (2017). Grid-based temporal logic inference. In 56th IEEE Annual Conference on Decision and Control, CDC 2017, pp. 5354–5359.
Vazquez-Chanlatte, M., Jha, S., Tiwari, A., Ho, M.K., & Seshia, S.A. (2018). Learning task specifications from demonstrations. In Neural Information Processing Systems 2018, NeurIPS 2018, pp. 5372–5382.
Wächter, A., & Biegler, L. T. (2006). On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Mathematical Programming, 106(1), 25–57.
Article MathSciNet Google Scholar
Weng, T.-W., Zhang, H., Chen, P.-Y., Yi, J., Su, D., Gao, Y., Hsieh, C.-J., & Daniel, L. (2018). Evaluating the robustness of neural networks: An extreme value theory approach. International Conference on Learning Representations (ICLR).
Wolff, E.M., Topcu, U., & Murray, R.M. (2014). Optimization-based trajectory generation with linear temporal logic specifications. In 2014 IEEE International Conference on Robotics and Automation, ICRA, pp. 5319–5325.
Xu, Z., Nettekoven, A.J., Agung Julius, A., & Topcu, U. (2019). Graph temporal logic inference for classification and identification. In 58th IEEE Conference on Decision and Control, CDC 2019, pp. 4761–4768. IEEE.
Zhou, B., Zhao, H., Puig, X., Xiao, T., Fidler, S., Barriuso, A., et al. (2019). Semantic understanding of scenes through the ADE20K dataset. International Journal of Computer Vision, 127(3), 302–321.
Article Google Scholar
Zhou, W., & Li, W. (2018). Safety-aware apprenticeship learning. In 30th International Conference on Computer Aided Verification, CAV 2018, pp. 662–680.

Download references

Acknowledgements

The authors thank Daniel Neider for insightful discussions and members of the Autonomous Robotic Manipulation (ARM) Lab for assistance and advice on the physical experiment. This research was supported in part by an NDSEG fellowship, NSF Grants IIS-1750489 and ECCS-1553873, and ONR Grants N00014-17-1-2050, N00014-18-1-2501, and N00014-21-1-2118.

Author information

Authors and Affiliations

Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, 1301 Beal Avenue, Ann Arbor, MI, 48105, USA
Glen Chou, Necmiye Ozay & Dmitry Berenson

Authors

Glen Chou
View author publications
You can also search for this author in PubMed Google Scholar
Necmiye Ozay
View author publications
You can also search for this author in PubMed Google Scholar
Dmitry Berenson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Glen Chou.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This is one of the several papers published in Autonomous Robots comprising the Special Issue on Robotics: Science and Systems 2020.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 460118 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chou, G., Ozay, N. & Berenson, D. Learning temporal logic formulas from suboptimal demonstrations: theory and experiments. Auton Robot 46, 149–174 (2022). https://doi.org/10.1007/s10514-021-10004-x

Download citation

Received: 14 February 2021
Accepted: 06 July 2021
Published: 30 July 2021
Issue Date: January 2022
DOI: https://doi.org/10.1007/s10514-021-10004-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning temporal logic formulas from suboptimal demonstrations: theory and experiments

Abstract

Access this article

Similar content being viewed by others

A practical guide to multi-objective reinforcement learning and planning

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

A review of motion planning algorithms for intelligent robots

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning temporal logic formulas from suboptimal demonstrations: theory and experiments

Abstract

Access this article

Similar content being viewed by others

A practical guide to multi-objective reinforcement learning and planning

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

A review of motion planning algorithms for intelligent robots

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation