Skip to main content
Log in

Learning temporal logic formulas from suboptimal demonstrations: theory and experiments

  • Published:
Autonomous Robots Aims and scope Submit manuscript

Abstract

We present a method for learning multi-stage tasks from demonstrations by learning the logical structure and atomic propositions of a consistent linear temporal logic (LTL) formula. The learner is given successful but potentially suboptimal demonstrations, where the demonstrator is optimizing a cost function while satisfying the LTL formula, and the cost function is uncertain to the learner. Our algorithm uses the Karush-Kuhn-Tucker (KKT) optimality conditions of the demonstrations together with a counterexample-guided falsification strategy to learn the atomic proposition parameters and logical structure of the LTL formula, respectively. We provide theoretical guarantees on the conservativeness of the recovered atomic proposition sets, as well as completeness in the search for finding an LTL formula consistent with the demonstrations. We evaluate our method on high-dimensional nonlinear systems by learning LTL formulas explaining multi-stage tasks on a simulated 7-DOF arm and a quadrotor, and show that it outperforms competing methods for learning LTL formulas from positive examples. Finally, we demonstrate that our approach can learn a real-world multi-stage tabletop manipulation task on a physical 7-DOF Kuka iiwa arm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

Notes

  1. This problem can also be represented and solved with satisfiability modulo theories (SMT) solvers.

  2. Provided that the remaining higher-cost demonstrations are feasible, they can still be used in the learning process; we can enforce that these demonstrations should still be feasible for any candidate LTL formula.

References

  • Abbeel, P., & Ng, A.Y. (2004). Apprenticeship learning via inverse reinforcement learning. In International conference on machine learning (ICML).

  • Annpureddy, Y., Liu, C., Fainekos, G.E., & Sankaranarayanan, S. (2011). S-taliro: A tool for temporal logic falsification for hybrid systems. In 17th international conference on tools and algorithms for the construction and analysis of systems, TACAS, pp. 254–257.

  • Araki, B., Vodrahalli, K., Leech, T., Vasile, C.I., Donahue, M., & Rus, D. (2019). Learning to plan with logical automata. In Robotics: Science and systems XV.

  • Argall, B., Chernova, S., Veloso, M., & Browning, B. (2009). A survey of robot learning from demonstration. Robotics and Autonomous Systems, 57, 469–483.

    Article  Google Scholar 

  • Baier, C., & Katoen, J.-P. (2008). Principles of model checking. Cambridge: MIT Press.

    MATH  Google Scholar 

  • Bakhirkin, A., Ferrère, T., & Maler, O. (2018). Efficient parametric identification for STL. In Proceedings of the 21st international conference on hybrid systems: Computation and control, pp. 177–186.

  • Bertsimas, D., & Tsitsiklis, J. (1997). Introduction to linear optimization (1st ed.). Belmont: Athena Scientific. ISBN 1886529191.

    Google Scholar 

  • Biere, A., Heljanko, K., Junttila, T.A., Latvala, T., & Schuppan, V. (2006). Linear encodings of bounded LTL model checking. Logical Methods in Computer Science, 2(5).

  • Bombara, G., Vasile, C.I., Penedo, F., Yasuoka, H., & Belta, C. (2016). A decision tree approach to data classification using signal temporal logic. In Proceedings of the 19th international conference on hybrid systems: Computation and control, HSCC 2016, pp. 1–10.

  • Boyd, S., & Vandenberghe, L. (2004). Convex optimization. New York: Cambridge University Press. ISBN 0521833787.

    Book  Google Scholar 

  • Bufo, S., Bartocci, E., Sanguinetti, G., Borelli, M., Lucangelo, U., & Bortolussi, L. (2014). Temporal logic based monitoring of assisted ventilation in intensive care patients. In Leveraging Applications of Formal Methods, Verification and Validation. Specialized Techniques and Applications—6th International Symposium, ISoLA 2014, pp. 391–403.

  • Calinon, S., & Billard, A. (2008). A probabilistic programming by demonstration framework handling constraints in joint space and task space. In International Conference on Intelligent Robots and Systems (IROS).

  • Çalli, B., Singh, A., Bruce, J., Walsman, A., Konolige, K., Srinivasa, S. S., et al. (2017). Yale-cmu-berkeley dataset for robotic manipulation research. International Journal of Robotics Research, 36(3), 261–268.

    Article  Google Scholar 

  • Camacho, A., & McIlraith, S.A. (2019). Learning interpretable models expressed in linear temporal logic. In Proceedings of the twenty-ninth international conference on automated planning and scheduling, ICAPS 2018, pp. 621–630.

  • Chou, G., Berenson, D., & Ozay, N. (2018). Learning constraints from demonstrations. Workshop on the Algorithmic Foundations of Robotics (WAFR), arXiv:1812.07084.

  • Chou, G., Ozay, N., & Berenson, D. (2019). Learning parametric constraints in high dimensions from demonstrations. In 3rd Conference on Robot Learning (CoRL), arXiv:1910.03477.

  • Chou, G., Ozay, N., & Berenson, D. (2020a). Explaining multi-stage tasks by learning temporal logic formulas from suboptimal demonstrations. In Proceedings of robotics: Science and systems, Corvalis, Oregon, USA.

  • Chou, G., Ozay, N., & Berenson, D. (2020b). Uncertainty-aware constraint learning for adaptive safe motion planning from demonstrations. In 4th Conference on Robot Learning (CoRL). arXiv:2011.04141.

  • Chou, G., Ozay, N., & Berenson, D.. (2020c). Learning constraints from locally-optimal demonstrations under cost function uncertainty. In Robotics and Automation Letters (RA-L), arXiv:2001.09336.

  • De Haan, L., & Ferreira, A. (2007). Extreme value theory: An introduction. Berlin: Springer.

    MATH  Google Scholar 

  • Demri, S., & Schnoebelen, P. (2002). The complexity of propositional linear temporal logics in simple cases. Information and Computation, 174(1), 84–103.

    Article  MathSciNet  Google Scholar 

  • Deng, X., Mousavian, A., Xiang, Y., Xia, F., Bretl, T., & Fox, D. (2019). Poserbpf: A rao-blackwellized particle filter for 6d object pose estimation. In Robotics: Science and Systems XV.

  • Englert, P., Vien, N. A., & Toussaint, M. (2017). Inverse kkt: Learning cost functions of manipulation tasks from demonstrations. International Journal of Robotics Research (IJRR), 36(13–14), 1474–1488.

    Article  Google Scholar 

  • Fu, J., Papusha, I., & Topcu, U. (2017). Sampling-based approximate optimal control under temporal logic constraints. In Proceedings of the 20th international conference on hybrid systems: Computation and control, HSCC 2017, pp. 227–235.

  • Jha, S.K., Clarke, E.M., Langmead, C.J., Legay, A., Platzer, A., & Zuliani, P. (2009). A bayesian approach to model checking biological systems. In 7th International conference on computational methods in systems biology, CMSB 2009, pp. 218–234.

  • Jha, S. (2017). susmitjha/telex. https://github.com/susmitjha/TeLEX.

  • Jha, S., Tiwari, A., Seshia, S. A., Sahai, T., & Shankar, N. (2019). Telex: learning signal temporal logic from positive examples using tightness metric. Formal Methods in System Design, 54(3), 364–387.

    Article  Google Scholar 

  • Johnson, M., Aghasadeghi, N., & Bretl, T. (2013). Inverse optimal control for deterministic continuous-time nonlinear systems. In IEEE Conference on Decision and Control (CDC).

  • Keshavarz, A., Wang, Y., & Boyd, S.P. (2011). Imputing a convex objective function. In IEEE International Symposium on Intelligent Control (ISIC), pp. 613–619. IEEE

  • Knuth, C., Chou, G., Ozay, N., & Berenson, D. (2021). Planning with learned dynamics: Probabilistic guarantees on safety and reachability via lipschitz constants. IEEE Robotics and Automation Letters (RA-L).

  • Kong, Z., Jones, A., Ayala, A.M., Gol, E.A., & Belta, C. (2014). Temporal logic inference for classification and prediction from data. In 17th International conference on hybrid systems: Computation and control (part of CPS Week), HSCC’14, pp. 273–282.

  • Kong, Z., Jones, A., & Belta, C. (2017). Temporal logics for learning and detection of anomalous behavior. IEEE Transactions on Automatic Control, 62(3), 1210–1222.

    Article  MathSciNet  Google Scholar 

  • Kress-Gazit, H., Fainekos, G. E., & Pappas, G. J. (2009). Temporal-logic-based reactive mission and motion planning. IEEE Transactions on Robotics, 25(6), 1370–1381.

    Article  Google Scholar 

  • Krishnan, S., Garg, A., Liaw, R., Thananjeyan, B., Miller, L., Pokorny, F. T., et al. (2019). SWIRL: A sequential windowed inverse reinforcement learning algorithm for robot tasks with delayed rewards. International Journal of Robotics Research (IJRR), 38(2–3), 126–145.

    Article  Google Scholar 

  • Leung, K., Aréchiga, N., & Pavone, M. (2019). Backpropagation for parametric STL. In 2019 IEEE Intelligent Vehicles Symposium, IV, pp. 185–192.

  • Li, L., & Fu, J. (2017). Sampling-based approximate optimal temporal logic planning. In 2017 IEEE International Conference on Robotics and Automation, ICRA, pp. 1328–1335.

  • Neider, D., & Gavran, I. (2018). Learning linear temporal properties. In 2018 Formal Methods in Computer Aided Design, FMCAD 2018, pp. 1–10.

  • Ng, A.Y., & Russell, S.J. (2000). Algorithms for inverse reinforcement learning. In International Conference on Machine Learning (ICML), pp. 663–670, San Francisco, CA, USA.

  • Pais, A. L., Umezawa, K., Nakamura, Y., Billard, A. (2013). Learning robot skills through motion segmentation and constraints extraction. ACM/IEEE International Conference on Human-Robot Interaction (HRI).

  • Papusha, I., Wen, M., & Topcu, U. (2018). Inverse optimal control with regular language specifications. In 2018 Annual American Control Conference. ACC, 2018, 770–777.

  • Ranchod, P., Rosman, B., & Konidaris, G.D. (2015). Nonparametric bayesian reward segmentation for skill discovery using inverse reinforcement learning. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2015, pp. 471–477.

  • Ratliff, N.D., Andrew Bagnell, J., & Zinkevich, M. (2006). Maximum margin planning. In Proceedings of the twenty-third international conference on machine learning (ICML 2006), pp. 729–736.

  • Rusu, R.B., & Cousins, S. (2011). 3d is here: Point cloud library (PCL). In IEEE international conference on robotics and automation, ICRA 2011. IEEE.

  • Sabatino, F. (2015). Quadrotor control: modeling, nonlinearcontrol design, and simulation.

  • Sadigh, D., Dragan, A.D., Sastry, S., & Seshia, S.A. (2017). Active preference-based learning of reward functions. In Robotics: Science and Systems XIII.

  • Schulman, J., Duan, Y., Ho, J., Lee, A. X., Awwal, I., Bradlow, H., et al. (2014). Motion planning with sequential convex optimization and convex collision checking. International Journal of Robotics Research, 33(9), 1251–1270.

    Article  Google Scholar 

  • Shah, A., Kamath, P., Shah, J.A., & Li, S. (2018). Bayesian inference of temporal task specifications from demonstrations. In Advances in Neural Information Processing Systems (NeurIPS) 2018, pp. 3808–3817.

  • Vaidyanathan, P., Ivison, R., Bombara, G., DeLateur, N.A., Weiss, R., Densmore, D., & Belta, C. (2017). Grid-based temporal logic inference. In 56th IEEE Annual Conference on Decision and Control, CDC 2017, pp. 5354–5359.

  • Vazquez-Chanlatte, M., Jha, S., Tiwari, A., Ho, M.K., & Seshia, S.A. (2018). Learning task specifications from demonstrations. In Neural Information Processing Systems 2018, NeurIPS 2018, pp. 5372–5382.

  • Wächter, A., & Biegler, L. T. (2006). On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Mathematical Programming, 106(1), 25–57.

    Article  MathSciNet  Google Scholar 

  • Weng, T.-W., Zhang, H., Chen, P.-Y., Yi, J., Su, D., Gao, Y., Hsieh, C.-J., & Daniel, L. (2018). Evaluating the robustness of neural networks: An extreme value theory approach. International Conference on Learning Representations (ICLR).

  • Wolff, E.M., Topcu, U., & Murray, R.M. (2014). Optimization-based trajectory generation with linear temporal logic specifications. In 2014 IEEE International Conference on Robotics and Automation, ICRA, pp. 5319–5325.

  • Xu, Z., Nettekoven, A.J., Agung Julius, A., & Topcu, U. (2019). Graph temporal logic inference for classification and identification. In 58th IEEE Conference on Decision and Control, CDC 2019, pp. 4761–4768. IEEE.

  • Zhou, B., Zhao, H., Puig, X., Xiao, T., Fidler, S., Barriuso, A., et al. (2019). Semantic understanding of scenes through the ADE20K dataset. International Journal of Computer Vision, 127(3), 302–321.

    Article  Google Scholar 

  • Zhou, W., & Li, W. (2018). Safety-aware apprenticeship learning. In 30th International Conference on Computer Aided Verification, CAV 2018, pp. 662–680.

Download references

Acknowledgements

The authors thank Daniel Neider for insightful discussions and members of the Autonomous Robotic Manipulation (ARM) Lab for assistance and advice on the physical experiment. This research was supported in part by an NDSEG fellowship, NSF Grants IIS-1750489 and ECCS-1553873, and ONR Grants N00014-17-1-2050, N00014-18-1-2501, and N00014-21-1-2118.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Glen Chou.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This is one of the several papers published in Autonomous Robots comprising the Special Issue on Robotics: Science and Systems 2020.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 460118 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chou, G., Ozay, N. & Berenson, D. Learning temporal logic formulas from suboptimal demonstrations: theory and experiments. Auton Robot 46, 149–174 (2022). https://doi.org/10.1007/s10514-021-10004-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10514-021-10004-x

Keywords

Navigation