Adding Knowledge to Unsupervised Algorithms for the Recognition of Intent

Synakowski, Stuart; Feng, Qianli; Martinez, Aleix

doi:10.1007/s11263-020-01404-0

Adding Knowledge to Unsupervised Algorithms for the Recognition of Intent

Published: 05 January 2021

Volume 129, pages 942–959, (2021)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

665 Accesses
3 Citations
3 Altmetric
Explore all metrics

Abstract

Computer vision algorithms performance are near or superior to humans in the visual problems including object recognition (especially those of fine-grained categories), segmentation, and 3D object reconstruction from 2D views. Humans are, however, capable of higher-level image analyses. A clear example, involving theory of mind, is our ability to determine whether a perceived behavior or action was performed intentionally or not. In this paper, we derive an algorithm that can infer whether the behavior of an agent in a scene is intentional or unintentional based on its 3D kinematics, using the knowledge of self-propelled motion, Newtonian motion and their relationship. We show how the addition of this basic knowledge leads to a simple, unsupervised algorithm. To test the derived algorithm, we constructed three dedicated datasets from abstract geometric animation to realistic videos of agents performing intentional and non-intentional actions. Experiments on these datasets show that our algorithm can recognize whether an action is intentional or not, even without training data. The performance is comparable to various supervised baselines quantitatively, with sensible intentionality segmentation qualitatively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Object-Based Active Inference

Visual Concept Recognition and Localization via Iterative Introspection

Disentangling What and Where for 3D Object-Centric Representations Through Active Inference

Notes

Standalone means this concept only focuses on the movement at a specific time point rather than the relationship between actions.
Here we are using the Computational, Algorithmic, and Implementational level from David Marr Marr (1982). The implementational level is not discussed since our work does not contribute to that specific level.
https://www.mixamo.com/.
However, one should also notice that acting to be non-intentional does not mean the action and kinematics of the agent lacks the characteristic of the genuine non-intentional movement.
This experiment was added during the revision phase of this paper.

References

Aditya, S., Yang, Y., Baral, C., Fermuller, C., & Aloimonos, Y. (2015) Visual commonsense for scene understanding using perception, semantic parsing and reasoning. In 2015 AAAI spring symposium series.
Aristotle, F. (1926). The art of rhetoric (Vol. 2). Cambridge, MA: Harvard University Press.
Google Scholar
Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
Cao, Z., Hidalgo, G., Simon, T., Wei, S.E., & Sheikh, Y. (2018). OpenPose: Realtime multi-person 2D pose estimation using part affinity fields. arXiv preprint arXiv:1812.08008.
Chambon, V., Domenech, P., Jacquet, P. O., Barbalat, G., Bouton, S., Pacherie, E., et al. (2017). Neural coding of prior expectations in hierarchical intention inference. Scientific Reports, 7(1), 1278.
Article Google Scholar
Chambon, V., Domenech, P., Pacherie, E., Koechlin, E., Baraduc, P., & Farrer, C. (2011). What are they up to? the role of sensory evidence and prior knowledge in action understanding. PloS One, 6(2), e17133.
Article Google Scholar
Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., & Ouyang, W., et al. (2019). Hybrid task cascade for instance segmentation. arXiv preprint arXiv:1901.07518.
Del Rincón, J. M., Santofimia, M. J., & Nebel, J. C. (2013). Common-sense reasoning for human action recognition. Pattern Recognition Letters, 34(15), 1849–1860.
Article Google Scholar
Descartes, R., & Lafleur, L. J. (1960). Meditations on first philosophy. New York: Bobbs-Merrill.
Google Scholar
Epstein, D., Chen, B., & Vondrick, C. (2020). Oops! predicting unintentional action in video. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 919–929).
Fang, Z., & López, A. M. (2019). Intention recognition of pedestrians and cyclists by 2d pose estimation. IEEE Transactions on Intelligent Transportation Systems.
Hastie, T., Tibshirani, & R. Friedman, J., (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. Springer-Verlag New York. pp 37–38
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).
Heider, F., & Simmel, M. (1944). An experimental study of apparent behavior. The American Journal of Psychology, 57, 243–259.
Article Google Scholar
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
Article Google Scholar
Luo, Y., & Baillargeon, R. (2005). Can a self-propelled box have a goal? psychological reasoning in 5-month-old infants. Psychological Science, 16(8), 601–608.
Article Google Scholar
Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. USA: Henry Holt and Co.Inc.
Google Scholar
Martinez, J., Hossain, R., Romero, J., & Little, J. J. (2017). A simple yet effective baseline for 3d human pose estimation. In Proceedings of the IEEE international conference on computer vision (pp. 2640–2649).
Miller, G. (1998). WordNet: An electronic lexical database. Cambridge: MIT press.
MATH Google Scholar
Ravichandar, H. C., & Dani, A. P. (2017). Human intention inference using expectation-maximization algorithm with online model learning. IEEE Transactions on Automation Science and Engineering, 14(2), 855–868.
Article Google Scholar
Rudenko, A., Palmieri, L., Herman, M., Kitani, K.M., Gavrila, D.M., & Arras, K.O. (2019). Human motion trajectory prediction: A survey. arXiv preprint arXiv:1905.06113.
Sartori, L., Becchio, C., & Castiello, U. (2011). Cues to intention: The role of movement information. Cognition, 119(2), 242–252.
Article Google Scholar
Speer, R., Chin, J., & Havasi, C. (2017). Conceptnet 5.5: An open multilingual graph of general knowledge. In Thirty-first AAAI conference on artificial intelligence.
Tozeren, A. (2000). Human body dynamics: Classical mechanics and human movement. New York: Springer Publishing.
Google Scholar
Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., & Paluri, M. (2018). A closer look at spatiotemporal convolutions for action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6450–6459).
Ullman, T., Baker, C., Macindoe, O., Evans, O., Goodman, N., & Tenenbaum, J. B. (2009). Help or hinder: Bayesian models of social goal inference. In Advances in neural information processing systems (pp. 1874–1882).
Varytimidis, D., Alonso-Fernandez, F., Duran, B., & Englund, C. (2018). Action and intention recognition of pedestrians in urban traffic. In 2018 14th International conference on signal-image technology & internet-based systems (SITIS) (pp. 676–682). IEEE.
Vondrick, C., Oktay, D., Pirsiavash, H., & Torralba, A. (2016). Predicting motivations of actions by leveraging text. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2997–3005).
Wei, P., Liu, Y., Shu, T., Zheng, N., & Zhu, S.C. (2018). Where and why are they looking? jointly inferring human attention and intentions in complex tasks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6801–6809).
Wilson, G., & Shpall, S. (2016). Action. In E. N. Zalta (Ed.), The Stanford Encyclopedia of Philosophy, winter (2016th ed.). Metaphysics Research Lab: Stanford University.
Google Scholar
Yeung, S., Russakovsky, O., Jin, N., Andriluka, M., Mori, G., & Fei-Fei, L. (2018). Every moment counts: Dense detailed labeling of actions in complex videos. International Journal of Computer Vision, 126(2–4), 375–389.
Article MathSciNet Google Scholar
You, D., Hamsici, O. C., & Martinez, A. M. (2011). Kernel optimization in discriminant analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(3), 631–638.
Article Google Scholar
Zellers, R., Bisk, Y., Farhadi, A., & Choi, Y. (2018). From recognition to cognition: Visual commonsense reasoning. arXiv preprint arXiv:1811.10830.

Download references

Acknowledgements

This research was supported by the National Institutes of Health (NIH), Grants R01-DC-014498 and R01-EY-020834, the Human Frontier Science Program (HFSP), Grant RGP0036/2016, and a grant from Ohio State’s Center for Cognitive and Brain Sciences.

Author information

S. Synakowski and Qianli Feng have contributed equally to this work.

Authors and Affiliations

Department of Electrical and Computer Engineering, The Ohio State University, Columbus, OH, 43210, USA
Stuart Synakowski, Qianli Feng & Aleix Martinez

Authors

Stuart Synakowski
View author publications
You can also search for this author in PubMed Google Scholar
Qianli Feng
View author publications
You can also search for this author in PubMed Google Scholar
Aleix Martinez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qianli Feng.

Additional information

Communicated by Deva Ramanan.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Synakowski, S., Feng, Q. & Martinez, A. Adding Knowledge to Unsupervised Algorithms for the Recognition of Intent. Int J Comput Vis 129, 942–959 (2021). https://doi.org/10.1007/s11263-020-01404-0

Download citation

Received: 15 May 2019
Accepted: 07 November 2020
Published: 05 January 2021
Issue Date: April 2021
DOI: https://doi.org/10.1007/s11263-020-01404-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adding Knowledge to Unsupervised Algorithms for the Recognition of Intent

Abstract

Access this article

Similar content being viewed by others

Object-Based Active Inference

Visual Concept Recognition and Localization via Iterative Introspection

Disentangling What and Where for 3D Object-Centric Representations Through Active Inference

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Adding Knowledge to Unsupervised Algorithms for the Recognition of Intent

Abstract

Access this article

Similar content being viewed by others

Object-Based Active Inference

Visual Concept Recognition and Localization via Iterative Introspection

Disentangling What and Where for 3D Object-Centric Representations Through Active Inference

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation