JSE: Joint Semantic Encoder for zero-shot gesture learning

Madapana, Naveen; Wachs, Juan

doi:10.1007/s10044-021-00992-y

JSE: Joint Semantic Encoder for zero-shot gesture learning

Original Article
Published: 11 June 2021

Volume 25, pages 679–692, (2022)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

352 Accesses
2 Citations
2 Altmetric
Explore all metrics

Abstract

Zero-shot learning (ZSL) is a transfer learning paradigm that aims to recognize unseen categories just by having a high-level description of them. While deep learning has greatly pushed the limits of ZSL for object classification, ZSL for gesture recognition (ZSGL) remains largely unexplored. Previous attempts to address ZSGL were focused on the creation of gesture attributes and algorithmic improvements, and there is little or no research concerned with feature selection for ZSGL. It is indisputable that deep learning has obviated the need for feature engineering for problems with large datasets. However, when the data are scarce, it is critical to leverage the domain information to create discriminative input features. The main goal of this work is to study the effect of three different feature extraction techniques (velocity, heuristical and latent features) on the performance of ZSGL. In addition, we propose a bilinear auto-encoder approach, referred to as Joint Semantic Encoder (JSE), for ZSGL that jointly minimizes the reconstruction, semantic and classification losses. We conducted extensive experiments to compare and contrast the feature extraction techniques and to evaluate the performance of JSE with respect to existing ZSL methods. For attribute-based classification scenario, irrespective of the feature type, results showed that JSE outperforms other approaches by 5% (p<0.01). When JSE is trained with heuristical features in across-category condition, we showed that JSE significantly outperforms other methods by 5% (p<0.01)).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 11

Real-time one-shot learning gesture recognition based on lightweight 3D Inception-ResNet with separable convolutions

Article 23 April 2021

Lianwei Li, Shiyin Qin, … Zhongying Hu

ZS-GR: zero-shot gesture recognition from RGB-D videos

Article 25 April 2023

Razieh Rastgoo, Kourosh Kiani & Sergio Escalera

Learning a compact embedding for fine-grained few-shot static gesture recognition

Article 02 March 2024

Zhipeng Hu, Feng Qiu, … Changjie Fan

References

Microsoft HoloLens 2: https://www.microsoft.com/en-us/hololens/. https://www.microsoft.com/en-us/hololens/
Bartels RH, Stewart GW (1972) Solution of the matrix equation ax + xb = c [f4]. Commun ACM 15(9):820–826. https://doi.org/10.1145/361573.361582
Article MATH Google Scholar
Boonchuay K, Sinapiromsaran K, Lursinsap C (2017) Decision tree induction based on minority entropy for the class imbalance problem. Pattern Anal Appl 20(3):769–782
Article MathSciNet Google Scholar
Changpinyo S, Chao W.L, Gong B, Sha F (2016) Synthesized classifiers for zero-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5327–5336
Chao W.L, Changpinyo S, Gong B, Sha F (2016) An empirical study and analysis of generalized zero-shot learning for object recognition in the wild. arXiv:1605.04253 [cs]
Cheng H, Yang L, Liu Z (2016) Survey on 3d hand gesture recognition. IEEE Trans Circuits Syst Video Technol 26(9):1659–1673
Article Google Scholar
Escalera S, Gonzàlez J, Baró X, Reyes M, Lopes O, Guyon I, Athitsos V, Escalante H (2013) Multi-modal gesture recognition challenge 2013: Dataset and results. In: Proceedings of the 15th ACM on International conference on multimodal interaction, pp. 445–452. ACM
Farhadi A, Endres I, Hoiem D, Forsyth D (2009) Describing objects by their attributes. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1778–1785 . https://doi.org/10.1109/CVPR.2009.5206772
Fe-Fei L, (2003) et al.: A Bayesian approach to unsupervised one-shot learning of object categories. In: Proceedings Ninth IEEE International Conference on Computer Vision, pp. 1134–1141. IEEE
Fei-Fei L (2006) Knowledge transfer in learning to recognize visual objects classes. In: Proceedings of the International Conference on Development and Learning (ICDL), p. 11
Fei-Fei L, Fergus R, Perona P (2006) One-shot learning of object categories. IEEE Trans Pattern Anal Mach Intell 28(4):594–611. https://doi.org/10.1109/TPAMI.2006.79
Article Google Scholar
Fei-Fei L, Fergus R, Perona P (2006) One-shot learning of object categories. IEEE Trans Pattern Anal Mach Intell 28(4):594–611
Article Google Scholar
Fothergill S, Mentis H, Kohli P, Nowozin S (2012) Instructing people for training gestural interactive systems. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’12, pp. 1737–1746. ACM, New York, NY, USA . https://doi.org/10.1145/2207676.2208303
Fu Y, Hospedales TM, Xiang T, Gong S (2015) Transductive Multi-view Zero-Shot Learning. IEEE Trans Pattern Anal Mach Intell 37(11):2332–2345. https://doi.org/10.1109/TPAMI.2015.2408354 arxiv.org/abs/1501.04560
Article Google Scholar
Fu Y, Xiang T, Jiang YG, Xue X, Sigal L, Gong S (2018) Recent advances in zero-shot recognition: toward data-efficient understanding of visual content. IEEE Signal Process Mag 35(1):112–125
Article Google Scholar
Gao J, Zhang T, Xu C (2019) I know the relationships: zero-shot action recognition via two-stream graph convolutional networks and knowledge graphs. Proceedings of the AAAI Conference on Artificial Intelligence 33:8303–8311
Article Google Scholar
Ghosh P, Saini N, Davis L.S, Shrivastava A (2020) All about knowledge graphs for actions. arXiv preprint arXiv:2008.12432
Hahn M, Silva A, Rehg J.M (2019) Action2vec: A crossmodal embedding approach to action learning. arXiv preprint arXiv:1901.00484
Istance H, Hyrskykari A, Immonen L, Mansikkamaa S, Vickers S (2010) Designing gaze gestures for gaming: an investigation of performance. In: Proceedings of the 2010 Symposium on Eye-Tracking Research & Applications, ETRA ’10, pp. 323–330. ACM, New York, NY, USA . https://doi.org/10.1145/1743666.1743740
Junior VLE, Pedrini H, Menotti D (2019)Zero-shot action recognition in videos: a survey. arXiv preprint arXiv:1909.06423
Kim J, Oh TH Lee S ,Pan F, Kweon IS (2019) Variational prototyping-encoder: one-shot learning with prototypical images. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114
Kodirov E, Xiang T, Fu Z, Gong S (2015) Unsupervised domain adaptation for zero-shot learning. In: Proceedings of the IEEE international conference on computer vision, pp. 2452–2460
Kodirov E, Xiang T, Gong S (2017) Semantic autoencoder for zero-shot learning. arXiv preprint arXiv:1704.08345
Lampert CH, Nickisch H, Harmeling S (2009) Learning to detect unseen object classes by between-class attribute transfer. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 951–958 . https://doi.org/10.1109/CVPR.2009.5206594
Lampert CH, Nickisch H, Harmeling S (2014) Attribute-based classification for zero-shot visual object categorization. IEEE Trans Pattern Anal Mach Intell 36(3):453–465. https://doi.org/10.1109/TPAMI.2013.140
Article Google Scholar
Madapana N, Gonzalez G, Rodgers R, Zhang L, Wachs JP (2018) Gestures for picture archiving and communication systems (PACS) operation in the operating room: is there any standard? PLoS ONE 13(6):e0198092
Article Google Scholar
Madapana N, Wachs J (2017) Zsgl: Zero shot gestural learning. In: Proceedings of the 19th ACM International Conference on Multimodal Interaction, ICMI 2017, pp. 331–335. ACM, New York, NY, USA . https://doi.org/10.1145/3136755.3136774
Madapana N, Wachs J (2019) Database of gesture attributes: zero shot learning for gesture recognition. In: 2019 14th IEEE International Conference on Automatic Face Gesture Recognition (FG 2019), pp. 1–8 . https://doi.org/10.1109/FG.2019.8756548
Madapana N, Wachs J (2020) Feature selection for zero-shot gesture recognition. In: 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020) (FG), pp. 309–313. IEEE Computer Society, Los Alamitos, CA, USA . https://doi.org/10.1109/FG47880.2020.00046
Massaroni C, Giurazza F, Tesei M, Schena E, Corvino F, Meneo M, Corletti L, Niola R, Setola R (2018) A touchless system for image visualization during surgery: preliminary experience in clinical settings. In: 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 5794–5797. IEEE
Maszczyk T, Duch W (2008) Comparison of Shannon, Renyi and Tsallis entropy used in decision trees. In: International Conference on Artificial Intelligence and Soft Computing, pp. 643–651. Springer
Mishra A, Krishna Reddy S, Mittal A, Murthy H.A (2018) A generative model for zero shot learning using conditional variational autoencoders. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 2188–2196
Mishra A, Pandey A, Murthy HA (2020) Zero-shot learning for action recognition using synthesized features. Neurocomputing 390:117–130 https://doi.org/10.1016/j.neucom.2020.01.078. http://www.sciencedirect.com/science/article/pii/S0925231220301302
Mitra S, Acharya T (2007) Gesture recognition: a survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 37(3), 311–324
Nishikawa A, Hosoi T, Koara K, Negoro D, Hikita A, Asano S, Kakutani H, Miyazaki F, Sekimoto M, Yasui M et al (2003) Face mouse: a novel human-machine interface for controlling the position of a laparoscope. IEEE Trans Robot Autom 19(5):825–841
Article Google Scholar
Norouzi M, Mikolov T, Bengio S, Singer Y, Shlens J, Frome A, Corrado GS, Dean J (2013)Zero-shot learning by convex combination of semantic embeddings. arXiv preprint arXiv:1312.5650
O’Hara K, Gonzalez G, Sellen A, Penney G, Varnavas A, Mentis H, Criminisi A, Corish R, Rouncefield M, Dastur N, Carrell T (2014) Touchless interaction in surgery. Commun. ACM 57(1):70–77. https://doi.org/10.1145/2541883.2541899
Palatucci M, Pomerleau D, Hinton GE, Mitchell TM (2009) Zero-shot learning with semantic output codes. In: Y. Bengio, D. Schuurmans, JD. Lafferty, CKI Williams, A. Culotta (eds.) Advances in Neural Information Processing Systems 22, pp. 1410–1418. Curran Associates, Inc. . http://papers.nips.cc/paper/3650-zero-shot-learning-with-semantic-output-codes.pdf
Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359. https://doi.org/10.1109/TKDE.2009.191
Article Google Scholar
Patterson G, Hays J (2012) SUN attribute database: discovering, annotating, and recognizing scene attributes. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2751–2758 . https://doi.org/10.1109/CVPR.2012.6247998
Pustejovsky J, Krishnaswamy N (2020) Situated meaning in multimodal dialogue: human-robot and human-computer interactions . http://www.voxicon.net/wp-content/uploads/2020/07/TAL_2020-13.pdf
Rahman S, Khan SH, Porikli F (2017) A unified approach for conventional zero-shot, generalized zero-shot and few-shot learning. arXiv:1706.08653 [cs] . http://arxiv.org/abs/1706.08653. ArXiv: 1706.08653
Rautaray SS, Agrawal A (2015) Vision based hand gesture recognition for human computer interaction: a survey. Artif Intell Rev 43(1):1–54. https://doi.org/10.1007/s10462-012-9356-9
Article Google Scholar
Romera-Paredes B, Torr P (2015) An embarrassingly simple approach to zero-shot learning. In: F. Bach, D. Blei (eds.) Proceedings of the 32nd International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 37, pp. 2152–2161. PMLR, Lille, France . http://proceedings.mlr.press/v37/romera-paredes15.html
Ruffieux S, Lalanne D, Mugellini E, Abou Khaled O (2014) A survey of datasets for human gesture recognition. In: M. Kurosu (ed.) Human-Computer Interaction. Advanced Interaction Modalities and Techniques, pp. 337–348. Springer International Publishing, Cham
Shannon CE (1948) A mathematical theory of communication. The Bell Syst Tech J 27(3):379–423
Article MathSciNet Google Scholar
Smaira L, Carreira J, Noland E, Clancy E, Wu A, Zisserman A (2020) A short note on the kinetics-700-2020 human action dataset
Socher R, Ganjoo M, Manning CD, Ng A (2013) Zero-shot learning through cross-modal transfer. In: CJC. Burges, L Bottou, M Welling, Z Ghahramani, KQ Weinberger (eds.) Advances in Neural Information Processing Systems 26, pp. 935–943. Curran Associates, Inc. . http://papers.nips.cc/paper/5027-zero-shot-learning-through-cross-modal-transfer.pdf
Thomason W, Knepper R (2016) Recognizing unfamiliar gestures for human-robot interaction through zero-shot learning
Vatavu RD (2012) User-defined Gestures for Free-hand TV Control. In: Proceedings of the 10th European Conference on Interactive TV and Video, EuroITV ’12, pp. 45–48. ACM, New York, NY, USA . https://doi.org/10.1145/2325616.2325626
Vatavu RD, Wobbrock JO (2015) Formalizing agreement analysis for elicitation studies: new measures, significance test, and toolkit. In: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, pp. 1325–1334
Vatavu RD, Wobbrock JO (2016) Between-subjects elicitation studies: formalization and tool support. In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, pp. 3390–3402. ACM
Wachs J, Stern H, Edan Y, Gillam M, Feied C, Smith M, Handler J (2006) A real-time hand gesture interface for medical visualization applications. In: Applications of Soft Computing, pp. 153–162. Springer
Wachs J, Stern H, Edan Y, Gillam M, Feied C, Smith M, Handler J (2007) Gestix: a doctor-computer sterile gesture interface for dynamic environments. In: Soft Computing in Industrial Applications, pp. 30–39. Springer
Wan J, Li S.Z, Zhao Y, Zhou S, Guyon I, Escalera S (2016) Chalearn looking at people rgb-d isolated and continuous datasets for gesture recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 761–769 . https://doi.org/10.1109/CVPRW.2016.100
Wang W, Zheng VW, Yu H, Miao C (2019) A survey of zero-shot learning: settings, methods, and applications. ACM Transac Intel Syst Technol (TIST) 10(2):13
Google Scholar
Wang Y, Yu T, Shi L, Li Z (2008) Using human body gestures as inputs for gaming via depth analysis. In: 2008 IEEE International Conference on Multimedia and Expo, pp. 993–996 . https://doi.org/10.1109/ICME.2008.4607604
Wipfli R, Dubois-Ferriere V, Budry S, Hoffmeyer P, Lovis C (2016) Gesture-controlled image management for operating room: a randomized crossover study to compare interaction using gestures, mouse, and third person relaying. PloS one 11(4)
Wu J, Li K, Zhao X, Tan M (2018) Unfamiliar dynamic hand gestures recognition based on zero-shot learning. In: Cheng L, Leung ACS, Ozawa S (eds) Neural information processing. Springer International Publishing, Cham, pp 244–254
Google Scholar
Xian Y, Lampert CH, Schiele B, Akata Z (2018) Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. IEEE transactions on pattern analysis and machine intelligence
Yadav V, Bethard S (2019) A survey on recent advances in named entity recognition from deep learning models. CoRR abs/1910.11470. http://arxiv.org/abs/1910.11470
Zhu Y, Li X, Liu C, Zolfaghari M, Xiong Y, Wu C, Zhang Z, Tighe J, Manmatha R, Li M (2020) A comprehensive study of deep video action recognition

Download references

Funding

This work is supported by the Agency for Healthcare Research and Quality (AHRQ), National Institute of Health (NIH) - under the Project No. 1R18HS024887-01. Opinions, interpretations, conclusions, and recommendations are those of the author and are not necessarily endorsed by NIH.

Author information

Authors and Affiliations

School of Industrial Engineering, Purdue University, West Lafayette, IN 47906, United States
Naveen Madapana & Juan Wachs

Authors

Naveen Madapana
View author publications
You can also search for this author in PubMed Google Scholar
Juan Wachs
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Juan Wachs.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

As this article does not involve human participants, there is no such informed consent.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Madapana, N., Wachs, J. JSE: Joint Semantic Encoder for zero-shot gesture learning. Pattern Anal Applic 25, 679–692 (2022). https://doi.org/10.1007/s10044-021-00992-y

Download citation

Received: 03 August 2020
Accepted: 05 May 2021
Published: 11 June 2021
Issue Date: August 2022
DOI: https://doi.org/10.1007/s10044-021-00992-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

JSE: Joint Semantic Encoder for zero-shot gesture learning

Abstract

Access this article

Similar content being viewed by others

Real-time one-shot learning gesture recognition based on lightweight 3D Inception-ResNet with separable convolutions

ZS-GR: zero-shot gesture recognition from RGB-D videos

Learning a compact embedding for fine-grained few-shot static gesture recognition

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

JSE: Joint Semantic Encoder for zero-shot gesture learning

Abstract

Access this article

Similar content being viewed by others

Real-time one-shot learning gesture recognition based on lightweight 3D Inception-ResNet with separable convolutions

ZS-GR: zero-shot gesture recognition from RGB-D videos

Learning a compact embedding for fine-grained few-shot static gesture recognition

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation