A novel shape matching descriptor for real-time static hand gesture recognition
Graphical abstract
Introduction
Hand gestures are a fundamental aspect in human to human interaction, especially in non-verbal communications (Rautaray and Agrawal, 2015). Thus it comes as no surprise that research involving hand gestures is growing in the field of human–computer interaction. Multiple different applications can benefit through the use of communication with hand gestures such as telerobotics (Zhong et al., 2013), virtual reality and augmented reality applications, gaming (Zhu and Yuan, 2014) and sign language translation (Parton, 2006).
Classic computer vision methods along with machine learning techniques have been proposed over the years in order to facilitate human–computer interaction through the use of extracting semantic information from hand gestures. Some of the main areas of research regarding hand gestures include hand gesture detection, segmentation and recognition (Ibraheem and Khan, 2012). In recent years the main line of research that predominated the research community for computer vision problems is the use of machine learning, specifically deep learning (Krizhevsky et al., 2012).
However, the bottleneck of deep learning is its reliance on vast amount of data, which is not always available in real-world scenarios, making machine learning approaches impossible to train. A representative example could be a scenario where a dataset consists of a large number of classes (i.e. 300) and each class contains only 1 example per class.
In this work we turn our attention to a real-world research problem that can be used in practical applications, real-time hand gesture recognition. Real-time hand gesture recognition has the potential to be used in real-time virtual reality applications, telerobotics and video games. The problem we are addressing can be defined as: given a predefined dataset of hand gestures and a query hand gesture image, identify the most perceptually similar hand gesture from the dataset. Matching perceptually similar hand gestures can be used as an efficient way to retrieve further information regarding the hand gesture. For example, each hand gesture in the dataset can have additional features associated with it that are calculated offline, such as its skeleton information and its 3D shape. Using the matched hand gesture from the dataset, these features can be associated with the query hand gesture and retrieved in real-time. This process eliminates the requirement to calculate these details from scratch and is of great interest in real-time applications that require fast processing, yet they only have limited amount of computational resources, such as mobile phones.
The process of hand gesture matching is part of the complete pipeline of real-time hand gesture recognition following the hand gesture segmentation procedure (Azad et al., 2014) and alignment procedure where every hand gesture is rotated by its orientation angle, °. Hand gesture segmentation techniques cannot provide 100% accuracy because (a) of the difficulty in segmenting the highly articulated human hands and (b) complex backgrounds and varying lighting conditions make the segmentation process challenging (Stergiopoulou et al., 2014).
In order to address the problem of imperfect segmentation we turn our attention to a class of algorithms that have lost research interest over the years because of deep learning, namely shape matching algorithms. It is hypothesized that a good shape matching algorithm can provide a robust methodology to address shape distortions and other deformations and identify similarities between shapes. Nevertheless, the most accurate shape matching algorithms currently found in literature tend to require high computational resources with non-linear matching complexities. For example the widely used shape matching method shape context has an average time matching complexity of (Belongie et al., 2002).
In this work we propose a novel shape matching algorithm that outperforms some of the most well known shape matching algorithms. Extensive experiments were carried out on three datasets, our collected segmented hand gesture dataset, a novel version of the MPEG-7 shape matching dataset (Ralph, 1999) that is widely used for comparing shape matching algorithms and a camel silhouette dataset consisting of all the binary images from the camel class available in the MPEG-7 dataset. The MPEG-7 and camel datasets were used to investigate the transferability of our method in different shape matching setting.
To summarize, our contributions are as follows:
- •
We present a novel shape matching problem formulation inspired from a real-world practical application, namely hand gesture matching for real-time applications.
- •
We propose a novel shape matching algorithm that provides a good trade-off with respect to accuracy and computational efficiency for real-time shape matching of highly articulated shapes such as hand gestures.
- •
We propose a novel hand gesture dataset that can be used to compare different shape matching algorithms with respect to accuracy and computational complexity.
Section snippets
Hand gesture recognition
The literature related to hand gesture recognition is divided between dynamic and static hand gesture recognition. Dynamic hand gestures are moving gestures that are represented by a sequence of images, while static hand gestures are represented by one image per gesture. Due to the fact that dynamic gestures require processing of multiple frames they are very challenging for real-time applications and especially for limited computational resources such as mobile phones. Hand gesture recognition
Methodology
In this section we will (a) describe our problem specification and (b) describe our proposed novel shape matching algorithm.
Test environment
The implementations of all the algorithms and all the experiments were executed in C++ using the OpenCV library. Implementations of the baseline methods were based on available OpenCV functions, for example the shape context algorithm implemented in OpenCV was used. All the experiments were executed on a workstation Acer Aspire V15, with Intel Core i7-4510U with 2.0 GHz speed. For the visualization of Fig. 10 we used the official MATLAB code provided from the authors of the paper.
Datasets
We conduct
Optimization of ARB
In the ablation study in Section 4.5 we can identify the variables that affect the most the performance of our method. It can be seen from Fig. 5 that translation and scaling invariance can be achieved with great success by using image moments in order to calculate the centre of mass and the total mass of the shape. In Fig. 5 regarding the concentric circles optimization, it can be seen that the more concentric circles used, the more rotation-tolerant the ARB descriptor becomes. This can be
Conclusion
In this work we proposed a novel shape matching method for a real-time hand gesture recognition. We created a unique dataset of 200 segmented hand gesture images as benchmark and compared our method against various state-of-the-art shape matching method in terms of accuracy and computational efficiency. Our experimental results show that our method significantly outperforms the state-of-the-art methods and provides the optimal trade-off in terms of computational efficiency and accuracy for
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (51)
- et al.
Curvature scale space image in shape similarity retrieval
Multimedia Syst.
(1999) - et al.
Real-time and robust method for hand gesture recognition system based on cross-correlation coefficient
(2014) - et al.
Surf: Speeded up robust features
- et al.
Shape indexing using approximate nearest-neighbour search in high-dimensional spaces
- et al.
Shape matching and object recognition using shape contexts
IEEE Trans. Pattern Anal. Mach. Intell.
(2002) - Boukhayma, A., Bem, R.d., Torr, P.H., 2019. 3D Hand shape and pose from images in the wild. In: The IEEE Conference on...
- et al.
Matching point sets with respect to the earth mover’s distance
Comput. Geom.
(2008) - et al.
A comparative study of three moment-based shape descriptors
- et al.
A convolutional neural network with feature fusion for real-time hand posture recognition
Appl. Soft Comput.
(2018) - et al.
Wavelet descriptor of planar curves: Theory and applications
IEEE Trans. Image Process.
(1996)
Finding Color and Shape Patterns in Images
Comparison of fourier descriptors and Hu moments for hand posture recognition
Real-time hand gesture recognition by shape context based matching and cost matrix
JNW
Vision-based hand pose estimation: A review
Comput. Vis. Image Underst.
Hand gesture recognition using fourier descriptors
Fourier preprocessing for hand print character recognition
IEEE Trans. Comput.
Visual pattern recognition by moment invariants
IRE Trans. Inf. Theory
Survey on various gesture recognition technologies and techniques
Int. J. Comput. Appl.
Imagenet classification with deep convolutional neural networks
The hungarian method for the assignment problem
Nav. Res. Logist.
Weakly-supervised mesh-convolutional hand reconstruction in the wild
Hierarchical elastic graph matching for hand gesture recognition
Shape classification using the inner-distance
IEEE Trans. Pattern Anal. Mach. Intell.
Distinctive image features from scale-invariant keypoints
Int. J. Comput. Vis.
Cited by (16)
An algorithm for extracting similar segments of moving target trajectories based on shape matching
2024, Engineering Applications of Artificial IntelligenceEnd-to-end weakly-supervised single-stage multiple 3D hand mesh reconstruction from a single RGB image
2023, Computer Vision and Image UnderstandingA new weighted multi-scale descriptor for hand gesture recognition
2024, Multimedia Tools and ApplicationsA New Fusion Feature Selection Model (FFSM) based Feature Extraction System for Hand Gesture Recognition
2023, Journal of Wireless Mobile Networks, Ubiquitous Computing, and Dependable Applications