A novel shape matching descriptor for real-time static hand gesture recognition

https://doi.org/10.1016/j.cviu.2021.103241Get rights and content

Highlights

  • Novel shape matching method for real-time hand gesture recognition.

  • Novel shape matching problem formulation.

  • Comparison of several shape matching methods.

  • Novel static hand gesture matching benchmark dataset.

Abstract

The current state-of-the-art hand gesture recognition methodologies heavily rely in the use of machine learning. However there are scenarios that machine learning cannot be applied successfully, for example in situations where data is scarce. This is the case when one-to-one matching is required between a query and a dataset of hand gestures where each gesture represents a unique class. In situations where learning algorithms cannot be trained, classic computer vision techniques such as feature extraction can be used to identify similarities between objects. Shape is one of the most important features that can be extracted from images, however the most accurate shape matching algorithms tend to be computationally inefficient for real-time applications. In this work we present a novel shape matching methodology for real-time hand gesture recognition. Extensive experiments were carried out comparing our method with other shape matching methods with respect to accuracy and computational complexity. Our method outperforms the other methods and provides a good combination of accuracy and computational efficiency for real-time applications.

Introduction

Hand gestures are a fundamental aspect in human to human interaction, especially in non-verbal communications (Rautaray and Agrawal, 2015). Thus it comes as no surprise that research involving hand gestures is growing in the field of human–computer interaction. Multiple different applications can benefit through the use of communication with hand gestures such as telerobotics (Zhong et al., 2013), virtual reality and augmented reality applications, gaming (Zhu and Yuan, 2014) and sign language translation (Parton, 2006).

Classic computer vision methods along with machine learning techniques have been proposed over the years in order to facilitate human–computer interaction through the use of extracting semantic information from hand gestures. Some of the main areas of research regarding hand gestures include hand gesture detection, segmentation and recognition (Ibraheem and Khan, 2012). In recent years the main line of research that predominated the research community for computer vision problems is the use of machine learning, specifically deep learning (Krizhevsky et al., 2012).

However, the bottleneck of deep learning is its reliance on vast amount of data, which is not always available in real-world scenarios, making machine learning approaches impossible to train. A representative example could be a scenario where a dataset consists of a large number of classes (i.e. 300) and each class contains only 1 example per class.

In this work we turn our attention to a real-world research problem that can be used in practical applications, real-time hand gesture recognition. Real-time hand gesture recognition has the potential to be used in real-time virtual reality applications, telerobotics and video games. The problem we are addressing can be defined as: given a predefined dataset of N hand gestures and a query hand gesture image, identify the most perceptually similar hand gesture from the dataset. Matching perceptually similar hand gestures can be used as an efficient way to retrieve further information regarding the hand gesture. For example, each hand gesture in the dataset can have additional features associated with it that are calculated offline, such as its skeleton information and its 3D shape. Using the matched hand gesture from the dataset, these features can be associated with the query hand gesture and retrieved in real-time. This process eliminates the requirement to calculate these details from scratch and is of great interest in real-time applications that require fast processing, yet they only have limited amount of computational resources, such as mobile phones.

The process of hand gesture matching is part of the complete pipeline of real-time hand gesture recognition following the hand gesture segmentation procedure (Azad et al., 2014) and alignment procedure where every hand gesture is rotated by its orientation angle, θ°. Hand gesture segmentation techniques cannot provide 100% accuracy because (a) of the difficulty in segmenting the highly articulated human hands and (b) complex backgrounds and varying lighting conditions make the segmentation process challenging (Stergiopoulou et al., 2014).

In order to address the problem of imperfect segmentation we turn our attention to a class of algorithms that have lost research interest over the years because of deep learning, namely shape matching algorithms. It is hypothesized that a good shape matching algorithm can provide a robust methodology to address shape distortions and other deformations and identify similarities between shapes. Nevertheless, the most accurate shape matching algorithms currently found in literature tend to require high computational resources with non-linear matching complexities. For example the widely used shape matching method shape context has an average time matching complexity of O(n3) (Belongie et al., 2002).

In this work we propose a novel shape matching algorithm that outperforms some of the most well known shape matching algorithms. Extensive experiments were carried out on three datasets, our collected segmented hand gesture dataset, a novel version of the MPEG-7 shape matching dataset (Ralph, 1999) that is widely used for comparing shape matching algorithms and a camel silhouette dataset consisting of all the binary images from the camel class available in the MPEG-7 dataset. The MPEG-7 and camel datasets were used to investigate the transferability of our method in different shape matching setting.

To summarize, our contributions are as follows:

  • We present a novel shape matching problem formulation inspired from a real-world practical application, namely hand gesture matching for real-time applications.

  • We propose a novel shape matching algorithm that provides a good trade-off with respect to accuracy and computational efficiency for real-time shape matching of highly articulated shapes such as hand gestures.

  • We propose a novel hand gesture dataset that can be used to compare different shape matching algorithms with respect to accuracy and computational complexity.

Section snippets

Hand gesture recognition

The literature related to hand gesture recognition is divided between dynamic and static hand gesture recognition. Dynamic hand gestures are moving gestures that are represented by a sequence of images, while static hand gestures are represented by one image per gesture. Due to the fact that dynamic gestures require processing of multiple frames they are very challenging for real-time applications and especially for limited computational resources such as mobile phones. Hand gesture recognition

Methodology

In this section we will (a) describe our problem specification and (b) describe our proposed novel shape matching algorithm.

Test environment

The implementations of all the algorithms and all the experiments were executed in C++ using the OpenCV library. Implementations of the baseline methods were based on available OpenCV functions, for example the shape context algorithm implemented in OpenCV was used. All the experiments were executed on a workstation Acer Aspire V15, with Intel Core i7-4510U with 2.0 GHz speed. For the visualization of Fig. 10 we used the official MATLAB code provided from the authors of the paper.

Datasets

We conduct

Optimization of ARB

In the ablation study in Section 4.5 we can identify the variables that affect the most the performance of our method. It can be seen from Fig. 5 that translation and scaling invariance can be achieved with great success by using image moments in order to calculate the centre of mass and the total mass of the shape. In Fig. 5 regarding the concentric circles optimization, it can be seen that the more concentric circles used, the more rotation-tolerant the ARB descriptor becomes. This can be

Conclusion

In this work we proposed a novel shape matching method for a real-time hand gesture recognition. We created a unique dataset of 200 segmented hand gesture images as benchmark and compared our method against various state-of-the-art shape matching method in terms of accuracy and computational efficiency. Our experimental results show that our method significantly outperforms the state-of-the-art methods and provides the optimal trade-off in terms of computational efficiency and accuracy for

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (51)

  • AbbasiS. et al.

    Curvature scale space image in shape similarity retrieval

    Multimedia Syst.

    (1999)
  • AzadR. et al.

    Real-time and robust method for hand gesture recognition system based on cross-correlation coefficient

    (2014)
  • BayH. et al.

    Surf: Speeded up robust features

  • BeisJ.S. et al.

    Shape indexing using approximate nearest-neighbour search in high-dimensional spaces

  • BelongieS. et al.

    Shape matching and object recognition using shape contexts

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2002)
  • Boukhayma, A., Bem, R.d., Torr, P.H., 2019. 3D Hand shape and pose from images in the wild. In: The IEEE Conference on...
  • CabelloS. et al.

    Matching point sets with respect to the earth mover’s distance

    Comput. Geom.

    (2008)
  • CelebiM.E. et al.

    A comparative study of three moment-based shape descriptors

  • ChevtchenkoS.F. et al.

    A convolutional neural network with feature fusion for real-time hand posture recognition

    Appl. Soft Comput.

    (2018)
  • ChuangG.-H. et al.

    Wavelet descriptor of planar curves: Theory and applications

    IEEE Trans. Image Process.

    (1996)
  • CohenS.

    Finding Color and Shape Patterns in Images

    (1999)
  • ConseilS. et al.

    Comparison of fourier descriptors and Hu moments for hand posture recognition

  • DengL.Y. et al.

    Real-time hand gesture recognition by shape context based matching and cost matrix

    JNW

    (2011)
  • ErolA. et al.

    Vision-based hand pose estimation: A review

    Comput. Vis. Image Underst.

    (2007)
  • Freeman, W.T., Roth, M., 1995. Orientation histograms for hand gesture recognition. In: International Workshop on...
  • GamalH.M. et al.

    Hand gesture recognition using fourier descriptors

  • GranlundG.H.

    Fourier preprocessing for hand print character recognition

    IEEE Trans. Comput.

    (1972)
  • HuM.-K.

    Visual pattern recognition by moment invariants

    IRE Trans. Inf. Theory

    (1962)
  • IbraheemN.A. et al.

    Survey on various gesture recognition technologies and techniques

    Int. J. Comput. Appl.

    (2012)
  • KrizhevskyA. et al.

    Imagenet classification with deep convolutional neural networks

  • KuhnH.W.

    The hungarian method for the assignment problem

    Nav. Res. Logist.

    (1955)
  • KulonD. et al.

    Weakly-supervised mesh-convolutional hand reconstruction in the wild

    (2020)
  • LiY.-T. et al.

    Hierarchical elastic graph matching for hand gesture recognition

  • LingH. et al.

    Shape classification using the inner-distance

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2007)
  • LoweD.G.

    Distinctive image features from scale-invariant keypoints

    Int. J. Comput. Vis.

    (2004)
  • Cited by (16)

    • A New Fusion Feature Selection Model (FFSM) based Feature Extraction System for Hand Gesture Recognition

      2023, Journal of Wireless Mobile Networks, Ubiquitous Computing, and Dependable Applications
    View all citing articles on Scopus
    View full text