A novel shape matching descriptor for real-time static hand gesture recognition

doi:10.1016/j.cviu.2021.103241

Computer Vision and Image Understanding

Volume 210, September 2021, 103241

https://doi.org/10.1016/j.cviu.2021.103241 Get rights and content

Highlights

•
Novel shape matching method for real-time hand gesture recognition.
•
Novel shape matching problem formulation.
•
Comparison of several shape matching methods.
•
Novel static hand gesture matching benchmark dataset.

Abstract

The current state-of-the-art hand gesture recognition methodologies heavily rely in the use of machine learning. However there are scenarios that machine learning cannot be applied successfully, for example in situations where data is scarce. This is the case when one-to-one matching is required between a query and a dataset of hand gestures where each gesture represents a unique class. In situations where learning algorithms cannot be trained, classic computer vision techniques such as feature extraction can be used to identify similarities between objects. Shape is one of the most important features that can be extracted from images, however the most accurate shape matching algorithms tend to be computationally inefficient for real-time applications. In this work we present a novel shape matching methodology for real-time hand gesture recognition. Extensive experiments were carried out comparing our method with other shape matching methods with respect to accuracy and computational complexity. Our method outperforms the other methods and provides a good combination of accuracy and computational efficiency for real-time applications.

Graphical abstract

Introduction

Hand gestures are a fundamental aspect in human to human interaction, especially in non-verbal communications (Rautaray and Agrawal, 2015). Thus it comes as no surprise that research involving hand gestures is growing in the field of human–computer interaction. Multiple different applications can benefit through the use of communication with hand gestures such as telerobotics (Zhong et al., 2013), virtual reality and augmented reality applications, gaming (Zhu and Yuan, 2014) and sign language translation (Parton, 2006).

Classic computer vision methods along with machine learning techniques have been proposed over the years in order to facilitate human–computer interaction through the use of extracting semantic information from hand gestures. Some of the main areas of research regarding hand gestures include hand gesture detection, segmentation and recognition (Ibraheem and Khan, 2012). In recent years the main line of research that predominated the research community for computer vision problems is the use of machine learning, specifically deep learning (Krizhevsky et al., 2012).

However, the bottleneck of deep learning is its reliance on vast amount of data, which is not always available in real-world scenarios, making machine learning approaches impossible to train. A representative example could be a scenario where a dataset consists of a large number of classes (i.e. 300) and each class contains only 1 example per class.

In this work we turn our attention to a real-world research problem that can be used in practical applications, real-time hand gesture recognition. Real-time hand gesture recognition has the potential to be used in real-time virtual reality applications, telerobotics and video games. The problem we are addressing can be defined as: given a predefined dataset of $N$ hand gestures and a query hand gesture image, identify the most perceptually similar hand gesture from the dataset. Matching perceptually similar hand gestures can be used as an efficient way to retrieve further information regarding the hand gesture. For example, each hand gesture in the dataset can have additional features associated with it that are calculated offline, such as its skeleton information and its 3D shape. Using the matched hand gesture from the dataset, these features can be associated with the query hand gesture and retrieved in real-time. This process eliminates the requirement to calculate these details from scratch and is of great interest in real-time applications that require fast processing, yet they only have limited amount of computational resources, such as mobile phones.

The process of hand gesture matching is part of the complete pipeline of real-time hand gesture recognition following the hand gesture segmentation procedure (Azad et al., 2014) and alignment procedure where every hand gesture is rotated by its orientation angle, $θ$ °. Hand gesture segmentation techniques cannot provide 100% accuracy because (a) of the difficulty in segmenting the highly articulated human hands and (b) complex backgrounds and varying lighting conditions make the segmentation process challenging (Stergiopoulou et al., 2014).

In order to address the problem of imperfect segmentation we turn our attention to a class of algorithms that have lost research interest over the years because of deep learning, namely shape matching algorithms. It is hypothesized that a good shape matching algorithm can provide a robust methodology to address shape distortions and other deformations and identify similarities between shapes. Nevertheless, the most accurate shape matching algorithms currently found in literature tend to require high computational resources with non-linear matching complexities. For example the widely used shape matching method shape context has an average time matching complexity of $O (n^{3})$ (Belongie et al., 2002).

In this work we propose a novel shape matching algorithm that outperforms some of the most well known shape matching algorithms. Extensive experiments were carried out on three datasets, our collected segmented hand gesture dataset, a novel version of the MPEG-7 shape matching dataset (Ralph, 1999) that is widely used for comparing shape matching algorithms and a camel silhouette dataset consisting of all the binary images from the camel class available in the MPEG-7 dataset. The MPEG-7 and camel datasets were used to investigate the transferability of our method in different shape matching setting.

To summarize, our contributions are as follows:

•
We present a novel shape matching problem formulation inspired from a real-world practical application, namely hand gesture matching for real-time applications.
•
We propose a novel shape matching algorithm that provides a good trade-off with respect to accuracy and computational efficiency for real-time shape matching of highly articulated shapes such as hand gestures.
•
We propose a novel hand gesture dataset that can be used to compare different shape matching algorithms with respect to accuracy and computational complexity.

Section snippets

Hand gesture recognition

The literature related to hand gesture recognition is divided between dynamic and static hand gesture recognition. Dynamic hand gestures are moving gestures that are represented by a sequence of images, while static hand gestures are represented by one image per gesture. Due to the fact that dynamic gestures require processing of multiple frames they are very challenging for real-time applications and especially for limited computational resources such as mobile phones. Hand gesture recognition

Methodology

In this section we will (a) describe our problem specification and (b) describe our proposed novel shape matching algorithm.

Test environment

The implementations of all the algorithms and all the experiments were executed in C++ using the OpenCV library. Implementations of the baseline methods were based on available OpenCV functions, for example the shape context algorithm implemented in OpenCV was used. All the experiments were executed on a workstation Acer Aspire V15, with Intel Core i7-4510U with 2.0 GHz speed. For the visualization of Fig. 10 we used the official MATLAB code provided from the authors of the paper.

Datasets

We conduct

Optimization of ARB

In the ablation study in Section 4.5 we can identify the variables that affect the most the performance of our method. It can be seen from Fig. 5 that translation and scaling invariance can be achieved with great success by using image moments in order to calculate the centre of mass and the total mass of the shape. In Fig. 5 regarding the concentric circles optimization, it can be seen that the more concentric circles used, the more rotation-tolerant the ARB descriptor becomes. This can be

Conclusion

In this work we proposed a novel shape matching method for a real-time hand gesture recognition. We created a unique dataset of 200 segmented hand gesture images as benchmark and compared our method against various state-of-the-art shape matching method in terms of accuracy and computational efficiency. Our experimental results show that our method significantly outperforms the state-of-the-art methods and provides the optimal trade-off in terms of computational efficiency and accuracy for

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (51)

AbbasiS. et al.
Curvature scale space image in shape similarity retrieval
Multimedia Syst.
(1999)
AzadR. et al.
Real-time and robust method for hand gesture recognition system based on cross-correlation coefficient
(2014)
BayH. et al.
Surf: Speeded up robust features
BeisJ.S. et al.
Shape indexing using approximate nearest-neighbour search in high-dimensional spaces
BelongieS. et al.
Shape matching and object recognition using shape contexts
IEEE Trans. Pattern Anal. Mach. Intell.
(2002)
Boukhayma, A., Bem, R.d., Torr, P.H., 2019. 3D Hand shape and pose from images in the wild. In: The IEEE Conference on...
CabelloS. et al.
Matching point sets with respect to the earth mover’s distance
Comput. Geom.
(2008)
CelebiM.E. et al.
A comparative study of three moment-based shape descriptors
ChevtchenkoS.F. et al.
A convolutional neural network with feature fusion for real-time hand posture recognition
Appl. Soft Comput.
(2018)
ChuangG.-H. et al.
Wavelet descriptor of planar curves: Theory and applications
IEEE Trans. Image Process.
(1996)

CohenS.

Finding Color and Shape Patterns in Images

(1999)

ConseilS. et al.

Comparison of fourier descriptors and Hu moments for hand posture recognition

DengL.Y. et al.

Real-time hand gesture recognition by shape context based matching and cost matrix

JNW

(2011)

ErolA. et al.

Vision-based hand pose estimation: A review

Comput. Vis. Image Underst.

(2007)

Freeman, W.T., Roth, M., 1995. Orientation histograms for hand gesture recognition. In: International Workshop on...

GamalH.M. et al.

Hand gesture recognition using fourier descriptors

GranlundG.H.

Fourier preprocessing for hand print character recognition

IEEE Trans. Comput.

(1972)

HuM.-K.

Visual pattern recognition by moment invariants

IRE Trans. Inf. Theory

(1962)

IbraheemN.A. et al.

Survey on various gesture recognition technologies and techniques

Int. J. Comput. Appl.

(2012)

KrizhevskyA. et al.

Imagenet classification with deep convolutional neural networks

KuhnH.W.

The hungarian method for the assignment problem

Nav. Res. Logist.

(1955)

KulonD. et al.

Weakly-supervised mesh-convolutional hand reconstruction in the wild

(2020)

LiY.-T. et al.

Hierarchical elastic graph matching for hand gesture recognition

LingH. et al.

Shape classification using the inner-distance

IEEE Trans. Pattern Anal. Mach. Intell.

(2007)

LoweD.G.

Distinctive image features from scale-invariant keypoints

Int. J. Comput. Vis.

(2004)

Cited by (16)

An algorithm for extracting similar segments of moving target trajectories based on shape matching
2024, Engineering Applications of Artificial Intelligence
Trajectory similarity analysis of moving target is the foundation for mining high-value and regular behavioral information such as motion preferences, activity hotspots and frequent paths. Unlike most trajectory similarity analysis methods aimed at discovering correlations of target activities in time, space or spatio-temporal domains, this paper focuses on the shape matching of target trajectories. If some specific shapes frequently appear in historical trajectories, extracting these local shapes would be beneficial for analyzing the target motion templates and behavior modes. Trajectory segments with similar shapes may not have spatio-temporal correlation, and the shapes also have geometric transformation characteristics such as rotation, scaling and translation. Since the existing trajectory similarity analysis methods cannot be directly applied, an algorithm for extracting similar segments based on shape matching is proposed. First, a new shape descriptor based on signed barycenter distance (SBD) is established. It describes a trajectory as a one-dimensional shape feature sequence, which has the advantage of low computational complexity. Then, the distributed nearest neighbor search strategy is used in the particle swarm optimization (PSO) method, which aims to accelerate the retrieval of trajectory segments with similar shapes and improve the matching accuracy. Experiments on MPEG-7, handwritten character and maneuvering target simulation trajectory data sets show that compared with the existing typical shape descriptors, SBD shape descriptor has advantages in accuracy and noise insensitivity, and the improved PSO method can efficiently and accurately obtain the local shape matching results.
End-to-end weakly-supervised single-stage multiple 3D hand mesh reconstruction from a single RGB image
2023, Computer Vision and Image Understanding
In this paper, we consider the challenging task of simultaneously locating and recovering multiple hands from a single 2D image. Previous studies either focus on single hand reconstruction or solve this problem in a multi-stage way. Moreover, the conventional two-stage pipeline firstly detects hand areas, and then estimates 3D hand pose from each cropped patch. To reduce the computational redundancy in preprocessing and feature extraction, for the first time, we propose a concise but efficient single-stage pipeline for multi-hand reconstruction. Specifically, we design a multi-head auto-encoder structure, where each head network shares the same feature map and outputs the hand center, pose and texture, respectively. Besides, we adopt a weakly-supervised scheme to alleviate the burden of expensive 3D real-world data annotations. To this end, we propose a series of losses optimized by a stage-wise training scheme, where a multi-hand dataset with 2D annotations is generated based on the publicly available single hand datasets. In order to further improve the accuracy of the weakly supervised model, we adopt several feature consistency constraints in both single and multiple hand settings. Specifically, the keypoints of each hand estimated from local features should be consistent with the re-projected points predicted from global features. Extensive experiments on public benchmarks including FreiHAND, HO3D, InterHand2.6M and RHD demonstrate that our method outperforms the state-of-the-art model-based methods in both weakly-supervised and fully-supervised manners. The code and models are available at https://github.com/zijinxuxu/SMHR.
DeReFNet: Dual-stream Dense Residual Fusion Network for static hand gesture recognition
2023, Displays
Vision-based hand gesture recognition (HGR) system provides the most effective and natural way of interaction between humans and machines. However, the recognition performance of such an HGR system is challenging due to the variations in illumination, complex backgrounds, the shape of the user’s hand, and inter-class similarity. This work proposes a compact dual-stream dense residual fusion network (DeReFNet) to address the above challenges. The proposed convolutional neural network architecture mainly utilizes the strength of global features from each residual block of the residual stream and spatial information from the other stream using dense connectivity. Both the streams are fused to gather enriched information using the feature concatenation module. The efficacy of the DeReFNet is validated using a subject-independent cross-validation technique on four publicly available benchmark datasets. Furthermore, the qualitative and quantitative analysis of the benchmarked datasets illustrates that the DeReFNet outperforms state-of-the-art methods in terms of accuracy and computational time.
A new weighted multi-scale descriptor for hand gesture recognition
2024, Multimedia Tools and Applications
Review for Optimal Human-gesture Design Methodology and Motion Representation of Medical Images using Segmentation from Depth Data and Gesture Recognition
2024, Current Medical Imaging
A New Fusion Feature Selection Model (FFSM) based Feature Extraction System for Hand Gesture Recognition
2023, Journal of Wireless Mobile Networks, Ubiquitous Computing, and Dependable Applications

View all citing articles on Scopus

View full text

A novel shape matching descriptor for real-time static hand gesture recognition

Highlights

Abstract

Graphical abstract

Introduction

Section snippets

Hand gesture recognition

Methodology

Test environment

Datasets

Optimization of ARB

Conclusion

Declaration of Competing Interest

Curvature scale space image in shape similarity retrieval

Multimedia Syst.

Real-time and robust method for hand gesture recognition system based on cross-correlation coefficient

Surf: Speeded up robust features

Shape indexing using approximate nearest-neighbour search in high-dimensional spaces

Shape matching and object recognition using shape contexts

IEEE Trans. Pattern Anal. Mach. Intell.

Matching point sets with respect to the earth mover’s distance

Comput. Geom.

A comparative study of three moment-based shape descriptors

A convolutional neural network with feature fusion for real-time hand posture recognition

Appl. Soft Comput.

Wavelet descriptor of planar curves: Theory and applications

IEEE Trans. Image Process.

Finding Color and Shape Patterns in Images

Comparison of fourier descriptors and Hu moments for hand posture recognition

Real-time hand gesture recognition by shape context based matching and cost matrix

JNW

Vision-based hand pose estimation: A review

Comput. Vis. Image Underst.

Hand gesture recognition using fourier descriptors

Fourier preprocessing for hand print character recognition

IEEE Trans. Comput.

Visual pattern recognition by moment invariants

IRE Trans. Inf. Theory

Survey on various gesture recognition technologies and techniques

Int. J. Comput. Appl.

Imagenet classification with deep convolutional neural networks

The hungarian method for the assignment problem

Nav. Res. Logist.

Weakly-supervised mesh-convolutional hand reconstruction in the wild

Hierarchical elastic graph matching for hand gesture recognition

Shape classification using the inner-distance

IEEE Trans. Pattern Anal. Mach. Intell.

Distinctive image features from scale-invariant keypoints

Int. J. Comput. Vis.