当前位置: X-MOL 学术Expert Syst. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Hand Gesture Recognition via Enhanced Densely Connected Convolutional Neural Network
Expert Systems with Applications ( IF 8.5 ) Pub Date : 2021-03-04 , DOI: 10.1016/j.eswa.2021.114797
Yong Soon Tan , Kian Ming Lim , Chin Poo Lee

Hand Gesture Recognition (HGR) serves as a fundamental way of communication and interaction for human being. While HGR can be applied in Human Computer Interaction (HCI) to facilitate user interaction, it can also be utilized for bridging the language barrier. For instance, HGR can be utilized to recognize sign language, which is a visual language represented by hand gestures and used by the deaf and mute all over the world as a primary way of communication. Hand-crafted approach for vision-based HGR typically involves multiple stages of specialized processing, such as hand-crafted feature extraction methods, which are usually designed to deal with particular challenges specifically. Hence, the effective ess of the system and its ability to deal with varied challenges across multiple datasets are heavily reliant on the methods being utilized. In contrast, deep learning approach such as convolutional neural network (CNN), adapts to varied challenges via supervised learning. However, attaining satisfactory generalization on unseen data is not only dependent on the architecture of the CNN, but also dependent on the quantity and variety of the training data. Therefore, a customized network architecture dubbed as enhanced densely connected convolutional neural network (EDenseNet) is proposed for vision-based hand gesture recognition. The modified transition layer in EDenseNet further strengthens feature propagation, by utilizing bottleneck layer to propagate the features being reused to all the feature maps in a bottleneck manner, and the following Conv layer smooths out the unwanted features. Differences between EDenseNet and DenseNet are discerned, and its performance gains are scrutinized in the ablation study. Furthermore, numerous data augmentation techniques are utilized to attenuate the effect of data scarcity, by increasing the quantity of training data, and enriching its variety to further improve generalization. Experiments have been carried out on multiple datasets, namely one NUS hand gesture dataset and two American Sign Language (ASL) datasets. The proposed EDenseNet obtains 98.50% average accuracy without augmented data, and 99.64% average accuracy with augmented data, outperforming other deep learning driven instances in both settings, with and without augmented data.



中文翻译:

通过增强的密集连接卷积神经网络进行手势识别

手势识别(HGR)是人类进行交流和互动的基本方式。虽然HGR可以应用在人机交互(HCI)中以促进用户交互,但它也可以用于弥合语言障碍。例如,HGR可用于识别手语,手语是手势表示的一种视觉语言,全世界的聋哑人都将其用作主要的交流方式。基于视觉的HGR的手工制作方法通常涉及多个阶段的专门处理,例如手工制作的特征提取方法,通常旨在专门应对特定挑战。因此,系统的有效本质及其处理跨多个数据集的各种挑战的能力在很大程度上取决于所使用的方法。相比之下,卷积神经网络(CNN)等深度学习方法则通过监督学习来适应各种挑战。但是,在看不见的数据上获得令人满意的概括不仅取决于CNN的体系结构,还取决于训练数据的数量和种类。因此,提出了一种被称为增强型紧密连接卷积神经网络(EDenseNet)的定制网络体系结构,用于基于视觉的手势识别。EDenseNet中经过修改的过渡层通过利用瓶颈层以瓶颈方式将要重用的特征传播到所有特征图,进一步增强了特征传播,而随后的Conv层则平滑了不需要的特征。可以看出EDenseNet和DenseNet之间的差异,在消融研究中仔细检查了其性能增益。此外,通过增加训练数据的数量并丰富其多样性以进一步提高通用性,许多数据增强技术被用于减轻数据稀缺性的影响。已经在多个数据集上进行了实验,即一个NUS手势数据集和两个美国手语(ASL)数据集。拟议的EDenseNet在没有增强数据的情况下获得98.50%的平均准确度,在具有增强数据的情况下获得99.64%的平均准确度,在有和没有增强数据的情况下,在两种情况下均优于其他深度学习驱动实例。并丰富其多样性以进一步提高泛化性。已经在多个数据集上进行了实验,即一个NUS手势数据集和两个美国手语(ASL)数据集。拟议的EDenseNet在没有增强数据的情况下获得98.50%的平均准确度,在具有增强数据的情况下获得99.64%的平均准确度,在有和没有增强数据的情况下,在两种情况下均优于其他深度学习驱动实例。并丰富其多样性以进一步提高泛化性。已经在多个数据集上进行了实验,即一个NUS手势数据集和两个美国手语(ASL)数据集。拟议的EDenseNet在没有增强数据的情况下获得98.50%的平均准确度,在具有增强数据的情况下获得99.64%的平均准确度,在有和没有增强数据的情况下,在两种情况下均优于其他深度学习驱动实例。

更新日期:2021-03-04
down
wechat
bug