当前位置: X-MOL 学术Multimed. Tools Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Affective state recognition from hand gestures and facial expressions using Grassmann manifolds
Multimedia Tools and Applications ( IF 3.6 ) Pub Date : 2021-01-20 , DOI: 10.1007/s11042-020-10341-6
Bindu Verma , Ayesha Choudhary

The emotional state of a person is important to understand their affective state. Affective states are an important aspect of our being “human”. Therefore, for man-machine interaction to be natural and for machines to understand people, it is becoming necessary to understand a person’s emotional state. Non-verbal behavioral cues such as facial expression and hand gestures provide a firm basis for understanding the affective state of a person. In this paper, we proposed a novel, real-time framework that focuses on extracting the dynamic information from a videos for multiple modalities to recognize a person’s affective state. In the first step, we detect the face and hands of the person in the video and create the motion history images (MHI) of both the face and gesturing hands to encode the temporal dynamics of both these modalities. In the second step, features are extracted for both face and hand MHIs using deep residual network ResNet-101 and concatenated into one feature vector for recognition. We use these integrated features to create subspaces that lie on a Grassmann manifold. Then, we use Geodesic Flow Kernel (GFK) of this Grassmann manifold for domain adaptation and apply this GFK to adapt GGDA to robustly recognize a person’s affective state from multiple modalities. An accuracy of 93.4% on FABO (Gunes and Piccardi 19) dataset and 92.7% on our own dataset shows that integrated face and hand modalities perform better than state-of-the-art methods for affective state recognition.



中文翻译:

使用Grassmann流形从手势和面部表情进行情感状态识别

一个人的情感状态对于理解他们的情感状态很重要。情感状态是我们成为“人类”的重要方面。因此,为了使人机交互变得自然并且使机器理解人,变得有必要理解人的情绪状态。诸如面部表情和手势之类的非语言行为提示为理解一个人的情感状态提供了坚实的基础。在本文中,我们提出了一种新颖的实时框架,该框架着重于从视频中提取动态信息的多种方式,以识别一个人的情感状态。第一步,我们在视频中检测人的脸部和手部,并创建脸部和手势手势的运动历史图像(MHI),以对这两种方式的时间动态进行编码。在第二步中,使用深度残差网络ResNet-101提取面部和手部MHI的特征,并将其连接到一个特征向量中进行识别。我们使用这些集成的功能来创建位于格拉斯曼流形上的子空间。然后,我们使用此Grassmann流形的测地线核心(GFK)进行域自适应,并应用该GFK来自适应GGDA以从多种模式中可靠地识别人的情感状态。精度为93.4 我们使用此Grassmann流形的测地线核心(GFK)进行域适应,并应用该GFK来适应GGDA,以从多种模式中可靠地识别人的情感状态。精度为93.4 我们使用此Grassmann流形的测地线核心(GFK)进行域适应,并应用该GFK来适应GGDA,以从多种模式中可靠地识别人的情感状态。精度为93.4在法博(居内什和Piccardi 19)数据集和92.7 %,对我们自己的数据集显示,集成的脸和手的方式执行比国家的最先进的方法情感状态识别更好。

更新日期:2021-01-21
down
wechat
bug