当前位置: X-MOL 学术Speech Commun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An automated integrated speech and face imageanalysis system for the identification of human emotions
Speech Communication ( IF 3.2 ) Pub Date : 2021-04-08 , DOI: 10.1016/j.specom.2021.04.001
Christos P. Loizou

Objective

Human interactions are related to speech and facial characteristics. It was suggested that speech signals and/or images of facial expressions may reveal human emotions and that both interact for the verification of a person's identity. The present study proposes and evaluates an automated integrated speech signal and facial image analysis system for the identification of seven different human emotions (Normal (N), Happy (H), Sad (S), Disgust (D), Fear (F), Anger (A), and Surprise (Su)).

Methods

Speech recordings and face images from 7,441 subjects aged 20≤age≤74 were collected, normalized and filtered. From all the above recordings 55 speech signal features and 61 different image face texture features were extracted. Statistical and model multi-classification analysis were performed to select the features able to statistically significantly distinguish between the seven aforementioned human emotions (N, H, S, D, F, A and Su). The selected features alone or a combination of these features along with age and gender of the sample investigated were used to build two learning-based classifiers; and the classifiers’ accuracy was computed.

Results

For each of the above mentioned human emotions, statistical significantly different speech and face image features were identified that may be used to distinguish between the aforementioned groups (N vs H, N vs S, N vs D, N vs F, N vs A, N vs Su). Using solely the statistically significant speech and image features identified, an overall percentage of correct classification (%CC) score of 93% was achieved.

Conclusions

A significant number of speech and face image features have been derived from continuous speech and face images. Features were identified that were able to identify between seven different emotional human states. This study poses the basis for the development of an integrated system for the identification of emotional states from automatic analysis of free speech and image face analysis. Future work will investigate the development and integration of the proposed method into a mobile device.



中文翻译:

自动化的语音和面部图像集成分析系统,用于识别人的情绪

客观的

人与人之间的互动与言语和面部特征有关。有人提出,语音信号和/或面部表情图像可能会揭示人的情感,并且两者都会相互作用以验证一个人的身份。本研究提出并评估了一种自动集成的语音信号和面部图像分析系统,用于识别七种不同的人类情感(正常(N),快乐(H),悲伤(S),厌恶(D),恐惧(F),愤怒(A)和惊喜(Su))。

方法

收集,归一化和过滤了7441名20岁以下≤74岁的受试者的语音记录和面部图像。从以上所有记录中,提取了55个语音信号特征和61个不同的图像面部纹理特征。进行统计和模型多分类分析以选择能够在统计学上显着地区分上述七个人类情感(N,H,S,D,F,A和Su)的特征。单独使用所选特征或将这些特征与所调查样本的年龄和性别一起使用,可以构建两个基于学习的分类器;并计算出分类器的准确性。

结果

对于上述每个人类情感,确定了统计上明显不同的语音和面部图像特征,这些特征可用于区分上述各组(N与H,N与S,N与D,N与F,N与A, N vs Su)。仅使用识别出的具有统计意义的语音和图像特征,正确分类(%CC)得分的总百分比就达到了93%。

结论

从连续的语音和面部图像中已经获得了大量的语音和面部图像特征。确定了能够识别七个不同情感人类状态的特征。这项研究为从语音自动分析和图像面部分析的自动识别情绪状态的集成系统的开发奠定了基础。未来的工作将调查所提出的方法的开发以及如何将其集成到移动设备中。

更新日期:2021-04-12
down
wechat
bug