当前位置: X-MOL 学术Eng. Appl. Artif. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Emotion recognition using speech and neural structured learning to facilitate edge intelligence
Engineering Applications of Artificial Intelligence ( IF 7.5 ) Pub Date : 2020-06-24 , DOI: 10.1016/j.engappai.2020.103775
Md. Zia Uddin , Erik G. Nilsson

Emotions are quite important in our daily communications and recent years have witnessed a lot of research works to develop reliable emotion recognition systems based on various types data sources such as audio and video. Since there is no apparently visual information of human faces, emotion analysis based on only audio data is a very challenging task. In this work, a novel emotion recognition is proposed based on robust features and machine learning from audio speech. For a person independent emotion recognition system, audio data is used as input to the system from which, Mel Frequency Cepstrum Coefficients (MFCC) are calculated as features. The MFCC features are then followed by discriminant analysis to minimize the inner-class scatterings while maximizing the inter-class scatterings. The robust discriminant features are then applied with an efficient and fast deep learning approach Neural Structured Learning (NSL) for emotion training and recognition. The proposed approach of combining MFCC, discriminant analysis and NSL generated superior recognition rates compared to other traditional approaches such as MFCC-DBN, MFCC-CNN, and MFCC-RNN during the experiments on an emotion dataset of audio speeches. The system can be adopted in smart environments such as homes or clinics to provide affective healthcare. Since NSL is fast and easy to implement, it can be tried on edge devices with limited datasets collected from edge sensors. Hence, we can push the decision-making step towards where data resides rather than conventionally processing of data and making decisions from far away of the data sources. The proposed approach can be applied in different practical applications such as understanding peoples’ emotions in their daily life and stress from the voice of the pilots or air traffic controllers in air traffic management systems.



中文翻译:

使用语音和神经结构学习进行情绪识别,以促进边缘智能

情感在我们的日常交流中非常重要,近年来,目睹了许多研究工作,以基于各种类型的数据源(例如音频和视频)开发可靠的情感识别系统。由于显然没有人脸的视觉信息,因此仅基于音频数据的情感分析是一项非常具有挑战性的任务。在这项工作中,提出了一种基于鲁棒特征和语音语音机器学习的新型情感识别方法。对于独立于人的情感识别系统,将音频数据用作系统的输入,从该系统中,将Mel频率倒谱系数(MFCC)计算为特征。然后,在MFCC特征之后进行判别分析,以最小化内部类别的散射,同时最大化类别间的散射。然后将强大的判别功能与高效,快速的深度学习方法神经结构化学习(NSL)一起用于情感训练和识别。与其他传统方法(例如MFCC-DBN,MFCC-CNN和MFCC-RNN)进行的语音语音情感数据集实验相比,将MFCC,判别分析和NSL相结合的拟议方法产生了更高的识别率。该系统可以在家庭或诊所等智能环境中采用,以提供情感保健。由于NSL快速且易于实施,因此可以在具有从边缘传感器收集的有限数据集的边缘设备上进行尝试。因此,我们可以将决策步骤推向数据所处的位置,而不是按常规方式处理数据并在远离数据源的地方进行决策。

更新日期:2020-06-24
down
wechat
bug