当前位置: X-MOL 学术Int. J. Intell. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Cross lingual speech emotion recognition via triple attentive asymmetric convolutional neural network
International Journal of Intelligent Systems ( IF 7 ) Pub Date : 2020-09-23 , DOI: 10.1002/int.22291
Elias N. N. Ocquaye 1 , Qirong Mao 1 , Yanfei Xue 1 , Heping Song 1
Affiliation  

The application of cross‐corpus for speech emotion recognition (SER) via domain adaptation methods have gain high acknowledgment for developing good robust emotion recognition systems using different corpora or datasets. However, the issue of cross‐lingual still remains a challenge in SER and needs more attention to resolve the scenario of applying different language types in both training and testing. In this paper, we propose a triple attentive asymmetric convolutional neural network to address the recognition of emotions for cross‐lingual and cross‐corpus speech in an unsupervised approach. The proposed method adopts the joint supervision of softmax loss and center loss to learn high power discriminative feature representations for target domain via the use of high quality pseudo‐labels. The proposed model uses three attentive convolutional neural networks asymmetrically, where two of the networks are used to artificially label unlabeled target samples as a result of their predictions from training on source labeled samples and the other network is used to obtain salient target discriminative features from the pseudo‐labeled target samples. We evaluate our proposed method on three different language types (i.e., English, German, and Italian) data sets. The experimental results indicate that, our proposed method achieves higher prediction accuracy over other state‐of‐the‐art methods.

中文翻译:

基于三重注意力非对称卷积神经网络的跨语言语音情感识别

通过域自适应方法将跨语料库应用于语音情感识别(SER)已获得高度认可,可使用不同的语料库或数据集开发良好的鲁棒情感识别系统。然而,跨语言问题仍然是 SER 中的一个挑战,需要更多地关注解决在训练和测试中应用不同语言类型的场景。在本文中,我们提出了一种三重注意力非对称卷积神经网络,以在无监督的方法中解决跨语言和跨语料库语音的情绪识别问题。所提出的方法采用 softmax 损失和中心损失的联合监督,通过使用高质量伪标签来学习目标域的高功率判别特征表示。所提出的模型不对称地使用三个注意力集中的卷积神经网络,其中两个网络用于根据对源标记样本的训练进行预测而人工标记未标记的目标样本,另一个网络用于从目标样本中获取显着的目标判别特征。伪标记的目标样本。我们在三种不同的语言类型(即英语、德语和意大利语)数据集上评估我们提出的方法。实验结果表明,我们提出的方法比其他最先进的方法具有更高的预测精度。其中两个网络用于根据对源标记样本进行训练的预测结果来人工标记未标记的目标样本,另一个网络用于从伪标记的目标样本中获得显着的目标判别特征。我们在三种不同的语言类型(即英语、德语和意大利语)数据集上评估我们提出的方法。实验结果表明,我们提出的方法比其他最先进的方法具有更高的预测精度。其中两个网络用于根据对源标记样本进行训练的预测结果来人工标记未标记的目标样本,另一个网络用于从伪标记的目标样本中获得显着的目标判别特征。我们在三种不同的语言类型(即英语、德语和意大利语)数据集上评估我们提出的方法。实验结果表明,我们提出的方法比其他最先进的方法具有更高的预测精度。
更新日期:2020-09-23
down
wechat
bug