An Adversarial Training Based Speech Emotion Classifier With Isolated Gaussian Regularization,IEEE Transactions on Affective Computing

当前位置： X-MOL 学术 › IEEE Trans. Affect. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

An Adversarial Training Based Speech Emotion Classifier With Isolated Gaussian Regularization
IEEE Transactions on Affective Computing ( IF 9.6 ) Pub Date : 4-21-2022 , DOI: 10.1109/taffc.2022.3169091
Changzeng Fu ₁ , Chaoran Liu ₂ , Carlos Toshinori Ishi ₂ , Hiroshi Ishiguro ₁

Affiliation

Speaker individual bias may cause emotion-related features to form clusters with irregular borders (non-Gaussian distributions), making the model sensitive to local irregularities of pattern distributions, resulting in the model over-fit of the in-domain dataset. This problem may cause a decrease in the validation scores in cross-domain (i.e., speaker-independent, channel-variant) implementation. To mitigate this problem, in this paper, we propose an adversarial training-based classifier to regularize the distribution of latent representations to further smooth the boundaries among different categories. In the regularization phase, the representations are mapped into Gaussian distributions in an unsupervised manner to improve the discriminative ability of the latent representations. A single Gaussian distribution is used for mapping the latent representations in our previous study. In this presented work, we adopt a mixture of isolated Gaussian distributions. Moreover, multi-instance learning was adopted by dividing speech into a bag of segments to capture the most salient part of presenting an emotion. The model was evaluated on the IEMOCAP and MELD datasets with in-corpus speaker-independent sittings. In addition, we investigated the accuracy of cross-corpus sittings in simulating speaker-independent and channel-variants. In the experiment, the proposed model was compared not only with baseline models but also with different configurations of our model. The results show that the proposed model is competitive with respect to the baseline, as demonstrated both by in-corpus and cross-corpus validation.

中文翻译：

一种基于对抗训练的孤立高斯正则化语音情感分类器

说话人个体偏差可能会导致与情感相关的特征形成边界不规则（非高斯分布）的簇，使模型对模式分布的局部不规则性敏感，导致模型对域内数据集的过度拟合。这个问题可能会导致跨域（即，与说话人无关、通道变体）实现中的验证分数降低。为了缓解这个问题，在本文中，我们提出了一种基于对抗性训练的分类器来规范潜在表示的分布，以进一步平滑不同类别之间的边界。在正则化阶段，以无监督的方式将表示映射到高斯分布，以提高潜在表示的判别能力。在我们之前的研究中，使用单个高斯分布来映射潜在表示。在本文中，我们采用了孤立高斯分布的混合。此外，采用多实例学习，将语音分成一包片段，以捕获呈现情感的最显着部分。该模型在 IEMOCAP 和 MELD 数据集上使用语料库中与说话者无关的坐姿进行评估。此外，我们还研究了跨语料库坐姿在模拟说话者独立和通道变体中的准确性。在实验中，所提出的模型不仅与基线模型进行了比较，还与我们模型的不同配置进行了比较。结果表明，正如语料库内和跨语料库验证所证明的那样，所提出的模型相对于基线具有竞争力。

更新日期：2024-08-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11