Self Supervised Adversarial Domain Adaptation for Cross-Corpus and Cross-Language Speech Emotion Recognition,IEEE Transactions on Affective Computing

当前位置： X-MOL 学术 › IEEE Trans. Affect. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Self Supervised Adversarial Domain Adaptation for Cross-Corpus and Cross-Language Speech Emotion Recognition
IEEE Transactions on Affective Computing ( IF 11.2 ) Pub Date : 2022-04-13 , DOI: 10.1109/taffc.2022.3167013
Siddique Latif ₁ , Rajib Rana ₁ , Sara Khalifa ₂ , Raja Jurdak ₃ , Björn Schuller ₄

Affiliation

Despite the recent advancement in speech emotion recognition (SER) within a single corpus setting, the performance of these SER systems degrades significantly for cross-corpus and cross-language scenarios. The key reason is the lack of generalisation in SER systems towards unseen conditions, which causes them to perform poorly in cross-corpus and cross-language settings. Recent studies focus on utilising adversarial methods to learn domain generalised representation for improving cross-corpus and cross-language SER to address this issue. However, many of these methods only focus on cross-corpus SER without addressing the cross-language SER performance degradation due to a larger domain gap between source and target language data. This contribution proposes an adversarial dual discriminator (ADDi) network that uses the three-players adversarial game to learn generalised representations without requiring any target data labels. We also introduce a self-supervised ADDi (sADDi) network that utilises self-supervised pre-training with unlabelled data. We propose synthetic data generation as a pretext task in sADDi, enabling the network to produce emotionally discriminative and domain invariant representations and providing complementary synthetic data to augment the system. The proposed model is rigorously evaluated using five publicly available datasets in three languages and compared with multiple studies on cross-corpus and cross-language SER. Experimental results demonstrate that the proposed model achieves improved performance compared to the state-of-the-art methods.

中文翻译：

用于跨语料库和跨语言语音情感识别的自监督对抗域适应

尽管最近在单一语料库设置中的语音情感识别 (SER) 取得了进展，但这些 SER 系统的性能在跨语料库和跨语言场景中显着下降。关键原因是SER系统缺乏对未见条件的泛化，这导致它们在跨语料库和跨语言环境中表现不佳。最近的研究重点是利用对抗性方法来学习领域广义表示，以改进跨语料库和跨语言 SER 来解决这个问题。然而，这些方法中的许多方法仅关注跨语料库 SER，而没有解决由于源语言数据和目标语言数据之间较大的域差距而导致的跨语言 SER 性能下降的问题。该贡献提出了一种对抗性双重判别器（ADDi）网络，该网络使用三人对抗游戏来学习广义表示，而不需要任何目标数据标签。我们还引入了一种自监督 ADDi (sADDi) 网络，该网络利用未标记数据的自监督预训练。我们建议合成数据生成作为 sADDi 中的借口任务，使网络能够产生情感区分和领域不变的表示，并提供补充的合成数据来增强系统。使用三种语言的五个公开数据集对所提出的模型进行了严格评估，并与跨语料库和跨语言 SER 的多项研究进行了比较。实验结果表明，与最先进的方法相比，所提出的模型取得了改进的性能。

更新日期：2022-04-13

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>