当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Fixed-MAML for Few Shot Classification in Multilingual Speech Emotion Recognition
arXiv - CS - Sound Pub Date : 2021-01-05 , DOI: arxiv-2101.01356
Anugunj Naman, Liliana Mancini

In this paper, we analyze the feasibility of applying few-shot learning to speech emotion recognition task (SER). The current speech emotion recognition models work exceptionally well but fail when then input is multilingual. Moreover, when training such models, the models' performance is suitable only when the training corpus is vast. This availability of a big training corpus is a significant problem when choosing a language that is not much popular or obscure. We attempt to solve this challenge of multilingualism and lack of available data by turning this problem into a few-shot learning problem. We suggest relaxing the assumption that all N classes in an N-way K-shot problem be new and define an N+F way problem where N and F are the number of emotion classes and predefined fixed classes, respectively. We propose this modification to the Model-Agnostic MetaLearning (MAML) algorithm to solve the problem and call this new model F-MAML. This modification performs better than the original MAML and outperforms on EmoFilm dataset.

中文翻译:

固定MAML用于多语言语音情感识别中的少数镜头分类

在本文中,我们分析了将少拍学习应用于语音情感识别任务(SER)的可行性。当前的语音情感识别模型运行异常良好,但是当输入为多语言时,该模型将失败。此外,在训练此类模型时,仅当训练语料库很大时,模型的性能才合适。选择一种语言,没有太大的流行或模糊时,一个大的训练语料库的这种可用性是一个显著的问题。我们试图通过将这个问题转变为几次学习中的问题来解决这种多语言和缺乏可用数据的挑战。我们建议放宽N路K-shot问题中所有N个类别都是新的假设,并定义一个N + F路径问题,其中N和F分别是情感类别和预定义的固定类别的数量。我们建议对模型不可知的元学习(MAML)算法进行此修改,以解决该问题,并将此新模型称为F-MAML。此修改的性能比原始MAML更好,并且在EmoFilm数据集上的表现优于其他。
更新日期:2021-01-06
down
wechat
bug