A novel policy for pre-trained Deep Reinforcement Learning for Speech Emotion Recognition,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A novel policy for pre-trained Deep Reinforcement Learning for Speech Emotion Recognition
arXiv - CS - Sound Pub Date : 2021-01-04 , DOI: arxiv-2101.00738
Thejan Rajapakshe, Rajib Rana, Sara Khalifa

Reinforcement Learning (RL) is a semi-supervised learning paradigm which an agent learns by interacting with an environment. Deep learning in combination with RL provides an efficient method to learn how to interact with the environment is called Deep Reinforcement Learning (deep RL). Deep RL has gained tremendous success in gaming - such as AlphaGo, but its potential have rarely being explored for challenging tasks like Speech Emotion Recognition (SER). The deep RL being used for SER can potentially improve the performance of an automated call centre agent by dynamically learning emotional-aware response to customer queries. While the policy employed by the RL agent plays a major role in action selection, there is no current RL policy tailored for SER. In addition, extended learning period is a general challenge for deep RL which can impact the speed of learning for SER. Therefore, in this paper, we introduce a novel policy - "Zeta policy" which is tailored for SER and apply Pre-training in deep RL to achieve faster learning rate. Pre-training with cross dataset was also studied to discover the feasibility of pre-training the RL Agent with a similar dataset in a scenario of where no real environmental data is not available. IEMOCAP and SAVEE datasets were used for the evaluation with the problem being to recognize four emotions happy, sad, angry and neutral in the utterances provided. Experimental results show that the proposed "Zeta policy" performs better than existing policies. The results also support that pre-training can reduce the training time upon reducing the warm-up period and is robust to cross-corpus scenario.

中文翻译：

用于语音情感识别的预训练深度强化学习的新策略

强化学习（RL）是半监督学习范例，代理通过与环境交互来学习。深度学习与RL结合提供了一种学习如何与环境交互的有效方法，称为深度强化学习（Deep RL）。Deep RL在游戏（例如AlphaGo）中获得了巨大的成功，但是很少有其潜力可用于诸如语音情感识别（SER）等具有挑战性的任务。用于SER的深层RL可通过动态学习对客户查询的情绪感知响应来潜在地提高自动呼叫中心代理的性能。尽管RL代理采用的策略在选择行动中起着主要作用，但是目前还没有针对SER量身定制的RL策略。此外，延长学习时间是深层RL的普遍挑战，可能会影响SER的学习速度。因此，在本文中，我们介绍了一种针对SER量身定制的新颖策略-“ Zeta策略”，并将其应用于深度RL中进行预训练，以提高学习速度。还研究了使用交叉数据集进行预训练的方法，以发现在没有实际环境数据的情况下，使用类似数据集对RL Agent进行预训练的可行性。IEMOCAP和SAVEE数据集用于评估，问题在于在所提供的语音中识别出四种快乐，悲伤，愤怒和中立的情绪。实验结果表明，提出的“ Zeta策略”比现有策略执行得更好。

更新日期：2021-01-05

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>