Building Naturalistic Emotionally Balanced Speech Corpus by Retrieving Emotional Speech From Existing Podcast Recordings,IEEE Transactions on Affective Computing

当前位置： X-MOL 学术 › IEEE Trans. Affect. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Building Naturalistic Emotionally Balanced Speech Corpus by Retrieving Emotional Speech From Existing Podcast Recordings
IEEE Transactions on Affective Computing ( IF 9.6 ) Pub Date : 2019-10-01 , DOI: 10.1109/taffc.2017.2736999
Reza Lotfian , Carlos Busso

The lack of a large, natural emotional database is one of the key barriers to translate results on speech emotion recognition in controlled conditions into real-life applications. Collecting emotional databases is expensive and time demanding, which limits the size of existing corpora. Current approaches used to collect spontaneous databases tend to provide unbalanced emotional content, which is dictated by the given recording protocol (e.g., positive for colloquial conversations, negative for discussion or debates). The size and speaker diversity are also limited. This paper proposes a novel approach to effectively build a large, naturalistic emotional database with balanced emotional content, reduced cost and reduced manual labor. It relies on existing spontaneous recordings obtained from audio-sharing websites. The proposed approach combines machine learning algorithms to retrieve recordings conveying balanced emotional content with a cost effective annotation process using crowdsourcing, which make it possible to build a large scale speech emotional database. This approach provides natural emotional renditions from multiple speakers, with different channel conditions and conveying balanced emotional content that are difficult to obtain with alternative data collection protocols.

中文翻译：

通过从现有播客录音中检索情感语音来构建自然情感平衡语音语料库

缺乏大型自然情感数据库是将受控条件下的语音情感识别结果转化为现实生活应用的主要障碍之一。收集情感数据库既昂贵又费时，这限制了现有语料库的大小。当前用于收集自发数据库的方法倾向于提供不平衡的情感内容，这是由给定的记录协议决定的（例如，对于口语对话是积极的，对于讨论或辩论是消极的）。尺寸和扬声器多样性也受到限制。本文提出了一种新颖的方法，可以有效地构建具有均衡情感内容、降低成本和减少体力劳动的大型自然情感数据库。它依赖于从音频共享网站获得的现有自发录音。所提出的方法将机器学习算法与使用众包的具有成本效益的注释过程相结合来检索传达平衡情感内容的录音，这使得构建大规模语音情感数据库成为可能。这种方法提供来自多个说话者的自然情感演绎，具有不同的频道条件并传达平衡的情感内容，而这些内容很难通过其他数据收集协议获得。

更新日期：2019-10-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11