CREMA-D: Crowd-Sourced Emotional Multimodal Actors Dataset,IEEE Transactions on Affective Computing

当前位置： X-MOL 学术 › IEEE Trans. Affect. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

CREMA-D: Crowd-Sourced Emotional Multimodal Actors Dataset
IEEE Transactions on Affective Computing ( IF 11.2 ) Pub Date : 2014-10-01 , DOI: 10.1109/taffc.2014.2336244
Houwei Cao ₁ , David G Cooper ₂ , Michael K Keutmann ₃ , Ruben C Gur ₄ , Ani Nenkova ₅ , Ragini Verma ₁

Affiliation

People convey their emotional state in their face and voice. We present an audio-visual dataset uniquely suited for the study of multi-modal emotion expression and perception. The dataset consists of facial and vocal emotional expressions in sentences spoken in a range of basic emotional states (happy, sad, anger, fear, disgust, and neutral). 7,442 clips of 91 actors with diverse ethnic backgrounds were rated by multiple raters in three modalities: audio, visual, and audio-visual. Categorical emotion labels and real-value intensity values for the perceived emotion were collected using crowd-sourcing from 2,443 raters. The human recognition of intended emotion for the audio-only, visual-only, and audio-visual data are 40.9, 58.2 and 63.6 percent respectively. Recognition rates are highest for neutral, followed by happy, anger, disgust, fear, and sad. Average intensity levels of emotion are rated highest for visual-only perception. The accurate recognition of disgust and fear requires simultaneous audio-visual cues, while anger and happiness can be well recognized based on evidence from a single modality. The large dataset we introduce can be used to probe other questions concerning the audio-visual perception of emotion.

中文翻译：

CREMA-D：众包情感多模态参与者数据集

人们通过面部和声音传达他们的情绪状态。我们提出了一个非常适合研究多模态情感表达和感知的视听数据集。该数据集由在一系列基本情绪状态（快乐、悲伤、愤怒、恐惧、厌恶和中性）下说出的句子中的面部和声音情感表达组成。多个评分者以三种方式对 91 位具有不同种族背景的演员的 7,442 个片段进行评分：音频、视频和视听。使用众包从 2,443 名评分者那里收集感知情绪的分类情绪标签和真实值强度值。人类对纯音频、纯视觉和视听数据的预期情感识别率分别为 40.9%、58.2% 和 63.6%。中性的识别率最高，其次是快乐、愤怒、厌恶、恐惧、和悲伤。情绪的平均强度水平在仅视觉感知方面被评为最高。对厌恶和恐惧的准确识别需要同时进行视听提示，而愤怒和快乐可以根据来自单一模态的证据很好地识别。我们引入的大型数据集可用于探讨有关情感视听感知的其他问题。

更新日期：2014-10-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>