Acoustic and Lexical Representations for Affect Prediction in Spontaneous Conversations.,Computer Speech & Language

当前位置： X-MOL 学术 › Comput. Speech Lang › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Acoustic and Lexical Representations for Affect Prediction in Spontaneous Conversations.
Computer Speech & Language ( IF 3.1 ) Pub Date : 2014-11-11 , DOI: 10.1016/j.csl.2014.04.002
Houwei Cao ₁ , Arman Savran ₁ , Ragini Verma ₁ , Ani Nenkova ₂

Affiliation

In this article we investigate what representations of acoustics and word usage are most suitable for predicting dimensions of affect|AROUSAL, VALANCE, POWER and EXPECTANCY|in spontaneous interactions. Our experiments are based on the AVEC 2012 challenge dataset. For lexical representations, we compare corpus-independent features based on psychological word norms of emotional dimensions, as well as corpus-dependent representations. We find that corpus-dependent bag of words approach with mutual information between word and emotion dimensions is by far the best representation. For the analysis of acoustics, we zero in on the question of granularity. We confirm on our corpus that utterance-level features are more predictive than word-level features. Further, we study more detailed representations in which the utterance is divided into regions of interest (ROI), each with separate representation. We introduce two ROI representations, which significantly outperform less informed approaches. In addition we show that acoustic models of emotion can be improved considerably by taking into account annotator agreement and training the model on smaller but reliable dataset. Finally we discuss the potential for improving prediction by combining the lexical and acoustic modalities. Simple fusion methods do not lead to consistent improvements over lexical classifiers alone but improve over acoustic models.

中文翻译：

自发会话中的情感预测的声音和词汇表示。

在本文中，我们研究了哪种声音和词语用法最适合预测自发交互中的情感，效价，能力和期望的维度。我们的实验基于AVEC 2012挑战数据集。对于词汇表述，我们根据情绪维度的心理词规范以及与语料库相关的表示，比较与语料库无关的功能。我们发现到目前为止，与语料库相关的词袋方法在词和情感维度之间具有相互信息，是最好的表示。对于声学分析，我们关注粒度问题。我们在语料库上确认话语级特征比单词级特征更具预测性。进一步，我们研究了更详细的表示，其中将话语划分为感兴趣的区域（ROI），每个区域都有单独的表示。我们介绍了两种ROI表示形式，它们的表现明显好于知识较少的方法。此外，我们表明，通过考虑注释者的同意并在较小但可靠的数据集上训练模型，可以大大改善情感的声学模型。最后，我们讨论了通过结合词汇和声学模态来改善预测的潜力。简单的融合方法并不能单独对词法分类器进行一致的改进，而可以对声学模型进行改进。此外，我们表明，通过考虑注释者的同意并在较小但可靠的数据集上训练模型，可以大大改善情感的声学模型。最后，我们讨论了通过结合词汇和声学模态来改善预测的潜力。简单的融合方法并不能单独对词法分类器进行一致的改进，而可以对声学模型进行改进。此外，我们表明，通过考虑注释者的同意并在较小但可靠的数据集上训练模型，可以大大改善情感的声学模型。最后，我们讨论了通过结合词汇和声学模态来改善预测的潜力。简单的融合方法并不能单独对词法分类器进行一致的改进，而可以对声学模型进行改进。

更新日期：2019-11-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文