当前位置: X-MOL 学术Speech Commun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Use of affect context in dyadic interactions for continuous emotion recognition
Speech Communication ( IF 2.4 ) Pub Date : 2021-05-25 , DOI: 10.1016/j.specom.2021.05.010
Syeda Narjis Fatima , Engin Erzin

Emotional dependencies play a crucial role in understanding complexities of dyadic interactions. Recent studies have shown that affect recognition tasks can benefit by the incorporation of a particular interaction’s context, however, the investigation of affect context in dyadic settings using neural network frameworks remains a complex and open problem. In this paper, we formulate the concept of dyadic affect context (DAC) and propose convolutional neural network (CNN) based architectures to model and incorporate DAC to improve continuous emotion recognition (CER) in dyadic scenarios. We begin by defining a CNN architecture for single-subject CER-based on speech and body motion data. We then introduce dyadic CER as a two-stage regression framework. Specifically, we propose two dyadic CNN architectures where cross-speaker affect contribution to the CER task is achieved by: (i) the fusion of cross-subject affect (FoA) or (ii) the fusion of cross-subject feature maps (FoM). Based on the preceding dyadic models, we finally propose a new Convolutional LSTM (ConvLSTM) model for the dyadic CER. ConvLSTM architecture captures local spectro-temporal correlations in speech and body motion as well as the long-term affect inter-dependencies between subjects. Our multimodal analysis demonstrates that modeling and incorporation of the DAC in the proposed CER models provide significant performance improvements on the USC CreativeIT database and the achieved results compare favorably to the state-of-the-art.



中文翻译:

在二元交互中使用情感上下文进行连续情感识别

情感依赖在理解二元交互的复杂性方面起着至关重要的作用。最近的研究表明,情感识别任务可以通过结合特定交互的上下文而受益,但是,使用神经网络框架在二元设置中研究情感上下文仍然是一个复杂而开放的问题。在本文中,我们制定了二元情感上下文 (DAC) 的概念,并提出了基于卷积神经网络 (CNN) 的架构来建模和合并 DAC,以改进二元场景中的连续情感识别 (CER)。我们首先为基于语音和身体运动数据的单主题 CER 定义 CNN 架构。然后我们引入二元 CER 作为两阶段回归框架。具体来说,我们提出了两种二元 CNN 架构,其中跨说话者影响对 CER 任务的贡献是通过以下方式实现的:(i)跨主题影响(FoA)的融合或(ii)跨主题特征图(FoM)的融合。基于前面的二元模型,我们最终为二元 CER 提出了一种新的卷积 LSTM(ConvLSTM)模型。ConvLSTM 架构捕获语音和身体运动中的局部光谱时间相关性以及受试者之间的长期影响相互依赖性。我们的多模态分析表明,在提议的 CER 模型中对 DAC 进行建模和合并,显着提高了 USC CreativeIT 数据库的性能,并且所取得的结果与最先进的结果相比毫不逊色。(i) 跨学科影响 (FoA) 的融合或 (ii) 跨学科特征图 (FoM) 的融合。基于前面的二元模型,我们最终为二元 CER 提出了一种新的卷积 LSTM(ConvLSTM)模型。ConvLSTM 架构捕获语音和身体运动中的局部光谱时间相关性以及受试者之间的长期影响相互依赖性。我们的多模态分析表明,在提议的 CER 模型中对 DAC 进行建模和合并,显着提高了 USC CreativeIT 数据库的性能,并且所取得的结果与最先进的技术相比具有优势。(i) 跨学科影响 (FoA) 的融合或 (ii) 跨学科特征图 (FoM) 的融合。基于前面的二元模型,我们最终为二元 CER 提出了一种新的卷积 LSTM(ConvLSTM)模型。ConvLSTM 架构捕获语音和身体运动中的局部光谱时间相关性以及受试者之间的长期影响相互依赖性。我们的多模态分析表明,在提议的 CER 模型中对 DAC 进行建模和合并,显着提高了 USC CreativeIT 数据库的性能,并且所取得的结果与最先进的技术相比具有优势。ConvLSTM 架构捕获语音和身体运动中的局部光谱时间相关性以及受试者之间的长期影响相互依赖性。我们的多模态分析表明,在提议的 CER 模型中对 DAC 进行建模和合并,显着提高了 USC CreativeIT 数据库的性能,并且所取得的结果与最先进的技术相比具有优势。ConvLSTM 架构捕获语音和身体运动中的局部光谱时间相关性以及受试者之间的长期影响相互依赖性。我们的多模态分析表明,在提议的 CER 模型中对 DAC 进行建模和合并,显着提高了 USC CreativeIT 数据库的性能,并且所取得的结果与最先进的技术相比具有优势。

更新日期:2021-06-09
down
wechat
bug