当前位置: X-MOL 学术IEEE Trans. Cybern. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
AIA-Net: Adaptive Interactive Attention Network for Text-Audio Emotion Recognition.
IEEE Transactions on Cybernetics ( IF 11.8 ) Pub Date : 2023-11-29 , DOI: 10.1109/tcyb.2022.3195739
Tong Zhang 1 , Shuzhen Li 1 , Bianna Chen 1 , Haozhang Yuan 1 , C. L. Philip Chen 1
Affiliation  

Emotion recognition based on text-audio modalities is the core technology for transforming a graphical user interface into a voice user interface, and it plays a vital role in natural human-computer interaction systems. Currently, mainstream multimodal learning research has designed various fusion strategies to learn intermodality interactions but hardly considers that not all modalities play equal roles in emotion recognition. Therefore, the main challenge in multimodal emotion recognition is how to implement effective fusion algorithms based on the auxiliary structure. To address this problem, this article proposes an adaptive interactive attention network (AIA-Net). In AIA-Net, text is treated as a primary modality, and audio is an auxiliary modality. AIA-Net adapts to textual and acoustic features with different dimensions and learns their dynamic interactive relations in a more flexible way. The interactive relations are encoded as interactive attention weights to focus on the acoustic features that are effective for textual emotional representations. AIA-Net performs well in adaptively assisting the textual emotional representation with the acoustic emotional information. Moreover, multiple collaborative learning (co-learning) layers of AIA-Net achieve multiple multimodal interactions and the deep bottom-up evolution of emotional representations. Experimental results on three benchmark datasets demonstrate the great effectiveness of the proposed method over the state-of-the-art methods.

中文翻译:

AIA-Net:用于文本音频情感识别的自适应交互式注意网络。

基于文本音频模态的情感识别是将图形用户界面转变为语音用户界面的核心技术,在自然人机交互系统中发挥着至关重要的作用。目前,主流的多模态学习研究已经设计了各种融合策略来学习跨模态交互,但很少考虑到并非所有模态在情感识别中都发挥同等作用。因此,多模态情感识别的主要挑战是如何基于辅助结构实现有效的融合算法。为了解决这个问题,本文提出了一种自适应交互式注意网络(AIA-Net)。在 AIA-Net 中,文本被视为主要模态,而音频是辅助模态。AIA-Net适应不同维度的文本和声学特征,并以更灵活的方式学习它们的动态交互关系。交互关系被编码为交互注意力权重,以关注对文本情感表示有效的声学特征。AIA-Net 在利用声学情感信息自适应地辅助文本情感表示方面表现良好。此外,AIA-Net 的多个协作学习(共同学习)层实现了多种多模态交互和情感表征的自下而上的深度演化。三个基准数据集的实验结果证明了所提出的方法相对于最先进的方法的巨大有效性。
更新日期:2022-08-22
down
wechat
bug