当前位置: X-MOL 学术arXiv.cs.CL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Hierachical Delta-Attention Method for Multimodal Fusion
arXiv - CS - Computation and Language Pub Date : 2020-11-22 , DOI: arxiv-2011.10916
Kunjal Panchal

In vision and linguistics; the main input modalities are facial expressions, speech patterns, and the words uttered. The issue with analysis of any one mode of expression (Visual, Verbal or Vocal) is that lot of contextual information can get lost. This asks researchers to inspect multiple modalities to get a thorough understanding of the cross-modal dependencies and temporal context of the situation to analyze the expression. This work attempts at preserving the long-range dependencies within and across different modalities, which would be bottle-necked by the use of recurrent networks and adds the concept of delta-attention to focus on local differences per modality to capture the idiosyncrasy of different people. We explore a cross-attention fusion technique to get the global view of the emotion expressed through these delta-self-attended modalities, in order to fuse all the local nuances and global context together. The addition of attention is new to the multi-modal fusion field and currently being scrutinized for on what stage the attention mechanism should be used, this work achieves competitive accuracy for overall and per-class classification which is close to the current state-of-the-art with almost half number of parameters.

中文翻译:

多峰融合的分层增量注意法

在视觉和语言学方面;主要的输入方式是面部表情,语音模式和说出的单词。分析任何一种表达方式(视觉,语言或人声)的问题是很多上下文信息可能会丢失。这要求研究人员检查多种模式,以全面了解交叉模式的依存关系和情况的时态背景,以分析表达式。这项工作试图保留不同模态内部和不同模态之间的长期依赖性,这将通过使用递归网络而成为瓶颈,并增加了增量注意的概念,以关注每种模态的局部差异,以捕获不同人群的特质。 。我们探索一种跨注意力融合技术,以获取通过这些自律式表达方式表达的情感的全局视图,以便将所有局部细微差别和全局上下文融合在一起。注意力的增加是多模式融合领域的新事物,目前正在仔细研究应在哪个阶段使用注意力机制,这项工作在总体分类和每类分类方面都达到了竞争性的准确性,接近当前的状态。具有几乎一半参数的最新技术。
更新日期:2020-11-25
down
wechat
bug