M2Lens: Visualizing and Explaining Multimodal Models for Sentiment Analysis,arXiv - CS - Multimedia

当前位置： X-MOL 学术 › arXiv.cs.MM › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

M2Lens: Visualizing and Explaining Multimodal Models for Sentiment Analysis
arXiv - CS - Multimedia Pub Date : 2021-07-17 , DOI: arxiv-2107.08264
Xingbo Wang, Jianben He, Zhihua Jin, Muqiao Yang, Huamin Qu

Multimodal sentiment analysis aims to recognize people's attitudes from multiple communication channels such as verbal content (i.e., text), voice, and facial expressions. It has become a vibrant and important research topic in natural language processing. Much research focuses on modeling the complex intra- and inter-modal interactions between different communication channels. However, current multimodal models with strong performance are often deep-learning-based techniques and work like black boxes. It is not clear how models utilize multimodal information for sentiment predictions. Despite recent advances in techniques for enhancing the explainability of machine learning models, they often target unimodal scenarios (e.g., images, sentences), and little research has been done on explaining multimodal models. In this paper, we present an interactive visual analytics system, M2Lens, to visualize and explain multimodal models for sentiment analysis. M2Lens provides explanations on intra- and inter-modal interactions at the global, subset, and local levels. Specifically, it summarizes the influence of three typical interaction types (i.e., dominance, complement, and conflict) on the model predictions. Moreover, M2Lens identifies frequent and influential multimodal features and supports the multi-faceted exploration of model behaviors from language, acoustic, and visual modalities. Through two case studies and expert interviews, we demonstrate our system can help users gain deep insights into the multimodal models for sentiment analysis.

中文翻译：

M2Lens：用于情感分析的多模态模型的可视化和解释

多模态情感分析旨在从语言内容（即文本）、语音和面部表情等多种沟通渠道中识别人们的态度。它已成为自然语言处理中一个充满活力和重要的研究课题。许多研究侧重于对不同通信渠道之间复杂的模式内和模式间交互进行建模。然而，当前具有强大性能的多模态模型通常是基于深度学习的技术，并且像黑匣子一样工作。目前尚不清楚模型如何利用多模态信息进行情感预测。尽管最近在增强机器学习模型可解释性的技术方面取得了进展，但它们通常针对单模态场景（例如，图像、句子），并且对解释多模态模型的研究很少。在本文中，我们提出了一个交互式可视化分析系统 M2Lens，用于可视化和解释用于情感分析的多模态模型。M2Lens 解释了全局、子集和局部级别的模式内和模式间交互。具体来说，它总结了三种典型的交互类型（即优势、互补和冲突）对模型预测的影响。此外，M2Lens 识别频繁且有影响的多模态特征，并支持从语言、声学和视觉模态对模型行为的多方面探索。通过两个案例研究和专家访谈，我们证明了我们的系统可以帮助用户深入了解用于情感分析的多模态模型。M2Lens 解释了全局、子集和局部级别的模式内和模式间交互。具体来说，它总结了三种典型的交互类型（即优势、互补和冲突）对模型预测的影响。此外，M2Lens 识别频繁且有影响的多模态特征，并支持从语言、声学和视觉模态对模型行为的多方面探索。通过两个案例研究和专家访谈，我们证明了我们的系统可以帮助用户深入了解用于情感分析的多模态模型。M2Lens 解释了全局、子集和局部级别的模式内和模式间交互。具体来说，它总结了三种典型的交互类型（即优势、互补和冲突）对模型预测的影响。此外，M2Lens 识别频繁且有影响的多模态特征，并支持从语言、声学和视觉模态对模型行为的多方面探索。通过两个案例研究和专家访谈，我们证明了我们的系统可以帮助用户深入了解用于情感分析的多模态模型。M2Lens 识别频繁且有影响的多模态特征，并支持从语言、声学和视觉模态对模型行为的多方面探索。通过两个案例研究和专家访谈，我们证明了我们的系统可以帮助用户深入了解用于情感分析的多模态模型。M2Lens 识别频繁且有影响的多模态特征，并支持从语言、声学和视觉模态对模型行为的多方面探索。通过两个案例研究和专家访谈，我们证明了我们的系统可以帮助用户深入了解用于情感分析的多模态模型。

更新日期：2021-07-20

点击分享查看原文

点击收藏

阅读更多本刊最新论文