A Novel Context-Aware Multimodal Framework for Persian Sentiment Analysis,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Novel Context-Aware Multimodal Framework for Persian Sentiment Analysis
arXiv - CS - Computation and Language Pub Date : 2021-03-03 , DOI: arxiv-2103.02636
Kia Dashtipour, Mandar Gogate, Erik Cambria, Amir Hussain

Most recent works on sentiment analysis have exploited the text modality. However, millions of hours of video recordings posted on social media platforms everyday hold vital unstructured information that can be exploited to more effectively gauge public perception. Multimodal sentiment analysis offers an innovative solution to computationally understand and harvest sentiments from videos by contextually exploiting audio, visual and textual cues. In this paper, we, firstly, present a first of its kind Persian multimodal dataset comprising more than 800 utterances, as a benchmark resource for researchers to evaluate multimodal sentiment analysis approaches in Persian language. Secondly, we present a novel context-aware multimodal sentiment analysis framework, that simultaneously exploits acoustic, visual and textual cues to more accurately determine the expressed sentiment. We employ both decision-level (late) and feature-level (early) fusion methods to integrate affective cross-modal information. Experimental results demonstrate that the contextual integration of multimodal features such as textual, acoustic and visual features deliver better performance (91.39%) compared to unimodal features (89.24%).

中文翻译：

一种新颖的上下文感知多模式波斯情感分析框架

关于情感分析的最新作品都利用了文本形式。但是，每天在社交媒体平台上发布的数百万小时的录像内容都包含重要的非结构化信息，可以利用这些信息更有效地评估公众的看法。多模式情感分析提供了一种创新的解决方案，可以通过上下文相关地利用音频，视觉和文本提示来从视频上计算地理解和收集视频中的情感。在本文中，我们首先介绍了第一个包含800多个语音的波斯多峰数据集，作为研究人员评估波斯语多峰情感分析方法的基准资源。其次，我们提出了一个新颖的情境感知多峰情感分析框架，该框架同时利用声学，视觉和文字提示，以更准确地确定所表达的情感。我们同时采用决策级（后期）和特征级（早期）融合方法来整合有效的跨模式信息。实验结果表明，与单峰特征（89.24％）相比，文本，声音和视觉特征等多峰特征的上下文集成具有更好的性能（91.39％）。

更新日期：2021-03-05

点击分享查看原文

点击收藏

阅读更多本刊最新论文