当前位置: X-MOL 学术ACM Trans. Multimed. Comput. Commun. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Socializing the Videos: A Multimodal Approach for Social Relation Recognition
ACM Transactions on Multimedia Computing, Communications, and Applications ( IF 5.2 ) Pub Date : 2021-04-16 , DOI: 10.1145/3416493
Tong Xu 1 , Peilun Zhou 1 , Linkang Hu 1 , Xiangnan He 1 , Yao Hu 2 , Enhong Chen 1
Affiliation  

As a crucial task for video analysis, social relation recognition for characters not only provides semantically rich description of video content but also supports intelligent applications, e.g., video retrieval and visual question answering. Unfortunately, due to the semantic gap between visual and semantic features, traditional solutions may fail to reveal the accurate relations among characters. At the same time, the development of social media platforms has now promoted the emergence of crowdsourced comments, which may enhance the recognition task with semantic and descriptive cues. To that end, in this article, we propose a novel multimodal-based solution to deal with the character relation recognition task. Specifically, we capture the target character pairs via a search module and then design a multistream architecture for jointly embedding the visual and textual information, in which feature fusion and attention mechanism are adapted for better integrating the multimodal inputs. Finally, supervised learning is applied to classify character relations. Experiments on real-world data sets validate that our solution outperforms several competitive baselines.

中文翻译:

视频社交化:社交关系识别的多模式方法

作为视频分析的一项关键任务,字符社会关系识别不仅提供了视频内容的丰富语义描述,而且还支持智能应用,例如视频检索和视觉问答。不幸的是,由于视觉和语义特征之间的语义差距,传统的解决方案可能无法揭示字符之间的准确关系。同时,社交媒体平台的发展现在促进了众包评论的出现,这可能会增强具有语义和描述性线索的识别任务。为此,在本文中,我们提出了一种新颖的基于多模态的解决方案来处理字符关系识别任务。具体来说,我们通过搜索模块捕获目标字符对,然后设计用于联合嵌入视觉和文本信息的多流架构,其中调整特征融合和注意机制以更好地集成多模态输入。最后,监督学习被应用于对人物关系进行分类。对真实世界数据集的实验验证了我们的解决方案优于几个竞争基准。
更新日期:2021-04-16
down
wechat
bug