A Multimodal Memes Classification: A Survey and Open Research Issues,arXiv - CS - Multimedia

当前位置： X-MOL 学术 › arXiv.cs.MM › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Multimodal Memes Classification: A Survey and Open Research Issues
arXiv - CS - Multimedia Pub Date : 2020-09-17 , DOI: arxiv-2009.08395
Tariq Habib Afridi, Aftab Alam, Muhammad Numan Khan, Jawad Khan, Young-Koo Lee

Memes are graphics and text overlapped so that together they present concepts that become dubious if one of them is absent. It is spread mostly on social media platforms, in the form of jokes, sarcasm, motivating, etc. After the success of BERT in Natural Language Processing (NLP), researchers inclined to Visual-Linguistic (VL) multimodal problems like memes classification, image captioning, Visual Question Answering (VQA), and many more. Unfortunately, many memes get uploaded each day on social media platforms that need automatic censoring to curb misinformation and hate. Recently, this issue has attracted the attention of researchers and practitioners. State-of-the-art methods that performed significantly on other VL dataset, tends to fail on memes classification. In this context, this work aims to conduct a comprehensive study on memes classification, generally on the VL multimodal problems and cutting edge solutions. We propose a generalized framework for VL problems. We cover the early and next-generation works on VL problems. Finally, we identify and articulate several open research issues and challenges. This is the first study that presents the generalized view of the advanced classification techniques concerning memes classification to the best of our knowledge. We believe this study presents a clear road-map for the Machine Learning (ML) research community to implement and enhance memes classification techniques.

中文翻译：

多模态模因分类：调查和开放研究问题

模因是重叠的图形和文本，因此它们一起呈现的概念如果缺少其中一个就会变得可疑。它主要以笑话、讽刺、激励等形式在社交媒体平台上传播。 BERT 在自然语言处理 (NLP) 中取得成功后，研究人员倾向于关注视觉语言 (VL) 多模态问题，如模因分类、图像字幕、视觉问答 (VQA) 等等。不幸的是，每天都有许多模因上传到社交媒体平台上，这些平台需要自动审查以遏制错误信息和仇恨。最近，这个问题引起了研究人员和从业者的关注。在其他 VL 数据集上表现显着的最先进方法往往在模因分类上失败。在这种情况下，这项工作旨在对模因分类进行全面研究，一般是关于 VL 多模态问题和前沿解决方案。我们为 VL 问题提出了一个通用框架。我们涵盖了 VL 问题的早期和下一代工作。最后，我们确定并阐明了几个开放的研究问题和挑战。这是第一项研究，据我们所知，介绍了有关模因分类的高级分类技术的一般观点。我们相信这项研究为机器学习 (ML) 研究社区实施和增强模因分类技术提供了清晰的路线图。我们确定并阐明了几个开放的研究问题和挑战。这是第一项研究，据我们所知，介绍了有关模因分类的高级分类技术的一般观点。我们相信这项研究为机器学习 (ML) 研究社区实施和增强模因分类技术提供了清晰的路线图。我们确定并阐明了几个开放的研究问题和挑战。这是第一项研究，据我们所知，介绍了有关模因分类的高级分类技术的一般观点。我们相信这项研究为机器学习 (ML) 研究社区实施和增强模因分类技术提供了清晰的路线图。

更新日期：2020-09-18

点击分享查看原文

点击收藏

阅读更多本刊最新论文