Deep Attentive Multimodal Network Representation Learning for Social Media Images,ACM Transactions on Internet Technology

当前位置： X-MOL 学术 › ACM Trans. Internet Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Deep Attentive Multimodal Network Representation Learning for Social Media Images
ACM Transactions on Internet Technology ( IF 3.9 ) Pub Date : 2021-06-16 , DOI: 10.1145/3417295
Feiran Huang ₁ , Chaozhuo Li ₂ , Boyu Gao ₁ , Yun Liu ₃ , Sattam Alotaibi ₄ , Hao Chen ₃

Affiliation

The analysis for social networks, such as the socially connected Internet of Things, has shown a deep influence of intelligent information processing technology on industrial systems for Smart Cities. The goal of social media representation learning is to learn dense, low-dimensional, and continuous representations for multimodal data within social networks, facilitating many real-world applications. Since social media images are usually accompanied by rich metadata (e.g., textual descriptions, tags, groups, and submitted users), simply modeling the image is not effective to learn the comprehensive information from social media images. In this work, we treat the image and its textual description as multimodal content, and transform other metainformation into the links between contents (such as two images marked by the same tag or submitted by the same user). Based on the multimodal content and social links, we propose a Deep Attentive Multimodal Graph Embedding model named DAMGE for more effective social image representation learning. We introduce both small- and large-scale datasets to conduct extensive experiments, of which the results confirm the superiority of the proposal on the tasks of social image classification and link prediction.

中文翻译：

社交媒体图像的深度注意力多模态网络表示学习

对社交网络的分析，例如社交连接的物联网，显示了智能信息处理技术对智慧城市工业系统的深刻影响。社交媒体表示学习的目标是学习社交网络中多模态数据的密集、低维和连续表示，从而促进许多现实世界的应用。由于社交媒体图像通常伴随着丰富的元数据（例如，文本描述、标签、组和提交的用户），简单地对图像进行建模并不能有效地从社交媒体图像中学习全面的信息。在这项工作中，我们将图像及其文本描述视为多模态内容，并将其他元信息转化为内容之间的链接（如同一标签标记或同一用户提交的两张图片）。基于多模态内容和社交链接，我们提出了一个深度注意力多模态图嵌入模型命名伤害用于更有效的社会图像表征学习。我们引入了小型和大型数据集进行广泛的实验，结果证实了该提议在社交图像分类和链接预测任务上的优越性。

更新日期：2021-06-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11