Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images and Recipes with Semantic Consistency and Attention Mechanism,arXiv - CS - Multimedia

当前位置： X-MOL 学术 › arXiv.cs.MM › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images and Recipes with Semantic Consistency and Attention Mechanism
arXiv - CS - Multimedia Pub Date : 2020-03-09 , DOI: arxiv-2003.03955
Hao Wang, Doyen Sahoo, Chenghao Liu, Ke Shu, Palakorn Achananuparp, Ee-peng Lim, Steven C. H. Hoi

Cross-modal food retrieval is an important task to perform analysis of food-related information, such as food images and cooking recipes. The goal is to learn an embedding of images and recipes in a common feature space, so that precise matching can be realized. Compared with existing cross-modal retrieval approaches, two major challenges in this specific problem are: 1) the large intra-class variance across cross-modal food data; and 2) the difficulties in obtaining discriminative recipe representations. To address these problems, we propose Semantic-Consistent and Attention-based Networks (SCAN), which regularize the embeddings of the two modalities by aligning output semantic probabilities. In addition, we exploit self-attention mechanism to improve the embedding of recipes. We evaluate the performance of the proposed method on the large-scale Recipe1M dataset, and the result shows that it outperforms the state-of-the-art.

中文翻译：

跨模式食物检索：学习具有语义一致性和注意机制的食物图像和食谱的联合嵌入

跨模态食物检索是分析食物相关信息（例如食物图像和烹饪食谱）的一项重要任务。目标是学习在公共特征空间中嵌入图像和食谱，从而实现精确匹配。与现有的跨模态检索方法相比，这个特定问题的两个主要挑战是：1）跨模态食物数据的大类内方差；和 2) 难以获得有区别的食谱表示。为了解决这些问题，我们提出了语义一致和基于注意力的网络（SCAN），它通过对齐输出语义概率来规范两种模态的嵌入。此外，我们利用自注意力机制来改进食谱的嵌入。

更新日期：2020-03-10

点击分享查看原文

点击收藏

阅读更多本刊最新论文