当前位置: X-MOL 学术arXiv.cs.MM › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images and Recipes with Semantic Consistency and Attention Mechanism
arXiv - CS - Multimedia Pub Date : 2020-03-09 , DOI: arxiv-2003.03955
Hao Wang, Doyen Sahoo, Chenghao Liu, Ke Shu, Palakorn Achananuparp, Ee-peng Lim, Steven C. H. Hoi

Cross-modal food retrieval is an important task to perform analysis of food-related information, such as food images and cooking recipes. The goal is to learn an embedding of images and recipes in a common feature space, so that precise matching can be realized. Compared with existing cross-modal retrieval approaches, two major challenges in this specific problem are: 1) the large intra-class variance across cross-modal food data; and 2) the difficulties in obtaining discriminative recipe representations. To address these problems, we propose Semantic-Consistent and Attention-based Networks (SCAN), which regularize the embeddings of the two modalities by aligning output semantic probabilities. In addition, we exploit self-attention mechanism to improve the embedding of recipes. We evaluate the performance of the proposed method on the large-scale Recipe1M dataset, and the result shows that it outperforms the state-of-the-art.

中文翻译:

跨模式食物检索:学习具有语义一致性和注意机制的食物图像和食谱的联合嵌入

跨模态食物检索是分析食物相关信息(例如食物图像和烹饪食谱)的一项重要任务。目标是学习在公共特征空间中嵌入图像和食谱,从而实现精确匹配。与现有的跨模态检索方法相比,这个特定问题的两个主要挑战是:1)跨模态食物数据的大类内方差;和 2) 难以获得有区别的食谱表示。为了解决这些问题,我们提出了语义一致和基于注意力的网络(SCAN),它通过对齐输出语义概率来规范两种模态的嵌入。此外,我们利用自注意力机制来改进食谱的嵌入。
更新日期:2020-03-10
down
wechat
bug