Learning Cross-Modal Embeddings with Adversarial Networks for Cooking Recipes and Food Images,arXiv - CS - Multimedia

当前位置： X-MOL 学术 › arXiv.cs.MM › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Learning Cross-Modal Embeddings with Adversarial Networks for Cooking Recipes and Food Images
arXiv - CS - Multimedia Pub Date : 2019-05-03 , DOI: arxiv-1905.01273
Hao Wang, Doyen Sahoo, Chenghao Liu, Ee-peng Lim, Steven C. H. Hoi

Food computing is playing an increasingly important role in human daily life, and has found tremendous applications in guiding human behavior towards smart food consumption and healthy lifestyle. An important task under the food-computing umbrella is retrieval, which is particularly helpful for health related applications, where we are interested in retrieving important information about food (e.g., ingredients, nutrition, etc.). In this paper, we investigate an open research task of cross-modal retrieval between cooking recipes and food images, and propose a novel framework Adversarial Cross-Modal Embedding (ACME) to resolve the cross-modal retrieval task in food domains. Specifically, the goal is to learn a common embedding feature space between the two modalities, in which our approach consists of several novel ideas: (i) learning by using a new triplet loss scheme together with an effective sampling strategy, (ii) imposing modality alignment using an adversarial learning strategy, and (iii) imposing cross-modal translation consistency such that the embedding of one modality is able to recover some important information of corresponding instances in the other modality. ACME achieves the state-of-the-art performance on the benchmark Recipe1M dataset, validating the efficacy of the proposed technique.

中文翻译：

使用对抗网络学习烹饪食谱和食物图像的跨模态嵌入

食品计算在人类日常生活中发挥着越来越重要的作用，并且在引导人类行为走向智能食品消费和健康生活方式方面有着巨大的应用。食品计算伞下的一项重要任务是检索，这对健康相关应用特别有用，我们有兴趣检索有关食品的重要信息（例如，成分、营养等）。在本文中，我们研究了烹饪食谱和食物图像之间跨模态检索的开放研究任务，并提出了一种新的框架对抗性跨模态嵌入（ACME）来解决食物领域的跨模态检索任务。具体来说，我们的目标是学习两种模式之间的公共嵌入特征空间，其中我们的方法包含几个新颖的想法：(i) 通过使用新的三元组损失方案和有效的采样策略进行学习，(ii) 使用对抗性学习策略强加模态对齐，以及 (iii) 强加跨模态翻译一致性，使得一种模态的嵌入能够在其他模态中恢复相应实例的一些重要信息。ACME 在基准 Recipe1M 数据集上实现了最先进的性能，验证了所提出技术的有效性。

更新日期：2020-03-10

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>