Few-shot Food Recognition via Multi-view Representation Learning,ACM Transactions on Multimedia Computing, Communications, and Applications

当前位置： X-MOL 学术 › ACM Trans. Multimed. Comput. Commun. Appl. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Few-shot Food Recognition via Multi-view Representation Learning
ACM Transactions on Multimedia Computing, Communications, and Applications ( IF 5.2 ) Pub Date : 2020-07-07 , DOI: 10.1145/3391624
Shuqiang Jiang ₁ , Weiqing Min ₁ , Yongqiang Lyu ₂ , Linhu Liu ₁

Affiliation

This article considers the problem of few-shot learning for food recognition. Automatic food recognition can support various applications, e.g., dietary assessment and food journaling. Most existing works focus on food recognition with large numbers of labelled samples, and fail to recognize food categories with few samples. To address this problem, we propose a Multi-View Few-Shot Learning (MVFSL) framework to explore additional ingredient information for few-shot food recognition. Besides category-oriented deep visual features, we introduce ingredient-supervised deep network to extract ingredient-oriented features. As general and intermediate attributes of food, ingredient-oriented features are informative and complementary to category-oriented features, and thus they play an important role in improving food recognition. Particularly in few-shot food recognition, ingredient information can bridge the gap between disjoint training categories and test categories. To take advantage of ingredient information, we fuse these two kinds of features by first combining their feature maps from their respective deep networks and then convolving combined feature maps. Such convolution is further incorporated into a multi-view relation network, which is capable of comparing pairwise images to enable fine-grained feature learning. MVFSL is trained in an end-to-end fashion for joint optimization on two types of feature learning subnetworks and relation subnetworks. Extensive experiments on different food datasets have consistently demonstrated the advantage of MVFSL in multi-view feature fusion. Furthermore, we extend another two types of networks, namely, Siamese Network and Matching Network, by introducing ingredient information for few-shot food recognition. Experimental results have also shown that introducing ingredient information into these two networks can improve the performance of few-shot food recognition.

中文翻译：

基于多视图表示学习的小样本食物识别

本文考虑了食物识别的少样本学习问题。自动食物识别可以支持各种应用，例如饮食评估和食物日记。大多数现有的工作都集中在具有大量标记样本的食物识别上，而无法识别具有少量样本的食物类别。为了解决这个问题，我们提出了一个多视图 Few-Shot 学习 (MVFSL) 框架来探索用于食物识别的附加成分信息。除了面向类别的深度视觉特征外，我们还引入了成分监督深度网络来提取面向成分的特征。作为食品的一般属性和中间属性，面向成分的特征与面向类别的特征具有信息性和互补性，因此它们在提高食品识别方面发挥着重要作用。特别是在小样本食品识别中，成分信息可以弥合不相交的训练类别和测试类别之间的差距。为了利用成分信息，我们融合这两种特征，首先将它们各自深度网络的特征图组合起来，然后对组合的特征图进行卷积。这种卷积被进一步合并到一个多视图关系网络中，该网络能够比较成对的图像以实现细粒度的特征学习。MVFSL 以端到端的方式进行训练，用于对两种类型的特征学习子网络和关系子网络进行联合优化。对不同食物数据集的大量实验一致证明了 MVFSL 在多视图特征融合中的优势。此外，我们扩展了另外两种类型的网络，即 Siamese Network 和 Matching Network，通过引入成分信息进行少量食物识别。实验结果还表明，将成分信息引入这两个网络可以提高few-shot食物识别的性能。

更新日期：2020-07-07

点击分享查看原文

点击收藏

阅读更多本刊最新论文