Multi-Scale Multi-View Deep Feature Aggregation for Food Recognition.,IEEE Transactions on Image Processing

当前位置： X-MOL 学术 › IEEE Trans. Image Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multi-Scale Multi-View Deep Feature Aggregation for Food Recognition.
IEEE Transactions on Image Processing ( IF 10.6 ) Pub Date : 2019-07-29 , DOI: 10.1109/tip.2019.2929447
Shuqiang Jiang , Weiqing Min , Linhu Liu , Zhengdong Luo

Recently, food recognition has received more and more attention in image processing and computer vision for its great potential applications in human health. Most of the existing methods directly extracted deep visual features via convolutional neural networks (CNNs) for food recognition. Such methods ignore the characteristics of food images and are, thus, hard to achieve optimal recognition performance. In contrast to general object recognition, food images typically do not exhibit distinctive spatial arrangement and common semantic patterns. In this paper, we propose a multi-scale multi-view feature aggregation (MSMVFA) scheme for food recognition. MSMVFA can aggregate high-level semantic features, mid-level attribute features, and deep visual features into a unified representation. These three types of features describe the food image from different granularity. Therefore, the aggregated features can capture the semantics of food images with the greatest probability. For that solution, we utilize additional ingredient knowledge to obtain mid-level attribute representation via ingredient-supervised CNNs. High-level semantic features and deep visual features are extracted from class-supervised CNNs. Considering food images do not exhibit distinctive spatial layout in many cases, MSMVFA fuses multi-scale CNN activations for each type of features to make aggregated features more discriminative and invariable to geometrical deformation. Finally, the aggregated features are more robust, comprehensive, and discriminative via two-level fusion, namely multi-scale fusion for each type of features and multi-view aggregation for different types of features. In addition, MSMVFA is general and different deep networks can be easily applied into this scheme. Extensive experiments and evaluations demonstrate that our method achieves state-of-the-art recognition performance on three popular large-scale food benchmark datasets in Top-1 recognition accuracy. Furthermore, we expect this paper will further the agenda of food recognition in the community of image processing and computer vision.

中文翻译：

用于食品识别的多尺度多视图深度特征聚合。

最近，食品识别由于其在人类健康中的巨大潜在应用而在图像处理和计算机视觉中受到越来越多的关注。现有的大多数方法都是通过卷积神经网络（CNN）直接提取深层视觉特征进行食品识别的。这些方法忽略了食物图像的特征，因此难以实现最佳识别性能。与一般的物体识别相反，食物图像通常不表现出明显的空间排列和共同的语义模式。在本文中，我们提出了一种用于食品识别的多尺度多视图特征聚合（MSMVFA）方案。MSMVFA可以将高级语义功能，中级属性功能和深层视觉功能聚合为一个统一的表示形式。这三种类型的特征从不同的粒度描述食物图像。因此，聚集的特征可以最大可能地捕获食物图像的语义。对于该解决方案，我们利用其他成分知识来通过成分监督的CNN获得中级属性表示。从类监督的CNN中提取高级语义特征和深层视觉特征。考虑到食物图像在许多情况下都没有表现出独特的空间布局，因此MSMVFA融合了针对每种类型特征的多尺度CNN激活，以使聚合特征更具判别力，并且对于几何变形始终不变。最后，通过两级融合，聚合后的功能更加健壮，全面和具有区别性，即针对每种类型的特征进行多尺度融合，针对不同类型的特征进行多视图聚合。另外，MSMVFA是通用的，可以轻松地将不同的深度网络应用于此方案。大量的实验和评估表明，我们的方法在Top-1识别精度上，在三个流行的大型食品基准数据集上均达到了最新的识别性能。此外，我们希望本文将进一步推动图像处理和计算机视觉界的食品识别议程。大量的实验和评估表明，我们的方法在Top-1识别精度上，在三个流行的大型食品基准数据集上均达到了最新的识别性能。此外，我们希望本文将进一步推动图像处理和计算机视觉界的食品识别议程。大量的实验和评估表明，我们的方法在Top-1识别精度上，在三个流行的大型食品基准数据集上均达到了最新的识别性能。此外，我们希望本文将进一步推动图像处理和计算机视觉界的食品识别议程。

更新日期：2020-04-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>