Product image recognition with guidance learning and noisy supervision,Computer Vision and Image Understanding

当前位置： X-MOL 学术 › Comput. Vis. Image Underst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Product image recognition with guidance learning and noisy supervision
Computer Vision and Image Understanding ( IF 4.5 ) Pub Date : 2020-04-17 , DOI: 10.1016/j.cviu.2020.102963
Qing Li , Xiaojiang Peng , Liangliang Cao , Wenbin Du , Hao Xing , Yu Qiao , Qiang Peng

This paper considers to recognize products from daily photos, which is an important problem in real-world applications but also challenging due to background clutters, category diversities, noisy labels, etc. We address this problem by two contributions. First, we introduce a novel large-scale product image dataset, termed as Product-90. Instead of collecting product images by laborious and time-intensive image capturing, we take advantage of the web and download images from the reviews of several e-commerce websites where the images are casually captured by consumers. Labels are assigned automatically by the categories of e-commerce websites. Totally the Product-90 consists of more than 140K images with 90 categories. Due to the fact that consumers may upload unrelated images, it is inevitable that our Product-90 introduces noisy labels. As the second contribution, we develop a simple yet efficient guidance learning (GL) method for training convolutional neural networks (CNNs) with noisy supervision. The GL method first trains an initial teacher network with the full noisy dataset, and then trains a target/student network with both large-scale noisy set and small manually-verified clean set in a multi-task manner. Specifically, in the stage of student network training, the large-scale noisy data is supervised by its guidance knowledge which is the combination of its given noisy label and the soften label from the teacher network. We conduct extensive experiments on our Products-90 and four public datasets, namely Food101, Food-101N, Clothing1M and synthetic noisy CIFAR-10. Our guidance learning method achieves performance superior to state-of-the-art methods on these datasets.

中文翻译：

通过指导学习和噪声监控来识别产品图像

本文考虑从日常照片中识别产品，这在现实应用中是一个重要问题，但由于背景混乱，类别多样性，嘈杂的标签等原因也具有挑战性。我们通过两个方面解决了这个问题。首先，我们介绍一个新颖的大规模产品图像数据集，称为Product-90。无需通过费力且费时的图像捕获来收集产品图像，我们可以利用Web并从几个电子商务网站的评论中下载图像，在这些网站上，消费者可以随意捕获图像。标签是根据电子商务网站的类别自动分配的。总共Product-90由90种类别的140K图像组成。由于消费者可能会上传不相关的图像，因此我们的Product-90不可避免地会引入嘈杂的标签。指导学习（GL）的方法，用于在嘈杂的监督下训练卷积神经网络（CNN）。GL方法首先训练具有完整噪声数据集的初始教师网络，然后以多任务方式训练具有大规模噪声集和小型手动验证净集的目标/学生网络。具体而言，在学生网络训练阶段，大规模的噪声数据由其指导知识进行监督，该指导知识是其给定的噪声标签和来自教师网络的软化标签的组合。我们对Products-90和四个公共数据集进行了广泛的实验，即Food101，Food-101N，Clothing1M和合成噪声CIFAR-10。我们的指导学习方法在这些数据集上的性能优于最新方法。

更新日期：2020-04-21

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>