当前位置: X-MOL 学术Image Vis. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning visual variation for object recognition
Image and Vision Computing ( IF 4.2 ) Pub Date : 2020-04-08 , DOI: 10.1016/j.imavis.2020.103912
Jatuporn Toy Leksut , Jiaping Zhao , Laurent Itti

We propose visual variation learning to improve object recognition with convolutional neural networks (CNN). While a typical CNN regards visual variations as nuisances and marginalizes them from the data, we speculate that some variations are informative. We study the impact of visual variation as an auxiliary task, during training only, on classification and similarity embedding problems. To train the network, we introduce the iLab-20M dataset, a large-scale controlled parametric dataset of toy vehicle objects under systematic annotated variations of viewpoint, lighting, focal setting, and background. After training, we strip out the network components related to visual variations, and test classification accuracy on images with no visual variation labels. Our experiments on 1.75 million images from iLab-20M show significant improvement in object recognition accuracy, i.e., AlexNet: 84.49% to 91.15%; ResNet: 86.14% to 90.70%; and DenseNet: 85.56% to 91.55%. Our key contribution is that, at the cost of visual variation annotation during training only, CNN enhanced with visual variation learning is able to focus its attention on distinctive features and learn better object representations, reducing classification error rate of Alexnet by 42%, ResNet by 32%, and DenseNet by 41%, without significant sacrificing of training time and model complexity.



中文翻译:

学习视觉变化以识别物体

我们提出视觉变异学习,以改善卷积神经网络(CNN)的对象识别能力。虽然典型的CNN会将视觉变化视为令人讨厌的事物,并从数据中将其边缘化,但我们推测某些变化是有益的。我们仅在训练期间研究视觉变化作为辅助任务对分类和相似性嵌入问题的影响。为了训练网络,我们引入了iLab-20M数据集,这是在视点,光照,焦点设置和背景的系统注释变化下,玩具车对象的大规模受控参数数据集。训练后,我们剔除与视觉变化有关的网络组件,并在没有视觉变化标签的图像上测试分类准确性。我们对1.的实验 来自iLab-20M的7500万张图像显示出对象识别准确度的显着提高,即AlexNet:84.49%至91.15%;ResNet:86.14%至90.70%;和DenseNet:85.56%至91.55%。我们的主要贡献在于,仅以培训期间的视觉变化注释为代价,通过视觉变化学习增强的CNN可以将注意力集中在独特功能上,并学习更好的对象表示,从而将Alexnet的分类错误率降低42%, 32%,DenseNet减少41%,而不会显着牺牲训练时间和模型复杂性。

更新日期:2020-04-08
down
wechat
bug