Recognition of varying size scene images using semantic analysis of deep activation maps,Machine Vision and Applications

当前位置： X-MOL 学术 › Mach. Vis. Appl. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Recognition of varying size scene images using semantic analysis of deep activation maps
Machine Vision and Applications ( IF 2.4 ) Pub Date : 2021-03-01 , DOI: 10.1007/s00138-021-01168-8
Shikha Gupta , A. D. Dileep , Veena Thenkanidiyoor

Understanding the complex semantic structure of scene images requires mapping the image from pixel space to high-level semantic space. In semantic space, a scene image is represented by the posterior probabilities of concepts (e.g., ‘car,’ ‘chair,’ ‘window,’ etc.) present in it and such representation is known as semantic multinomial (SMN) representation. SMN generation requires a concept annotated dataset for concept modeling which is infeasible to generate manually due to the large size of databases. To tackle this issue, we propose a novel approach of building the concept model via pseudo-concepts. Pseudo-concept acts as a proxy for the actual concept and gives the cue for its presence instead of actual identity. We propose to use filter responses from deeper convolutional layers of convolutional neural networks (CNNs) as pseudo-concepts, as filters in deeper convolutional layers are trained for different semantic concepts. Most of the prior work considers fixed-size (\(\approx \)227\(\times \)227) images for semantic analysis which suppresses many concepts present in the images. In this work, we preserve the true-concept structure in images by passing in their original resolution to convolutional layers of CNNs. We further propose to prune the non-prominent pseudo-concepts, group the similar one using kernel clustering and later model them using a dynamic-based support vector machine. We demonstrate that resulting SMN representation indeed captures the semantic concepts better and results in state-of-the-art classification accuracy on varying size scene image datasets such as MIT67 and SUN397.

中文翻译：

使用深度激活图的语义分析识别大小不一的场景图像

了解场景图像的复杂语义结构需要将图像从像素空间映射到高级语义空间。在语义空间中，场景图像由其中存在的概念（例如“汽车”，“椅子”，“窗口”等）的后验概率表示，这种表示称为语义多项式（SMN）表示。SMN的生成需要用于概念建模的带有概念注释的数据集，由于数据库庞大，因此无法手动生成。为了解决这个问题，我们提出了一种通过伪概念构建概念模型的新颖方法。伪概念充当实际概念的代理，并提供其存在而不是实际身份的提示。我们建议使用来自卷积神经网络（CNN）的更深卷积层的过滤器响应作为伪概念，因为更深的卷积层中的过滤器针对不同的语义概念进行了训练。先前的大多数工作都将定尺寸（\（\ approx \） 227 \（\ times \） 227）用于语义分析的图像，它抑制了图像中存在的许多概念。在这项工作中，我们通过将图像的原始分辨率传递给CNN的卷积层来保留图像的真实概念结构。我们进一步建议修剪非突出的伪概念，使用内核聚类对相似的伪概念进行分组，然后使用基于动态的支持向量机对它们进行建模。我们证明，所得的SMN表示确实可以更好地捕获语义概念，并且可以在诸如MIT67和SUN397之类的各种大小的场景图像数据集上实现最新的分类精度。

更新日期：2021-03-02

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11