当前位置: X-MOL 学术IEEE Trans. Cybern. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Scene Categorization Using Deeply Learned Gaze Shifting Kernel
IEEE Transactions on Cybernetics ( IF 9.4 ) Pub Date : 5-11-2018 , DOI: 10.1109/tcyb.2018.2820731
Xiao Sun , Luming Zhang , Zepeng Wang , Jie Chang , Yiyang Yao , Ping Li , Roger Zimmermann

Accurately recognizing sophisticated sceneries from a rich variety of semantic categories is an indispensable component in many intelligent systems, e.g., scene parsing, video surveillance, and autonomous driving. Recently, there have emerged a large quantity of deep architectures for scene categorization, wherein promising performance has been achieved. However, these models cannot explicitly encode human visual perception toward different sceneries, i.e., the sequence of humans sequentially allocates their gazes. To solve this problem, we propose deep gaze shifting kernel to distinguish sceneries from different categories. Specifically, we first project regions from each scenery into the so-called perceptual space, which is established by combining color, texture, and semantic features. Then, a novel non-negative matrix factorization algorithm is developed which decomposes the regions' feature matrix into the product of the basis matrix and the sparse codes. The sparse codes indicate the saliency level of different regions. In this way, the gaze shifting path from each scenery is derived and an aggregation-based convolutional neural network is designed accordingly to learn its deep representation. Finally, the deep representations of gaze shifting paths from all the scene images are incorporated into an image kernel, which is further fed into a kernel SVM for scene categorization. Comprehensive experiments on six scenery data sets have demonstrated the superiority of our method over a series of shallow/deep recognition models. Besides, eye tracking experiments have shown that our predicted gaze shifting paths are 94.6% consistent with the real human gaze allocations.

中文翻译:


使用深度学习的目光转移内核进行场景分类



从丰富的语义类别中准确识别复杂的场景是许多智能系统中不可或缺的组成部分,例如场景解析、视频监控和自动驾驶。最近,出现了大量用于场景分类的深度架构,其中已经取得了可喜的性能。然而,这些模型无法明确地编码人类对不同场景的视觉感知,即人类的顺序依次分配他们的目光。为了解决这个问题,我们提出了深度凝视转移内核来区分不同类别的场景。具体来说,我们首先将每个场景的区域投影到所谓的感知空间中,该空间是通过结合颜色、纹理和语义特征建立的。然后,开发了一种新颖的非负矩阵分解算法,将区域的特征矩阵分解为基矩阵和稀疏码的乘积。稀疏代码表示不同区域的显着性水平。通过这种方式,导出每个场景的视线转移路径,并相应地设计基于聚合的卷积神经网络来学习其深度表示。最后,所有场景图像的视线转移路径的深度表示被合并到图像内核中,该图像内核进一步馈送到内核 SVM 中以进行场景分类。对六个风景数据集的综合实验证明了我们的方法相对于一系列浅/深度识别模型的优越性。此外,眼动追踪实验表明,我们预测的视线转移路径与真实的人类视线分配的一致性为 94.6%。
更新日期:2024-08-22
down
wechat
bug