Deep semantic-aware network for zero-shot visual urban perception,International Journal of Machine Learning and Cybernetics

当前位置： X-MOL 学术 › Int. J. Mach. Learn. & Cyber. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Deep semantic-aware network for zero-shot visual urban perception
International Journal of Machine Learning and Cybernetics ( IF 3.1 ) Pub Date : 2021-08-17 , DOI: 10.1007/s13042-021-01401-w
Chunyun Zhang ₁ , Yunfeng Zhang ₁ , Tingwen Wang ₁ , Chaoran Cui ₁ , Tianze Wu ₂ , Baolin Zhao ₃ , Yilong Yin ₄

Affiliation

Visual urban perception has recently attracted a lot of research attention owing to its importance in many fields. Traditional methods for visual urban perception mostly need to collect adequate training instances for newly-added perception attributes. In this paper, we consider a novel formulation, zero-shot learning, to free this cumbersome curation. Based on the idea of different images containing similar objects are more likely to possess the same perceptual attribute, we learn the semantic correlation space formed by objects semantic information and perceptual attributes. For newly-added attributes, we attempt to synthesize their prototypes by transferring similar object vector representations between the unseen attributes and the training (seen) perceptual attributes. For this purpose, we leverage a deep semantic-aware network for zero-shot visual urban perception model. It is a new two step zero-shot learning architecture, which includes supervised visual urban perception step for training attributes and zero-shot prediction step for unseen attributes. In the first step, we highlight the important role of semantic information and introduce it into supervised deep visual urban perception framework for training attributes. In the second step, we use the visualization techniques to obtain the correlations between semantic information and visual perception attributes from the well trained supervised model, and learn the prototype of unseen attributes and testing images to predict perception score on unseen attributes. The experimental results on a large-scale benchmark dataset validate the effectiveness of our method.

中文翻译：

用于零镜头视觉城市感知的深度语义感知网络

由于其在许多领域的重要性，视觉城市感知最近引起了很多研究的关注。传统的视觉城市感知方法大多需要为新增的感知属性收集足够的训练实例。在本文中，我们考虑了一种新颖的公式，零样本学习，以摆脱这种繁琐的管理。基于包含相似对象的不同图像更可能具有相同的感知属性的思想，我们学习了对象语义信息和感知属性形成的语义相关空间。对于新添加的属性，我们尝试通过在不可见属性和训练（可见）感知属性之间转移相似的对象向量表示来合成它们的原型。为此，我们利用深度语义感知网络实现零镜头视觉城市感知模型。它是一种新的两步零样本学习架构，其中包括用于训练属性的监督视觉城市感知步骤和用于未知属性的零样本预测步骤。第一步，我们强调语义信息的重要作用，并将其引入有监督的深度视觉城市感知框架来训练属性。第二步，我们使用可视化技术从训练有素的监督模型中获取语义信息和视觉感知属性之间的相关性，并学习不可见属性的原型和测试图像来预测对不可见属性的感知分数。在大规模基准数据集上的实验结果验证了我们方法的有效性。

更新日期：2021-08-19

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11