当前位置: X-MOL 学术Sensors › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
ero-Shot Image Classification Based on a Learnable Deep Metric
Sensors ( IF 3.9 ) Pub Date : 2021-05-07 , DOI: 10.3390/s21093241
Jingyi Liu , Caijuan Shi , Dongjing Tu , Ze Shi , Yazhi Liu

The supervised model based on deep learning has made great achievements in the field of image classification after training with a large number of labeled samples. However, there are many categories without or only with a few labeled training samples in practice, and some categories even have no training samples at all. The proposed zero-shot learning greatly reduces the dependence on labeled training samples for image classification models. Nevertheless, there are limitations in learning the similarity of visual features and semantic features with a predefined fixed metric (e.g., as Euclidean distance), as well as the problem of semantic gap in the mapping process. To address these problems, a new zero-shot image classification method based on an end-to-end learnable deep metric is proposed in this paper. First, the common space embedding is adopted to map the visual features and semantic features into a common space. Second, an end-to-end learnable deep metric, that is, the relation network is utilized to learn the similarity of visual features and semantic features. Finally, the invisible images are classified, according to the similarity score. Extensive experiments are carried out on four datasets and the results indicate the effectiveness of the proposed method.

中文翻译:

基于可学习的深度度量的ero-Shot图像分类

经过大量标记样本训练后,基于深度学习的监督模型在图像分类领域取得了巨大成就。但是,实际上有许多类别没有或只有一些带标签的训练样本,有些类别甚至根本没有训练样本。提出的零镜头学习大大降低了图像分类模型对标记训练样本的依赖。然而,在学习视觉特征和语义特征与预定义的固定度量(例如,欧几里得距离)的相似性方面存在局限性,并且在映射过程中存在语义间隙的问题。为了解决这些问题,本文提出了一种新的基于端到端可学习深度度量的零镜头图像分类方法。第一的,采用公共空间嵌入将视觉特征和语义特征映射到公共空间。其次,使用端到端的可学习深度度量,即关系网络来学习视觉特征和语义特征的相似性。最后,根据相似度分数对不可见图像进行分类。在四个数据集上进行了广泛的实验,结果表明了该方法的有效性。
更新日期:2021-05-07
down
wechat
bug