Semantic granularity metric learning for visual search,Journal of Visual Communication and Image Representation

当前位置： X-MOL 学术 › J. Visual Commun. Image Represent. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Semantic granularity metric learning for visual search
Journal of Visual Communication and Image Representation ( IF 2.6 ) Pub Date : 2020-08-17 , DOI: 10.1016/j.jvcir.2020.102871
Dipu Manandhar , Muhammet Bastan , Kim-Hui Yap

Existing metric learning methods often do not consider different granularity in visual similarity. However, in many domains, images exhibit similarity at multiple granularities with visual semantic concepts, $e . g .$ fashion demonstrates similarity ranging from clothing of the exact same instance to similar looks/design or common category. Therefore, training image triplets/pairs inherently possess different degree of information. Nevertheless, the existing methods often treat them with equal importance which hinder capturing underlying granularities in image similarity. In view of this, we propose a new semantic granularity metric learning (SGML) that develops a novel idea of detecting and leveraging attribute semantic space and integrating it into deep metric learning to capture multiple granularities of similarity. The proposed framework simultaneously learns image attributes and embeddings with multitask-CNN where the tasks are linked by semantic granularity similarity mapping to leverage correlations between the tasks. To this end, we propose a new soft-binomial deviance loss that effectively integrates informativeness of training samples into metric-learning on-the-fly during training. Compared to recent ensemble-based methods, SGML is conceptually elegant, computationally simple yet effective. Extensive experiments on benchmark datasets demonstrate its superiority $e . g .$ , 1–4.5%[email protected] improvement over the state-of-the-arts (Kim et al., 2018; Cakir et al., 2019) on DeepFashion-Inshop dataset.

中文翻译：

用于视觉搜索的语义粒度度量学习

现有的度量学习方法通常不会在视觉相似性中考虑不同的粒度。但是，在许多领域，图像在多个粒度上都具有视觉语义概念的相似性， $Ë 。 G 。$ 时装表现出相似性，从完全相同实例的服装到相似的外观/设计或共同类别。因此，训练图像三元组/对固有地具有不同程度的信息。然而，现有的方法通常将它们同等重要地对待，这阻碍了捕获图像相似性中的潜在粒度。有鉴于此，我们提出了一种新的语义粒度度量学习（SGML），它开发了一种检测和利用属性语义空间并将其集成到深度度量学习中以捕获多个相似粒度的新颖思想。所提出的框架利用多任务-CNN同时学习图像属性和嵌入，其中任务通过语义粒度相似度映射链接以利用任务之间的相关性。为此，我们提出了一种新的软二项式偏差损失，该损失将训练样本的信息有效地整合到了训练过程中的实时度量学习中。与最近的基于集成的方法相比，SGML在概念上优雅，计算简单但有效。在基准数据集上进行的大量实验证明了它的优越性 $Ë 。 G 。$ ，在DeepFashion-Inshop数据集上比最新技术（Kim等人，2018； Cakir等人，2019）提高了1-4.5％（受电子邮件保护）。

更新日期：2020-08-17

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11