当前位置: X-MOL 学术IEEE Trans. Image Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Conversational Image Search
IEEE Transactions on Image Processing ( IF 10.8 ) Pub Date : 2021-09-03 , DOI: 10.1109/tip.2021.3108724
Liqiang Nie , Fangkai Jiao , Wenjie Wang , Yinglong Wang , Qi Tian

Conversational image search, a revolutionary search mode, is able to interactively induce the user response to clarify their intents step by step. Several efforts have been dedicated to the conversation part, namely automatically asking the right question at the right time for user preference elicitation, while few studies focus on the image search part given the well-prepared conversational query. In this paper, we work towards conversational image search, which is much difficult compared to the traditional image search task, due to the following challenges: 1) understanding complex user intents from a multimodal conversational query; 2) utilizing multiform knowledge associated images from a memory network; and 3) enhancing the image representation with distilled knowledge. To address these problems, in this paper, we present a novel contextuaL imAge seaRch sCHeme (LARCH for short), consisting of three components. In the first component, we design a multimodal hierarchical graph-based neural network, which learns the conversational query embedding for better user intent understanding. As to the second one, we devise a multi-form knowledge embedding memory network to unify heterogeneous knowledge structures into a homogeneous base that greatly facilitates relevant knowledge retrieval. In the third component, we learn the knowledge-enhanced image representation via a novel gated neural network, which selects the useful knowledge from retrieved relevant one. Extensive experiments have shown that our LARCH yields significant performance over an extended benchmark dataset. As a side contribution, we have released the data, codes, and parameter settings to facilitate other researchers in the conversational image search community.

中文翻译:


对话式图像搜索



对话式图像搜索是一种革命性的搜索模式,能够交互式地诱导用户响应,逐步阐明他们的意图。人们在对话部分做出了一些努力,即在正确的时间自动询问正确的问题以获取用户偏好,而很少有研究关注考虑到准备充分的对话查询的图像搜索部分。在本文中,我们致力于会话图像搜索,与传统的图像搜索任务相比,由于以下挑战,这要困难得多:1)从多模态会话查询中理解复杂的用户意图; 2)利用来自记忆网络的多种形式的知识关联图像; 3)通过提炼的知识增强图像表示。为了解决这些问题,在本文中,我们提出了一种新颖的上下文图像搜索方案(简称 LARCH),由三个部分组成。在第一个组件中,我们设计了一个基于多模式分层图的神经网络,它学习会话查询嵌入以更好地理解用户意图。对于第二个,我们设计了一种多形式的知识嵌入记忆网络,将异构知识结构统一为同质基础,极大地方便了相关知识检索。在第三部分中,我们通过一种新颖的门控神经网络学习知识增强的图像表示,该网络从检索到的相关知识中选择有用的知识。大量实验表明,我们的 LARCH 比扩展基准数据集具有显着的性能。作为附带贡献,我们发布了数据、代码和参数设置,以方便会话图像搜索社区中的其他研究人员。
更新日期:2021-09-03
down
wechat
bug