当前位置: X-MOL 学术arXiv.cs.IR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
More Robust Dense Retrieval with Contrastive Dual Learning
arXiv - CS - Information Retrieval Pub Date : 2021-07-16 , DOI: arxiv-2107.07773
Yizhi Li, Zhenghao Liu, Chenyan Xiong, Zhiyuan Liu

Dense retrieval conducts text retrieval in the embedding space and has shown many advantages compared to sparse retrieval. Existing dense retrievers optimize representations of queries and documents with contrastive training and map them to the embedding space. The embedding space is optimized by aligning the matched query-document pairs and pushing the negative documents away from the query. However, in such training paradigm, the queries are only optimized to align to the documents and are coarsely positioned, leading to an anisotropic query embedding space. In this paper, we analyze the embedding space distributions and propose an effective training paradigm, Contrastive Dual Learning for Approximate Nearest Neighbor (DANCE) to learn fine-grained query representations for dense retrieval. DANCE incorporates an additional dual training object of query retrieval, inspired by the classic information retrieval training axiom, query likelihood. With contrastive learning, the dual training object of DANCE learns more tailored representations for queries and documents to keep the embedding space smooth and uniform, thriving on the ranking performance of DANCE on the MS MARCO document retrieval task. Different from ANCE that only optimized with the document retrieval task, DANCE concentrates the query embeddings closer to document representations while making the document distribution more discriminative. Such concentrated query embedding distribution assigns more uniform negative sampling probabilities to queries and helps to sufficiently optimize query representations in the query retrieval task. Our codes are released at https://github.com/thunlp/DANCE.

中文翻译:

使用对比对偶学习进行更稳健的密集检索

密集检索在嵌入空间中进行文本检索,与稀疏检索相比显示出许多优势。现有的密集检索器通过对比训练优化查询和文档的表示,并将它们映射到嵌入空间。通过对齐匹配的查询-文档对并将负面文档推离查询来优化嵌入空间。然而,在这样的训练范式中,查询仅被优化以与文档对齐并且被粗定位,导致各向异性查询嵌入空间。在本文中,我们分析了嵌入空间分布并提出了一种有效的训练范式,近似最近邻的对比对偶学习(DANCE)来学习密集检索的细粒度查询表示。DANCE 包含了一个额外的查询检索双重训练对象,其灵感来自经典的信息检索训练公理,即查询似然。通过对比学习,DANCE 的双重训练对象为查询和文档学习更多定制的表示,以保持嵌入空间的平滑和统一,从而提高 DANCE 在 MS MARCO 文档检索任务上的排名性能。与仅对文档检索任务进行优化的 ANCE 不同,DANCE 将查询嵌入集中到更接近文档表示的地方,同时使文档分布更具辨别力。这种集中的查询嵌入分布为查询分配了更均匀的负采样概率,并有助于充分优化查询检索任务中的查询表示。我们的代码在 https://github 上发布。
更新日期:2021-07-19
down
wechat
bug