当前位置: X-MOL 学术Inf. Process. Manag. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Improving zero-shot retrieval using dense external expansion
Information Processing & Management ( IF 7.4 ) Pub Date : 2022-08-02 , DOI: 10.1016/j.ipm.2022.103026
Xiao Wang , Craig Macdonald , Iadh Ounis

Pseudo-relevance feedback (PRF) is a classical technique to improve search engine retrieval effectiveness, by closing the vocabulary gap between users’ query formulations and the relevant documents. While PRF is typically applied on the same target corpus as the final retrieval, in the past, external expansion techniques have sometimes been applied to obtain a high-quality pseudo-relevant feedback set using the external corpus. However, such external expansion approaches have only been studied for sparse (BoW) retrieval methods, and its effectiveness for recent dense retrieval methods remains under-investigated. Indeed, dense retrieval approaches such as ANCE and ColBERT, which conduct similarity search based on encoded contextualised query and document embeddings, are of increasing importance. Moreover, pseudo-relevance feedback mechanisms have been proposed to further enhance dense retrieval effectiveness. In particular, in this work, we examine the application of dense external expansion to improve zero-shot retrieval effectiveness, i.e. evaluation on corpora without further training. Zero-shot retrieval experiments with six datasets, including two TREC datasets and four BEIR datasets, when applying the MSMARCO passage collection as external corpus, indicate that obtaining external feedback documents using ColBERT can significantly improve NDCG@10 for the sparse retrieval (by upto 28%) and the dense retrieval (by upto 12%). In addition, using ANCE on the external corpus brings upto 30% NDCG@10 improvements for the sparse retrieval and upto 29% for the dense retrieval.



中文翻译:

使用密集外部扩展改进零样本检索

伪相关反馈(PRF)是一种通过缩小用户查询公式和相关文档之间的词汇差距来提高搜索引擎检索效率的经典技术。虽然 PRF 通常应用于与最终检索相同的目标语料库,但在过去,有时会应用外部扩展技术来使用外部语料库获得高质量的伪相关反馈集. 然而,这种外部扩展方法仅针对稀疏(BoW)检索方法进行了研究,其对最近密集检索方法的有效性仍未得到充分研究。事实上,密集检索方法,如 ANCE 和 ColBERT,基于编码的上下文查询和文档嵌入进行相似性搜索,变得越来越重要。此外,已经提出了伪相关反馈机制来进一步提高密集检索的有效性。特别是,在这项工作中,我们研究了密集外部扩展的应用以改善零样本检索有效性,即在没有进一步培训的情况下对语料库进行评估。对六个数据集(包括两个 TREC 数据集和四个 BEIR 数据集)进行零样本检索实验,当将 MSMARCO 段落集合用作外部语料库时,表明使用 ColBERT 获取外部反馈文档可以显着提高 NDCG@10 的稀疏检索(最多 28 %)和密集检索(高达 12%)。此外,在外部语料库上使用 ANCE 可为稀疏检索带来高达 30% 的 NDCG@10 改进,为密集检索带来高达 29% 的改进。

更新日期:2022-08-03
down
wechat
bug