当前位置: X-MOL 学术arXiv.cs.IR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning Passage Impacts for Inverted Indexes
arXiv - CS - Information Retrieval Pub Date : 2021-04-24 , DOI: arxiv-2104.12016
Antonio Mallia, Omar Khattab, Nicola Tonellotto, Torsten Suel

Neural information retrieval systems typically use a cascading pipeline, in which a first-stage model retrieves a candidate set of documents and one or more subsequent stages re-rank this set using contextualized language models such as BERT. In this paper, we propose DeepImpact, a new document term-weighting scheme suitable for efficient retrieval using a standard inverted index. Compared to existing methods, DeepImpact improves impact-score modeling and tackles the vocabulary-mismatch problem. In particular, DeepImpact leverages DocT5Query to enrich the document collection and, using a contextualized language model, directly estimates the semantic importance of tokens in a document, producing a single-value representation for each token in each document. Our experiments show that DeepImpact significantly outperforms prior first-stage retrieval approaches by up to 17% on effectiveness metrics w.r.t. DocT5Query, and, when deployed in a re-ranking scenario, can reach the same effectiveness of state-of-the-art approaches with up to 5.1x speedup in efficiency.

中文翻译:

学习段落索引对段落的影响

神经信息检索系统通常使用级联流水线,其中第一阶段模型检索候选文档集,而一个或多个后续阶段使用上下文化语言模型(例如BERT)对该文档集重新排序。在本文中,我们提出了DeepImpact,这是一种适用于使用标准倒排索引进行有效检索的新文档术语加权方案。与现有方法相比,DeepImpact改善了影响得分建模并解决了词汇不匹配问题。特别是,DeepImpact利用DocT5Query来丰富文档集合,并使用上下文语言模型直接估计文档中标记的语义重要性,从而为每个文档中的每个标记生成单值表示。
更新日期:2021-04-27
down
wechat
bug