当前位置: X-MOL 学术Inf. Process. Manag. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Contextualized query expansion via unsupervised chunk selection for text retrieval
Information Processing & Management ( IF 8.6 ) Pub Date : 2021-07-09 , DOI: 10.1016/j.ipm.2021.102672
Zhi Zheng 1, 2 , Kai Hui 3 , Ben He 1, 2 , Xianpei Han 2 , Le Sun 2 , Andrew Yates 4
Affiliation  

When ranking a list of documents relative to a given query, the vocabulary mismatches could compromise the performance, as a result of the different language used in the queries and the documents. Though the BERT-based re-ranker have significantly advanced the state-of-the-art, such mismatch still exist. Moreover, recent works demonstrated that it is non-trivial to use the established query expansion methods to boost the performance of BERT-based re-rankers. Henceforth, this paper proposes a novel query expansion model using unsupervised chunk selection, coined as BERT-QE. In particular, BERT-QE consists of three phases. After performing the first-round re-ranking in phase one, BERT-QE leverages the strength of the BERT model to select relevant text chunks from feedback documents in phase two and uses them for the final re-ranking in phase three. Furthermore, different variants of BERT-QE are thoroughly investigated for a better trade-off between effectiveness and efficiency, including the uses of smaller BERT variants and of recently proposed late interaction methods. On the standard TREC Robust04 and GOV2 test collections, the proposed BERT-QE model significantly outperforms BERT-Large models. Actually, the best variant of BERT-QE can outperform BERT-Large significantly on shallow metrics with less than 1% extra computations.



中文翻译:

通过用于文本检索的无监督块选择进行上下文化查询扩展

当相对于给定查询对文档列表进行排序时,由于查询和文档中使用的语言不同,词汇不匹配可能会影响性能。尽管基于 BERT 的重新排序器显着提升了最先进的技术,但这种不匹配仍然存在。此外,最近的工作表明,使用既定的查询扩展方法来提高基于 BERT 的重新排序器的性能并非易事。此后,本文提出了一种使用无监督块选择的新型查询扩展模型,称为 BERT-QE。特别是,BERT-QE 由三个阶段组成。在第一阶段执行第一轮重新排序后,BERT-QE 利用 BERT 模型的优势从第二阶段的反馈文档中选择相关文本块,并将它们用于第三阶段的最终重新排序。此外,为了在有效性和效率之间更好地权衡,对 BERT-QE 的不同变体进行了彻底研究,包括使用较小的 BERT 变体和最近提出的后期交互方法。在标准的 TREC Robust04 和 GOV2 测试集上,所提出的 BERT-QE 模型明显优于 BERT-Large 模型。实际上,BERT-QE 的最佳变体可以在不超过 1% 的额外计算的情况下在浅度量上显着优于 BERT-Large。

更新日期:2021-07-09
down
wechat
bug