Phrase Retrieval Learns Passage Retrieval, Too,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Phrase Retrieval Learns Passage Retrieval, Too
arXiv - CS - Computation and Language Pub Date : 2021-09-16 , DOI: arxiv-2109.08133
Jinhyuk Lee, Alexander Wettig, Danqi Chen

Dense retrieval methods have shown great promise over sparse retrieval methods in a range of NLP problems. Among them, dense phrase retrieval-the most fine-grained retrieval unit-is appealing because phrases can be directly used as the output for question answering and slot filling tasks. In this work, we follow the intuition that retrieving phrases naturally entails retrieving larger text blocks and study whether phrase retrieval can serve as the basis for coarse-level retrieval including passages and documents. We first observe that a dense phrase-retrieval system, without any retraining, already achieves better passage retrieval accuracy (+3-5% in top-5 accuracy) compared to passage retrievers, which also helps achieve superior end-to-end QA performance with fewer passages. Then, we provide an interpretation for why phrase-level supervision helps learn better fine-grained entailment compared to passage-level supervision, and also show that phrase retrieval can be improved to achieve competitive performance in document-retrieval tasks such as entity linking and knowledge-grounded dialogue. Finally, we demonstrate how phrase filtering and vector quantization can reduce the size of our index by 4-10x, making dense phrase retrieval a practical and versatile solution in multi-granularity retrieval.

中文翻译：

短语检索也学习段落检索

在一系列 NLP 问题中，密集检索方法比稀疏检索方法显示出巨大的前景。其中，密集短语检索——最细粒度的检索单元——很有吸引力，因为短语可以直接用作问答和槽填充任务的输出。在这项工作中，我们遵循检索短语自然需要检索更大文本块的直觉，并研究短语检索是否可以作为包括段落和文档在内的粗级检索的基础。我们首先观察到，与段落检索器相比，没有任何再训练的密集短语检索系统已经实现了更好的段落检索准确度（前 5 名准确度提高了 3-5%），这也有助于实现卓越的端到端 QA 性能用更少的段落。然后，我们解释了为什么短语级监督比段落级监督更有助于学习更好的细粒度蕴涵，并表明短语检索可以得到改进，以在文档检索任务（如实体链接和知识基础）中获得有竞争力的表现对话。最后，我们展示了短语过滤和矢量量化如何将索引的大小减少 4-10 倍，使密集短语检索成为多粒度检索中实用且通用的解决方案。

更新日期：2021-09-17

点击分享查看原文

点击收藏

阅读更多本刊最新论文