当前位置:
X-MOL 学术
›
arXiv.cs.CL
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Phrase Retrieval Learns Passage Retrieval, Too
arXiv - CS - Computation and Language Pub Date : 2021-09-16 , DOI: arxiv-2109.08133 Jinhyuk Lee, Alexander Wettig, Danqi Chen
arXiv - CS - Computation and Language Pub Date : 2021-09-16 , DOI: arxiv-2109.08133 Jinhyuk Lee, Alexander Wettig, Danqi Chen
Dense retrieval methods have shown great promise over sparse retrieval
methods in a range of NLP problems. Among them, dense phrase retrieval-the most
fine-grained retrieval unit-is appealing because phrases can be directly used
as the output for question answering and slot filling tasks. In this work, we
follow the intuition that retrieving phrases naturally entails retrieving
larger text blocks and study whether phrase retrieval can serve as the basis
for coarse-level retrieval including passages and documents. We first observe
that a dense phrase-retrieval system, without any retraining, already achieves
better passage retrieval accuracy (+3-5% in top-5 accuracy) compared to passage
retrievers, which also helps achieve superior end-to-end QA performance with
fewer passages. Then, we provide an interpretation for why phrase-level
supervision helps learn better fine-grained entailment compared to
passage-level supervision, and also show that phrase retrieval can be improved
to achieve competitive performance in document-retrieval tasks such as entity
linking and knowledge-grounded dialogue. Finally, we demonstrate how phrase
filtering and vector quantization can reduce the size of our index by 4-10x,
making dense phrase retrieval a practical and versatile solution in
multi-granularity retrieval.
中文翻译:
短语检索也学习段落检索
在一系列 NLP 问题中,密集检索方法比稀疏检索方法显示出巨大的前景。其中,密集短语检索——最细粒度的检索单元——很有吸引力,因为短语可以直接用作问答和槽填充任务的输出。在这项工作中,我们遵循检索短语自然需要检索更大文本块的直觉,并研究短语检索是否可以作为包括段落和文档在内的粗级检索的基础。我们首先观察到,与段落检索器相比,没有任何再训练的密集短语检索系统已经实现了更好的段落检索准确度(前 5 名准确度提高了 3-5%),这也有助于实现卓越的端到端 QA 性能用更少的段落。然后,我们解释了为什么短语级监督比段落级监督更有助于学习更好的细粒度蕴涵,并表明短语检索可以得到改进,以在文档检索任务(如实体链接和知识基础)中获得有竞争力的表现对话。最后,我们展示了短语过滤和矢量量化如何将索引的大小减少 4-10 倍,使密集短语检索成为多粒度检索中实用且通用的解决方案。
更新日期:2021-09-17
中文翻译:
短语检索也学习段落检索
在一系列 NLP 问题中,密集检索方法比稀疏检索方法显示出巨大的前景。其中,密集短语检索——最细粒度的检索单元——很有吸引力,因为短语可以直接用作问答和槽填充任务的输出。在这项工作中,我们遵循检索短语自然需要检索更大文本块的直觉,并研究短语检索是否可以作为包括段落和文档在内的粗级检索的基础。我们首先观察到,与段落检索器相比,没有任何再训练的密集短语检索系统已经实现了更好的段落检索准确度(前 5 名准确度提高了 3-5%),这也有助于实现卓越的端到端 QA 性能用更少的段落。然后,我们解释了为什么短语级监督比段落级监督更有助于学习更好的细粒度蕴涵,并表明短语检索可以得到改进,以在文档检索任务(如实体链接和知识基础)中获得有竞争力的表现对话。最后,我们展示了短语过滤和矢量量化如何将索引的大小减少 4-10 倍,使密集短语检索成为多粒度检索中实用且通用的解决方案。