当前位置:
X-MOL 学术
›
arXiv.cs.IR
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking
arXiv - CS - Information Retrieval Pub Date : 2021-07-12 , DOI: arxiv-2107.05720 Thibault Formal, Benjamin Piwowarski, Stéphane Clinchant
arXiv - CS - Information Retrieval Pub Date : 2021-07-12 , DOI: arxiv-2107.05720 Thibault Formal, Benjamin Piwowarski, Stéphane Clinchant
In neural Information Retrieval, ongoing research is directed towards
improving the first retriever in ranking pipelines. Learning dense embeddings
to conduct retrieval using efficient approximate nearest neighbors methods has
proven to work well. Meanwhile, there has been a growing interest in learning
sparse representations for documents and queries, that could inherit from the
desirable properties of bag-of-words models such as the exact matching of terms
and the efficiency of inverted indexes. In this work, we present a new
first-stage ranker based on explicit sparsity regularization and a
log-saturation effect on term weights, leading to highly sparse representations
and competitive results with respect to state-of-the-art dense and sparse
methods. Our approach is simple, trained end-to-end in a single stage. We also
explore the trade-off between effectiveness and efficiency, by controlling the
contribution of the sparsity regularization.
中文翻译:
SPLADE:用于第一阶段排名的稀疏词法和扩展模型
在神经信息检索中,正在进行的研究旨在改进排名管道中的第一个检索器。学习密集嵌入以使用有效的近似最近邻方法进行检索已被证明行之有效。同时,人们对学习文档和查询的稀疏表示越来越感兴趣,这可以继承词袋模型的理想特性,例如术语的精确匹配和倒排索引的效率。在这项工作中,我们基于显式稀疏正则化和对术语权重的对数饱和效应提出了一个新的第一阶段排序器,导致高度稀疏的表示和与最先进的密集和稀疏方法相关的竞争结果。我们的方法很简单,在单个阶段进行端到端的训练。
更新日期:2021-07-14
中文翻译:
SPLADE:用于第一阶段排名的稀疏词法和扩展模型
在神经信息检索中,正在进行的研究旨在改进排名管道中的第一个检索器。学习密集嵌入以使用有效的近似最近邻方法进行检索已被证明行之有效。同时,人们对学习文档和查询的稀疏表示越来越感兴趣,这可以继承词袋模型的理想特性,例如术语的精确匹配和倒排索引的效率。在这项工作中,我们基于显式稀疏正则化和对术语权重的对数饱和效应提出了一个新的第一阶段排序器,导致高度稀疏的表示和与最先进的密集和稀疏方法相关的竞争结果。我们的方法很简单,在单个阶段进行端到端的训练。