当前位置: X-MOL 学术arXiv.cs.IR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking
arXiv - CS - Information Retrieval Pub Date : 2021-07-12 , DOI: arxiv-2107.05720
Thibault Formal, Benjamin Piwowarski, Stéphane Clinchant

In neural Information Retrieval, ongoing research is directed towards improving the first retriever in ranking pipelines. Learning dense embeddings to conduct retrieval using efficient approximate nearest neighbors methods has proven to work well. Meanwhile, there has been a growing interest in learning sparse representations for documents and queries, that could inherit from the desirable properties of bag-of-words models such as the exact matching of terms and the efficiency of inverted indexes. In this work, we present a new first-stage ranker based on explicit sparsity regularization and a log-saturation effect on term weights, leading to highly sparse representations and competitive results with respect to state-of-the-art dense and sparse methods. Our approach is simple, trained end-to-end in a single stage. We also explore the trade-off between effectiveness and efficiency, by controlling the contribution of the sparsity regularization.

中文翻译:

SPLADE:用于第一阶段排名的稀疏词法和扩展模型

在神经信息检索中,正在进行的研究旨在改进排名管道中的第一个检索器。学习密集嵌入以使用有效的近似最近邻方法进行检索已被证明行之有效。同时,人们对学习文档和查询的稀疏表示越来越感兴趣,这可以继承词袋模型的理想特性,例如术语的精确匹配和倒排索引的效率。在这项工作中,我们基于显式稀疏正则化和对术语权重的对数饱和效应提出了一个新的第一阶段排序器,导致高度稀疏的表示和与最先进的密集和稀疏方法相关的竞争结果。我们的方法很简单,在单个阶段进行端到端的训练。
更新日期:2021-07-14
down
wechat
bug