A Discriminative Semantic Ranker for Question Retrieval,arXiv - CS - Information Retrieval

当前位置： X-MOL 学术 › arXiv.cs.IR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Discriminative Semantic Ranker for Question Retrieval
arXiv - CS - Information Retrieval Pub Date : 2021-07-18 , DOI: arxiv-2107.08345
Yinqiong Cai, Yixing Fan, Jiafeng Guo, Ruqing Zhang, Yanyan Lan, Xueqi Cheng

Similar question retrieval is a core task in community-based question answering (CQA) services. To balance the effectiveness and efficiency, the question retrieval system is typically implemented as multi-stage rankers: The first-stage ranker aims to recall potentially relevant questions from a large repository, and the latter stages attempt to re-rank the retrieved results. Most existing works on question retrieval mainly focused on the re-ranking stages, leaving the first-stage ranker to some traditional term-based methods. However, term-based methods often suffer from the vocabulary mismatch problem, especially on short texts, which may block the re-rankers from relevant questions at the very beginning. An alternative is to employ embedding-based methods for the first-stage ranker, which compress texts into dense vectors to enhance the semantic matching. However, these methods often lose the discriminative power as term-based methods, thus introduce noise during retrieval and hurt the recall performance. In this work, we aim to tackle the dilemma of the first-stage ranker, and propose a discriminative semantic ranker, namely DenseTrans, for high-recall retrieval. Specifically, DenseTrans is a densely connected Transformer, which learns semantic embeddings for texts based on Transformer layers. Meanwhile, DenseTrans promotes low-level features through dense connections to keep the discriminative power of the learned representations. DenseTrans is inspired by DenseNet in computer vision (CV), but poses a new way to use the dense connectivity which is totally different from its original design purpose. Experimental results over two question retrieval benchmark datasets show that our model can obtain significant gain on recall against strong term-based methods as well as state-of-the-art embedding-based methods.

中文翻译：

用于问题检索的判别语义排序器

类似问题检索是基于社区的问答 (CQA) 服务的核心任务。为了平衡有效性和效率，问题检索系统通常作为多阶段排序器实现：第一阶段排序器旨在从大型存储库中召回潜在相关的问题，而后阶段尝试对检索到的结果重新排序。大多数现有的问题检索工作主要集中在重新排序阶段，将第一阶段的排序器留给了一些传统的基于术语的方法。然而，基于术语的方法经常会遇到词汇不匹配的问题，尤其是在短文本上，这可能会在一开始就阻止重新排序的相关问题。另一种方法是对第一阶段排序器采用基于嵌入的方法，将文本压缩成密集向量以增强语义匹配。然而，这些方法往往会失去作为基于术语的方法的判别力，从而在检索过程中引入噪声并损害召回性能。在这项工作中，我们旨在解决第一阶段排序器的困境，并提出一种用于高召回率检索的判别语义排序器，即 DenseTrans。具体来说，DenseTrans 是一个密集连接的 Transformer，它基于 Transformer 层学习文本的语义嵌入。同时，DenseTrans 通过密集连接提升低级特征，以保持学习到的表征的判别力。DenseTrans 的灵感来自计算机视觉 (CV) 中的 DenseNet，但提出了一种使用密集连接的新方法，这与其最初的设计目的完全不同。

更新日期：2021-07-20

点击分享查看原文

点击收藏

阅读更多本刊最新论文