A discriminative method for global query expansion and term reweighting using co-occurrence graphs,Journal of Information Science

当前位置： X-MOL 学术 › J. Inf. Sci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A discriminative method for global query expansion and term reweighting using co-occurrence graphs
Journal of Information Science ( IF 1.8 ) Pub Date : 2021-03-29 , DOI: 10.1177/0165551521998047
Billel Aklouche ₁ , Ibrahim Bounhas ₂ , Yahya Slimani ₃

Affiliation

This article presents a new query expansion (QE) method aiming to tackle term mismatch in information retrieval (IR). Previous research showed that selecting good expansion terms which do not hurt retrieval effectiveness remains an open and challenging research question. Our method investigates how global statistics of term co-occurrence can be used effectively to enhance expansion term selection and reweighting. Indeed, we build a co-occurrence graph using a context window approach over the entire collection, thus adopting a global QE approach. Then, we employ a semantic similarity measure inspired by the Okapi BM25 model, which allows to evaluate the discriminative power of words and to select relevant expansion terms based on their similarity to the query as a whole. The proposed method includes a reweighting step where selected terms are assigned weights according to their relevance to the query. What’s more, our method does not require matrix factorisation or complex text mining processes. It only requires simple co-occurrence statistics about terms, which reduces complexity and insures scalability. Finally, it has two free parameters that may be tuned to adapt the model to the context of a given collection and control co-occurrence normalisation. Extensive experiments on four standard datasets of English (TREC Robust04 and Washington Post) and French (CLEF2000 and CLEF2003) show that our method improves both retrieval effectiveness and robustness in terms of various evaluation metrics and outperforms competitive state-of-the-art baselines with significantly better results. We also investigate the impact of varying the number of expansion terms on retrieval results.

中文翻译：

使用共现图的全局查询扩展和术语重加权的判别方法

本文提出了一种新的查询扩展（QE）方法，旨在解决信息检索（IR）中的术语不匹配问题。先前的研究表明，选择不损害检索效率的良好扩展术语仍然是一个开放且具有挑战性的研究问题。我们的方法研究如何有效地使用术语共现的全局统计信息来增强扩展术语的选择和重新加权。确实，我们使用上下文窗口方法在整个集合上构建了共现图，因此采用了全局QE方法。然后，我们采用了由Okapi BM25模型启发的语义相似性度量，该度量允许评估单词的判别力，并根据它们与整个查询的相似性来选择相关的扩展术语。所提出的方法包括重新加权步骤，其中根据选择的术语与查询的相关性为其分配权重。而且，我们的方法不需要矩阵分解或复杂的文本挖掘过程。它仅需要有关术语的简单共现统计信息，从而降低了复杂性并确保了可伸缩性。最后，它具有两个自由参数，可以对其进行调整以使模型适应给定集合的上下文并控制共现归一化。在英语（TREC Robust04和《华盛顿邮报》）和法语（CLEF2000和CLEF2003）的四个标准数据集上进行的大量实验表明，我们的方法在各种评估指标方面均提高了检索效率和鲁棒性，并且优于具有竞争性的最新基准明显更好的结果。

更新日期：2021-03-30

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11