Pseudo relevance feedback optimization,Information Retrieval Journal

当前位置： X-MOL 学术 › Inf. Retrieval J. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Pseudo relevance feedback optimization
Information Retrieval Journal ( IF 2.5 ) Pub Date : 2021-05-25 , DOI: 10.1007/s10791-021-09393-5
Avi Arampatzis , Georgios Peikos , Symeon Symeonidis

We propose a method for automatic optimization of pseudo relevance feedback (PRF) in information retrieval. Based on the conjecture that the initial query’s contribution to the final query may not be necessary once a good model is built from pseudo relevant documents, we set out to optimize per query only the number of top-retrieved documents to be used for feedback. The optimization is based on several query performance predictors for the initial query, by building a linear regression model discovering the optimal machine learning pipeline via genetic programming. Even by using only 50–100 training queries, the method yields statistically-significant improvements in MAP of 18–35% over the initial query, 7–11% over the feedback model with the best fixed number of pseudo-relevant documents, and up to 10% (5.5% on median) over the standard method of optimizing both the balance coefficient and the number of feedback documents by grid-search in the training set. Compared to state-of-the-art PRF methods from the recent literature, our method outperforms by up to 21% with an average of 10%. Further analysis shows that we are still far from the method’s effectiveness ceiling (in contrast to the standard method), leaving amble room for further improvements.

中文翻译：

伪相关反馈优化

我们提出一种在信息检索中自动优化伪相关反馈（PRF）的方法。基于这样的猜想，即一旦从伪相关文档中建立了一个好的模型，就可能不需要初始查询对最终查询的贡献，我们着手针对每个查询仅优化用于反馈的最优先检索文档的数量。通过建立线性回归模型并通过遗传编程发现最佳机器学习管道，该优化基于针对初始查询的多个查询性能预测指标。即使仅使用50–100个训练查询，该方法也可以使MAP在统计上有显着提高，比初始查询提高18–35％，与具有最佳固定伪相关文档数量的反馈模型相比，提高7–11％，甚至更高至10％（5。相对于通过训练集中的网格搜索来优化平衡系数和反馈文档数的标准方法，是中值的5％）。与最新文献中最先进的PRF方法相比，我们的方法的性能要高出21％，平均为10％。进一步的分析表明，我们距离该方法的有效上限（与标准方法相比）还差得很远，为进一步改进留有余地。

更新日期：2021-05-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>