Supervised approaches for explicit search result diversification,Information Processing & Management

当前位置： X-MOL 学术 › Inf. Process. Manag. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Supervised approaches for explicit search result diversification
Information Processing & Management ( IF 7.4 ) Pub Date : 2020-07-30 , DOI: 10.1016/j.ipm.2020.102356
Sevgi Yigit-Sert , Ismail Sengor Altingovde , Craig Macdonald , Iadh Ounis , Özgür Ulusoy

Diversification of web search results aims to promote documents with diverse content (i.e., covering different aspects of a query) to the top-ranked positions, to satisfy more users, enhance fairness and reduce bias. In this work, we focus on the explicit diversification methods, which assume that the query aspects are known at the diversification time, and leverage supervised learning methods to improve their performance in three different frameworks with different features and goals. First, in the LTRDiv framework, we focus on applying typical learning to rank (LTR) algorithms to obtain a ranking where each top-ranked document covers as many aspects as possible. We argue that such rankings optimize various diversification metrics (under certain assumptions), and hence, are likely to achieve diversity in practice. Second, in the AspectRanker framework, we apply LTR for ranking the aspects of a query with the goal of more accurately setting the aspect importance values for diversification. As features, we exploit several pre- and post-retrieval query performance predictors (QPPs) to estimate how well a given aspect is covered among the candidate documents. Finally, in the LmDiv framework, we cast the diversification problem into an alternative fusion task, namely, the supervised merging of rankings per query aspect. We again use QPPs computed over the candidate set for each aspect, and optimize an objective function that is tailored for the diversification goal. We conduct thorough comparative experiments using both the basic systems (based on the well-known BM25 matching function) and the best-performing systems (with more sophisticated retrieval methods) from previous TREC campaigns. Our findings reveal that the proposed frameworks, especially AspectRanker and LmDiv, outperform both non-diversified rankings and two strong diversification baselines (i.e., xQuAD and its variant) in terms of various effectiveness metrics.

中文翻译：

明确搜索结果多样化的监督方法

网络搜索结果的多样化旨在促进内容多样化的文档（即涵盖不同方面最高排名），以满足更多用户，提高公平性并减少偏见。在这项工作中，我们专注于显式的多元化方法，该方法假设查询方面在多元化时间是已知的，并利用监督学习方法来提高它们在具有不同功能和目标的三个不同框架中的性能。首先，在LTRDiv框架中，我们专注于应用典型的学习排名（LTR）算法来获得排名，其中每个排名最高的文档都涵盖了尽可能多的方面。我们认为，这样的排名可以优化各种多元化指标（在某些假设下），因此在实践中很可能实现多元化。其次，在AspectRanker框架中，我们应用LTR对查询的各个方面进行排名，目的是更准确地设置各个方面的方面重要性值。作为功能，我们利用多个检索前和检索后查询性能预测器（QPP）来估计给定方面在候选文档中的覆盖程度。最后，在LmDiv框架中，我们将多样化问题转化为替代融合任务，即每个查询方面的排名的有监督合并。我们再次针对每个方面使用在候选集上计算出的QPP，并优化针对多元化目标量身定制的目标函数。我们使用以前的TREC活动的基本系统（基于著名的BM25匹配功能）和性能最佳的系统（具有更复杂的检索方法）进行了全面的比较实验。

更新日期：2020-07-30

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11