当前位置: X-MOL 学术arXiv.cs.IR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning Early Exit Strategies for Additive Ranking Ensembles
arXiv - CS - Information Retrieval Pub Date : 2021-05-06 , DOI: arxiv-2105.02568
Francesco Busolin, Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, Salvatore Trani

Modern search engine ranking pipelines are commonly based on large machine-learned ensembles of regression trees. We propose LEAR, a novel - learned - technique aimed to reduce the average number of trees traversed by documents to accumulate the scores, thus reducing the overall query response time. LEAR exploits a classifier that predicts whether a document can early exit the ensemble because it is unlikely to be ranked among the final top-k results. The early exit decision occurs at a sentinel point, i.e., after having evaluated a limited number of trees, and the partial scores are exploited to filter out non-promising documents. We evaluate LEAR by deploying it in a production-like setting, adopting a state-of-the-art algorithm for ensembles traversal. We provide a comprehensive experimental evaluation on two public datasets. The experiments show that LEAR has a significant impact on the efficiency of the query processing without hindering its ranking quality. In detail, on a first dataset, LEAR is able to achieve a speedup of 3x without any loss in NDCG1@0, while on a second dataset the speedup is larger than 5x with a negligible NDCG@10 loss (< 0.05%).

中文翻译:

学习可加排名组合的提早退出策略

现代搜索引擎排名管道通常基于大型的机器学习的回归树集成。我们提出了LEAR,这是一种新颖的,学到的技术,旨在减少文档遍历的树木的平均数量以累积分数,从而减少总体查询响应时间。LEAR利用分类器来预测文档是否可以尽早退出集合,因为它不太可能在最终的前k个结果中排名。提前退出决策发生在前哨点,即在评估了有限数量的树木之后,并且利用部分分数来筛选出没有希望的文档。我们通过将LEAR部署在类似于生产的环境中,并采用最新算法进行整体遍历来评估LEAR。我们对两个公共数据集提供了全面的实验评估。实验表明,LEAR在不影响其排名质量的情况下,对查询处理的效率具有重大影响。详细地,在第一个数据集上,LEAR能够实现3倍的加速,而NDCG1 @ 0没有任何损失,而在第二个数据集上,其加速大于5倍,而NDCG @ 10的损失可以忽略不计(<0.05%)。
更新日期:2021-05-07
down
wechat
bug