Using an Inverted Index Synopsis for Query Latency and Performance Prediction,ACM Transactions on Information Systems

当前位置： X-MOL 学术 › ACM Trans. Inf. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Using an Inverted Index Synopsis for Query Latency and Performance Prediction
ACM Transactions on Information Systems ( IF 5.4 ) Pub Date : 2020-05-22 , DOI: 10.1145/3389795
Nicola Tonellotto ₁ , Craig Macdonald ₂

Affiliation

Predicting the query latency by a search engine has important benefits, for instance, in allowing the search engine to adjust its configuration to address long-running queries without unnecessarily sacrificing its effectiveness. However, for the dynamic pruning techniques that underlie many commercial search engines, achieving accurate predictions of query latencies is difficult. We propose the use of index synopses—which are stochastic samples of the full index—for attaining accurate timing predictions. Indeed, we experiment using the TREC ClueWeb09 collection, and a large set of real user queries, and find that using small index synopses it is possible to very accurately estimate properties of the larger index, including sizes of posting list unions and intersections. Thereafter, we demonstrate that index synopses facilitate two key use cases: first, for query efficiency prediction, we show that predicting the query latencies on the full index and classifying long-running queries can be accurately achieved using index synopses; second, for query performance prediction, we show that the effectiveness of queries can be estimated more accurately using a synopsis index post-retrieval predictor than a pre-retrieval predictor. Overall, our experiments demonstrate the value of such a stochastic sample of a larger index at predicting the properties of the larger index.

中文翻译：

使用倒排索引概要进行查询延迟和性能预测

通过搜索引擎预测查询延迟具有重要的好处，例如，允许搜索引擎调整其配置以解决长时间运行的查询，而不会不必要地牺牲其有效性。然而，对于作为许多商业搜索引擎基础的动态剪枝技术，实现查询延迟的准确预测是困难的。我们建议使用索引概要（完整索引的随机样本）来获得准确的时间预测。事实上，我们使用 TREC ClueWeb09 集合和大量真实用户查询进行了实验，发现使用小索引概要可以非常准确地估计较大索引的属性，包括发布列表联合和交集的大小。此后，我们证明索引概要有助于两个关键用例：首先，对于查询效率预测，我们展示了使用索引概要可以准确地预测全索引上的查询延迟和对长时间运行的查询进行分类；其次，对于查询性能预测，我们表明使用概要索引后检索预测器比检索前预测器可以更准确地估计查询的有效性。总体而言，我们的实验证明了较大指数的这种随机样本在预测较大指数的属性方面的价值。我们表明，使用概要索引检索后预测器比检索前预测器更准确地估计查询的有效性。总体而言，我们的实验证明了较大指数的这种随机样本在预测较大指数的属性方面的价值。我们表明，使用概要索引检索后预测器比检索前预测器更准确地估计查询的有效性。总体而言，我们的实验证明了较大指数的这种随机样本在预测较大指数的属性方面的价值。

更新日期：2020-05-22

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11