Anytime Ranking on Document-Ordered Indexes,arXiv - CS - Databases

当前位置： X-MOL 学术 › arXiv.cs.DB › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Anytime Ranking on Document-Ordered Indexes
arXiv - CS - Databases Pub Date : 2021-04-18 , DOI: arxiv-2104.08976
Joel Mackenzie, Matthias Petri, Alistair Moffat

Inverted indexes continue to be a mainstay of text search engines, allowing efficient querying of large document collections. While there are a number of possible organizations, document-ordered indexes are the most common, since they are amenable to various query types, support index updates, and allow for efficient dynamic pruning operations. One disadvantage with document-ordered indexes is that high-scoring documents can be distributed across the document identifier space, meaning that index traversal algorithms that terminate early might put search effectiveness at risk. The alternative is impact-ordered indexes, which primarily support top-k disjunctions, but also allow for anytime query processing, where the search can be terminated at any time, with search quality improving as processing latency increases. Anytime query processing can be used to effectively reduce high-percentile tail latency which is essential for operational scenarios in which a service level agreement (SLA) imposes response time requirements. In this work, we show how document-ordered indexes can be organized such that they can be queried in an anytime fashion, enabling strict latency control with effective early termination. Our experiments show that processing document-ordered topical segments selected by a simple score estimator outperforms existing anytime algorithms, and allows query runtimes to be accurately limited in order to comply with SLA requirements.

中文翻译：

随时对文档排序索引进行排名

倒排索引仍然是文本搜索引擎的支柱，可以有效地查询大型文档集合。尽管存在许多可能的组织，但文档排序的索引是最常见的，因为它们适用于各种查询类型，支持索引更新并允许高效的动态修剪操作。文档排序索引的一个缺点是，高得分文档可能会分布在整个文档标识符空间中，这意味着提前终止的索引遍历算法可能会使搜索效率面临风险。另一种选择是影响排序的索引，该索引主要支持top-k析取，但也允许随时进行查询处理，可以在任何时间终止搜索，并且随着处理延迟的增加，搜索质量也会提高。随时可以使用查询处理来有效地减少高百分数的尾部等待时间，这对于其中服务级别协议（SLA）规定了响应时间要求的操作方案而言至关重要。在这项工作中，我们展示了如何组织文档排序的索引，以便可以随时查询它们，从而实现了严格的延迟控制以及有效的提前终止。我们的实验表明，处理由简单分数估算器选择的文档排序主题片段的性能优于现有的随时算法，并且可以精确限制查询运行时间以符合SLA要求。我们展示了如何组织文档排序的索引，以便可以随时查询它们，从而实现了严格的延迟控制以及有效的提前终止。我们的实验表明，处理由简单分数估算器选择的文档排序主题片段的性能优于现有的随时算法，并且可以精确限制查询运行时间以符合SLA要求。我们展示了如何组织文档排序的索引，以便可以随时查询它们，从而实现了严格的延迟控制以及有效的提前终止。我们的实验表明，处理由简单分数估算器选择的文档排序主题片段的性能优于现有的随时算法，并且可以精确限制查询运行时间以符合SLA要求。

更新日期：2021-04-20

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>