当前位置: X-MOL 学术arXiv.cs.DB › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Topical Result Caching in Web Search Engines
arXiv - CS - Databases Pub Date : 2020-01-09 , DOI: arxiv-2001.03010
Ida Mele, Nicola Tonellotto, Ophir Frieder, Raffaele Perego

Caching search results is employed in information retrieval systems to expedite query processing and reduce back-end server workload. Motivated by the observation that queries belonging to different topics have different temporal-locality patterns, we investigate a novel caching model called STD (Static-Topic-Dynamic cache). It improves traditional SDC (Static-Dynamic Cache) that stores in a static cache the results of popular queries and manages the dynamic cache with a replacement policy for intercepting the temporal variations in the query stream. Our proposed caching scheme includes another layer for topic-based caching, where the entries are allocated to different topics (e.g., weather, education). The results of queries characterized by a topic are kept in the fraction of the cache dedicated to it. This permits to adapt the cache-space utilization to the temporal locality of the various topics and reduces cache misses due to those queries that are neither sufficiently popular to be in the static portion nor requested within short-time intervals to be in the dynamic portion. We simulate different configurations for STD using two real-world query streams. Experiments demonstrate that our approach outperforms SDC with an increase up to 3% in terms of hit rates, and up to 36% of gap reduction w.r.t. SDC from the theoretical optimal caching algorithm.

中文翻译:

Web 搜索引擎中的主题结果缓存

信息检索系统采用缓存搜索结果来加快查询处理并减少后端服务器的工作量。由于观察到属于不同主题的查询具有不同的时间局部性模式,我们研究了一种称为 STD(静态-主题-动态缓存)的新型缓存模型。它改进了传统的 SDC(静态-动态缓存),将流行查询的结果存储在静态缓存中,并使用替换策略管理动态缓存,以拦截查询流中的时间变化。我们提出的缓存方案包括另一个基于主题的缓存层,其中条目被分配到不同的主题(例如,天气、教育)。以主题为特征的查询结果保存在专用于它的缓存的一部分中。这允许使缓存空间利用适应各种主题的时间局部性,并减少由于那些查询既不够流行而无法进入静态部分,也不能在短时间间隔内请求进入动态部分所导致的缓存未命中。我们使用两个真实世界的查询流模拟 STD 的不同配置。实验表明,我们的方法优于 SDC,命中率提高了 3%,与理论最佳缓存算法相比,SDC 的差距减少了 36%。我们使用两个真实世界的查询流模拟 STD 的不同配置。实验表明,我们的方法优于 SDC,命中率提高了 3%,与理论最佳缓存算法相比,SDC 的差距减少了 36%。我们使用两个真实世界的查询流模拟 STD 的不同配置。实验表明,我们的方法优于 SDC,命中率提高了 3%,与理论最佳缓存算法相比,SDC 的差距减少了 36%。
更新日期:2020-01-10
down
wechat
bug