当前位置: X-MOL 学术Inf. Process. Manag. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Exploiting temporal changes in query submission behavior for improving the search engine result cache performance
Information Processing & Management ( IF 8.6 ) Pub Date : 2021-02-09 , DOI: 10.1016/j.ipm.2021.102533
Tayfun Kucukyilmaz

A commonly used technique for improving query response time in retrieval systems is storing query-relevant information in a fast access memory storage such as a result cache. The effectiveness of the result cache heavily relies on giving correct admission and eviction decisions, which require careful analysis of query and stream characteristics. Due to anonymous and global user access patterns, search engines are often considered time-invariant architectures: query characteristics are frequently collected globally, and assumed to be unchanging for long periods of time. However, the highly distributed nature of the modern search engine framework consequently led to noticeable temporal changes in user access patterns through short periods of time for each of the data center within the distributed network.

The work presented here attempts to evaluate temporal variations in query submissions and exploit them in order to improve the result caching performance. To this end, query logs are analyzed in order to verify the availability of such variations and a new caching framework that facilitate these changes is proposed. The proposed framework partitions the result cache into three segments: a static segment that stores the most frequent queries in an offline fashion, a newly introduced semi-static segment on top of the state-of-the-art Static–Dynamic Cache (SDC) that changes content during different periods of the day, and a dynamic segment that is maintained in an Least Recently Used (LRU) fashion. Conducted experiments demonstrate that the proposed caching framework improves the hit rate of a search engine result cache up to 3.31% and query response time up to 7.27% with respect to the state-of-the-art techniques.



中文翻译:

利用查询提交行为的时间变化来提高搜索引擎结果缓存的性能

用于改善检索系统中查询响应时间的常用技术是将与查询相关的信息存储在快速访问存储器中,例如结果缓存。结果缓存的有效性在很大程度上取决于给出正确的接纳和收回决策,这需要仔细分析查询和流特征。由于匿名和全局用户访问模式,搜索引擎通常被认为是时不变的体系结构:查询特征经常在全局范围内收集,并假定长时间不变。但是,现代搜索引擎框架的高度分布式特性导致分布式网络中每个数据中心的用户访问模式在短时间内出现了明显的时间变化。

本文介绍的工作试图评估查询提交中的时间变化并加以利用,以提高结果缓存性能。为此,分析查询日志以验证此类变化的可用性,并提出了促进这些变化的新缓存框架。提议的框架将结果缓存分为三个部分:以离线方式存储最频繁查询的静态部分,在最新的静态动态缓存(SDC)之上新引入的半静态部分会在一天的不同时段更改内容,并以最近最少使用(LRU)的方式维护动态细分。进行的实验表明,所提出的缓存框架将搜索引擎结果缓存的命中率提高了3。

更新日期:2021-02-09
down
wechat
bug