当前位置: X-MOL 学术ACM Comput. Surv. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Techniques for Inverted Index Compression
ACM Computing Surveys ( IF 23.8 ) Pub Date : 2020-12-06 , DOI: 10.1145/3415148
Giulio Ermanno Pibiri 1 , Rossano Venturini 2
Affiliation  

The data structure at the core of large-scale search engines is the inverted index , which is essentially a collection of sorted integer sequences called inverted lists . Because of the many documents indexed by such engines and stringent performance requirements imposed by the heavy load of queries, the inverted index stores billions of integers that must be searched efficiently. In this scenario, index compression is essential because it leads to a better exploitation of the computer memory hierarchy for faster query processing and, at the same time, allows reducing the number of storage machines. The aim of this article is twofold: first, surveying the encoding algorithms suitable for inverted index compression and, second, characterizing the performance of the inverted index through experimentation.

中文翻译:

倒排索引压缩技术

大型搜索引擎的核心数据结构是倒排索引,它本质上是一个排序整数序列的集合,称为倒排列表. 由于此类引擎索引的文档数量众多,并且查询负载过重,对性能提出了严格的要求,因此倒排索引存储了数十亿个必须高效搜索的整数。在这种情况下,索引压缩是必不可少的,因为它可以更好地利用计算机内存层次结构来加快查询处理速度,同时还可以减少存储机器的数量。本文的目的有两个:第一,调查适合倒排索引压缩的编码算法,第二,通过实验来表征倒排索引的性能。
更新日期:2020-12-06
down
wechat
bug