当前位置: X-MOL 学术arXiv.cs.DS › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Space/time-efficient RDF stores based on circular suffix sorting
arXiv - CS - Data Structures and Algorithms Pub Date : 2020-09-21 , DOI: arxiv-2009.10045
Nieves R. Brisaboa, Ana Cerdeira-Pena, Guillermo de Bernardo, Antonio Fari\~na, Gonzalo Navarro

In recent years, RDF has gained popularity as a format for the standardized publication and exchange of information in the Web of Data. In this paper we introduce RDFCSA, a data structure that is able to self-index an RDF dataset in small space and supports efficient querying. RDFCSA regards the triples of the RDF store as short circular strings and applies suffix sorting on those strings, so that triple-pattern queries reduce to prefix searching on the string set. The RDF store is then represented compactly using a Compressed Suffix Array (CSA), a proved technology in text indexing that efficiently supports prefix searches. Our experimental evaluation shows that RDFCSA is able to answer triple-pattern queries in a few microseconds per result while using less than 60% of the space required by the raw original data. We also support join queries, which provide the basis for full SPARQL query support. Even though smaller-space solutions exist, as well as faster ones, RDFCSA is shown to provide an excellent space/time tradeoff, with fast and consistent query times within much less space than alternatives that compete in time.

中文翻译:

基于循环后缀排序的空间/时间高效的 RDF 存储

近年来,RDF 作为一种在数据网络中标准化发布和信息交换的格式而广受欢迎。在本文中,我们介绍了 RDFCSA,这是一种能够在小空间内对 RDF 数据集进行自索引并支持高效查询的数据结构。RDFCSA 将 RDF 存储的三元组视为短循环字符串,并对这些字符串应用后缀排序,从而使三元模式查询简化为对字符串集进行前缀搜索。然后使用压缩后缀数组 (CSA) 来紧凑地表示 RDF 存储,这是一种经过验证的文本索引技术,可有效支持前缀搜索。我们的实验评估表明,RDFCSA 能够在每个结果几微秒内回答三重模式查询,同时使用不到原始原始数据所需空间的 60%。我们还支持连接查询,它为完整的 SPARQL 查询支持提供了基础。尽管存在更小空间和更快的解决方案,但 RDFCSA 被证明可以提供出色的空间/时间权衡,与时间竞争的替代方案相比,它在更少的空间内提供快速且一致的查询时间。
更新日期:2020-09-22
down
wechat
bug