An analytical study of large SPARQL query logs,The VLDB Journal

当前位置： X-MOL 学术 › VLDB J. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

An analytical study of large SPARQL query logs
The VLDB Journal ( IF 2.8 ) Pub Date : 2019-08-02 , DOI: 10.1007/s00778-019-00558-9
Angela Bonifati , Wim Martens , Thomas Timm

With the adoption of RDF as the data model for Linked Data and the Semantic Web, query specification from end users has become more and more common in SPARQL endpoints. In this paper, we conduct an in-depth analytical study of the queries formulated by end users and harvested from large and up-to-date structured query logs from a wide variety of RDF data sources. As opposed to previous studies, ours is the first assessment on a voluminous query corpus, spanning over several years and covering many representative SPARQL endpoints. Apart from the syntactical structure of the queries that exhibits already interesting results on this generalized corpus, we drill deeper in the structural characteristics related to the graph and hypergraph representation of queries. We outline the most common shapes of queries when visually displayed as undirected graphs and characterize their treewidth, length of their cycles, maximal degree of nodes, and more. For queries that cannot be adequately represented as graphs, we investigate their hypergraphs and hypertreewidth. Moreover, we analyze the evolution of queries over time, by introducing the novel concept of a streak, i.e., a sequence of queries that appear as subsequent modifications of a seed query. Our study offers several fresh insights on the already rich query features of real SPARQL queries formulated by real users and brings us to draw a number of conclusions and pinpoint future directions for SPARQL query evaluation, query optimization, tuning, and benchmarking.

中文翻译：

大型SPARQL查询日志的分析研究

随着RDF被用作链接数据和语义Web的数据模型，来自最终用户的查询规范在SPARQL端点中变得越来越普遍。在本文中，我们对最终用户提出的查询进行了深入的分析研究，这些查询是从来自各种RDF数据源的大型，最新的结构化查询日志中收集的。与以往的研究相反，我们的研究是对大量查询语料库的首次评估，其涵盖了数年，涵盖了许多代表性的SPARQL端点。除了在这种通用语料库上已经显示出有趣结果的查询句法结构之外，我们还将深入研究与查询的图和超图表示有关的结构特征。当概述可视化为无向图的查询时，我们概述了查询的最常见形状，并描述了它们的树宽，周期长度，最大节点度等。对于无法充分表示为图的查询，我们研究了它们的超图和超树宽。此外，我们通过引入连胜的新概念来分析查询随时间的演变，即连串查询作为种子查询的后续修改而出现。我们的研究为真实用户制定的真实SPARQL查询已经丰富的查询功能提供了一些新鲜的见解，使我们得出了一些结论，并指出了SPARQL查询评估，查询优化，调整和基准测试的未来方向。对于无法充分表示为图的查询，我们研究了它们的超图和超树宽。此外，我们通过引入连胜的新概念来分析查询随时间的演变，即连串查询作为种子查询的后续修改而出现。我们的研究为真实用户制定的真实SPARQL查询已经丰富的查询功能提供了一些新鲜的见解，使我们得出了一些结论，并指出了SPARQL查询评估，查询优化，调整和基准测试的未来方向。对于无法充分表示为图的查询，我们研究了它们的超图和超树宽。此外，我们通过引入连胜的新概念来分析查询随时间的演变，即连串查询作为种子查询的后续修改而出现。我们的研究为真实用户制定的真实SPARQL查询已经丰富的查询功能提供了一些新鲜的见解，使我们得出了一些结论，并指出了SPARQL查询评估，查询优化，调整和基准测试的未来方向。作为种子查询的后续修改而出现的一系列查询。我们的研究为真实用户制定的真实SPARQL查询已经丰富的查询功能提供了一些新鲜的见解，使我们得出了一些结论，并指出了SPARQL查询评估，查询优化，调整和基准测试的未来方向。作为种子查询的后续修改而出现的一系列查询。我们的研究为真实用户制定的真实SPARQL查询已经丰富的查询功能提供了一些新鲜的见解，使我们得出了一些结论，并指出了SPARQL查询评估，查询优化，调整和基准测试的未来方向。

更新日期：2019-08-02

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文