当前位置: X-MOL 学术ACM Trans. Web › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Fast and Practical Snippet Generation for RDF Datasets
ACM Transactions on the Web ( IF 3.5 ) Pub Date : 2019-11-18 , DOI: 10.1145/3365575
Daxin Liu 1 , Gong Cheng 2 , Qingxia Liu 2 , Yuzhong Qu 2
Affiliation  

Triple-structured open data creates value in many ways. However, the reuse of datasets is still challenging. Users feel difficult to assess the usefulness of a large dataset containing thousands or millions of triples. To satisfy the needs, existing abstractive methods produce a concise high-level abstraction of data. Complementary to that, we adopt the extractive strategy and aim to select the optimum small subset of data from a dataset as a snippet to compactly illustrate the content of the dataset. This has been formulated as a combinatorial optimization problem in our previous work. In this article, we design a new algorithm for the problem, which is an order of magnitude faster than the previous one but has the same approximation ratio. We also develop an anytime algorithm that can generate empirically better solutions using additional time. To suit datasets that are partially accessible via online query services (e.g., SPARQL endpoints for RDF data), we adapt our algorithms to trade off quality of snippet for feasibility and efficiency in the Web environment. We carry out extensive experiments based on real RDF datasets and SPARQL endpoints for evaluating quality and running time. The results demonstrate the effectiveness and practicality of our proposed algorithms.

中文翻译:

用于 RDF 数据集的快速实用的代码段生成

三重结构的开放数据以多种方式创造价值。然而,数据集的重用仍然具有挑战性。用户很难评估包含数千或数百万个三元组的大型数据集的有用性。为了满足需求,现有的抽象方法产生了简洁的高级数据抽象。作为补充,我们采用提取策略,旨在从数据集中选择最佳的小数据子集作为片段,以紧凑地说明数据集的内容。在我们之前的工作中,这已被表述为一个组合优化问题。在本文中,我们针对该问题设计了一种新算法,该算法比上一个算法快一个数量级,但具有相同的近似比。我们还开发了一种随时算法,可以使用额外的时间生成经验上更好的解决方案。为了适应可以通过在线查询服务部分访问的数据集(例如,RDF 数据的 SPARQL 端点),我们调整算法以在 Web 环境中权衡片段的质量和可行性和效率。我们基于真实的 RDF 数据集和 SPARQL 端点进行了广泛的实验,以评估质量和运行时间。结果证明了我们提出的算法的有效性和实用性。
更新日期:2019-11-18
down
wechat
bug