当前位置: X-MOL 学术Semant. Web › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Enhancing virtual ontology based access over tabular data with Morph-CSV
Semantic Web ( IF 3.0 ) Pub Date : 2021-04-09 , DOI: 10.3233/sw-210432
David Chaves-Fraga 1 , Edna Ruckhaus 1 , Freddy Priyatna 1 , Maria-Esther Vidal 2 , Oscar Corcho 1
Affiliation  

Ontology-Based Data Access (OBDA) has traditionally focused on providing a unified view of heterogeneous datasets (e.g., relational databases, CSV and JSON files), either by materializing integrated data into RDF or by performing on-the-fly querying via SPARQL query translation. In the specific case of tabular datasets represented as several CSV or Excel files, query translation approaches have been applied by considering each source as a single table that can be loaded into a relational database management system (RDBMS). Nevertheless, constraints over these tables are not represented (e.g., referential integrity among sources, datatypes, or data integrity); thus, neither consistency among attributes nor indexes over tables are enforced. As a consequence, efficiency of the SPARQL-to-SQL translation process may be affected, as well as the completeness of the answers produced during the evaluation of the generated SQL query. Our work is focused on applying implicit constraints on the OBDA query translation process over tabular data. We propose Morph-CSV, a framework for querying tabular data that exploits information from typical OBDA inputs (e.g., mappings, queries) to enforce constraints that can be used together with any SPARQL-to-SQL OBDA engine. Morph-CSV relies on both a constraint component and a set of constraint operators. For a given set of constraints, the operators are applied to each type of constraint with the aim of enhancing query completeness and performance. We evaluate Morph-CSV in several domains: e-commerce with the BSBM benchmark; transportation with the GTFS-Madrid benchmark; and biology with a use case extracted from the Bio2RDF project. We compare and report the performance of two SPARQL-to-SQL OBDA engines, without and with the incorporation of Morph-CSV. The observed results suggest that Morph-CSV is able to speed up the total query execution time by up to two orders of magnitude, while it is able to produce all the query answers.

中文翻译:

使用Morph-CSV增强对表格数据的基于虚拟本体的访问

传统上,基于本体的数据访问(OBDA)专注于提供异构数据集(例如关系数据库,CSV和JSON文件)的统一视图,方法是将集成数据具体化为RDF或通过SPARQL查询执行即时查询翻译。在以几个CSV或Excel文件表示的表格数据集的特定情况下,已通过将每个源视为可加载到关系数据库管理系统(RDBMS)的单个表来应用查询转换方法。但是,这些表的约束未表示出来(例如,源之间的参照完整性,数据类型或数据完整性);因此,既不要求属性之间的一致性也不要求表上的索引。因此,SPARQL到SQL转换过程的效率可能会受到影响,以及在评估生成的SQL查询期间生成的答案的完整性。我们的工作重点是在表格式数据的OBDA查询转换过程中应用隐式约束。我们建议使用Morph-CSV,它是一种查询表格数据的框架,该框架利用来自典型OBDA输入的信息(例如,映射,查询)来实施可与任何SPARQL-to-SQL OBDA引擎一起使用的约束。Morph-CSV依赖于约束组件和一组约束运算符。对于给定的一组约束,将运算符应用于每种约束,以增强查询的完整性和性能。我们在多个领域评估Morph-CSV:使用BSBM基准测试的电子商务;以GTFS-Madrid基准进行运输;和生物学,以及从Bio2RDF项目中提取的用例。我们比较并报告了两个SPARQL-to-SQL OBDA引擎的性能,这些引擎在没有合并Morph-CSV的情况下都具有。观察到的结果表明,Morph-CSV可以将整个查询的执行时间最多缩短两个数量级,同时可以生成所有查询答案。
更新日期:2021-04-09
down
wechat
bug