Dataset Generation Patterns for Evaluating Knowledge Graph Construction,arXiv - CS - Databases

当前位置： X-MOL 学术 › arXiv.cs.DB › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Dataset Generation Patterns for Evaluating Knowledge Graph Construction
arXiv - CS - Databases Pub Date : 2021-04-28 , DOI: arxiv-2104.13576
Markus Schröder, Christian Jilek, Andreas Dengel

Confidentiality hinders the publication of authentic, labeled datasets of personal and enterprise data, although they could be useful for evaluating knowledge graph construction approaches in industrial scenarios. Therefore, our plan is to synthetically generate such data in a way that it appears as authentic as possible. Based on our assumption that knowledge workers have certain habits when they produce or manage data, generation patterns could be discovered which can be utilized by data generators to imitate real datasets. In this paper, we initially derived 11 distinct patterns found in real spreadsheets from industry and demonstrate a suitable generator called Data Sprout that is able to reproduce them. We describe how the generator produces spreadsheets in general and what altering effects the implemented patterns have.

中文翻译：

评估知识图构造的数据集生成模式

尽管机密性对于评估工业场景中的知识图构建方法可能有用，但它会阻止发布真实的，带有标签的个人和企业数据集。因此，我们的计划是以一种看起来尽可能真实的方式综合生成此类数据。基于我们的假设，即知识型员工在生成或管理数据时具有一定的习惯，因此可以发现生成模式，数据生成器可以利用这些生成模式来模仿真实的数据集。在本文中，我们最初从行业中的真实电子表格中获得了11种不同的模式，并演示了一种合适的名为Data Sprout的生成器，该生成器可以复制它们。我们将描述生成器通常如何生成电子表格，以及所实现的模式有哪些改变效果。

更新日期：2021-04-29

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>