Tuple-Independent Representations of Infinite Probabilistic Databases,arXiv - CS - Logic in Computer Science

当前位置： X-MOL 学术 › arXiv.cs.LO › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Tuple-Independent Representations of Infinite Probabilistic Databases
arXiv - CS - Logic in Computer Science Pub Date : 2020-08-21 , DOI: arxiv-2008.09511
Nofar Carmeli, Martin Grohe, Peter Lindner, Christoph Standke

Probabilistic databases (PDBs) are probability spaces over database instances. They provide a framework for handling uncertainty in databases, as occurs due to data integration, noisy data, data from unreliable sources or randomized processes. Most of the existing theory literature investigated finite, tuple-independent PDBs (TI-PDBs) where the occurrences of tuples are independent events. Only recently, Grohe and Lindner (PODS '19) introduced independence assumptions for PDBs beyond the finite domain assumption. In the finite, a major argument for discussing the theoretical properties of TI-PDBs is that they can be used to represent any finite PDB via views. This is no longer the case once the number of tuples is countably infinite. In this paper, we systematically study the representability of infinite PDBs in terms of TI-PDBs and the related block-independent disjoint PDBs. The central question is which infinite PDBs are representable as first-order views over tuple-independent PDBs. We give a necessary condition for the representability of PDBs and provide a sufficient criterion for representability in terms of the probability distribution of a PDB. With various examples, we explore the limits of our criteria. We show that conditioning on first order properties yields no additional power in terms of expressivity. Finally, we discuss the relation between purely logical and arithmetic reasons for (non-)representability.

中文翻译：

无限概率数据库的元组独立表示

概率数据库 (PDB) 是数据库实例上的概率空间。它们为处理数据库中的不确定性提供了一个框架，这些不确定性是由于数据集成、嘈杂的数据、来自不可靠来源的数据或随机过程而发生的。大多数现有理论文献研究了有限的、元组独立的 PDB (TI-PDB)，其中元组的出现是独立事件。就在最近，Grohe 和 Lindner (PODS '19) 引入了超出有限域假设的 PDB 独立性假设。在有限中，讨论 TI-PDB 理论属性的一个主要论点是它们可用于通过视图表示任何有限 PDB。一旦元组的数量可数无限，情况就不再如此。在本文中，我们系统地研究了无限 PDB 在 TI-PDB 和相关块独立不相交 PDB 方面的可表示性。核心问题是哪些无限 PDB 可以表示为元组独立 PDB 上的一阶视图。我们给出了 PDB 的可表示性的必要条件，并根据 PDB 的概率分布为可表示性提供了充分的标准。通过各种示例，我们探索了我们标准的局限性。我们表明，就表达性而言，对一阶属性的调节不会产生额外的力量。最后，我们讨论了（非）可表示性的纯逻辑原因和算术原因之间的关系。我们给出了 PDB 的可表示性的必要条件，并根据 PDB 的概率分布为可表示性提供了充分的标准。通过各种示例，我们探索了我们标准的局限性。我们表明，就表达性而言，对一阶属性进行调节不会产生额外的力量。最后，我们讨论了（非）可表示性的纯逻辑原因和算术原因之间的关系。我们给出了 PDB 的可表示性的必要条件，并根据 PDB 的概率分布为可表示性提供了充分的标准。通过各种示例，我们探索了我们标准的局限性。我们表明，就表达性而言，对一阶属性进行调节不会产生额外的力量。最后，我们讨论了（非）可表示性的纯逻辑原因和算术原因之间的关系。

更新日期：2020-08-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>