当前位置: X-MOL 学术arXiv.cs.DB › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Infinite Probabilistic Databases
arXiv - CS - Databases Pub Date : 2019-04-14 , DOI: arxiv-1904.06766
Martin Grohe and Peter Lindner

Probabilistic databases (PDBs) are used to model uncertainty in data in a quantitative way. In the standard formal framework, PDBs are finite probability spaces over relational database instances. It has been argued convincingly that this is not compatible with an open world semantics (Ceylan et al., KR 2016) and with application scenarios that are modeled by continuous probability distributions (Dalvi et al., CACM 2009). We recently introduced a model of PDBs as infinite probability spaces that addresses these issues (Grohe and Lindner, PODS 2019). While that work was mainly concerned with countably infinite probability spaces, our focus here is on uncountable spaces. Such an extension is necessary to model typical continuous probability distributions that appear in many applications. However, an extension beyond countable probability spaces raises nontrivial foundational issues concerned with the measurability of events and queries and ultimately with the question whether queries have a well-defined semantics. It turns out that so-called finite point processes are the appropriate model from probability theory for dealing with probabilistic databases. This model allows us to construct suitable (uncountable) probability spaces of database instances in a systematic way. Our main technical results are measurability statements for relational algebra queries as well as aggregate queries and datalog queries.

中文翻译:

无限概率数据库

概率数据库 (PDB) 用于以定量方式对数据中的不确定性进行建模。在标准形式框架中,PDB 是关系数据库实例上的有限概率空间。有人令人信服地认为,这与开放世界语义(Ceylan 等人,KR 2016)和由连续概率分布建模的应用场景不兼容(Dalvi 等人,CACM 2009)。我们最近引入了 PDB 模型作为解决这些问题的无限概率空间(Grohe 和 Lindner,PODS 2019)。虽然这项工作主要关注可数无限概率空间,但我们这里的重点是不可数空间。这种扩展对于模拟许多应用中出现的典型连续概率分布是必要的。然而,超出可数概率空间的扩展提出了与事件和查询的可测量性有关的重要基础问题,并最终提出了查询是否具有明确定义的语义的问题。事实证明,所谓的有限点过程是概率论中处理概率数据库的合适模型。该模型允许我们以系统的方式构建数据库实例的合适(不可数)概率空间。我们的主要技术成果是关系代数查询以及聚合查询和数据日志查询的可测量性陈述。事实证明,所谓的有限点过程是概率论中处理概率数据库的合适模型。该模型允许我们以系统的方式构建数据库实例的合适(不可数)概率空间。我们的主要技术成果是关系代数查询以及聚合查询和数据日志查询的可测量性陈述。事实证明,所谓的有限点过程是概率论中处理概率数据库的合适模型。该模型允许我们以系统的方式构建数据库实例的合适(不可数)概率空间。我们的主要技术成果是关系代数查询以及聚合查询和数据日志查询的可测量性陈述。
更新日期:2020-01-09
down
wechat
bug