无限概率数据库中的独立性

论文标题

无限概率数据库中的独立性

Independence in Infinite Probabilistic Databases

论文作者

Grohe, Martin, Lindner, Peter

论文摘要

概率数据库（PDB）模型数据中的不确定性。当前的标准是将PDB视为关系数据库实例的有限概率空间。由于典型数据库中的许多属性具有无限的域，例如整数，字符串或实数，因此在数据库实例上将PDB视为无限概率空间通常更自然。在本文中，我们奠定了无限概率数据库的数学基础。我们的重点是独立假设。独立的PDB在PDB的理论和实践中起着核心作用。在这里，我们研究了无限元组独立的PDB以及相关模型，例如无限块非依赖性的分离PDB。尽管PDB的标准模型侧重于基于设定的语义，但我们还研究了与借助额的PDB，具有袋子语义和PDB的独立性，而在不可数的事实空间上。我们还提出了一种具有开放世界假设的PDB的新方法，以解决Ceylan等人提出的问题。（Proc。KR2016）并概括其工作，这仍然植根于有限的元组依赖性PDB中。此外，对于可数的PDB，我们提出了一个近似的查询回答算法。

Probabilistic databases (PDBs) model uncertainty in data. The current standard is to view PDBs as finite probability spaces over relational database instances. Since many attributes in typical databases have infinite domains, such as integers, strings, or real numbers, it is often more natural to view PDBs as infinite probability spaces over database instances. In this paper, we lay the mathematical foundations of infinite probabilistic databases. Our focus then is on independence assumptions. Tuple-independent PDBs play a central role in theory and practice of PDBs. Here, we study infinite tuple-independent PDBs as well as related models such as infinite block-independent disjoint PDBs. While the standard model of PDBs focuses on a set-based semantics, we also study tuple-independent PDBs with a bag semantics and independence in PDBs over uncountable fact spaces. We also propose a new approach to PDBs with an open-world assumption, addressing issues raised by Ceylan et al. (Proc. KR 2016) and generalizing their work, which is still rooted in finite tuple-independent PDBs. Moreover, for countable PDBs we propose an approximate query answering algorithm.

下载PDF全文

下载文献需遵守相关版权规定

论文标题