大约拒绝约束

论文标题

大约拒绝约束

Approximate Denial Constraints

论文作者

Livshits, Ester, Heidari, Alireza, Ilyas, Ihab F., Kimelfeld, Benny

论文摘要

在过去的二十年中，已经对数据的挖掘完整性约束的问题进行了广泛的研究，包括经典的功能依赖性（FDS）和更一般的拒绝约束（DC）。在本文中，我们研究了挖掘近似DC（即从数据中“满足”的DC）的问题。考虑近似约束，我们可以发现不一致的数据库中的更准确的约束，检测通常正确但可能有一些例外的规则，并且避免过度拟合并获得更一般且更不符合人为的约束。我们介绍了用于开采大约DC的算法Adcminer。该算法的一个重要特征是，它不假定近似DC的任何特定定义，而是将语义作为输入。由于有多种定义近似直流的方法，并且不同的定义可能会产生截然不同的结果，因此我们不关注一个定义，而是在一个近似函数的一般家族上，该函数满足本文定义的某些自然公理，并捕获了近似约束的常用定义。我们还展示了如何将算法与采样相结合，以高精度返回结果，同时大大减少了运行时间。

The problem of mining integrity constraints from data has been extensively studied over the past two decades for commonly used types of constraints including the classic Functional Dependencies (FDs) and the more general Denial Constraints (DCs). In this paper, we investigate the problem of mining approximate DCs (i.e., DCs that are "almost" satisfied) from data. Considering approximate constraints allows us to discover more accurate constraints in inconsistent databases, detect rules that are generally correct but may have a few exceptions, as well as avoid overfitting and obtain more general and less contrived constraints. We introduce the algorithm ADCMiner for mining approximate DCs. An important feature of this algorithm is that it does not assume any specific definition of an approximate DC, but takes the semantics as input. Since there is more than one way to define an approximate DC and different definitions may produce very different results, we do not focus on one definition, but rather on a general family of approximation functions that satisfies some natural axioms defined in this paper and captures commonly used definitions of approximate constraints. We also show how our algorithm can be combined with sampling to return results with high accuracy while significantly reducing the running time.

下载PDF全文

下载文献需遵守相关版权规定

论文标题