论文标题
假设推理通过出处抽象
Hypothetical Reasoning via Provenance Abstraction
论文作者
论文摘要
数据分析通常涉及假设推理:反复修改数据并观察以数据为中心应用程序的计算结果的诱导影响。先前的工作表明,细颗粒的数据出处可以帮助使这样的分析更有效:而不是对基本应用的昂贵重新执行,而是将假设的情况应用于预计的出处表达。但是,存储复杂查询和大规模数据的出处会导致一个重要的开销,这通常是纳入基于出处的解决方案的障碍。 为此,我们提出了一个允许减少出处大小的框架。我们的方法是基于使用用户定义的抽象树在出处变量上降低出处粒度;粒度基于预期的假设情景。我们正式化了假设推理的出处规模和受支持的粒度之间的权衡,并研究了由此产生的优化问题的复杂性,为其他人提供了有效的算法和其他人的启发式算法。我们通过实验研究解决各种查询和抽象树的解决方案的性能。我们的研究表明,算法通常会导致假设推理的大幅加速,并具有合理的准确性丧失。
Data analytics often involves hypothetical reasoning: repeatedly modifying the data and observing the induced effect on the computation result of a data-centric application. Previous work has shown that fine-grained data provenance can help make such an analysis more efficient: instead of a costly re-execution of the underlying application, hypothetical scenarios are applied to a pre-computed provenance expression. However, storing provenance for complex queries and large-scale data leads to a significant overhead, which is often a barrier to the incorporation of provenance-based solutions. To this end, we present a framework that allows to reduce provenance size. Our approach is based on reducing the provenance granularity using user defined abstraction trees over the provenance variables; the granularity is based on the anticipated hypothetical scenarios. We formalize the tradeoff between provenance size and supported granularity of the hypothetical reasoning, and study the complexity of the resulting optimization problem, provide efficient algorithms for tractable cases and heuristics for others. We experimentally study the performance of our solution for various queries and abstraction trees. Our study shows that the algorithms generally lead to substantial speedup of hypothetical reasoning, with a reasonable loss of accuracy.