论文标题
答案如何以及为什么正确?在动态知识图中维持出处
How and Why is An Answer (Still) Correct? Maintaining Provenance in Dynamic Knowledge Graphs
论文作者
论文摘要
知识图(kg)越来越多地成为许多以知识为中心应用程序的骨干。实践中使用的大多数大规模公斤是根据对不同数据源应用的提取技术集合自动构建的。因此,重要的是要确定查询的结果来确定如何计算这些结果。出处被证明可用于将置信度得分分配给结果,调试KG一代本身以及提供答案解释。在许多此类应用中,由于经常需要答案,因此将某些查询注册为常规查询。但是,由于原因,诸如源数据的变化,提取技术的改进,信息的改进/丰富信息等原因之类的原因,KGS一直在不断变化。这使我们陷入了有效地维护动态和大kg的复杂图模式查询的出处多项式问题的问题,而不必每次更新kg时从头开始重新计算它们。在解决这些问题时,我们提出了Huka,它使用出处多项式来通过编码生成答案所涉及的边缘来跟踪查询结果的推导。更重要的是,Huka还在面对更新时维护这些出处多项式 - 插入以及对基础KG的事实删除。具有各种基准SPARQL查询工作负载的大型现实世界中的实验结果,例如Yago和DBPedia,表明HUKA的速度比现有系统快了几乎50倍,用于在动态KGS上进行出处计算。
Knowledge graphs (KGs) have increasingly become the backbone of many critical knowledge-centric applications. Most large-scale KGs used in practice are automatically constructed based on an ensemble of extraction techniques applied over diverse data sources. Therefore, it is important to establish the provenance of results for a query to determine how these were computed. Provenance is shown to be useful for assigning confidence scores to the results, for debugging the KG generation itself, and for providing answer explanations. In many such applications, certain queries are registered as standing queries since their answers are needed often. However, KGs keep continuously changing due to reasons such as changes in the source data, improvements to the extraction techniques, refinement/enrichment of information, and so on. This brings us to the issue of efficiently maintaining the provenance polynomials of complex graph pattern queries for dynamic and large KGs instead of having to recompute them from scratch each time the KG is updated. Addressing these issues, we present HUKA which uses provenance polynomials for tracking the derivation of query results over knowledge graphs by encoding the edges involved in generating the answer. More importantly, HUKA also maintains these provenance polynomials in the face of updates---insertions as well as deletions of facts---to the underlying KG. Experimental results over large real-world KGs such as YAGO and DBpedia with various benchmark SPARQL query workloads reveals that HUKA can be almost 50 times faster than existing systems for provenance computation on dynamic KGs.