什么是正常，什么是奇怪的，以及知识图中缺少什么：通过归纳摘要统一表征

论文标题

什么是正常，什么是奇怪的，以及知识图中缺少什么：通过归纳摘要统一表征

What is Normal, What is Strange, and What is Missing in a Knowledge Graph: Unified Characterization via Inductive Summarization

论文作者

Belth, Caleb, Zheng, Xinyi, Vreeken, Jilles, Koutra, Danai

论文摘要

知识图（kgs）存储了图表结构中有关世界的高度异构信息，并且对于诸如问题的回答和推理等任务很有用。但是，它们通常包含错误，并且缺少信息。 KG完善的充满活力的研究旨在解决这些问题，调整技术以检测特定类型的错误或完成KG。在这项工作中，我们通过将问题作为无监督的kg摘要来引入统一的解决方案，并使用一组感应性，软规则来描述kg中的正常水平，因此可以用来识别异常的东西，无论是奇怪的还是奇怪的。与一阶逻辑规则不同，我们的规则被标记为根系图，即根据基于其类型的（可见或看不见的）节点围绕（可见或看不见的）的模式，以及kg中的信息。我们提出了KGIST，“知识图归纳摘要”，从传统的支持/基于置信度的规则挖掘技术中脱离了归纳规则的摘要，该摘要根据最小描述长度原理来最能压缩KG，这是我们在KG规则挖掘的上下文中首次使用的一种表述。我们将规则应用于三个大公斤（Nell，DBPedia和Yago），以及诸如压缩，各种类型的错误检测和识别不完整信息之类的任务。我们表明，KGIST在错误检测和不完整识别中优于特定于任务，监督和无监督的基线（识别多达93％的缺失实体的位置 - 比基线多10％以上），同时也对大型知识图高效。

Knowledge graphs (KGs) store highly heterogeneous information about the world in the structure of a graph, and are useful for tasks such as question answering and reasoning. However, they often contain errors and are missing information. Vibrant research in KG refinement has worked to resolve these issues, tailoring techniques to either detect specific types of errors or complete a KG. In this work, we introduce a unified solution to KG characterization by formulating the problem as unsupervised KG summarization with a set of inductive, soft rules, which describe what is normal in a KG, and thus can be used to identify what is abnormal, whether it be strange or missing. Unlike first-order logic rules, our rules are labeled, rooted graphs, i.e., patterns that describe the expected neighborhood around a (seen or unseen) node, based on its type, and information in the KG. Stepping away from the traditional support/confidence-based rule mining techniques, we propose KGist, Knowledge Graph Inductive SummarizaTion, which learns a summary of inductive rules that best compress the KG according to the Minimum Description Length principle---a formulation that we are the first to use in the context of KG rule mining. We apply our rules to three large KGs (NELL, DBpedia, and Yago), and tasks such as compression, various types of error detection, and identification of incomplete information. We show that KGist outperforms task-specific, supervised and unsupervised baselines in error detection and incompleteness identification, (identifying the location of up to 93% of missing entities---over 10% more than baselines), while also being efficient for large knowledge graphs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题