开放知识图使用变分自动编码器的规范化

论文标题

开放知识图使用变分自动编码器的规范化

Open Knowledge Graphs Canonicalization using Variational Autoencoders

论文作者

Dash, Sarthak, Rossiello, Gaetano, Mihindukulasooriya, Nandana, Bagchi, Sugato, Gliozzo, Alfio

论文摘要

开放知识图中的名词短语和关系短语不是规范化的，导致冗余和模棱两可的主题 - 依赖对象的爆炸。解决此问题的现有方法采用两步方法。首先，它们生成名词和关系短语的嵌入表示形式，然后使用群集算法将其用于使用嵌入式作为特征进行分组。在这项工作中，我们提出了使用变分自动编码器（CUVA）的规范化，这是一种在端到端方法中学习嵌入和群集分配的联合模型，从而导致对名词和关系短语的更好的向量表示。我们对多个基准测试的评估表明，CUVA的表现优于现有的最新方法。此外，我们介绍了一种新型数据集Canonicnell，以评估实体规范化系统。

Noun phrases and Relation phrases in open knowledge graphs are not canonicalized, leading to an explosion of redundant and ambiguous subject-relation-object triples. Existing approaches to solve this problem take a two-step approach. First, they generate embedding representations for both noun and relation phrases, then a clustering algorithm is used to group them using the embeddings as features. In this work, we propose Canonicalizing Using Variational Autoencoders (CUVA), a joint model to learn both embeddings and cluster assignments in an end-to-end approach, which leads to a better vector representation for the noun and relation phrases. Our evaluation over multiple benchmarks shows that CUVA outperforms the existing state-of-the-art approaches. Moreover, we introduce CanonicNell, a novel dataset to evaluate entity canonicalization systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题