聚类：可扩展的实体对齐，并进行随机训练和标准化的迷你批次相似性

论文标题

聚类：可扩展的实体对齐，并进行随机训练和标准化的迷你批次相似性

ClusterEA: Scalable Entity Alignment with Stochastic Training and Normalized Mini-batch Similarities

论文作者

Gao, Yunjun, Liu, Xiaoze, Wu, Junyang, Li, Tianyi, Wang, Pengfei, Chen, Lu

论文摘要

实体一致性（EA）旨在在不同的知识图（kgs）中找到同等实体。近年来，基于嵌入的方法已经主导了EA任务。这些方法面临来自嵌入向量的几何特性的问题，包括枢纽和隔离。为了解决这些几何问题，EA已采用了许多归一化方法。但是，KGS的规模不断增加，因此EA模型很难采用归一化过程，从而限制了它们在现实世界应用中的用法。为了应对这一挑战，我们提出了集群，这是一个能够扩展EA模型并通过利用高实体同等速率的迷你批汇率来扩展EA模型并增强结果的一般框架。聚类包含三个组件，以使大规模kg之间的实体对齐，包括随机训练，clustersampler和稀疏灌注。它首先以随机方式训练大规模的暹罗GNN为EA训练，以生成实体嵌入。基于嵌入，提出了一种新型的clustersampler策略，用于对高度重叠的迷你批次进行采样。最后，群集结合了稀疏灌注，该漏洞将局部和全局相似性归一化，然后融合了所有相似性矩阵以获得最终的相似性矩阵。使用EA基准上的现实数据集进行了广泛的实验，可以洞悉所提出的框架，并建议它能够以@1的命中率最高8次，超过最高的可扩展EA框架。

Entity alignment (EA) aims at finding equivalent entities in different knowledge graphs (KGs). Embedding-based approaches have dominated the EA task in recent years. Those methods face problems that come from the geometric properties of embedding vectors, including hubness and isolation. To solve these geometric problems, many normalization approaches have been adopted for EA. However, the increasing scale of KGs renders it hard for EA models to adopt the normalization processes, thus limiting their usage in real-world applications. To tackle this challenge, we present ClusterEA, a general framework that is capable of scaling up EA models and enhancing their results by leveraging normalization methods on mini-batches with a high entity equivalent rate. ClusterEA contains three components to align entities between large-scale KGs, including stochastic training, ClusterSampler, and SparseFusion. It first trains a large-scale Siamese GNN for EA in a stochastic fashion to produce entity embeddings. Based on the embeddings, a novel ClusterSampler strategy is proposed for sampling highly overlapped mini-batches. Finally, ClusterEA incorporates SparseFusion, which normalizes local and global similarity and then fuses all similarity matrices to obtain the final similarity matrix. Extensive experiments with real-life datasets on EA benchmarks offer insight into the proposed framework, and suggest that it is capable of outperforming the state-of-the-art scalable EA framework by up to 8 times in terms of Hits@1.

下载PDF全文

下载文献需遵守相关版权规定

论文标题