论文标题
火锅:使用网络嵌入的节点歧义的框架
FONDUE: A Framework for Node Disambiguation Using Network Embeddings
论文作者
论文摘要
现实世界中的数据通常以网络的形式出现。示例包括社交网络,引文网络,生物网络和知识图。网络以最简单的形式代表了现实生活中的实体(例如人,论文,蛋白质,概念)作为节点,并通过这些节点之间的边缘来描述它们与其他实体的关系。从信息扩散到书目分析,生物信息学研究和提问的问题,这对于一系列目的可能是有价值的。 但是,网络的质量通常是有问题的,影响下游任务。本文重点介绍了网络中的节点实际上对应于多个现实生活实体的常见问题。特别是,我们介绍了基于网络嵌入节点歧义的算法dondue。给定网络,火锅标识与多个实体相对应的节点,以进行后续分裂。在十二个基准数据集上进行的广泛实验表明,与现有的最新计算成本相比,火锅在模棱两可的节点识别方面基本上更准确,以相当的计算成本,而对于确定拆分含糊的节点的最佳方法的最佳方式。
Real-world data often presents itself in the form of a network. Examples include social networks, citation networks, biological networks, and knowledge graphs. In their simplest form, networks represent real-life entities (e.g. people, papers, proteins, concepts) as nodes, and describe them in terms of their relations with other entities by means of edges between these nodes. This can be valuable for a range of purposes from the study of information diffusion to bibliographic analysis, bioinformatics research, and question-answering. The quality of networks is often problematic though, affecting downstream tasks. This paper focuses on the common problem where a node in the network in fact corresponds to multiple real-life entities. In particular, we introduce FONDUE, an algorithm based on network embedding for node disambiguation. Given a network, FONDUE identifies nodes that correspond to multiple entities, for subsequent splitting. Extensive experiments on twelve benchmark datasets demonstrate that FONDUE is substantially and uniformly more accurate for ambiguous node identification compared to the existing state-of-the-art, at a comparable computational cost, while less optimal for determining the best way to split ambiguous nodes.