论文标题
在图形上共同使用节点属性和半监督分类的邻近度
Joint Use of Node Attributes and Proximity for Semi-Supervised Classification on Graphs
论文作者
论文摘要
节点分类的任务是推断未知的节点标签,给定某些节点以及网络结构和其他节点属性的标签。通常,此任务的方法假定同质,因此相邻节点具有相似的属性,并且可以从邻居的标签或网络中的其他近距离(即附近)节点中预测节点的标签。但是,这样的假设可能并不总是存在 - 实际上,在某些情况下,从每个节点的各个属性而不是其近端节点的标签中可以更好地预测标签。理想情况下,节点分类方法应灵活地适应一系列设置,其中未知标签是从近距离节点的标签或单个节点属性的标签中预测的。在本文中,我们基于一种生成概率模型提出了一种原则性方法Jane,该模型通过嵌入预测标签中共同权衡属性和节点接近的作用。我们在各种网络数据集上进行的实验表明,与标准基线相比,Jane表现出多功能性和竞争性能的期望结合。
The task of node classification is to infer unknown node labels, given the labels for some of the nodes along with the network structure and other node attributes. Typically, approaches for this task assume homophily, whereby neighboring nodes have similar attributes and a node's label can be predicted from the labels of its neighbors or other proximate (i.e., nearby) nodes in the network. However, such an assumption may not always hold -- in fact, there are cases where labels are better predicted from the individual attributes of each node rather than the labels of its proximate nodes. Ideally, node classification methods should flexibly adapt to a range of settings wherein unknown labels are predicted either from labels of proximate nodes, or individual node attributes, or partly both. In this paper, we propose a principled approach, JANE, based on a generative probabilistic model that jointly weighs the role of attributes and node proximity via embeddings in predicting labels. Our experiments on a variety of network datasets demonstrate that JANE exhibits the desired combination of versatility and competitive performance compared to standard baselines.