论文标题
IMGCL:在不平衡节点分类上重新访问图形对比度学习
ImGCL: Revisiting Graph Contrastive Learning on Imbalanced Node Classification
论文作者
论文摘要
图形对比学习(GCL)由于其在没有标签的学习节点/图表表示方面的出色表现,吸引了人们的注意力。但是,在实践中,给定图的未标记节点的基础类分布通常是不平衡的。这种高度不平衡的班级分布不可避免地会恶化GCL中学习的节点表示的质量。实际上,我们从经验上发现,大多数最新的GCL方法无法获得歧视性表示,并且在不平衡的节点分类方面表现不佳。在这一观察过程中,我们提出了一个关于不平衡节点分类(IMGCL)的原则GCL框架,该框架自动和适应地平衡了从GCL中学到的无标签的表示形式。具体而言,我们首先用理论原理介绍了基于在线聚类的逐步平衡采样(PBS)方法,该方法平衡了基于从GCL中学的表示的伪标记的训练集。然后,我们开发基于节点中心性的PBS方法,通过上升给定图的重要节点来更好地保留图的内在结构。在多个不平衡的图形数据集和不平衡设置上进行了广泛的实验证明了我们提出的框架的有效性,这大大改善了最近最新的GCL方法的性能。进一步的实验消融和分析表明,IMGCL框架始终提高了代表性不足(TAIL)类中节点的表示质量。
Graph contrastive learning (GCL) has attracted a surge of attention due to its superior performance for learning node/graph representations without labels. However, in practice, the underlying class distribution of unlabeled nodes for the given graph is usually imbalanced. This highly imbalanced class distribution inevitably deteriorates the quality of learned node representations in GCL. Indeed, we empirically find that most state-of-the-art GCL methods cannot obtain discriminative representations and exhibit poor performance on imbalanced node classification. Motivated by this observation, we propose a principled GCL framework on Imbalanced node classification (ImGCL), which automatically and adaptively balances the representations learned from GCL without labels. Specifically, we first introduce the online clustering based progressively balanced sampling (PBS) method with theoretical rationale, which balances the training sets based on pseudo-labels obtained from learned representations in GCL. We then develop the node centrality based PBS method to better preserve the intrinsic structure of graphs, by upweighting the important nodes of the given graph. Extensive experiments on multiple imbalanced graph datasets and imbalanced settings demonstrate the effectiveness of our proposed framework, which significantly improves the performance of the recent state-of-the-art GCL methods. Further experimental ablations and analyses show that the ImGCL framework consistently improves the representation quality of nodes in under-represented (tail) classes.