可扩展的自我监督图表示学习的子图形对比度学习

论文标题

可扩展的自我监督图表示学习的子图形对比度学习

Sub-graph Contrast for Scalable Self-Supervised Graph Representation Learning

论文作者

Jiao, Yizhu, Xiong, Yun, Zhang, Jiawei, Zhang, Yao, Zhang, Tianqi, Zhu, Yangyong

论文摘要

图表学习最近引起了很多关注。由于计算和内存成本有限，现有的带有完整图数据的图形神经网络无法扩展。因此，在大规模图数据中捕获丰富的信息仍然是一个巨大的挑战。此外，这些方法主要集中于监督学习，高度依赖于节点标签信息，而在现实世界中获得的昂贵。至于无监督的网络嵌入方法，他们过分强调了节点接近性，其学习的表示形式几乎无法直接用于下游应用程序任务。近年来，新兴的自我监督学习提供了一种潜在的解决方案来解决上述问题。但是，现有的自我监督作品还可以在完整的图形数据上运行，并有偏见以适合全球或非常本地的（1-Hop邻域）图形结构，以定义基于相互信息的损失项。在本文中，提出了一种新型的自我监督的表示方法，即通过子图对比度，即\ textsc {subg-con}，是通过利用中央节点与其采样子绘画之间的强相关性来捕获区域结构信息的。 \ textsc {subg-con}没有学习完整的输入图数据，而是通过新颖的数据增强策略来学习节点表示形式，而不是根据根据原始图中采样的子图定义的对比损失来学习节点表示。与现有的图表学习方法相比，\ textsc {subg-con}在弱监督要求，模型学习可伸缩性和并行化方面具有明显的性能优势。与来自不同领域的多个现实世界大型基准数据集的经典和最先进的图表学习方法相比，广泛的实验验证了我们工作的有效性和效率。

Graph representation learning has attracted lots of attention recently. Existing graph neural networks fed with the complete graph data are not scalable due to limited computation and memory costs. Thus, it remains a great challenge to capture rich information in large-scale graph data. Besides, these methods mainly focus on supervised learning and highly depend on node label information, which is expensive to obtain in the real world. As to unsupervised network embedding approaches, they overemphasize node proximity instead, whose learned representations can hardly be used in downstream application tasks directly. In recent years, emerging self-supervised learning provides a potential solution to address the aforementioned problems. However, existing self-supervised works also operate on the complete graph data and are biased to fit either global or very local (1-hop neighborhood) graph structures in defining the mutual information based loss terms. In this paper, a novel self-supervised representation learning method via Subgraph Contrast, namely \textsc{Subg-Con}, is proposed by utilizing the strong correlation between central nodes and their sampled subgraphs to capture regional structure information. Instead of learning on the complete input graph data, with a novel data augmentation strategy, \textsc{Subg-Con} learns node representations through a contrastive loss defined based on subgraphs sampled from the original graph instead. Compared with existing graph representation learning approaches, \textsc{Subg-Con} has prominent performance advantages in weaker supervision requirements, model learning scalability, and parallelization. Extensive experiments verify both the effectiveness and the efficiency of our work compared with both classic and state-of-the-art graph representation learning approaches on multiple real-world large-scale benchmark datasets from different domains.

下载PDF全文

下载文献需遵守相关版权规定

论文标题