论文标题
使用自学的联合图表表示学习
Federated Graph Representation Learning using Self-Supervision
论文作者
论文摘要
联合图表示学习(FedGRL)将分布式培训的好处带入了图形结构化数据,同时解决了与数据策划有关的一些隐私和合规性问题。但是,几个有趣的现实图形数据特征就是。在当前的FedGrl设置中未考虑标签缺陷和下游任务异质性。在本文中,我们考虑了一个现实且新颖的问题设置,其中跨 - silo客户可以访问具有有限或没有标记数据的大量未标记的数据,并且还具有多样化的下游类标签域。然后,我们根据模型插值提出了一种新颖的FedGrl公式,我们旨在学习共享的全局模型,该模型使用自我监督的目标进行协作进行优化,并通过本地客户端模型获得下游任务监督。我们使用bgrl a sota自我监督图表示学习方法对我们的通用公式进行了特定的实例化,并通过验证通过逼真的跨SLIO数据集验证其有效性:(1)我们适应了Twitch Gamer网络,该网络自然地模拟了跨geo的场景和表明我们的配方可以提供一致和AVG。 6.1%的收益超过了传统的监督联邦学习目标和AVG。与单个客户特定的自我监督培训相比,1.7%的收益和(2)我们构建并引入了一个名为Amazon Courchase Networks的新的跨核数据集,这些数据集具有动机问题设置的特征。而且,我们见证了AVG。比传统的监督联邦学习和AVG收益11.5%。 1.9%的收益超过了受过单独训练的自我监督模型。两个实验结果都表明我们提出的配方的有效性。最后,我们新颖的问题设置和数据集贡献为FedGrl研究提供了新的途径。
Federated graph representation learning (FedGRL) brings the benefits of distributed training to graph structured data while simultaneously addressing some privacy and compliance concerns related to data curation. However, several interesting real-world graph data characteristics viz. label deficiency and downstream task heterogeneity are not taken into consideration in current FedGRL setups. In this paper, we consider a realistic and novel problem setting, wherein cross-silo clients have access to vast amounts of unlabeled data with limited or no labeled data and additionally have diverse downstream class label domains. We then propose a novel FedGRL formulation based on model interpolation where we aim to learn a shared global model that is optimized collaboratively using a self-supervised objective and gets downstream task supervision through local client models. We provide a specific instantiation of our general formulation using BGRL a SoTA self-supervised graph representation learning method and we empirically verify its effectiveness through realistic cross-slio datasets: (1) we adapt the Twitch Gamer Network which naturally simulates a cross-geo scenario and show that our formulation can provide consistent and avg. 6.1% gains over traditional supervised federated learning objectives and on avg. 1.7% gains compared to individual client specific self-supervised training and (2) we construct and introduce a new cross-silo dataset called Amazon Co-purchase Networks that have both the characteristics of the motivated problem setting. And, we witness on avg. 11.5% gains over traditional supervised federated learning and on avg. 1.9% gains over individually trained self-supervised models. Both experimental results point to the effectiveness of our proposed formulation. Finally, both our novel problem setting and dataset contributions provide new avenues for the research in FedGRL.