开放图基准：用于图形上的机器学习数据集

论文标题

开放图基准：用于图形上的机器学习数据集

Open Graph Benchmark: Datasets for Machine Learning on Graphs

论文作者

Hu, Weihua, Fey, Matthias, Zitnik, Marinka, Dong, Yuxiao, Ren, Hongyu, Liu, Bowen, Catasta, Michele, Leskovec, Jure

论文摘要

我们介绍了开放的图基准（OGB），这是一套具有挑战性和现实的基准数据集，可促进可扩展，健壮和可重复的图形机器学习（ML）研究。 OGB数据集是大规模的，包括多个重要的图形ML任务，涵盖了各种范围的域，从社交和信息网络到生物网络，分子图，源代码ASTS和知识图。对于每个数据集，我们使用有意义的应用程序特定数据拆分和评估指标提供统一的评估协议。除了构建数据集外，我们还为每个数据集执行了广泛的基准实验。我们的实验表明，OGB数据集在现实的数据分割下对大规模图和分布式概括提出了重大挑战，这表明未来研究的富有成果的机会。最后，OGB提供了自动端到端图ML管道，该管道简化并标准化了图数据加载，实验设置和模型评估的过程。 OGB将定期更新，并欢迎社区的意见。 OGB数据集以及数据加载程序，评估脚本，基准代码和排行榜可在https://ogb.stanford.edu上公开获取。

We present the Open Graph Benchmark (OGB), a diverse set of challenging and realistic benchmark datasets to facilitate scalable, robust, and reproducible graph machine learning (ML) research. OGB datasets are large-scale, encompass multiple important graph ML tasks, and cover a diverse range of domains, ranging from social and information networks to biological networks, molecular graphs, source code ASTs, and knowledge graphs. For each dataset, we provide a unified evaluation protocol using meaningful application-specific data splits and evaluation metrics. In addition to building the datasets, we also perform extensive benchmark experiments for each dataset. Our experiments suggest that OGB datasets present significant challenges of scalability to large-scale graphs and out-of-distribution generalization under realistic data splits, indicating fruitful opportunities for future research. Finally, OGB provides an automated end-to-end graph ML pipeline that simplifies and standardizes the process of graph data loading, experimental setup, and model evaluation. OGB will be regularly updated and welcomes inputs from the community. OGB datasets as well as data loaders, evaluation scripts, baseline code, and leaderboards are publicly available at https://ogb.stanford.edu .

下载PDF全文

下载文献需遵守相关版权规定

论文标题