带有自适应节点采样的分层图变压器

论文标题

带有自适应节点采样的分层图变压器

Hierarchical Graph Transformer with Adaptive Node Sampling

论文作者

Zhang, Zaixi, Liu, Qi, Hu, Qingyong, Lee, Chee-Kong

论文摘要

变压器体系结构在包括自然语言处理和计算机视觉在内的许多领域取得了巨大的成功。但是，在图形结构化数据方面，变压器尚未实现竞争性能，尤其是在大图上。在本文中，我们确定了当前图形变压器的主要缺陷：（1）图形变压器中现有的节点采样策略对图形特征和训练过程不可知。（2）大多数抽样策略仅关注本地邻居，而忽略了图中的远程依赖性。我们对合成数据集进行了实验研究，以表明现有的采样策略是最佳的。为了解决上述问题，我们将图形变压器中节点采样的优化策略作为对手强盗问题，在这种问题中，奖励与注意力的重量有关，并且在训练过程中可能会有所不同。同时，我们提出了一个层次的关注方案，并具有图形变形，以捕获远距离的相互作用，同时降低计算复杂性。最后，我们对现实世界数据集进行了广泛的实验，以证明我们方法比现有的图形变压器和流行的GNN的优越性。

The Transformer architecture has achieved remarkable success in a number of domains including natural language processing and computer vision. However, when it comes to graph-structured data, transformers have not achieved competitive performance, especially on large graphs. In this paper, we identify the main deficiencies of current graph transformers:(1) Existing node sampling strategies in Graph Transformers are agnostic to the graph characteristics and the training process. (2) Most sampling strategies only focus on local neighbors and neglect the long-range dependencies in the graph. We conduct experimental investigations on synthetic datasets to show that existing sampling strategies are sub-optimal. To tackle the aforementioned problems, we formulate the optimization strategies of node sampling in Graph Transformer as an adversary bandit problem, where the rewards are related to the attention weights and can vary in the training procedure. Meanwhile, we propose a hierarchical attention scheme with graph coarsening to capture the long-range interactions while reducing computational complexity. Finally, we conduct extensive experiments on real-world datasets to demonstrate the superiority of our method over existing graph transformers and popular GNNs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题