动态图形消息传递网络用于视觉识别

论文标题

动态图形消息传递网络用于视觉识别

Dynamic Graph Message Passing Networks for Visual Recognition

论文作者

Zhang, Li, Chen, Mohan, Arnab, Anurag, Xue, Xiangyang, Torr, Philip H. S.

论文摘要

建模长期依赖关系对于理解计算机视觉中的任务至关重要。尽管卷积神经网络（CNN）在许多视觉任务中都表现出色，但由于它们通常由当地内核层组成，因此它们仍在捕获远程结构化关系方面仍然受到限制。完全连接的图，例如变形金刚中的自我发项操作，对这种建模是有益的，但是，其计算开销是令人难以置信的。在本文中，我们提出了一个动态图形消息传递网络，与建模完全连接的图形相比，该网络大大降低了计算复杂性。这是通过在图表中自适应采样节点（以输入为条件）来实现的，以传递消息传递。基于采样节点，我们动态预测节点依赖性滤波器权重和亲和力矩阵，以在它们之间传播信息。这种公式使我们能够设计一个自我发挥的模块，更重要的是，我们将基于变压器的新骨干网络用于图像分类预处理，并用于解决各种下游任务（对象检测，实例和语义细分）。使用此模型，我们在四个不同任务上的强大，最先进的基线方面显示出重大改进。我们的方法还优于完全连接的图形，同时使用较少的浮点操作和参数。代码和型号将在https://github.com/fudan-zvg/dgmn2上公开提供。

Modelling long-range dependencies is critical for scene understanding tasks in computer vision. Although convolution neural networks (CNNs) have excelled in many vision tasks, they are still limited in capturing long-range structured relationships as they typically consist of layers of local kernels. A fully-connected graph, such as the self-attention operation in Transformers, is beneficial for such modelling, however, its computational overhead is prohibitive. In this paper, we propose a dynamic graph message passing network, that significantly reduces the computational complexity compared to related works modelling a fully-connected graph. This is achieved by adaptively sampling nodes in the graph, conditioned on the input, for message passing. Based on the sampled nodes, we dynamically predict node-dependent filter weights and the affinity matrix for propagating information between them. This formulation allows us to design a self-attention module, and more importantly a new Transformer-based backbone network, that we use for both image classification pretraining, and for addressing various downstream tasks (object detection, instance and semantic segmentation). Using this model, we show significant improvements with respect to strong, state-of-the-art baselines on four different tasks. Our approach also outperforms fully-connected graphs while using substantially fewer floating-point operations and parameters. Code and models will be made publicly available at https://github.com/fudan-zvg/DGMN2

下载PDF全文

下载文献需遵守相关版权规定

论文标题