有效的无注意视频变压器

论文标题

有效的无注意视频变压器

Efficient Attention-free Video Shift Transformers

论文作者

Bulat, Adrian, Martinez, Brais, Tzimiropoulos, Georgios

论文摘要

本文解决了有效的视频识别问题。在这一领域，视频变压器最近在效率（Top-1精确度与Flops）频谱中占据了主导地位。同时，在图像域中进行了一些尝试，这些尝试挑战了变压器体系结构中自我发挥操作的必要性，主张使用更简单的方法来进行令牌混合。但是，视频识别的情况尚无结果，在这种情况下，自我发项操作员对效率的影响（与图像的情况相比）显着更高。为了解决这一差距，在本文中，我们做出以下贡献：（a）我们基于移位操作员，造型的仿射转移块构建了一个高效\＆准确的无注意块，该块是专门设计的，该块专门为变压器层的MHSA块中的操作尽可能接近近似。基于我们的仿射转移块，我们构建了我们的仿射转移变压器，并表明它已经超过了所有现有的基于移位/MLP的架构进行Imagenet分类。（b）我们将公式扩展到视频域中，以构建视频播客变压器（vast），这是第一个纯粹无注意的基于偏移的视频变压器。（c）我们表明，对于最流行的动作识别基准，对于具有低计算和内存足迹的模型而言，巨大的最新变压器的表现明显优于最新的最新变压器。代码将提供。

This paper tackles the problem of efficient video recognition. In this area, video transformers have recently dominated the efficiency (top-1 accuracy vs FLOPs) spectrum. At the same time, there have been some attempts in the image domain which challenge the necessity of the self-attention operation within the transformer architecture, advocating the use of simpler approaches for token mixing. However, there are no results yet for the case of video recognition, where the self-attention operator has a significantly higher impact (compared to the case of images) on efficiency. To address this gap, in this paper, we make the following contributions: (a) we construct a highly efficient \& accurate attention-free block based on the shift operator, coined Affine-Shift block, specifically designed to approximate as closely as possible the operations in the MHSA block of a Transformer layer. Based on our Affine-Shift block, we construct our Affine-Shift Transformer and show that it already outperforms all existing shift/MLP--based architectures for ImageNet classification. (b) We extend our formulation in the video domain to construct Video Affine-Shift Transformer (VAST), the very first purely attention-free shift-based video transformer. (c) We show that VAST significantly outperforms recent state-of-the-art transformers on the most popular action recognition benchmarks for the case of models with low computational and memory footprint. Code will be made available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题